Update to RTKLIB config file recommendations

I’ve just updated my “RTKLIB: Customizing the input configuration file” post from a few months ago with information on all of the new config parameters I have added to the demo5 code up through B26B.  I’ve also added more notes to some of the existing features based on my more recent experiences.

RTKLIB on a drone with u-blox M8T receivers

Drones are a popular application for RTKLIB and quite a few readers have shared their drone-collected data sets with me, usually with questions on how they can get better solutions. In many cases, the quality of this data has been fairly poor and it has been difficult to get satisfactory results. I was curious to understand why this environment tends to be so challenging since in theory a drone should have more open skies than just about any other application.

To do an experiment, I bought an inexpensive consumer drone from Amazon. I chose the X8C from Syma since it is beginner model and a little larger than some options. I figured the larger size should make it better able to carry some extra weight.

After a few practice flights to get the hang of flying it, I used some duct tape and double-sided foam adhesive to attach a u-blox antenna and 90 mm diameter ground plane to the top of the drone and a u-blox M8T receiver with my custom CHIP data logger underneath where the camera usually goes. I used the landing gear as a spool to wind the unnecessary five meters of antenna cable which was the heaviest part of the whole setup. From a weight perspective, the Emlid Reach units would have been a better choice, but I wanted to collect data from the Galileo constellation of satellites as well as GPS and GLONASS so I used my CSG receiver with the newer 3.0 firmware. I used a second CSG receiver mounted on top of my car as the base station.  Here’s a stock photo of the drone on the left and after my modifications on the right.

drone1drone2a

Although the drone was able to lift the extra weight fairly easily, it seemed to affect the stability of the flight control system and after a few minutes the prop motors would start to fight each other. At that point the drone would start to descend even at full throttle and the drone would land hard enough to usually bounce on its side or back. Still I was able to make a number of short flights which were adequate for testing purposes.

Here’s the observation data for the first set of flights, base station on the left and drone on the right. Red ticks are cycle-slips and gray ticks are half-cycle ambiguities. Ideally, the drone data would look as clean as the base but as you can see it is significantly worse and it turned out to be unusable for any sort of reliable position solution.  The regions without cycle-slips in the drone observations are the times in between flights in which the drone is sitting on the ground.

drone3

Clearly, while the drone is flying, something is interfering with the GPS receiver or antenna, most likely either EMI or mechanical vibration. I could have used a fancy test stand and RF sniffer to try and locate the source of interference but since this blog is focused on low-cost solutions I just used some duct tape, a steel bar, and the RTKLIB code instead.

I used two types of duct tape, both the polyester/fabric type that everyone calls duct tape, and also the metal foil type that is actually used to repair or install ducts. I first used the non-metal duct tape to securely attach the landing gear to the heavy steel bar. The steel bar was convenient because it was easy to attach but anything heavy enough to prevent the drone lifting off under full throttle would work fine.

I then started an instance of RTKNAVI on my laptop and connected it to the receiver on the drone.  The goal was to simulate a complete drone flight while the drone was sitting on the ground and at the same time watch the RTKNAVI observations to detect any degradation of the measurements.  I used a wireless connection but a USB cable would have worked too.

Unfortunately RTKNAVI won’t plot the observation data real-time, but by selecting the tiny “RTK Monitor” box in the bottom left corner of the main RTKNAVI screen, then choosing “Obs Data” from the menu, I was able to get a continuously updating listing of the observations.  Cycle-slips show up as non-zero values in the first column with the I heading. I chose a location outdoors with open enough skies that any degradation in the observation data would be obvious.

drone4

I first observed the cycle-slip column with the drone powered down to verify I wasn’t getting any cycle-slips on all but the lowest elevation satellites. I then continued to observe the cycle-slip column while sequencing through the steps required to fly the drone. I first powered on the drone, then powered on the transmitter, then issued the calibration/connection sequence, then turned on the throttle to low. So far, so good, no sign of cycle-slips. I then started moving the joysticks to issue steering commands which caused the motors to change speeds. All of a sudden I started getting cycle-slips, the more aggressive the steering commands, the more cycle-slips I saw. Aggressive changes in throttle also caused cycle-slips but full throttle with no adjustments or steering commands was fine.

Next I moved just the antenna, then just the receiver away from the drone while issuing steering commands. Moving the antenna away had no effect but moving the receiver away eliminated the cycle-slips.

At this point my guess was that the interference was coming from the relatively high power switching in the motor control circuits and that the antenna ground plane was shielding the antenna from this interference but nothing was shielding the receiver. To test this theory, I attached a layer of the metal duct tape to the bottom of the drone to act as a shield between the drone controller board and the receiver.  I then re-attached the receiver to the bottom of the drone and re-ran the experiment. This time there were almost no cycle-slips even with the most aggressive steering.

I then removed the steel bar and ran a second set of short flights with the layer of metal tape still in place. I was a little concerned that the new shield would interfere with commands sent from the transmitter to the drone so I first tested everything while still on the ground and then kept the drone fairly close during the flight. Fortunately I didn’t see any sign of commands not getting through.

The drone data looked much cleaner in this flight!  Unfortunately, this time the base data was no good with many simultaneous cycle-slips throughout the observation data. At this point I realized that I had placed the base station receiver directly on the top of the car when collecting the data which was very hot at the time. Usually I keep the receiver in the car to avoid this and only place the antenna on the roof. I have seen this kind of severe temperature effects cause simultaneous cycle-slips before but never to this extent. Again the data was completely unusable.

So, back out there again for a third round of flights. This time, everything looked much better. I still saw a few cycle-slips, especially when first applying the throttle at take-off, so my shielding was not perfect but a dramatic improvement over the first flight. The plots below show the results. The top two plots are position solutions for the z-axis. The top plot is with continuous ambiguity resolution and the middle plot is with fix-and-hold enabled. The bottom plot is the drone observation data.

drone5

I made three adjustments to the input configuration file from what I would normally use for my car based measurements.  First of all, since the drones have very open skies, I adjusted the minimum elevation angles from 15 degrees to 10 degrees.   Then, after plotting and observing the accelerations from an initial solution, I increased the vertical acceleration dynamics estimate (stats-prnaccelv) from 0.25 to 1.0.  Finally, because I was seeing slightly higher position variances in the initial solution than I usually do, I adjusted the position variance AR threshold (pos2-arthres1) from 0.004 to 0.1  Both of these last two changes would make sense if the level of vibration were higher in the drone than I am used to seeing, which is quite likely.

Each time the drone landed/crashed due to the unstable flight control system it would bounce to the side or upside-down and that is what is causing the cycle-slips and switch from fix to float at the end of each flight. As you can see though in every case I quickly get another fix after I put the drone upright again. The fixes are solid enough to hold through the entire flight even in continuous mode for all but one of the flights. With fix-and-hold enabled all flights maintained 100% fix rate. The data is as good as or better than similar experiments where I have mounted the rover on top of a car.

This is not surprising since the skies are more open in this experiment. Having over twenty satellites available for ambiguity resolution also helped. I used all the satellites (GPS/GLONASS/Galileo/SBAS) for ambiguity resolution and took advantage of the new feature in the demo5 b26 code that cycles through all the satellites and will throw a single one out if it is preventing a fix. This will automatically occur anytime the number of satellites available for ambiguity resolution is greater than the config parameter “pos2-mindropsats” which defaults to twenty.

I have added the raw data and the configuration file to the  sample data set section at rtkexplorer.com

I imagine different drones will have different issues and not all will be as easy to fix as this one, but the method described here or something similar should be helpful any time drone data is not looking as clean as the base station data.

The fix I chose was very easy to implement but a better fix would probably have been to wrap just the receiver in a shield rather than placing a shield between the control board and the receiver. This would protect the receiver better and avoid affecting commands sent from the transmitter.  In fact, based on these results, I suspect shielding the GPS receiver on a drone is always a good idea.

Zero baseline experiment

I’ve been busy with some consulting projects recently so it’s been a while since my last post but I’m finally caught up and had some time to write something.  I thought I would describe an experiment I did to both try out the “fixed” mode in RTKLIB and also provide some insight into the composition of the errors in the pseudorange and carrier phase measurements in the u-blox M8T receiver.

The “fixed” mode is an alternative to “static” or “kinematic” in which the exact rover location is specified as well as the base position and remains fixed.  The residual errors are then calculated  from the actual position rather than the measured position.  I describe it in a little more detail in this post.  It is intended to be used as a tool to characterize and analyze the residual errors in the pseudorange and carrier phase measurements.

The basic idea in this experiment was to connect two M8T receivers to a single antenna and then compare residuals between the two receivers.  I first looked at the solution using one receiver as base and the other as rover (the zero baseline case) and then compared solutions between each receiver and a local CORS reference station about 8 km away.

The M8T is typically setup to use an active antenna for which it provides power on the antenna input.  I was concerned about connecting the two antenna power feeds together, so to avoid this, I added a 47 pf capacitor in series in one of the antenna feeds to act as a DC block.  In the photo below, the capacitor is inside the metal tape wrapped around a male to male SMA adapter.  I cut the adapter in half, soldered the capacitor to each end, then wrapped it in metal tape as a shield.

zeroBL

The receivers are from CSG and each one is connected to a Next Thing CHIP single board computer, which logs the data and transmits it over wireless to my laptop.  They are very similar to the Raspberry Pi data loggers I described in a previous post, but the on-board wireless makes them more convenient to use.  At $9 each, they are also quite affordable, especially since they do not need micro SD cards like the Raspberry Pi Zeros.  They also have a built-in LiPo battery connector which can be convenient for providing power., although they can also be powered over the USB connectors.  They are also linux based, so setting them up is very similar to the instructions in my Raspberry Pi post.

I first looked at the zero baseline case where I used one receiver as base and the other as rover.  In this case the two receivers are seeing exactly the same signal from the single antenna.  Any error contributions from the satellites, atmosphere, or antenna should cancel and the only contributor to the residual errors should be from the receivers.

I collected about an hour of measurement data from my back patio.  It is next to the house and nearby trees so as usual, the data quality is only mediocre and will include both some multipath and signal attenuation.  I prefer to look at less than perfect data because that is where the challenges are, not in the perfect data sets collected in wide-open skies.

Here are the residuals for a high elevation, high signal strength GPS satellite.  Standard deviations are 0.24 meters for the pseudorange and 0.0008 meters for the carrier phase.

zeroBL1

For a lower elevation GPS satellite with low and varying signal strength, the standard deviations increased to 0.46 meters for the pseudorange and 0.0017 meters for the carrier phase.  Notice how the residuals increase as the signal strength decreases as you would expect.

 

zeroBL2

The GLONASS satellites had noticeably higher residuals.  Here is an example of a satellite with high elevation and reasonable signal strength.  The standard deviations were 1.02 meters for pseudorange and 0.0039 meters for carrier phase, more than twice the GPS residuals.

zeroBL3

I’m not quite sure how relevant it is, but the ratio between the pseudorange residuals and carrier phase residuals in each case is roughly 300, the same value I have found works best for “eratio1”, the config file input parameter that specifies the ratio between the two.

RTKLIB also estimates the standard devations of the GLONASS satellites measurements at 1.5 times the standard deviations of the GPS satellites which is less than the difference I see in the example above.

However, my numbers are for only the receiver components of the measurement errors, I’m not sure exactly which components the RTKLIB config parameters are intended to include.

For the second experiment, I calculated solutions for both receivers relative to a CORS reference station about 8 km away.  In this case, I was curious to see how close the two solutions are as they will have common satellite, atmospheric, and antenna errors but will differ in their receiver errors.  The plot below shows the residuals for a GPS satellite from each solution plotted on top of each other.  As you can see the errors are quite a bit larger than before and the correlation between the two receivers is very high.  Based on the frequency of the errors, I suspect they are dominated by multipath which will vary roughly sinusoidally as the direct path and reflected path go in and out of phase with each other.

I found it quite impressive to see how repeatable the errors are between the two solutions.  It indicates, at least at this distance, that the errors from the receiver are small compared to the other errors in the system.zeroBL4

Again, the GLONASS results were not as good as the GPS results and include a DC shift in the carrier phase that I’m not sure exactly what the cause is.

zeroBL5

I haven’t spent a lot of time trying to figure out how to best use the information in these plots but in particular I found the similarity between the two receiver solutions in the longer baseline experiment quite encouraging.  If the errors are dominated by multipath as I expect, then the baseline length isn’t that relevant and I would expect to see similar results with shorter baselines.  If that’s true, then it may be possible to derive information about the receiver’s environment from the multipath data.  People do this with more expensive dual frequency receivers to monitor things like tides and ground moisture content.  It would be interesting to see if it can be done with these low cost receivers.  Or maybe it already has been done …

 

New firmware, new satellites, new code

CSG Shop is now shipping all of their M8N and M8T u-blox receivers with the latest version 3 firmware.  This is not such good news for the M8N units since the raw measurements are scrambled and these receivers need to be downgraded to the previous firmware version before using with RTKLIB.  For the M8T receivers though, the new firmware is good news because it contains support for the Galileo satellite system.

I now have two of their M8T receivers with the new firmware and did a little testing to see how RTKLIB works with the Galileo measurements.  I did have to make a couple small changes to get things working.

First of all, the RNX2RTKP compile options for including the Galileo code was not enabled.  For some reason, all the other apps did have this option enabled.  To enable it, I had to add “ENAGAL” to the “Preprocessor Defintions”  for C/C++ in the Project menu in Visual Studio.

The second issue I ran into was in the decode_rxmrawx() function that decodes the raw u-blox RXM-RAWX messages.  There is a line of code in this function that sets the code type based on the system.

raw->obs.data[n].code[0]=
       sys==SYS_CMP?CODE_L1I:(sys==SYS_GAL?CODE_L1X:CODE_L1C);

This line sets the code to L1X for Galileo, but that code type doesn’t seem to be supported by RTKLIB and the measurements in the RINEX file for the Galileo satellites get left blank.  Changing the “L1X” in the above statement to “L1C” resolves the problem.  That leaves an unnecessary check in the code but I will leave it there at least until I understand what it was supposed to do.  After that everything else worked fine including ambiguity resolution with the Galileo satellites, so that was quite encouraging.

Next,  I put the two receivers outside in the front yard to collect a longer set of data.  Not an ideal environment because they were close to the house but fairly open skies otherwise.  In an hour of data collection I got measurements from 11 GPS satellites, 8 GLONASS satellites, 5 Galileo satellites, and 3 SBAS satellites. After collecting the data, I processed it with various constellation options to see how they compared.  For all the solutions, I set ambiguity resolution mode to “continuous”, position mode to “kinematic”, and opened up the position variance threshold for AR (arthres1) to allow the solution to lock up as early as possible.  I also enabled all constellations for ambiguity resolution in each case.  Here’s how they compared:

satcombos1a

satcombos1b

Note that the time scale on the GPS-only plot is very different than the others since it took much longer to lock up than any of the other combinations.  With the GPS satellites only, there was an initial short false fix after 14 minutes, then a good fix at 27 minutes that lasted a few minutes but it did not get a solid fix until 43 minutes after it started.  That’s a long time to wait!  Adding a second constellation significantly improved the results, with solid fixes coming after two minutes with GLONASS added, five minutes with SBAS added, and 7 minutes with Galileo added.  Adding a third consellation improved things even more, with times to first solid fix varying from 12 secs for GPS+SBAS+GLO, 3.5 min for GPS+GAL+GLO, and 6 min for GPS+SBAS+GAL.  Using all four constellations gave a time to first solid fix of 2 minutes, not the fastest time, but better than two out of three of the three constellation answers.

It is risky to conclude too much from one data set, but these results are consistent with other data I’ve looked at (for three constellations) that show the more satellites you use the better the answer.  This seems to make sense to me since more information should be better than less information.  However, I often hear or read recommendations to use only the GPS data for better results which I don’t understand.  If anyone has data to support that recommendation I would like to see it to understand it better.

I do sometimes see that one bad satellite can prevent or delay a solution no matter how many good satellites there are and this may be part of the answer.  The more satellites you use, the higher chance there is of having a bad one and RTKLIB is not great at rejecting a bad satellite.  The “arlockcnt” and “ARFilter” features do help prevent bad satellites from getting into the AR solution but they do not reject a satellite if it goes bad after being accepted into the solution.  I have added a new feature starting with the demo5 b26a code that does try to reject bad satellites after they have been accepted into the AR solution but have not had a chance to do a lot of testing on it yet.  It was enabled for the test above and may possibly have helped, I did not look into the details.  The feature is enabled by setting the “pos2-mindropsats” to a value lower than the number of satellites in the solution, in which case it will cycle through dropping all the satellites, one by one, one each epoch, and reject a satellite that has a large negative effect on the AR ratio.  If you try this feature, be careful not to set the minimum satellite threshold too low or you will increase the chances of a false fix.  I would recommend values no lower than 10 satellites.

I have released a new version of the demo 5 code (b26b) with the fixes for Galileo, a couple of new features and fixes, and GUI updates for RTKPOST and RTKNAVI for all the new input parameters for both b26a and b26b codes.  The binaries and a list of the changes are available here.  The source code is available on my Github page.

 

 

 

Demo5 b26a code release

I’ve just released a new version of the demo5 code.  It has the time tag adjustments for RTCM conversion described in the last post as well as a few new features that I will describe in future posts.

You can download the binaries from here.  There is also a short description of the new features on that page.  The source code is available on my  Github page.

 

A fix for the RTCM time tag issue

In my last post I described a problem with a loss of some of the raw measurement information caused by the lack of resolution in the time tags in the RTCM format.  Since the RTCM format is typically used to reduce bandwidth requirements in real-time applications, it is causing real-time solutions to fail when post-processing the same raw data without the translation to RTCM gives good results.  In this post I will describe a fix for this problem.

First of all I want to thank Felipe Nievinski, Igor Vereninov from Emlid, and Anthony Woolridge for their comments to the last post that pointed me to the solution.  They make this a collaborative effort between the U.S., Brazil, Russia, and the U.K!  It still amazes me how enabling the internet can be!

I’ll start by showing again this example of a RINEX output from an M8T receiver with the official raw measurement output (RXM_RAWX) and the debug raw measurement output (TRK_MEAS) enabled simultaneously.  I think  this provides a good insight to what is going on.  The RXM_RAWX message is the top 5 lines and the TRK_MEAS message is the bottom 5 lines for a single epoch.  The first line in each message is the time stamp and the following lines are the measurements for each satellite.  In the satellite measurements, the second column contains the pseduorange value.

trkmeas1

The time stamp specifies the receiver time of the received signals and the sixth column is the number of seconds.  For the TRK_MEAS message these values are always aligned to round numbers based on alignment to the sample rate.  For example in this case the measurement rate was 5 Hz and all the time stamps occur on multiples of 0.2.  This is because they are based on the raw receiver clock without any corrections.

The time stamps from the RXM_RAWX messages however often differ from the round numbers by small arbitrary amounts.  This is because the receiver has estimated the error in its own clock and adjusted the measurements to remove this error.  In this case the estimate of clock error is 0.001 seconds and so the time stamp is adjusted by this value (18.8000000 to 18.7990000).

To keep the time stamps consistent with the other parts of the measurement, the clock error also needs to be removed from the psuedorange and carrier phase values since they are based on the difference in time between satellite transmission and receiver reception and will include any errors in the receiver clock.  We see from the above observations that the pseudorange measurement for satellite G24 has been adjusted from 22675327.198 to 22375547.970, a difference of 299779.228 meters.   The speed of light is 299792458 meters per second so the clock error of 0.001 seconds is equivalent to 299792.458 meters,  a value very close to the amount that the pseudorange was adjusted by.

A similar adjustment needs to be made to the carrier phase measurement as well but it is not as easy to see in this example because the carrier phase measurements are relative rather than absolute and the two messages in this case use different references.  The carrier phase measurements are in cycles, not meters, so the frequency of the carrier phase needs to be included in the translation from clock error to carrier phase cycles but is otherwise the same as the pseudorange adjustment.  In equation form, the adjustments are:

P = P -toff*c
L =L – toff*freq

where P=pseudorange, L=carrier phase, c= speed of light, and freq=carrier frequency

So, basically, the receiver is trying to help us out by removing its best estimate of the clock error from the measurements.  This is unnecessary since RTKLIB is quite good at estimating this clock error on its own, but by itself this adjustment does not cause a problem.

It is when the adjusted measurement is translated to RTCM that we get in trouble.  The resolution of the time stamps in the RTCM format is 0.001 seconds.  In this particular example it would not be an issue because the error is exactly 0.001 seconds or one count of the RTCM format.  Most of the time, however, this error is not an exact multiple of 1 millisec.

For example, here is a time stamp for the data set described in the previous posts.

> 2017  1 17 20 31 48.9995584  0  9

And here is the same time stamp after being translated to RTCM and then to RINEX

> 2017  1 17 20 31 49.0000000  0  9

As you can see, the clock adjustment was less than half a millisec so was completely lost in the roundoff to the RTCM format.  However, the adjustments the receiver made to the pseudorange and carrier phase are still present in those measurements.  We now have a problem because the clock correction is in part of the measurement and not the other pieces.  RTKLIB can not correct for this lack of consistency within the measurement.

So, how do we avoid this problem?  Fortunately, RTKLIB has an option to adjust the time stamps to round values using the same equations described above to adjust time stamp, pseudorange, and carrier phase to maintain consistency within the measurement.   I imagine it was put in specifically to solve this problem. We can invoke this option by adding “-TADJ=0.001” in the “Options” box in the “Conversion Options” menu in STRSVR or using the “-opt” option in the command line with STR2STR.  Note that this option needs to be set in the conversion from raw binary format to RTCM format, not the conversion from RTCM to RINEX.  It is possible to set this option when converting from RTCM to RINEX but this won’t help because the damage has already been done in the earlier conversion.

Unfortunately, there is a bug in the implementation of this option in RTKLIB, at least for the u-blox receivers, so by itself, this is not enough.  The problem is that invalid carrier phase measurements are flagged in RTKLIB by setting the carrier phase value to zero.  The time stamp adjustment feature adjusts these zero values slightly so they are no longer recognized as invalid.  They end up getting included in the output as valid measurements and corrupt the solution.

Fortunately, the fix for this bug is very simple.  Here is the code in the decode_rxmrawx() function in ublox.c that makes the adjustment:

/* offset by time tag adjustment */
if (toff!=0.0) {
fcn=(int)U1(p+23)-7;
freq=sys==SYS_CMP?FREQ1_CMP:
(sys==SYS_GLO?FREQ1_GLO+DFRQ1_GLO*fcn:FREQ1);
raw->obs.data[n].P[0]-=toff*CLIGHT;
raw->obs.data[n].L[0]-=toff*freq;
}

If we add a check to the first line of code to skip the adjustment if the carrier phase is zero, then all is fine.

if (toff!=0.0&&cp1!=0) {

Below is the original solution after RTCM conversion on the left and with time tag adjustment and the bug fix on the right.  If you compare the solution on the right to the solution with no  RTCM correction in the previous post you will see they are nearly identical.

timetag

I am still wary of using RTCM because of its other limitations described in the last  post, particularly the loss of the half cycle invalid flag and the doppler information, but I believe this fix eliminates the most serious issue that comes from using RTCM.

I will release a new version of the demo5 code with this fix sometime in the next few days.  It will take a little while because I also want to include some other features that have been waiting in the pipeline.  If you want to try the fix right away, you just need to  modify the one line of code described above and rebuild.

Update 2/2/17:    I have taken Anthony Woolridge’s suggestion and modified the RTCM conversion code to automatically adjust the pseudorange and carrier phase measurements to compensate for any round off done to the time tag.  This means it is not necessary to set the time-tag adjust receiver option.  This change is currently checked into my Github page and I hope to post new executables in the next couple of days.

Limitations of the RTCM raw measurement format

In the last post I described a process to troubleshoot problems occurring in real-time solutions that are not seen in post-processing solutions for the same data.  I collected a data set demonstrating this issue, and traced the problem to the conversion of the measurement data from raw binary format to the RTCM format.  This conversion is typically done in real-time applications to compress the data and minimize bandwidth requirements for the base to rover real-time data link.  In this post I will look into that example in more detail and also explore some of the limitations of the RTCM format.

First, it is important to understand that the conversion to RTCM is not a lossless process. There are several ways in which information is lost in this process.  In some cases these losses are probably not significant but in other cases it is not so clear that is the case.

So let’s look at some of those differences.  We actually have three formats to compare here: the raw binary format from the u-blox receiver, the RTCM format, and the RINEX format.  Both the RTCM and RINEX formats contain less information than the raw binary format and information is lost when the conversion is made to either format.  The reason I include the RINEX format here is because in the post-processing procedure, the measurements, whether they come from the raw binary format or the RTCM format, must first be converted to RINEX format before being input into the solution.   What I see with my example data set that fails in real-time is that it looks good in post-processing if the raw measurements are converted directly from raw binary to RINEX but fail if the raw measurements are first converted to RTCM and then the RTCM is converted to RINEX.  Therefore it is very likely that there is something critical that is lost in the conversion to RTCM that is not lost in the conversion to RINEX.

The official RTCM spec is not freely available on the internet (it must be purchased), so I have relied on this document from Geo++ for the RTCM details.  Here is a chart of the most significant differences I am aware of between the three formats.  In the case of RTCM, these numbers apply only to the older 1002/1010 messages used by Reach and most other systems, not the newer MSM messages.

U-blox binary RINEX 3.0 RTCM 3.0
Psuedorange resolution double precison floating point 0.001 m 0.02 m
Carrier phase resolution double precison floating point 0.001 cycles = 0.2 mm 0.5 mm
Doppler resolution single precision floating point 0.001 Hz Not supported
Time stamp resolution double precison floating point 100 nsec 1 msec
Lock time 1 ms Lock status only Variable (> 1 ms)
Half cycle invalid Supported Supported Not supported

 

To figure out which (if any) of these differences is responsible for the failure I needed a way to run the solution multiple times, each run done with only a single difference injected into the conversion.

I already had a matlab script I had previously written previously to parse a RINEX observation file into a set of variables in the matlab space.  So I wrote a second script that goes the other way, from variables in memory to a RINEX observation file.  Once I had done this, I could read in the good RINEX observation file translated directly from the u-blox binary file, modify a single measurement type, write it back to a new RINEX observation file, then run this file through a solution.

My first guess was that it was the missing  “Half Cycle Invalid” flag that would prove to be the culprit since I have seen this before with the M8N receiver as described in this post.  Although I suspect that this probably is true in some cases, it did not make a difference with this data set.  My next suspect was the missing doppler measurements, since RTKLIB uses the doppler measurements when estimating the receiver clock bias, but again, it was not the case.  In the end it turned out to be my very last guess that made the difference and that was the time stamp resolution.  So much for me thinking I was starting to get the hang of this RTK stuff!  The differences were so small in the time stamps relative to the distance between them, that I had unconsciously  ignored them.  For example, the two first time stamps in the good measurements were 49.9995584 and 50.999584 but the time stamps in the failing measurements had been rounded off to 50.0000000 and 51.0000000.  Even after discovering that this round-off error makes a difference, it still is not obvious to me why this is true.  In any GPS solution, the receiver clocks are assumed to lack sufficient accuracy  to be relied upon without correction and the clock errors are one of the unknowns in the solution along with the three  position axes.  I don’t know why RTKLIB does not correctly estimate this error in its clock bias estimate and remove it.  Maybe one of you guys who has been doing this a lot longer than I have can explain this?

Just to be sure it wasn’t a fluke, I started the data processing at three different times in the data set, and I also ran additional solutions with the sign of the error in the time stamps reversed.  In every cases, regardless of sign, or starting location, the solution failed to get a fix when the error was present and succeeded when the error was not there.

I have read somewhere that more expensive receivers will typically align there time stamps to round numbers which would avoid the need for as much resolution.  The only expensive receivers I have access to are the CORS stations so I took a look at data from a couple of them.  Sure enough, it appears to be true that they do use round numbers for their time stamps.  If this is more generally true it might explain why the RTCM spec does not have sufficient resolution for the u-blox data but would work fine for more commonly used, higher priced receivers.

I was curious why the u-blox time stamps don’t occur at round numbers so took a look  at the hardware description spec.  I found this explanation

“In practice the receiver’s local oscillator will not be as stable as the atomic clocks to which GNSS systems are referenced and consequently clock bias will tend to accumulate. However, when selecting the next navigation epoch, the receiver will always try to use the 1 kHz clock tick which it estimates to be closest to the desired fix period as measured in GNSS system time”

I interpret this to mean that the receiver is aware of alignment error in its clock source relative to GPS system time, and it adjusts the time stamp values to  includes its estimate of that error.

Something else I am curious about but have not had time to investigate in any detail is how this issue is affected by differences between the RXM_RAWX measurements which are what is normally used with the M8T receiver, and the debug TRK_MEAS messages which also contain the raw measurements and are the only raw measurement messages available on the M8N receiver.  Looking at several data sets from the both the M8N and M8T, it appears that the TRK_MEAS time stamps for both receivers are aligned to round numbers  while the RXM-RAWX measurements are not aligned.  This means that the TRK_MEAS messages would not be affected by the lack of resolution in the RTCM format.   However, the TRK_MEAS measurements lack the compensation for inter-channel frequency delays in the GLONASS measurements and so would not be a good substitute.  Maybe it’s possible to combine the two into a single set of measurements?  The two include different references and clock errors so it is not obvious if that is possible. Below is an example of partial TRK_MEAS and RXM-RAWX outputs for the same epoch when both were enabled, TRK_MEAS on the top, and RXM_RAWX below.

trkmeas1

Another avenue I considered is using the newer MSM messages (1077,1087)in the RTCM format instead of the current 1002/1010 messages that Reach and most other users are using.  These have higher resolutions for the pseudorange and carrier phase, and include doppler and half cycle invalid flags.  Unfortunately, the resolution for the time stamps does not seem to have changed, or if it has, it hasn’t changed enough to see a difference in the output for the small deltas in my example.

There also appears to be a bug in the RTKLIB implementation of the encode or decode of these messages which sometimes causes the number of integer cycles in the carrier phase measurements to be incorrect (the fractional part is fine).    This bug appears to be present in both the official 2.4.3 release and the demo5 code but some of the changes I have made to the u-blox translation in the demo5 code seem to have increased the frequency of these incorrect measurements.

Reach does use the MSM messages for the SBAS measurements although it does not need to since the 1002 message supports SBAS as well as GPS.   It is possible this could introduce a problem for users in North America where the WAAS satellites used for SBAS correction include carrier phase measurements.  Users in Europe would not see this problem because the EGNOS satellites used for SBAS correction in Europe don’t provide the carrier phase.  I did not see any corruption in the SBAS carrier phase measurements in the initial RTCM data in this example but after I enabled the 1077 and 1087 measurements, I did see corruption in the measurements in all three systems.

So, unfortunately this is still somewhat a work in progress and I don’t have any easy answer how to fix this.  I am hoping some of the experts out there can comment and help put some of the pieces of the puzzle together.

In the meantime I would suggest using the u-blox binary format for the base-rover data-link instead of the RTCM format.  The bandwidth requirements will be 2.5 to 3 time higher but some of this can be offset by reducing the measurement sample rate for the base station.

I believe a long term fix is going to require two things.  First of all a workaround to the time tag resolution issue described in this post.  But even with fixed, the half cycle valid flag and doppler information will still be lost.  I haven’t  done any tests to understand how critical the doppler measurements are, but I have demonstrated in the post I referenced above, that losing the half cycle valid flag can definitely degrade the solution.  Fortunately, the newer MSM RTCM messages do include both half cycle valid flag and doppler.  They do not appear to be usable until the bug in the encode/decode of the carrier phase data is fixed, so that will have to happen as well.

On the other hand, I suspect most real-time RTK systems do use RTCM and manage to live with its limitations so maybe I am overreacting here.  I would be interested in other people’s opinions and experiences with RTCM on u-blox or other receiver types.