Comparison of the u-blox M8T to the u-blox M8P

I recently acquired a couple of  u-blox C94-M8P receivers and so I am now able to do a direct comparison between the u-blox M8T and M8P modules, something I’ve been wanting to do for a while.

I believe the hardware between the two receivers is identical, the differences are only in firmware.  The most significant difference in firmware is that the M8P includes an internal RTK solution.  In order to squeeze this into the firmware, however, they had to remove support for the Galileo and SBAS constellations, so this is another fairly significant difference.

Cost, of course, is also different.  The M8T receivers I usually use are available from CSGShop for $75.  CSG sells an M8P receiver for $240 or you can buy a kit with two C94-M8P eval boards from u-blox for $400.  The C94-M8P boards also include on-board radios.

In this post, I will describe how I used RTKLIB and a cell phone hot spot to connect the M8Ps rather than using the internal radios.  I will also compare the RTKLIB solutions for a pair of M8T receivers with the internal u-blox RTK solution for a pair of M8P receivers. I hope to compare the RTKLIB solution to the internal solution for a pair of M8Ps in a future post.  If you are mostly interested in the M8T to M8P comparison results, you can jump directly to the end of this post.

To set up the experiment I first connected both a C94-M8P receiver and a CSG M8T receiver to the GPS survey antenna on my roof using a signal splitter.  I then connected both receivers to a laptop with USB cables.  I configured the M8P using u-center following instructions in the C94-M8P Setup Guide with a few modifications.  First of all, I normally use an NTRIP caster over a cell phone hot spot for my real-time data link so didn’t want to bother with the on-board radios.  I disabled the radios following the instructions in the setup guide.  This involves removing a plastic cover, soldering a wire to a capacitor, drilling a hole in the plastic cover to run the wire, then plugging the other end into a pin on the connector.  It is also possible to do this by connecting an external voltage and ground to the correct pins on the connector, but a simple jumper option would have been a lot more convenient.

The setup instructions are intended to run the setup and solution output messages over the USB port while running the RTCM raw observation messages between receivers over the UART interface.  The UART interface is internally connected to the radios.  Since I am not using the radios, I wanted to run all communication over the USB interface to avoid extra cables.  To do this, I disabled the UART interface and configured the USB interface for UBX messages in and RTCM messages out using the UBX-CFG-PRT configure command from u-center.  Following the C94-M8P setup instructions, I then specified a fixed base station location with the UBX-CFG-TMODE3 command and enabled 1005,1077,1087, and 1230 RTCM messages which include the raw observations, base station location, and GLONASS biases.

I setup the base station M8T receiver as I usually do to output the raw observations using the UBX-RXM-RAWX messages.

I then started two copies of the RTKLIB STRSVR app on the base station laptop and streamed both sets of base observations to an NTRIP server as I’ve described in an earlier post using the free RTK2GO.com community NTRIP server.  With this setup, I can receive the NTRIP streams anywhere I can get cell phone coverage, using a cell phone hot spot and a laptop connected to the two rovers and I can test over much longer distances than the radios would allow.

One thing to be aware of is that there are separate versions of the receiver firmware for the M8P base and M8P rover.  I didn’t realize this at first.  It turned out that both of my receivers came loaded with the rover firmware so I spent an embarrassingly long time unsuccessfully trying to configure one of them as a base station.  Once I realized the problem, I was able to fairly quickly download the base firmware from the u-blox website, then use u-center to download the base station firmware to the receiver.

Next, I attached the two rover receivers to a laptop using USB cables and to a single antenna using a signal splitter.  To make the results more comparable to some of my other recent experiments I used the same GPS-500 L1/L2 antenna from SwiftNav that I used for those experiments.  This is a more expensive antenna than generally would be used with these receivers but I wanted to avoid mixing differences between antennas with differences between receivers.

The M8P rover does not need much configuration since it will by default process any incoming RTCM messages as inputs to an RTK solution.  I did configure the USB port  to support NEMA, UBX, and RTCM messages in and UBX and NMEA messages out.  For the static rover test, I left the position message output rate at the default 1 Hz, but increased it to 5 Hz for the dynamic rover test.  I suspect this only affects the message output rate, and not the solution itself since the solution is probably run at a faster internal rate.

The only other setting I modified in the M8P rover receiver was the dynamic model of the navigation mode using the UBX-CFG-NAV5 message.  I set it to “Pedestrian” for my static rover test and “Automotive” for my moving rover test.

I then started another copy of STRSVR on the rover laptop to stream the M8P base station data from the NTRIP server to the M8P rover over the USB COM port.

At this point, getting the solution output messages from the M8P rover is a little tricky.  Only a single user can attach to a Windows COM port at one time and the STRSVR app is not fully bi-directional.  One solution might be to use u-center to stream the NTRIP data and receive the rover messages but I did not verify if this is possible.

Instead, I used a feature of STRSVR that allows you to pipe the data coming from the other direction to a TCP/IP port which can then be processed by either another copy of STRSVR, or plotted with RTKPLOT, or used as an input to RTKNAVI, or all three at once.  Below, on the left, I show how I modified the STRSVR setup that streams the base data from the NTRIP client to the rover by checking the “Output Received Stream to TCP Port” and selecting port 1000.  On the right, I show the STRSVR setup necessary to stream this incoming data from the rover to a file.  This is a convenient work-around for the problem of only being able to connect one user at a time to a COM port.

strsvr2

In my case I configured the rover receiver to send both the M8P internal solution position using NMEA messages and the raw observations using UBX messages so that I could post-process the raw observations later with RTKLIB.  In general though, it is only necessary to send the NMEA messages.  I configured the rover to send GGA and RMC NMEA messages, but others will work as well.

For the M8Ts, I ran a real-time RTKLIB solution using RTKNAVI and my normal kinematic solution configuration for both static and moving rover tests.  As usual, I used the demo5 version of RTKLIB for this test.

So, how did they compare?  First of all, a few differences to consider between the solutions.   The M8P solution only uses GPS and GLONASS satellites since the SBAS and Galileo satellites have been disabled in the M8P firmware as I mentioned earlier.  This should give the M8T solution an advantage.  The M8P also has limited processing power relative to the laptop running the RTKLIB solution, so this may also give the M8T solution an advantage.  On the other hand, I’m sure that u-blox has lots of smart people that know all the internal details of the hardware which should give the M8P solution an advantage.

For my first comparison, I did a a static rover test with the antenna located on a tripod a few meters from the house and partially obstructed by trees.  To avoid different starting conditions between the two receivers, I connected both receivers to the antenna and waited till they both converged to fixed solutions before starting the test.  I then disconnected the antenna for about 30 seconds, and then reconnected it.  This will cause both receivers to lose lock with all satellites and the solutions will have to re-converge from scratch so this should be a fair comparison.  I disconnected and reconnected the antenna several times to compare the times to re-acquire fix and the accuracies of the fixes.

The results are plotted below, the M8P internal solution on the top, and the real-time RTKLIB M8T solution on the bottom.  I re-connected the antenna five times.  Once, the internal M8P solution re-acquired fix slightly quicker than the RTKLIB M8T solution, once the M8T re-acquired fix slightly quicker than the M8P.  The other three times, the M8T re-acquired fix significantly quicker than the M8P.   The M8P times to re-acquire varied from 52 to 220 seconds.  The M8T took from 50 to 76 seconds.

The M8T RTKLIB solutions had no false fixes, the M8P internal solutions had false fixes with several meter errors for over five seconds in two of the five tests.  The false fixes are circled with red in the plot below.

m8p1

The E-W and N-S axes matched within one centimeter between the two solutions.  The U-D axis at first appears to differ by 21.5 meters between the two solutions but this is because I specified the RTKLIB vertical solution to be relative to the ellipsoid, while the M8P solution is relative to the geoid.  If I subtract the geoid to ellipsoid offset (available in the GGA message from the M8P), then the two solutions match in the vertical axis as well.

Overall, I would have to say that in this particular run,  the RTKLIB M8T solution out-performed the M8P internal solution by a fairly significant amount.

One other observation about the M8P solution is that the precision reported (at least in the GGA message) is only to one centimeter in the horizontal axes and 10 cm in the vertical accuracy.  I didn’t see any obvious way to get outputs with better precision than this, at least with the NMEA messages.  [Note 8/16/18: It turns out that the UBX-CFG-NMEA message can be used to enable high precision mode to fix this.  Thanks to Charles Wang for pointing this out in a comment]

Next I mounted the same antenna on the roof of my car and drove around the neighborhood.  The sky view varied from nearly unobstructed to moderate tree canopy.  Here are the results of that test, the MP internal RTK solution on the left, and the M8T real-time RTKLIB solution on the right.

m8p2

The fix rate for the M8P solution was 52.4%.  The M8T solution was noticeably better with a 95.2% fix rate.

Here’s a plot of the difference between the two solutions (green indicates both solution are fixed, yellow indicates that at least one is float).  The U-D axis differences are fairly large, mostly because of the 10 cm vertical precision of the M8P position messages.  There is also a fairly large discrepancy between the two solutions around 22:55:00.  This corresponds to a 15 second gap in base station data, probably due to a loss of cell phone reception.  This seems to be a large error for such a short base outage but I was not able to narrow down the cause beyond this or determine which receiver contributed more to this combined error.  This probably deserves more investigation in a future post.

m8p2d

Again, the real-time RTKLIB M8T solution appeared in this test to be better than the internal M8P solution.

The most likely reason the M8T is getting a better solution is that it is using more satellites.  Here are the rover raw observations for the M8P on the left, and the M8T on the right.  With the GPS and GLONASS constellations, both receivers are seeing 13 satellites.  With Galileo and SBAS, the M8T is seeing an additional 7 satellites.  One of these Galileo satellites is not fully operational yet and so is not used in the solution, but that still gives 6 extra satellites, nearly 50% more than the M8P solution.

m8p3

If the extra satellites are what is improving the M8T solution, then running a solution with the M8T data without the SBAS and Galileo satellites should give a solution similar to the M8P solution.

This is easy to do, so let’s try it.  Here’s a post-processed solution for the M8T data using only the GPS and GLONASS satellites.

m8p4

Fix rate is 66.5%, reasonably close to the 52.4% fix rate from the M8P internal solution.  This suggests that a large part of the improvement for the M8T solution is coming from the additional satellites and not from any differences in the solution algorithms.

As in all my comparisons, this is intended to be one user’s initial experience, and not any kind of rigorous comparison test.  Please interpret these results with that in mind.  If you have experiences that differ from these I would be very interested to hear about them.

Advertisements

A fix for the RTCM time tag issue

In my last post I described a problem with a loss of some of the raw measurement information caused by the lack of resolution in the time tags in the RTCM format.  Since the RTCM format is typically used to reduce bandwidth requirements in real-time applications, it is causing real-time solutions to fail when post-processing the same raw data without the translation to RTCM gives good results.  In this post I will describe a fix for this problem.

First of all I want to thank Felipe Nievinski, Igor Vereninov from Emlid, and Anthony Woolridge for their comments to the last post that pointed me to the solution.  They make this a collaborative effort between the U.S., Brazil, Russia, and the U.K!  It still amazes me how enabling the internet can be!

I’ll start by showing again this example of a RINEX output from an M8T receiver with the official raw measurement output (RXM_RAWX) and the debug raw measurement output (TRK_MEAS) enabled simultaneously.  I think  this provides a good insight to what is going on.  The TRK_MEAS  message is the top 5 lines and the RXM_RAWX message is the bottom 5 lines for a single epoch.  The first line in each message is the time stamp and the following lines are the measurements for each satellite.  In the satellite measurements, the second column contains the pseduorange value.

trkmeas1

The time stamp specifies the receiver time of the received signals and the sixth column is the number of seconds.  For the TRK_MEAS message these values are always aligned to round numbers based on alignment to the sample rate.  For example in this case the measurement rate was 5 Hz and all the time stamps occur on multiples of 0.2.  This is because they are based on the raw receiver clock without any corrections.

The time stamps from the RXM_RAWX messages however often differ from the round numbers by small arbitrary amounts.  This is because the receiver has estimated the error in its own clock and adjusted the measurements to remove this error.  In this case the estimate of clock error is 0.001 seconds and so the time stamp is adjusted by this value (18.8000000 to 18.7990000).

To keep the time stamps consistent with the other parts of the measurement, the clock error also needs to be removed from the psuedorange and carrier phase values since they are based on the difference in time between satellite transmission and receiver reception and will include any errors in the receiver clock.  We see from the above observations that the pseudorange measurement for satellite G24 has been adjusted from 22675327.198 to 22375547.970, a difference of 299779.228 meters.   The speed of light is 299792458 meters per second so the clock error of 0.001 seconds is equivalent to 299792.458 meters,  a value very close to the amount that the pseudorange was adjusted by.

A similar adjustment needs to be made to the carrier phase measurement as well but it is not as easy to see in this example because the carrier phase measurements are relative rather than absolute and the two messages in this case use different references.  The carrier phase measurements are in cycles, not meters, so the frequency of the carrier phase needs to be included in the translation from clock error to carrier phase cycles but is otherwise the same as the pseudorange adjustment.  In equation form, the adjustments are:

P = P -toff*c
L =L – toff*freq

where P=pseudorange, L=carrier phase, c= speed of light, and freq=carrier frequency

So, basically, the receiver is trying to help us out by removing its best estimate of the clock error from the measurements.  This is unnecessary since RTKLIB is quite good at estimating this clock error on its own, but by itself this adjustment does not cause a problem.

It is when the adjusted measurement is translated to RTCM that we get in trouble.  The resolution of the time stamps in the RTCM format is 0.001 seconds.  In this particular example it would not be an issue because the error is exactly 0.001 seconds or one count of the RTCM format.  Most of the time, however, this error is not an exact multiple of 1 millisec.

For example, here is a time stamp for the data set described in the previous posts.

> 2017  1 17 20 31 48.9995584  0  9

And here is the same time stamp after being translated to RTCM and then to RINEX

> 2017  1 17 20 31 49.0000000  0  9

As you can see, the clock adjustment was less than half a millisec so was completely lost in the roundoff to the RTCM format.  However, the adjustments the receiver made to the pseudorange and carrier phase are still present in those measurements.  We now have a problem because the clock correction is in part of the measurement and not the other pieces.  RTKLIB can not correct for this lack of consistency within the measurement.

So, how do we avoid this problem?  Fortunately, RTKLIB has an option to adjust the time stamps to round values using the same equations described above to adjust time stamp, pseudorange, and carrier phase to maintain consistency within the measurement.   I imagine it was put in specifically to solve this problem. We can invoke this option by adding “-TADJ=0.001” in the “Options” box in the “Conversion Options” menu in STRSVR or using the “-opt” option in the command line with STR2STR.  Note that this option needs to be set in the conversion from raw binary format to RTCM format, not the conversion from RTCM to RINEX.  It is possible to set this option when converting from RTCM to RINEX but this won’t help because the damage has already been done in the earlier conversion.

Unfortunately, there is a bug in the implementation of this option in RTKLIB, at least for the u-blox receivers, so by itself, this is not enough.  The problem is that invalid carrier phase measurements are flagged in RTKLIB by setting the carrier phase value to zero.  The time stamp adjustment feature adjusts these zero values slightly so they are no longer recognized as invalid.  They end up getting included in the output as valid measurements and corrupt the solution.

Fortunately, the fix for this bug is very simple.  Here is the code in the decode_rxmrawx() function in ublox.c that makes the adjustment:

/* offset by time tag adjustment */
if (toff!=0.0) {
fcn=(int)U1(p+23)-7;
freq=sys==SYS_CMP?FREQ1_CMP:
(sys==SYS_GLO?FREQ1_GLO+DFRQ1_GLO*fcn:FREQ1);
raw->obs.data[n].P[0]-=toff*CLIGHT;
raw->obs.data[n].L[0]-=toff*freq;
}

If we add a check to the first line of code to skip the adjustment if the carrier phase is zero, then all is fine.

if (toff!=0.0&&cp1!=0) {

Below is the original solution after RTCM conversion on the left and with time tag adjustment and the bug fix on the right.  If you compare the solution on the right to the solution with no  RTCM correction in the previous post you will see they are nearly identical.

timetag

I am still wary of using RTCM because of its other limitations described in the last  post, particularly the loss of the half cycle invalid flag and the doppler information, but I believe this fix eliminates the most serious issue that comes from using RTCM.

I will release a new version of the demo5 code with this fix sometime in the next few days.  It will take a little while because I also want to include some other features that have been waiting in the pipeline.  If you want to try the fix right away, you just need to  modify the one line of code described above and rebuild.

Update 2/2/17:    I have taken Anthony Woolridge’s suggestion and modified the RTCM conversion code to automatically adjust the pseudorange and carrier phase measurements to compensate for any round off done to the time tag.  This means it is not necessary to set the time-tag adjust receiver option.  This change is currently checked into my Github page and I hope to post new executables in the next couple of days.

Limitations of the RTCM raw measurement format

In the last post I described a process to troubleshoot problems occurring in real-time solutions that are not seen in post-processing solutions for the same data.  I collected a data set demonstrating this issue, and traced the problem to the conversion of the measurement data from raw binary format to the RTCM format.  This conversion is typically done in real-time applications to compress the data and minimize bandwidth requirements for the base to rover real-time data link.  In this post I will look into that example in more detail and also explore some of the limitations of the RTCM format.

First, it is important to understand that the conversion to RTCM is not a lossless process. There are several ways in which information is lost in this process.  In some cases these losses are probably not significant but in other cases it is not so clear that is the case.

So let’s look at some of those differences.  We actually have three formats to compare here: the raw binary format from the u-blox receiver, the RTCM format, and the RINEX format.  Both the RTCM and RINEX formats contain less information than the raw binary format and information is lost when the conversion is made to either format.  The reason I include the RINEX format here is because in the post-processing procedure, the measurements, whether they come from the raw binary format or the RTCM format, must first be converted to RINEX format before being input into the solution.   What I see with my example data set that fails in real-time is that it looks good in post-processing if the raw measurements are converted directly from raw binary to RINEX but fail if the raw measurements are first converted to RTCM and then the RTCM is converted to RINEX.  Therefore it is very likely that there is something critical that is lost in the conversion to RTCM that is not lost in the conversion to RINEX.

The official RTCM spec is not freely available on the internet (it must be purchased), so I have relied on this document from Geo++ for the RTCM details.  Here is a chart of the most significant differences I am aware of between the three formats.  In the case of RTCM, these numbers apply only to the older 1002/1010 messages used by Reach and most other systems, not the newer MSM messages.

U-blox binary RINEX 3.0 RTCM 3.0
Psuedorange resolution double precison floating point 0.001 m 0.02 m
Carrier phase resolution double precison floating point 0.001 cycles = 0.2 mm 0.5 mm
Doppler resolution single precision floating point 0.001 Hz Not supported
Time stamp resolution double precison floating point 100 nsec 1 msec
Lock time 1 ms Lock status only Variable (> 1 ms)
Half cycle invalid Supported Supported Not supported

 

To figure out which (if any) of these differences is responsible for the failure I needed a way to run the solution multiple times, each run done with only a single difference injected into the conversion.

I already had a matlab script I had previously written previously to parse a RINEX observation file into a set of variables in the matlab space.  So I wrote a second script that goes the other way, from variables in memory to a RINEX observation file.  Once I had done this, I could read in the good RINEX observation file translated directly from the u-blox binary file, modify a single measurement type, write it back to a new RINEX observation file, then run this file through a solution.

My first guess was that it was the missing  “Half Cycle Invalid” flag that would prove to be the culprit since I have seen this before with the M8N receiver as described in this post.  Although I suspect that this probably is true in some cases, it did not make a difference with this data set.  My next suspect was the missing doppler measurements, since RTKLIB uses the doppler measurements when estimating the receiver clock bias, but again, it was not the case.  In the end it turned out to be my very last guess that made the difference and that was the time stamp resolution.  So much for me thinking I was starting to get the hang of this RTK stuff!  The differences were so small in the time stamps relative to the distance between them, that I had unconsciously  ignored them.  For example, the two first time stamps in the good measurements were 49.9995584 and 50.999584 but the time stamps in the failing measurements had been rounded off to 50.0000000 and 51.0000000.  Even after discovering that this round-off error makes a difference, it still is not obvious to me why this is true.  In any GPS solution, the receiver clocks are assumed to lack sufficient accuracy  to be relied upon without correction and the clock errors are one of the unknowns in the solution along with the three  position axes.  I don’t know why RTKLIB does not correctly estimate this error in its clock bias estimate and remove it.  Maybe one of you guys who has been doing this a lot longer than I have can explain this?

Just to be sure it wasn’t a fluke, I started the data processing at three different times in the data set, and I also ran additional solutions with the sign of the error in the time stamps reversed.  In every cases, regardless of sign, or starting location, the solution failed to get a fix when the error was present and succeeded when the error was not there.

I have read somewhere that more expensive receivers will typically align there time stamps to round numbers which would avoid the need for as much resolution.  The only expensive receivers I have access to are the CORS stations so I took a look at data from a couple of them.  Sure enough, it appears to be true that they do use round numbers for their time stamps.  If this is more generally true it might explain why the RTCM spec does not have sufficient resolution for the u-blox data but would work fine for more commonly used, higher priced receivers.

I was curious why the u-blox time stamps don’t occur at round numbers so took a look  at the hardware description spec.  I found this explanation

“In practice the receiver’s local oscillator will not be as stable as the atomic clocks to which GNSS systems are referenced and consequently clock bias will tend to accumulate. However, when selecting the next navigation epoch, the receiver will always try to use the 1 kHz clock tick which it estimates to be closest to the desired fix period as measured in GNSS system time”

I interpret this to mean that the receiver is aware of alignment error in its clock source relative to GPS system time, and it adjusts the time stamp values to  includes its estimate of that error.

Something else I am curious about but have not had time to investigate in any detail is how this issue is affected by differences between the RXM_RAWX measurements which are what is normally used with the M8T receiver, and the debug TRK_MEAS messages which also contain the raw measurements and are the only raw measurement messages available on the M8N receiver.  Looking at several data sets from the both the M8N and M8T, it appears that the TRK_MEAS time stamps for both receivers are aligned to round numbers  while the RXM-RAWX measurements are not aligned.  This means that the TRK_MEAS messages would not be affected by the lack of resolution in the RTCM format.   However, the TRK_MEAS measurements lack the compensation for inter-channel frequency delays in the GLONASS measurements and so would not be a good substitute.  Maybe it’s possible to combine the two into a single set of measurements?  The two include different references and clock errors so it is not obvious if that is possible. Below is an example of partial TRK_MEAS and RXM-RAWX outputs for the same epoch when both were enabled, TRK_MEAS on the top, and RXM_RAWX below.

trkmeas1

Another avenue I considered is using the newer MSM messages (1077,1087)in the RTCM format instead of the current 1002/1010 messages that Reach and most other users are using.  These have higher resolutions for the pseudorange and carrier phase, and include doppler and half cycle invalid flags.  Unfortunately, the resolution for the time stamps does not seem to have changed, or if it has, it hasn’t changed enough to see a difference in the output for the small deltas in my example.

There also appears to be a bug in the RTKLIB implementation of the encode or decode of these messages which sometimes causes the number of integer cycles in the carrier phase measurements to be incorrect (the fractional part is fine).    This bug appears to be present in both the official 2.4.3 release and the demo5 code but some of the changes I have made to the u-blox translation in the demo5 code seem to have increased the frequency of these incorrect measurements.

Reach does use the MSM messages for the SBAS measurements although it does not need to since the 1002 message supports SBAS as well as GPS.   It is possible this could introduce a problem for users in North America where the WAAS satellites used for SBAS correction include carrier phase measurements.  Users in Europe would not see this problem because the EGNOS satellites used for SBAS correction in Europe don’t provide the carrier phase.  I did not see any corruption in the SBAS carrier phase measurements in the initial RTCM data in this example but after I enabled the 1077 and 1087 measurements, I did see corruption in the measurements in all three systems.

So, unfortunately this is still somewhat a work in progress and I don’t have any easy answer how to fix this.  I am hoping some of the experts out there can comment and help put some of the pieces of the puzzle together.

In the meantime I would suggest using the u-blox binary format for the base-rover data-link instead of the RTCM format.  The bandwidth requirements will be 2.5 to 3 time higher but some of this can be offset by reducing the measurement sample rate for the base station.

I believe a long term fix is going to require two things.  First of all a workaround to the time tag resolution issue described in this post.  But even with fixed, the half cycle valid flag and doppler information will still be lost.  I haven’t  done any tests to understand how critical the doppler measurements are, but I have demonstrated in the post I referenced above, that losing the half cycle valid flag can definitely degrade the solution.  Fortunately, the newer MSM RTCM messages do include both half cycle valid flag and doppler.  They do not appear to be usable until the bug in the encode/decode of the carrier phase data is fixed, so that will have to happen as well.

On the other hand, I suspect most real-time RTK systems do use RTCM and manage to live with its limitations so maybe I am overreacting here.  I would be interested in other people’s opinions and experiences with RTCM on u-blox or other receiver types.

 

 

 

Exploring differences between real-time and post-processed solutions.

I’ve had a few questions recently about differences showing up when the same set of raw data measurements are processed real-time and when they are post-processed.  Since I haven’t done a lot of real-time work I didn’t have a good answer to these questions, but it seemed like an interesting problem so I thought I would dig into a little bit.

In many cases, these differences can be traced to a poorly performing data link between base and rover that loses, delays or corrupts the base measurement data.  These problems are usually diagnosed fairly easily by looking at the “age of differential”  between base and rover or by seeing missing data in plots of the base observations.  My interest is not in these cases but rather where the data link is performing well and there is still a difference between the real-time and post-process solutions.

To troubleshoot real-time solutions is a little trickier than post-processing solutions because you may need a way to re-run the data through the real-time RTKLIB app (either RTKRCV or RTKNAVI) to recreate the problem.  The standard *.ubx log files do not contain enough information to do this since they contain only a time stamp for when the measurement was made and not when it was actually available to the solution.  There will usually be some delay between the two because of latencies in the data link between rover and base.  The post-processing solutions ignore this delay and simply align the two measurements assuming zero delay but we need to know what these delays are to recreate the real-time solution.

The real-time solution apps have an option in the input stream setup to read from a file instead of a real-time stream.  This allows you to re-run previously recorded log files but when doing this they require a *.ubx.tag file in addition to the *.ubx file to provide the latency information.   These *.ubx.tag files are generated automatically when you log real-time data if you select the appropriate option before you collect the data.  For RTKRCV, this is a “::T” appended on to the end of the log file name.  For RTKNAVI, it is checking the “Time Tag” box in the log stream options.  I recommend always enabling these options when you are running real-time solutions because the extra files are not very large and you never know when you are going to get something unusual in the data that you would like to investigate later.

Since none of the data sets I had been sent to look at contained tag files, my first step was to try and collect some data that looked good in post-processing but not in real-time with time tags enabled.  I chose to use my Emlid Reach receivers to do this, in part because it is easy to do real-time solutions with the onboard wireless, and in part because I wanted to try out their recently released 2.1.6 version of the RTKLIB code.  This version is a very close cousin to my demo5 code and contains all of its features (although many of them are not currently accessible through the Reachview GUI).

I first added or modified a couple of lines of code in the Reach startup files to save time tag versions of both the base and rover data on the rover, and the base data on the base.  I’ve added some notes at the bottom of this post on how I did it but I don’t necessarily recommend doing it yourself unless you are fairly comfortable with linux because it can be a little tricky to recover without reflashing the unit if you make a mistake.  I wanted to be able to collect data on the Reach units using the command line based RTKRCV app but use the GUI based RTKNAVI on my laptop to re-create the realtime run.  This is because RTKNAVI has a much nicer  interface with a lot more information available.  However, this meant that I needed to fix an incompatibility in the RTKLIB code between the time stamp formats of RTKRCV and RTKNAVI as described in the RTKLIB Github issue #99.  Using the fix recommended in the issue description,  I rebuilt the code on the Reach unit to create a new str2str executable with this fix incorporated.

With these changes, I can collect measurement data that gives me the option to run post-process solutions or re-created real-time solutions.  In addition, these can be run either with measurements made before or after the data link and raw binary to RTCM conversion.  This gives me quite a bit of capability  to investigate where a potential problem might be occurring.

To test this setup, I first collected some static data with both base and rover exposed to open skies.  I got all three sets of data and tag files and using these I was able to re-run the data using RTKNAVI.  Both real-time and post-processed solutions got a fix fairly quickly and the two solutions were very similar.  So, nothing interesting to look at in this example.

Next I placed both base and rover on my back patio, just a few meters away from the house and partially blocked by a large tree, knowing that this would be a more stressful measurement environment.  I may have just got lucky, but the very first data set I collected gave me multiple fixes in post-processing but none in real-time as shown below (post-process on the left, real-time on the right).  The two loss of fixes are caused by me restarting the data collection on the Reach rover.  In this case I ran the post-processing solution using the base data collected on the base in raw binary format (*.ubx), not the data after it had been converted to RTCM and transmitted to the rover (*.rtcm)  since this is the way post-processing is usually done.

real_post

Next I ran a second post-processing solution, this time using the raw measurement file saved in RTCM format on the rover.  This time there was no fix and the solution looked nearly identical to the real-time solution plotted above.   So somewhere between when these two data files were saved, the problem is occurring.  Note that in this case I was able to do all this without the time tag files or re recreating a real-time run but I imagine this capability will be helpful in future analysis.

I had monitored the age of differential while collecting the data and after collecting the data I plotted the base observations to verify there was no missing data.  This suggests that the data link was working fine.  So my next guess was that the conversion from raw binary measurements to RTCM format might be the cause of the problem.  In real-time solutions, the base data is typically translated to RTCM before transmitting over the data link to the rover to compress the data and reduce bandwidth requirements on the data link, and this is the default configuration of the Reach units.   The amount of compression will vary depending on the details of the data but in this case the RTCM file (*.rtcm) was about one third as large as the raw binary file (*.ubx).  Some of this is lossless compression but not all of it so there is potential for degrading the solution with this translation.

The next step was to isolate the effects of the RTCM translation from any effects from the data link latency.  I did this by using the STRSVR app to translate the raw binary base data saved on the base station to RTCM format.  I configured the conversion options to use the same RTCM messages as used by Reach.  ran this data through a post-process solution.  Sure enough, just converting the undelayed raw binary data to RTCM was enough to break the solution.  That means, at least for this case, we can ignore any effect of the data link delays and focus on the RTCM conversion.

Note that the post-processing apps require all the measurement input files to be in RINEX format.  This means that both the raw binary files and the RTCM files are converted to RINEX first using RTKCONV first as part of the post-processing procedure.  One thing to be aware of when using RTKCONV to convert from RTCM to RINEX is the signal mask input options.  The default signal mask has all observation types selected and if left this way it will cause the file header to be incorrect.  If you do not de-select all the extra observation types you will see this in your observation file header

obstype

The number of observations is 8 instead of 4 and there are extra observation types listed.  This will confuse RTKLIB and it will not interpret the rest of the file properly. Specifically it will not pick up any of the GLONASS observations.  It won’t flag an error but it will cause all the GLONASS measurements to be left out of your solution.  The signal mask button is on the options page as shown below.  You want to un-check all options except “1C”.

sigmask1

This post is already getting fairly long so I will put off to the next post the rest of the story including discussion about what is actually lost in the translation to RTCM and why it caused this particular example to fail.  In general, though, it is important to understand there are real losses in this translation and that they may affect the quality of your solution.  If you have the bandwidth to transfer the raw binary format instead of the RTCM format I would recommend you consider doing that.  If you don’t have the bandwidth, I would suggest you consider the trade-offs from reducing the base sample rate enough so that you are able to transfer the measurements in raw format.  As I mentioned above, in this example the raw binary file was about three times as large as the RTCM file.

 

 

Notes on how I set up the Reach to collect extra data.  There may be a more elegant way to do this but I just wanted a quick hack.  Please be careful if you try to do this yourself and be sure to back up any files before modifying them:

RTKLIB has a “::T” option to record the time-tags but I don’t believe Reach supports this option.  I got around this by adding extra instances of str2str initiated from a function call I added to the “reach_setup” script in the /usr/bin folder.  This, and all the instructions below assume you are running the 2.1.6 version of Reach code.

 I added the call right before the call to “reachview” in the “reach_setup” script as shown in blue below.  I did this on the rover receiver assuming it is getting the base measurements through a data link.
ncat -k -l 2000 < /dev/ttyMFD1 > /dev/ttyMFD1 &
 
#start logging data files with time stamps
reach_time_logs
 
# Run ReachView
led set_color green
 

I created the “reach_time_logs” script in the /usr/bin folder and put in the following lines of code

#!/bin/bash
 
# Log u-blox data to file with time stamp logs
# find unused file name
path=”/home/root/logs/”
i=0
fname=$path”rover”$i”.ubx”
while [ -f $fname ]; do
    let “i=i+1”
    fname=$path”rover”$i”.ubx”
done
 
fnameR=$path”rover”$i”.ubx”
fnameB=$path”base”$i”.rtcm”
 
# start data collection from rover
/usr/bin/RTKLIB/app/str2str/gcc/str2str_tag  -in tcpcli://localhost:2000 -out $fnameR::T &
 
# start data collection from base
/usr/bin/RTKLIB/app/str2str/gcc/str2str_tag  -in tcpcli://192.168.43.186:9000 -out $fnameB::T &
 

This finds an unused filename and saves the measurements and the tags for both the rover and base data.  You will need to modify the specified input stream for the base data to match what you are using.  You can look at the inpstr2-type and path in the /usr/bin/RTKLIB/app/rtkrcv/rtk.conf file for the exact format.  You might be able to use the RTKLIB wildcards instead to create the file name but I just copied this code from my PiZero logger which doesn’t update the clock.  I don’t know if on the Reach the clock has been updated yet at this point in the start-up.

I also had to modify the str2str app to make the time-tags compatible with RTKNAVI.  I used the bug fix recommended in Github issue #99.  I recommend debugging by re-running the data through RTKNAVI (on a Windows machine) rather than RTKRCV because it has a much nicer interface with much more info available.  If you decide you want to re-run the data through RTKRCV you will either need to rebuild it with the bug fix or collect the data with the unmodified str2str.  I think it’s unlikely that you will see different solutions between RTKRCV and RTKNAVI assuming they are both configured the same.
 
Rename the modified str2str executable to str2str_tag and leave it in the /usr/bin/RTKLIB/app/str2str/gcc folder .  Use the chmod +x command to make this file and the “reach_time_logs” file both executable.
I also modified the base receiver and saved the base data in ubx format before it was converted to rtcm so I could compare before and after to see if the conversion or data link might be causing problems.  You can use the same modifications described above, except delete the last two lines in the “reach_time_logs” script.
With these changes in place, the units will automatically save time-tagged data to a new file every time they are turned on.
After collecting data, the data files will be in the /home/root/logs folder.  The file names will be basexx.rtcm, basexx.rtcm.tag, roverxx.ubx, and roverxx.ubx.tag where “xx” will increment every time you run until you delete the old files.  To run them through RTKNAVI, just specify files in the input stream and check the time tag box.
 
You then have the options of running post-process or simulated real-time with measurements either before or after the data link/RTCM conversion.  This should give you a fair bit of insight into where the problem is occurring.
 
 I had a bit of trouble with files I edited getting corrupted after  a power cycle (maybe because I was using WinSCP through the wireless) so I suggest using the “reboot” or “shutdown” commands to avoid problems.  Also be sure to make copies of the files before you edit them.  At one point I corrupted the “reach_setup” script and then could only access the Reach by using the instructions in the Software Development section of the QuickStart guide.  Another time the /etc/reachview/stable_config.json disappeared and I had to restore it.