PPK vs RTK: A look at RTKLIB for post-processing solutions

The “RTK” in RTKLIB is an abbreviation for “Real-time Kinematics”, but RTKLIB is probably used at least as often for “PPK” or “Post-Processed Kinematics” as it is for real-time work.  In applications like precision agriculture, where the solution is part of a real-time feedback loop, RTK is obviously a requirement, but in many other applications there is no need for a real-time solution.  For example, drones are often used for collecting photographic or other sensor data but only need precision positions after the fact to process the data.  PPK is simpler than RTK because there is no need for a real-time data link between GPS receivers and so is often preferable if there is a choice.  The downside of course is that if there is something wrong with the collected data, you may not find out until it’s too late.

For the most part, RTKLIB solutions are identical regardless if they are run on real-time data (RTK) or run on previously collected data (PPK).  The most significant exception to this rule is what RTKLIB calls the “Filter Type”.  This is selected in the configuration and can be set to forward, backward, or combined.  Forward is the default and this is the only mode that can be used in real-time solutions.  In forward mode, the observation data is processed through the kalman filter in the forward direction, starting with the beginning of the data and continuing through to the end.  Backward mode is the opposite,  data is run through the filter starting with the end of the data and continuing to the beginning.  In Combined mode, the filter is run both ways and the two results are combined into a single solution.   This mode is set using the “Filter Type” box in the Options menu if using one of the GUI apps, or with the “pos1-solytpe” input parameter in the configuration file if using a CUI app.

There are two advantages to a combined solution over a forward solution.  First of all, it gives two chances to find a fix for each data point.  Let’s say there is an anomaly in the middle of the data set that causes the solution to switch from fix to float and not come back to fix for some period of time.   It may cause both the forward and backward solutions to lose fix but they will lose fix on opposite sides of the anomaly.  By combining the two solutions we are likely to get a fix for everywhere except right at the anomaly.  Another case where it often helps is in recovering the beginning of a data set.  Let’s say the first fix didn’t occur until five minutes into the data set.  With a forward solution, you would need to guarantee that nothing important happened during that five minutes, but with a combined solution, the backward pass will normally provide a fix all the way to the very beginning of the data set so there is no lost data.

The second advantage of the combined solution is that it provides an extra level of validation of the results.  To understand how this happens, it’s important to understand how RTKLIB combines the forward and reverse solutions.  For each solution position point there are three possibilities; both passes are float, one is float and one is fix, or both are fixed.  If both passes generate a float position, then the combined result will be a float with a value equal to the average of the two positions.  If one is float, and the other is fix, the float is thrown away and the fix is used.  In the case where both are fixed, then RTKLIB will attempt to validate the result by comparing the two values.  If they differ by less than four sigma, then the result will be a fix, otherwise it will be downgraded to a float.  Either way, the value will be the average of the two positions.  This degrading the solution type when the answers from opposite directions differ provides an increased confidence in the solution, at least for points for which we got two fixed values.

I will show a couple examples of the differences between forward and combined modes.  The first example is a more typical case and demonstrates how combined mode will normally give you a higher fix percentage while at the same time increasing confidence in the solution.

The plots below were taken from an M8N receiver on a sailboat using a nearby CORS station as base.  With ambiguity resolution mode set to fix-and-hold, I was able to get a solution with nearly 100% fix except for the initial convergence, but I would prefer to use continuous ambiguity resolution because of the higher confidence of the solution.  In the position plots below, the top was run in forward mode, the middle in backwards mode, and the bottom in combined mode, all in continuous ambiguity resolution mode.

combined1

As you can see the forwards and backwards mode solutions are not bad but both have gaps of float in the middle as well as floats during the initial acquisition.  The combined solution though has almost 100% fix rate and in addition includes the additional confidence knowing that every point found the same solution when running the data in opposite directions.

This second example comes from a data set posted on the Emlid Reach forum with a question on why the combined solution was worse than the forward solution.  In the plots below, the top solution is forward, the middle is backward, and the bottom is combined.

combined2

This data was GPS and SBAS only, so had a fairly low number of satellites, also included a mix of poor observations and the solution was run with full tracking gain (i.e fix-and-hold with the default gain).  Both forward and backward runs found fixed (green) solutions and tracked them all the way through the data set.  However, at least one of them was most likely a false fix, causing the fix to be downgraded to float (yellow) for most of the combined solution as can be seen be seen in the bottom plot.

To confirm this, the plot below shows the difference between the forward and backward solutions.  As you can see, the two differ by a fairly substantial amount and it is not possible from this data to know which one is correct.

combined3

In this case, turning off fix-and-hold and running ambiguity resolution in continuous mode sheds some light on what may be going on.  The plots below are again forward, backward, and combined.  This time the forward solution loses fix early on and never recovers it, whereas the backwards solution maintains a fix through the whole data set and is probably correct since without fix-and-hold enabled, it is very unlikely to stay locked that long to an incorrect solution.  The backward solution is also consistent with the beginning of the forward solution, since the combined solution remains fixed in the early part of the data set where both forward and backward solutions are fixed.

combined4

Again, this can be confirmed by looking at the difference between the forward and backward solutions.  In this case they agree everywhere that both are fixed.

combined5

As this example demonstrates, if post-processing is an option, it often makes sense to run in combined mode with continuous ambiguity resolution instead of forward mode with fix-and-hold enabled.  The additional pass will increase the chances of getting a fixed solution without the risk of locking onto a false fix that fix-and-hold can cause.  Even if you find you can not disable fix-and-hold completely, it may allow you to reduce the tracking gain (pos2-varholdamb)

So one last question is why are there still some float values in the middle of the combined solution? We would expect that since the backwards solution is fixed and the forward solution is float, that the combined solution should just become the backwards solution and all but the very end should be fixed.

The answer to this question turns out to be the way the reverse pass of the kalman filter is initialized.  I have chosen in the demo5 code to not reset the filter between forward and reverse passes if continuous ambiguity resolution is selected.  If fix-and-hold is selected then the demo5 code does re-initialize the kalman filter between passes.  This is different from the release code which always resets the filter between passes.

In this case, the results would have been slightly better if the filter were re-initialized but most of the time I find that allowing the filter to stay converged avoids a large gap in the backwards solution during the active part of the data set where the filter is reconverging. With fix-and-hold enabled I have found the chance of staying locked to an incorrect fix is too high and so it is better to reset the filter.  This is a recent change and hasn’t yet made it into the released version of demo5 but I should get it out soon.  The current version of the demo5 code (b28a) does not reset the filter for either case.

Modifying the if statement in the existing code in postpos.c to match the line below will give you the newest behavior.  Removing the if statement altogether will cause the filter to always be reset and will match the release code.

combined6

The other factor to consider when deciding whether to run the filter type in forward or combined mode is that combined mode will take nearly twice as long to run since it is processing each data point twice.  Most of the time this shouldn’t be an issue since it is not being run in real-time.

So to summarize, my recommendation would be to use combined mode if you do not need a real-time solution as the only real cost is a small amount of additional computation time and it will give you both higher fix percentages and more confidence in those fixes.

RTKLIB on a drone with u-blox M8T receivers

Drones are a popular application for RTKLIB and quite a few readers have shared their drone-collected data sets with me, usually with questions on how they can get better solutions. In many cases, the quality of this data has been fairly poor and it has been difficult to get satisfactory results. I was curious to understand why this environment tends to be so challenging since in theory a drone should have more open skies than just about any other application.

To do an experiment, I bought an inexpensive consumer drone from Amazon. I chose the X8C from Syma since it is beginner model and a little larger than some options. I figured the larger size should make it better able to carry some extra weight.

After a few practice flights to get the hang of flying it, I used some duct tape and double-sided foam adhesive to attach a u-blox antenna and 90 mm diameter ground plane to the top of the drone and a u-blox M8T receiver with my custom CHIP data logger underneath where the camera usually goes. I used the landing gear as a spool to wind the unnecessary five meters of antenna cable which was the heaviest part of the whole setup. From a weight perspective, the Emlid Reach units would have been a better choice, but I wanted to collect data from the Galileo constellation of satellites as well as GPS and GLONASS so I used my CSG receiver with the newer 3.0 firmware. I used a second CSG receiver mounted on top of my car as the base station.  Here’s a stock photo of the drone on the left and after my modifications on the right.

drone1drone2a

Although the drone was able to lift the extra weight fairly easily, it seemed to affect the stability of the flight control system and after a few minutes the prop motors would start to fight each other. At that point the drone would start to descend even at full throttle and the drone would land hard enough to usually bounce on its side or back. Still I was able to make a number of short flights which were adequate for testing purposes.

Here’s the observation data for the first set of flights, base station on the left and drone on the right. Red ticks are cycle-slips and gray ticks are half-cycle ambiguities. Ideally, the drone data would look as clean as the base but as you can see it is significantly worse and it turned out to be unusable for any sort of reliable position solution.  The regions without cycle-slips in the drone observations are the times in between flights in which the drone is sitting on the ground.

drone3

Clearly, while the drone is flying, something is interfering with the GPS receiver or antenna, most likely either EMI or mechanical vibration. I could have used a fancy test stand and RF sniffer to try and locate the source of interference but since this blog is focused on low-cost solutions I just used some duct tape, a steel bar, and the RTKLIB code instead.

I used two types of duct tape, both the polyester/fabric type that everyone calls duct tape, and also the metal foil type that is actually used to repair or install ducts. I first used the non-metal duct tape to securely attach the landing gear to the heavy steel bar. The steel bar was convenient because it was easy to attach but anything heavy enough to prevent the drone lifting off under full throttle would work fine.

I then started an instance of RTKNAVI on my laptop and connected it to the receiver on the drone.  The goal was to simulate a complete drone flight while the drone was sitting on the ground and at the same time watch the RTKNAVI observations to detect any degradation of the measurements.  I used a wireless connection but a USB cable would have worked too.

Unfortunately RTKNAVI won’t plot the observation data real-time, but by selecting the tiny “RTK Monitor” box in the bottom left corner of the main RTKNAVI screen, then choosing “Obs Data” from the menu, I was able to get a continuously updating listing of the observations.  Cycle-slips show up as non-zero values in the first column with the I heading. I chose a location outdoors with open enough skies that any degradation in the observation data would be obvious.

drone4

I first observed the cycle-slip column with the drone powered down to verify I wasn’t getting any cycle-slips on all but the lowest elevation satellites. I then continued to observe the cycle-slip column while sequencing through the steps required to fly the drone. I first powered on the drone, then powered on the transmitter, then issued the calibration/connection sequence, then turned on the throttle to low. So far, so good, no sign of cycle-slips. I then started moving the joysticks to issue steering commands which caused the motors to change speeds. All of a sudden I started getting cycle-slips, the more aggressive the steering commands, the more cycle-slips I saw. Aggressive changes in throttle also caused cycle-slips but full throttle with no adjustments or steering commands was fine.

Next I moved just the antenna, then just the receiver away from the drone while issuing steering commands. Moving the antenna away had no effect but moving the receiver away eliminated the cycle-slips.

At this point my guess was that the interference was coming from the relatively high power switching in the motor control circuits and that the antenna ground plane was shielding the antenna from this interference but nothing was shielding the receiver. To test this theory, I attached a layer of the metal duct tape to the bottom of the drone to act as a shield between the drone controller board and the receiver.  I then re-attached the receiver to the bottom of the drone and re-ran the experiment. This time there were almost no cycle-slips even with the most aggressive steering.

I then removed the steel bar and ran a second set of short flights with the layer of metal tape still in place. I was a little concerned that the new shield would interfere with commands sent from the transmitter to the drone so I first tested everything while still on the ground and then kept the drone fairly close during the flight. Fortunately I didn’t see any sign of commands not getting through.

The drone data looked much cleaner in this flight!  Unfortunately, this time the base data was no good with many simultaneous cycle-slips throughout the observation data. At this point I realized that I had placed the base station receiver directly on the top of the car when collecting the data which was very hot at the time. Usually I keep the receiver in the car to avoid this and only place the antenna on the roof. I have seen this kind of severe temperature effects cause simultaneous cycle-slips before but never to this extent. Again the data was completely unusable.

So, back out there again for a third round of flights. This time, everything looked much better. I still saw a few cycle-slips, especially when first applying the throttle at take-off, so my shielding was not perfect but a dramatic improvement over the first flight. The plots below show the results. The top two plots are position solutions for the z-axis. The top plot is with continuous ambiguity resolution and the middle plot is with fix-and-hold enabled. The bottom plot is the drone observation data.

drone5

I made three adjustments to the input configuration file from what I would normally use for my car based measurements.  First of all, since the drones have very open skies, I adjusted the minimum elevation angles from 15 degrees to 10 degrees.   Then, after plotting and observing the accelerations from an initial solution, I increased the vertical acceleration dynamics estimate (stats-prnaccelv) from 0.25 to 1.0.  Finally, because I was seeing slightly higher position variances in the initial solution than I usually do, I adjusted the position variance AR threshold (pos2-arthres1) from 0.004 to 0.1  Both of these last two changes would make sense if the level of vibration were higher in the drone than I am used to seeing, which is quite likely.

Each time the drone landed/crashed due to the unstable flight control system it would bounce to the side or upside-down and that is what is causing the cycle-slips and switch from fix to float at the end of each flight. As you can see though in every case I quickly get another fix after I put the drone upright again. The fixes are solid enough to hold through the entire flight even in continuous mode for all but one of the flights. With fix-and-hold enabled all flights maintained 100% fix rate. The data is as good as or better than similar experiments where I have mounted the rover on top of a car.

This is not surprising since the skies are more open in this experiment. Having over twenty satellites available for ambiguity resolution also helped. I used all the satellites (GPS/GLONASS/Galileo/SBAS) for ambiguity resolution and took advantage of the new feature in the demo5 b26 code that cycles through all the satellites and will throw a single one out if it is preventing a fix. This will automatically occur anytime the number of satellites available for ambiguity resolution is greater than the config parameter “pos2-mindropsats” which defaults to twenty.

I have added the raw data and the configuration file to the  sample data set section at rtkexplorer.com

I imagine different drones will have different issues and not all will be as easy to fix as this one, but the method described here or something similar should be helpful any time drone data is not looking as clean as the base station data.

The fix I chose was very easy to implement but a better fix would probably have been to wrap just the receiver in a shield rather than placing a shield between the control board and the receiver. This would protect the receiver better and avoid affecting commands sent from the transmitter.  In fact, based on these results, I suspect shielding the GPS receiver on a drone is always a good idea.

Zero baseline experiment

I’ve been busy with some consulting projects recently so it’s been a while since my last post but I’m finally caught up and had some time to write something.  I thought I would describe an experiment I did to both try out the “fixed” mode in RTKLIB and also provide some insight into the composition of the errors in the pseudorange and carrier phase measurements in the u-blox M8T receiver.

The “fixed” mode is an alternative to “static” or “kinematic” in which the exact rover location is specified as well as the base position and remains fixed.  The residual errors are then calculated  from the actual position rather than the measured position.  I describe it in a little more detail in this post.  It is intended to be used as a tool to characterize and analyze the residual errors in the pseudorange and carrier phase measurements.

The basic idea in this experiment was to connect two M8T receivers to a single antenna and then compare residuals between the two receivers.  I first looked at the solution using one receiver as base and the other as rover (the zero baseline case) and then compared solutions between each receiver and a local CORS reference station about 8 km away.

The M8T is typically setup to use an active antenna for which it provides power on the antenna input.  I was concerned about connecting the two antenna power feeds together, so to avoid this, I added a 47 pf capacitor in series in one of the antenna feeds to act as a DC block.  In the photo below, the capacitor is inside the metal tape wrapped around a male to male SMA adapter.  I cut the adapter in half, soldered the capacitor to each end, then wrapped it in metal tape as a shield.

zeroBL

The receivers are from CSG and each one is connected to a Next Thing CHIP single board computer, which logs the data and transmits it over wireless to my laptop.  They are very similar to the Raspberry Pi data loggers I described in a previous post, but the on-board wireless makes them more convenient to use.  At $9 each, they are also quite affordable, especially since they do not need micro SD cards like the Raspberry Pi Zeros.  They also have a built-in LiPo battery connector which can be convenient for providing power., although they can also be powered over the USB connectors.  They are also linux based, so setting them up is very similar to the instructions in my Raspberry Pi post.

I first looked at the zero baseline case where I used one receiver as base and the other as rover.  In this case the two receivers are seeing exactly the same signal from the single antenna.  Any error contributions from the satellites, atmosphere, or antenna should cancel and the only contributor to the residual errors should be from the receivers.

I collected about an hour of measurement data from my back patio.  It is next to the house and nearby trees so as usual, the data quality is only mediocre and will include both some multipath and signal attenuation.  I prefer to look at less than perfect data because that is where the challenges are, not in the perfect data sets collected in wide-open skies.

Here are the residuals for a high elevation, high signal strength GPS satellite.  Standard deviations are 0.24 meters for the pseudorange and 0.0008 meters for the carrier phase.

zeroBL1

For a lower elevation GPS satellite with low and varying signal strength, the standard deviations increased to 0.46 meters for the pseudorange and 0.0017 meters for the carrier phase.  Notice how the residuals increase as the signal strength decreases as you would expect.

 

zeroBL2

The GLONASS satellites had noticeably higher residuals.  Here is an example of a satellite with high elevation and reasonable signal strength.  The standard deviations were 1.02 meters for pseudorange and 0.0039 meters for carrier phase, more than twice the GPS residuals.

zeroBL3

I’m not quite sure how relevant it is, but the ratio between the pseudorange residuals and carrier phase residuals in each case is roughly 300, the same value I have found works best for “eratio1”, the config file input parameter that specifies the ratio between the two.

RTKLIB also estimates the standard devations of the GLONASS satellites measurements at 1.5 times the standard deviations of the GPS satellites which is less than the difference I see in the example above.

However, my numbers are for only the receiver components of the measurement errors, I’m not sure exactly which components the RTKLIB config parameters are intended to include.

For the second experiment, I calculated solutions for both receivers relative to a CORS reference station about 8 km away.  In this case, I was curious to see how close the two solutions are as they will have common satellite, atmospheric, and antenna errors but will differ in their receiver errors.  The plot below shows the residuals for a GPS satellite from each solution plotted on top of each other.  As you can see the errors are quite a bit larger than before and the correlation between the two receivers is very high.  Based on the frequency of the errors, I suspect they are dominated by multipath which will vary roughly sinusoidally as the direct path and reflected path go in and out of phase with each other.

I found it quite impressive to see how repeatable the errors are between the two solutions.  It indicates, at least at this distance, that the errors from the receiver are small compared to the other errors in the system.zeroBL4

Again, the GLONASS results were not as good as the GPS results and include a DC shift in the carrier phase that I’m not sure exactly what the cause is.

zeroBL5

I haven’t spent a lot of time trying to figure out how to best use the information in these plots but in particular I found the similarity between the two receiver solutions in the longer baseline experiment quite encouraging.  If the errors are dominated by multipath as I expect, then the baseline length isn’t that relevant and I would expect to see similar results with shorter baselines.  If that’s true, then it may be possible to derive information about the receiver’s environment from the multipath data.  People do this with more expensive dual frequency receivers to monitor things like tides and ground moisture content.  It would be interesting to see if it can be done with these low cost receivers.  Or maybe it already has been done …

 

New firmware, new satellites, new code

CSG Shop is now shipping all of their M8N and M8T u-blox receivers with the latest version 3 firmware.  This is not such good news for the M8N units since the raw measurements are scrambled and these receivers need to be downgraded to the previous firmware version before using with RTKLIB.  For the M8T receivers though, the new firmware is good news because it contains support for the Galileo satellite system.

I now have two of their M8T receivers with the new firmware and did a little testing to see how RTKLIB works with the Galileo measurements.  I did have to make a couple small changes to get things working.

First of all, the RNX2RTKP compile options for including the Galileo code was not enabled.  For some reason, all the other apps did have this option enabled.  To enable it, I had to add “ENAGAL” to the “Preprocessor Defintions”  for C/C++ in the Project menu in Visual Studio.

The second issue I ran into was in the decode_rxmrawx() function that decodes the raw u-blox RXM-RAWX messages.  There is a line of code in this function that sets the code type based on the system.

raw->obs.data[n].code[0]=
       sys==SYS_CMP?CODE_L1I:(sys==SYS_GAL?CODE_L1X:CODE_L1C);

This line sets the code to L1X for Galileo, but that code type doesn’t seem to be supported by RTKLIB and the measurements in the RINEX file for the Galileo satellites get left blank.  Changing the “L1X” in the above statement to “L1C” resolves the problem.  That leaves an unnecessary check in the code but I will leave it there at least until I understand what it was supposed to do.  After that everything else worked fine including ambiguity resolution with the Galileo satellites, so that was quite encouraging.

Next,  I put the two receivers outside in the front yard to collect a longer set of data.  Not an ideal environment because they were close to the house but fairly open skies otherwise.  In an hour of data collection I got measurements from 11 GPS satellites, 8 GLONASS satellites, 5 Galileo satellites, and 3 SBAS satellites. After collecting the data, I processed it with various constellation options to see how they compared.  For all the solutions, I set ambiguity resolution mode to “continuous”, position mode to “kinematic”, and opened up the position variance threshold for AR (arthres1) to allow the solution to lock up as early as possible.  I also enabled all constellations for ambiguity resolution in each case.  Here’s how they compared:

satcombos1a

satcombos1b

Note that the time scale on the GPS-only plot is very different than the others since it took much longer to lock up than any of the other combinations.  With the GPS satellites only, there was an initial short false fix after 14 minutes, then a good fix at 27 minutes that lasted a few minutes but it did not get a solid fix until 43 minutes after it started.  That’s a long time to wait!  Adding a second constellation significantly improved the results, with solid fixes coming after two minutes with GLONASS added, five minutes with SBAS added, and 7 minutes with Galileo added.  Adding a third consellation improved things even more, with times to first solid fix varying from 12 secs for GPS+SBAS+GLO, 3.5 min for GPS+GAL+GLO, and 6 min for GPS+SBAS+GAL.  Using all four constellations gave a time to first solid fix of 2 minutes, not the fastest time, but better than two out of three of the three constellation answers.

It is risky to conclude too much from one data set, but these results are consistent with other data I’ve looked at (for three constellations) that show the more satellites you use the better the answer.  This seems to make sense to me since more information should be better than less information.  However, I often hear or read recommendations to use only the GPS data for better results which I don’t understand.  If anyone has data to support that recommendation I would like to see it to understand it better.

I do sometimes see that one bad satellite can prevent or delay a solution no matter how many good satellites there are and this may be part of the answer.  The more satellites you use, the higher chance there is of having a bad one and RTKLIB is not great at rejecting a bad satellite.  The “arlockcnt” and “ARFilter” features do help prevent bad satellites from getting into the AR solution but they do not reject a satellite if it goes bad after being accepted into the solution.  I have added a new feature starting with the demo5 b26a code that does try to reject bad satellites after they have been accepted into the AR solution but have not had a chance to do a lot of testing on it yet.  It was enabled for the test above and may possibly have helped, I did not look into the details.  The feature is enabled by setting the “pos2-mindropsats” to a value lower than the number of satellites in the solution, in which case it will cycle through dropping all the satellites, one by one, one each epoch, and reject a satellite that has a large negative effect on the AR ratio.  If you try this feature, be careful not to set the minimum satellite threshold too low or you will increase the chances of a false fix.  I would recommend values no lower than 10 satellites.

I have released a new version of the demo 5 code (b26b) with the fixes for Galileo, a couple of new features and fixes, and GUI updates for RTKPOST and RTKNAVI for all the new input parameters for both b26a and b26b codes.  The binaries and a list of the changes are available here.  The source code is available on my Github page.

 

 

 

Exploring differences between real-time and post-processed solutions.

I’ve had a few questions recently about differences showing up when the same set of raw data measurements are processed real-time and when they are post-processed.  Since I haven’t done a lot of real-time work I didn’t have a good answer to these questions, but it seemed like an interesting problem so I thought I would dig into a little bit.

In many cases, these differences can be traced to a poorly performing data link between base and rover that loses, delays or corrupts the base measurement data.  These problems are usually diagnosed fairly easily by looking at the “age of differential”  between base and rover or by seeing missing data in plots of the base observations.  My interest is not in these cases but rather where the data link is performing well and there is still a difference between the real-time and post-process solutions.

To troubleshoot real-time solutions is a little trickier than post-processing solutions because you may need a way to re-run the data through the real-time RTKLIB app (either RTKRCV or RTKNAVI) to recreate the problem.  The standard *.ubx log files do not contain enough information to do this since they contain only a time stamp for when the measurement was made and not when it was actually available to the solution.  There will usually be some delay between the two because of latencies in the data link between rover and base.  The post-processing solutions ignore this delay and simply align the two measurements assuming zero delay but we need to know what these delays are to recreate the real-time solution.

The real-time solution apps have an option in the input stream setup to read from a file instead of a real-time stream.  This allows you to re-run previously recorded log files but when doing this they require a *.ubx.tag file in addition to the *.ubx file to provide the latency information.   These *.ubx.tag files are generated automatically when you log real-time data if you select the appropriate option before you collect the data.  For RTKRCV, this is a “::T” appended on to the end of the log file name.  For RTKNAVI, it is checking the “Time Tag” box in the log stream options.  I recommend always enabling these options when you are running real-time solutions because the extra files are not very large and you never know when you are going to get something unusual in the data that you would like to investigate later.

Since none of the data sets I had been sent to look at contained tag files, my first step was to try and collect some data that looked good in post-processing but not in real-time with time tags enabled.  I chose to use my Emlid Reach receivers to do this, in part because it is easy to do real-time solutions with the onboard wireless, and in part because I wanted to try out their recently released 2.1.6 version of the RTKLIB code.  This version is a very close cousin to my demo5 code and contains all of its features (although many of them are not currently accessible through the Reachview GUI).

I first added or modified a couple of lines of code in the Reach startup files to save time tag versions of both the base and rover data on the rover, and the base data on the base.  I’ve added some notes at the bottom of this post on how I did it but I don’t necessarily recommend doing it yourself unless you are fairly comfortable with linux because it can be a little tricky to recover without reflashing the unit if you make a mistake.  I wanted to be able to collect data on the Reach units using the command line based RTKRCV app but use the GUI based RTKNAVI on my laptop to re-create the realtime run.  This is because RTKNAVI has a much nicer  interface with a lot more information available.  However, this meant that I needed to fix an incompatibility in the RTKLIB code between the time stamp formats of RTKRCV and RTKNAVI as described in the RTKLIB Github issue #99.  Using the fix recommended in the issue description,  I rebuilt the code on the Reach unit to create a new str2str executable with this fix incorporated.

With these changes, I can collect measurement data that gives me the option to run post-process solutions or re-created real-time solutions.  In addition, these can be run either with measurements made before or after the data link and raw binary to RTCM conversion.  This gives me quite a bit of capability  to investigate where a potential problem might be occurring.

To test this setup, I first collected some static data with both base and rover exposed to open skies.  I got all three sets of data and tag files and using these I was able to re-run the data using RTKNAVI.  Both real-time and post-processed solutions got a fix fairly quickly and the two solutions were very similar.  So, nothing interesting to look at in this example.

Next I placed both base and rover on my back patio, just a few meters away from the house and partially blocked by a large tree, knowing that this would be a more stressful measurement environment.  I may have just got lucky, but the very first data set I collected gave me multiple fixes in post-processing but none in real-time as shown below (post-process on the left, real-time on the right).  The two loss of fixes are caused by me restarting the data collection on the Reach rover.  In this case I ran the post-processing solution using the base data collected on the base in raw binary format (*.ubx), not the data after it had been converted to RTCM and transmitted to the rover (*.rtcm)  since this is the way post-processing is usually done.

real_post

Next I ran a second post-processing solution, this time using the raw measurement file saved in RTCM format on the rover.  This time there was no fix and the solution looked nearly identical to the real-time solution plotted above.   So somewhere between when these two data files were saved, the problem is occurring.  Note that in this case I was able to do all this without the time tag files or re recreating a real-time run but I imagine this capability will be helpful in future analysis.

I had monitored the age of differential while collecting the data and after collecting the data I plotted the base observations to verify there was no missing data.  This suggests that the data link was working fine.  So my next guess was that the conversion from raw binary measurements to RTCM format might be the cause of the problem.  In real-time solutions, the base data is typically translated to RTCM before transmitting over the data link to the rover to compress the data and reduce bandwidth requirements on the data link, and this is the default configuration of the Reach units.   The amount of compression will vary depending on the details of the data but in this case the RTCM file (*.rtcm) was about one third as large as the raw binary file (*.ubx).  Some of this is lossless compression but not all of it so there is potential for degrading the solution with this translation.

The next step was to isolate the effects of the RTCM translation from any effects from the data link latency.  I did this by using the STRSVR app to translate the raw binary base data saved on the base station to RTCM format.  I configured the conversion options to use the same RTCM messages as used by Reach.  ran this data through a post-process solution.  Sure enough, just converting the undelayed raw binary data to RTCM was enough to break the solution.  That means, at least for this case, we can ignore any effect of the data link delays and focus on the RTCM conversion.

Note that the post-processing apps require all the measurement input files to be in RINEX format.  This means that both the raw binary files and the RTCM files are converted to RINEX first using RTKCONV first as part of the post-processing procedure.  One thing to be aware of when using RTKCONV to convert from RTCM to RINEX is the signal mask input options.  The default signal mask has all observation types selected and if left this way it will cause the file header to be incorrect.  If you do not de-select all the extra observation types you will see this in your observation file header

obstype

The number of observations is 8 instead of 4 and there are extra observation types listed.  This will confuse RTKLIB and it will not interpret the rest of the file properly. Specifically it will not pick up any of the GLONASS observations.  It won’t flag an error but it will cause all the GLONASS measurements to be left out of your solution.  The signal mask button is on the options page as shown below.  You want to un-check all options except “1C”.

sigmask1

This post is already getting fairly long so I will put off to the next post the rest of the story including discussion about what is actually lost in the translation to RTCM and why it caused this particular example to fail.  In general, though, it is important to understand there are real losses in this translation and that they may affect the quality of your solution.  If you have the bandwidth to transfer the raw binary format instead of the RTCM format I would recommend you consider doing that.  If you don’t have the bandwidth, I would suggest you consider the trade-offs from reducing the base sample rate enough so that you are able to transfer the measurements in raw format.  As I mentioned above, in this example the raw binary file was about three times as large as the RTCM file.

 

 

Notes on how I set up the Reach to collect extra data.  There may be a more elegant way to do this but I just wanted a quick hack.  Please be careful if you try to do this yourself and be sure to back up any files before modifying them:

RTKLIB has a “::T” option to record the time-tags but I don’t believe Reach supports this option.  I got around this by adding extra instances of str2str initiated from a function call I added to the “reach_setup” script in the /usr/bin folder.  This, and all the instructions below assume you are running the 2.1.6 version of Reach code.

 I added the call right before the call to “reachview” in the “reach_setup” script as shown in blue below.  I did this on the rover receiver assuming it is getting the base measurements through a data link.
ncat -k -l 2000 < /dev/ttyMFD1 > /dev/ttyMFD1 &
 
#start logging data files with time stamps
reach_time_logs
 
# Run ReachView
led set_color green
 

I created the “reach_time_logs” script in the /usr/bin folder and put in the following lines of code

#!/bin/bash
 
# Log u-blox data to file with time stamp logs
# find unused file name
path=”/home/root/logs/”
i=0
fname=$path”rover”$i”.ubx”
while [ -f $fname ]; do
    let “i=i+1”
    fname=$path”rover”$i”.ubx”
done
 
fnameR=$path”rover”$i”.ubx”
fnameB=$path”base”$i”.rtcm”
 
# start data collection from rover
/usr/bin/RTKLIB/app/str2str/gcc/str2str_tag  -in tcpcli://localhost:2000 -out $fnameR::T &
 
# start data collection from base
/usr/bin/RTKLIB/app/str2str/gcc/str2str_tag  -in tcpcli://192.168.43.186:9000 -out $fnameB::T &
 

This finds an unused filename and saves the measurements and the tags for both the rover and base data.  You will need to modify the specified input stream for the base data to match what you are using.  You can look at the inpstr2-type and path in the /usr/bin/RTKLIB/app/rtkrcv/rtk.conf file for the exact format.  You might be able to use the RTKLIB wildcards instead to create the file name but I just copied this code from my PiZero logger which doesn’t update the clock.  I don’t know if on the Reach the clock has been updated yet at this point in the start-up.

I also had to modify the str2str app to make the time-tags compatible with RTKNAVI.  I used the bug fix recommended in Github issue #99.  I recommend debugging by re-running the data through RTKNAVI (on a Windows machine) rather than RTKRCV because it has a much nicer interface with much more info available.  If you decide you want to re-run the data through RTKRCV you will either need to rebuild it with the bug fix or collect the data with the unmodified str2str.  I think it’s unlikely that you will see different solutions between RTKRCV and RTKNAVI assuming they are both configured the same.
 
Rename the modified str2str executable to str2str_tag and leave it in the /usr/bin/RTKLIB/app/str2str/gcc folder .  Use the chmod +x command to make this file and the “reach_time_logs” file both executable.
I also modified the base receiver and saved the base data in ubx format before it was converted to rtcm so I could compare before and after to see if the conversion or data link might be causing problems.  You can use the same modifications described above, except delete the last two lines in the “reach_time_logs” script.
With these changes in place, the units will automatically save time-tagged data to a new file every time they are turned on.
After collecting data, the data files will be in the /home/root/logs folder.  The file names will be basexx.rtcm, basexx.rtcm.tag, roverxx.ubx, and roverxx.ubx.tag where “xx” will increment every time you run until you delete the old files.  To run them through RTKNAVI, just specify files in the input stream and check the time tag box.
 
You then have the options of running post-process or simulated real-time with measurements either before or after the data link/RTCM conversion.  This should give you a fair bit of insight into where the problem is occurring.
 
 I had a bit of trouble with files I edited getting corrupted after  a power cycle (maybe because I was using WinSCP through the wireless) so I suggest using the “reboot” or “shutdown” commands to avoid problems.  Also be sure to make copies of the files before you edit them.  At one point I corrupted the “reach_setup” script and then could only access the Reach by using the instructions in the Software Development section of the QuickStart guide.  Another time the /etc/reachview/stable_config.json disappeared and I had to restore it.

Ublox M8N vs M8T: Part 2

In my last post, I compared data from a pair of Ublox M8N receivers with data from a pair of Emlid Reach M8T based receivers, and found issues in both data sets. In this post I will look into those issues more closely.

To do this, I collected another data set with a few differences from the previous one.

  • On the M8N rover receiver, I replaced the original antenna that had only a 1” cable with a Ublox ANN-MS-0-005 antenna with a much longer cable. This allowed me to move the receiver from the car roof to inside the car, a friendlier thermal environment. The goal was to see if this would avoid the all-satellite cycle-slips I saw last time at the beginning of the data set. This antenna cost me $20 from CSG but is available from other places for less.
  • On both Reach M8T receivers, I modified the dynamic platform model setting from “Airborne < 4g” to lower acceleration settings; “Pedestrian” for the rover, and “Stationary” for the base. I also changed the setting on the M8N base receiver from “Pedestrian” to “Stationary” to make them consistent. The goal here was to avoid the erroneous fixed solution points I saw in the solution from the previous M8T data. To insure the modifications occurred correctly, I used the u-center software to read back the settings from the receivers after the data was taken.
  • I chose a measurement route with much more tree cover than the previous one to increase the difficulty of the solution and to help differentiate the two receiver sets.

Here is a Google map of the measurement route. In this area there are few native trees, so by driving through residential areas, rather than next to them as I did in the previous data set, I encountered many more trees. There were numerous locations with tall trees on both sides of the road as well as places where I drove directly under large tree branches.

walker1

Here’s an example of some of the more challenging tree cover encountered in the route. The route locations are only approximate since the base locations are not calibrated, that is why the lines are not actually on the road.

walker2

Here is the observation data from both rovers. The red ticks are cycle-slips, the gray ticks are half-cycle invalids.

                             M8N                                                                          Reach  M8Twalker3

The first thing to notice is that there are no simultaneous cycle-slips in the M8N data. Of course, this doesn’t prove we will never get them but it suggests that maybe moving the receiver inside the car did help. Unfortunately, there is one simultaneous cycle-slip in the M8T data very close to the start at 00:54. Apparently, both the M8N and the M8T are susceptible to these slips, which is not surprising, since the chips are part of the same family. We do see only a single slip, which is an improvement over the previous data. Since they always occur near the beginning of the data collection, I still believe they are most likely caused by thermal transients. I suspect that the best way to avoid them would be to turn on the receivers 15 minutes before starting to collect data in an environment as similar as possible to the data collection environment.

Next, let’s look at the SNR vs. elevation plots. Last time we saw noticeably better numbers with the Reach setup because of the more expensive antenna. This time, with the Ublox antenna on the M8N receiver, the two are much more similar. SNR isn’t everything, and there still are reasons why the more expensive Reach antennas are likely better than the Ublox antennas, but we’ve at least closed the gap some between them.

                            M8N                                                                          Reach  M8Twalker4

Here are the position solutions from both receiver sets.

                            M8N                                                                          Reach  M8Twalker5

The M8N solution is very good, with nearly 100% fix after the initial acquire until the very end when I parked the car under some trees in the driveway. This is despite a very challenging data set with many cycle-slips.

Unfortunately, the M8T solution is not as good. The initial acquire is delayed because of the simultaneous cycle-slip I mentioned earlier. It does eventually acquire, but then loses fix for quite a long period near the end of the data (the yellow part of the line in the plot above)

Even more concerning, when we look at the acceleration plots, we see the same spikes in the M8T data as we did in the previous data set.

                       M8N                                                                          Reach  M8Twalker6

Again, these spikes align with erroneous fixed solution points in the M8T data as can be seen in deviations from the circle in the difference between the two solutions.

walker7

Clearly, changing the dynamic platform model setting in the receiver did not fix the problem. A couple of experienced users have commented that this setting does affect the front end of the receiver on earlier Ublox models and it’s still possible it affects the M8T as well, but it does not appear to be the cause of this problem. We will need to look elsewhere for the solution.

We are running identical RTKLIB solution code on the two data sets and we have verified that the receiver setup is nearly identical for both data sets. So what else can be different between them? One possibility is the conversion utility, RTKCONV, that translates the raw binary output from the receiver into the RINEX observation files that are used as input to the solution. Since the raw measurements are output by different commands on the two receivers, there are two different functions in the RTKCONV code that process them.

Let’s look first at the RTKCONV code to convert the data from the UBX_TRK_MEAS command used by the M8N receiver. I won’t show the code here but just describe functionally what happens in the code. Each data sample from the receiver contains a status bit indicating carrier-phase lock and a count of consecutive phase locks. RTKCONV sets the cycle-slip flag for that sample if the carrier-phase is valid and an internal code flag is set. If the carrier-phase is not valid, then the cycle-slip flag is left in its previous state. The internal code flag is set if the phase lock count is zero or less than the phase lock count for the previous sample and is only reset for the next sample if the carrier-phase is valid.

For the UBX_RXM_RAWX command used by the M8T receiver, there is also a status bit indicating carrier-phase lock. Instead of a count, there is a time for consecutive phase locks but functionally it is equivalent. The RTKCONV code for this command does not use an internal code flag and the cycle-slip flag is set or cleared directly every sample that the carrier-phase is valid. The cycle-slip flag is set if the phase lock time is zero or less than the previous sample and cleared otherwise.

I know that’s a bit confusing and the difference is fairly subtle. The most important thing to understand is that RTKCONV is doing more than just translating from binary to text, it is deciding which samples to set cycle-slips for based on a somewhat complicated algorithm, and that algorithm is different for M8N and M8T.

Basically, the effect of this difference is that for the M8N, if a cycle-slip is followed by an invalid phase then the next valid phase will always be flagged as a cycle-slip while for an M8T it won’t necessarily be so. It’s easier to understand by looking at a picture. The observation plots below show the location of one of the acceleration spikes in the M8T solution which is caused by a cycle-slip on satellite G05. The plot on the left shows which samples were flagged as cycle-slips with the existing code, the plot on the right shows the cycle-slips after modifying the code to be functionally equivalent to the M8N code. Note the extra two cycle-slips with the change.  The extra cycle-slips in this case are a good thing because they are flagging samples with large phase errors and preventing RTKLIB from incorporating them into the solution.

              Reach M8T before change                                   Reach M8T after changewalker8

Modifying the RTKCONV code for the UBX_RXM_RAWX command in this way and re-running the solution gives us the the position plot on the right and the acceleration on the left.

walker9

Much better than before! Not quite as good as the M8N but still a significant improvement. The difference between the M8N solution and the improved M8T solution is shown below.

walker10

Remember this should be a circle if both solutions are free of errors.  Actually in this case not quite a circle because there is also a separation between the two base stations ,but that effect is small and we can ignore it for now.  This is much better than the equivalent plot from the original M8T solution shown earlier. There are obviously some erroneous fixes even after the improvement, so this is still a work in progress, but I think it is a big step in the right direction.

This data set was quite a bit more challenging than the previous one and in reality both solutions are quite good given the number of cycle-slips we saw, but there is always room for improvement.

The other thing to note with this data set is that the better quality antenna made a big difference on the quality of the M8N solution.  I suspect I will not be going back to the old antenna!  I knew the nicer antenna would help but I have resisted using it till now because of the extra cost.  There are similar looking antennas with similar gain specs available for as little as $4 so I may try some of these to see how they compare.

I’m not going to post the code to my Github repository or my binaries until I’ve had a little more time to understand the remaining errors but if anyone wants to take a closer look now, here are the code modifications to ublox.c that I made in the decode_rxmrawx function.

walker11