PPK vs RTK: A look at RTKLIB for post-processing solutions

The “RTK” in RTKLIB is an abbreviation for “Real-time Kinematics”, but RTKLIB is probably used at least as often for “PPK” or “Post-Processed Kinematics” as it is for real-time work.  In applications like precision agriculture, where the solution is part of a real-time feedback loop, RTK is obviously a requirement, but in many other applications there is no need for a real-time solution.  For example, drones are often used for collecting photographic or other sensor data but only need precision positions after the fact to process the data.  PPK is simpler than RTK because there is no need for a real-time data link between GPS receivers and so is often preferable if there is a choice.  The downside of course is that if there is something wrong with the collected data, you may not find out until it’s too late.

For the most part, RTKLIB solutions are identical regardless if they are run on real-time data (RTK) or run on previously collected data (PPK).  The most significant exception to this rule is what RTKLIB calls the “Filter Type”.  This is selected in the configuration and can be set to forward, backward, or combined.  Forward is the default and this is the only mode that can be used in real-time solutions.  In forward mode, the observation data is processed through the kalman filter in the forward direction, starting with the beginning of the data and continuing through to the end.  Backward mode is the opposite,  data is run through the filter starting with the end of the data and continuing to the beginning.  In Combined mode, the filter is run both ways and the two results are combined into a single solution.   This mode is set using the “Filter Type” box in the Options menu if using one of the GUI apps, or with the “pos1-solytpe” input parameter in the configuration file if using a CUI app.

There are two advantages to a combined solution over a forward solution.  First of all, it gives two chances to find a fix for each data point.  Let’s say there is an anomaly in the middle of the data set that causes the solution to switch from fix to float and not come back to fix for some period of time.   It may cause both the forward and backward solutions to lose fix but they will lose fix on opposite sides of the anomaly.  By combining the two solutions we are likely to get a fix for everywhere except right at the anomaly.  Another case where it often helps is in recovering the beginning of a data set.  Let’s say the first fix didn’t occur until five minutes into the data set.  With a forward solution, you would need to guarantee that nothing important happened during that five minutes, but with a combined solution, the backward pass will normally provide a fix all the way to the very beginning of the data set so there is no lost data.

The second advantage of the combined solution is that it provides an extra level of validation of the results.  To understand how this happens, it’s important to understand how RTKLIB combines the forward and reverse solutions.  For each solution position point there are three possibilities; both passes are float, one is float and one is fix, or both are fixed.  If both passes generate a float position, then the combined result will be a float with a value equal to the average of the two positions.  If one is float, and the other is fix, the float is thrown away and the fix is used.  In the case where both are fixed, then RTKLIB will attempt to validate the result by comparing the two values.  If they differ by less than four sigma, then the result will be a fix, otherwise it will be downgraded to a float.  Either way, the value will be the average of the two positions.  This degrading the solution type when the answers from opposite directions differ provides an increased confidence in the solution, at least for points for which we got two fixed values.

I will show a couple examples of the differences between forward and combined modes.  The first example is a more typical case and demonstrates how combined mode will normally give you a higher fix percentage while at the same time increasing confidence in the solution.

The plots below were taken from an M8N receiver on a sailboat using a nearby CORS station as base.  With ambiguity resolution mode set to fix-and-hold, I was able to get a solution with nearly 100% fix except for the initial convergence, but I would prefer to use continuous ambiguity resolution because of the higher confidence of the solution.  In the position plots below, the top was run in forward mode, the middle in backwards mode, and the bottom in combined mode, all in continuous ambiguity resolution mode.

combined1

As you can see the forwards and backwards mode solutions are not bad but both have gaps of float in the middle as well as floats during the initial acquisition.  The combined solution though has almost 100% fix rate and in addition includes the additional confidence knowing that every point found the same solution when running the data in opposite directions.

This second example comes from a data set posted on the Emlid Reach forum with a question on why the combined solution was worse than the forward solution.  In the plots below, the top solution is forward, the middle is backward, and the bottom is combined.

combined2

This data was GPS and SBAS only, so had a fairly low number of satellites, also included a mix of poor observations and the solution was run with full tracking gain (i.e fix-and-hold with the default gain).  Both forward and backward runs found fixed (green) solutions and tracked them all the way through the data set.  However, at least one of them was most likely a false fix, causing the fix to be downgraded to float (yellow) for most of the combined solution as can be seen be seen in the bottom plot.

To confirm this, the plot below shows the difference between the forward and backward solutions.  As you can see, the two differ by a fairly substantial amount and it is not possible from this data to know which one is correct.

combined3

In this case, turning off fix-and-hold and running ambiguity resolution in continuous mode sheds some light on what may be going on.  The plots below are again forward, backward, and combined.  This time the forward solution loses fix early on and never recovers it, whereas the backwards solution maintains a fix through the whole data set and is probably correct since without fix-and-hold enabled, it is very unlikely to stay locked that long to an incorrect solution.  The backward solution is also consistent with the beginning of the forward solution, since the combined solution remains fixed in the early part of the data set where both forward and backward solutions are fixed.

combined4

Again, this can be confirmed by looking at the difference between the forward and backward solutions.  In this case they agree everywhere that both are fixed.

combined5

As this example demonstrates, if post-processing is an option, it often makes sense to run in combined mode with continuous ambiguity resolution instead of forward mode with fix-and-hold enabled.  The additional pass will increase the chances of getting a fixed solution without the risk of locking onto a false fix that fix-and-hold can cause.  Even if you find you can not disable fix-and-hold completely, it may allow you to reduce the tracking gain (pos2-varholdamb)

So one last question is why are there still some float values in the middle of the combined solution? We would expect that since the backwards solution is fixed and the forward solution is float, that the combined solution should just become the backwards solution and all but the very end should be fixed.

The answer to this question turns out to be the way the reverse pass of the kalman filter is initialized.  I have chosen in the demo5 code to not reset the filter between forward and reverse passes if continuous ambiguity resolution is selected.  If fix-and-hold is selected then the demo5 code does re-initialize the kalman filter between passes.  This is different from the release code which always resets the filter between passes.

In this case, the results would have been slightly better if the filter were re-initialized but most of the time I find that allowing the filter to stay converged avoids a large gap in the backwards solution during the active part of the data set where the filter is reconverging. With fix-and-hold enabled I have found the chance of staying locked to an incorrect fix is too high and so it is better to reset the filter.  This is a recent change and hasn’t yet made it into the released version of demo5 but I should get it out soon.  The current version of the demo5 code (b28a) does not reset the filter for either case.

Modifying the if statement in the existing code in postpos.c to match the line below will give you the newest behavior.  Removing the if statement altogether will cause the filter to always be reset and will match the release code.

combined6

The other factor to consider when deciding whether to run the filter type in forward or combined mode is that combined mode will take nearly twice as long to run since it is processing each data point twice.  Most of the time this shouldn’t be an issue since it is not being run in real-time.

So to summarize, my recommendation would be to use combined mode if you do not need a real-time solution as the only real cost is a small amount of additional computation time and it will give you both higher fix percentages and more confidence in those fixes.

Advertisements

Tersus/M8T moving rover comparison

In my last couple of posts I compared a u-blox M8T single frequency receiver to a Tersus BX306 dual frequency receiver for a static rover using a fairly distant CORS receiver for base data.  Both receivers had over twenty raw phase measurements, but the Tersus receiver had much better overlap with the CORS receiver with twelve measurements available for ambiguity resolution (GPS L1 and L2) while the M8T had only six (GPS L1).  Not surprisingly, the Tersus provided a much better solution than the M8T.  I also compared the RTKLIB solution and the internal Tersus RTK solution and showed that they appeared to be roughly comparable.

In this post, I will add a second M8T receiver and compare a M8T to M8T short baseline solution to the Tersus to CORS longer baseline solution.  While this may not sound like a fair comparison, it could be a reasonable choice given that two M8T receivers are still significantly less expensive than one Tersus receiver.   Also, to make things more interesting,  I will use a moving rover this time rather than a stationary one.

For the experiment, I mounted both receivers in a car, each with it’s own antenna on the roof.  Given that we are making a comparison to a relatively expensive solution I felt it wouldn’t be unreasonable to add $20 to the M8T solution and upgraded its antenna from the standard $20 u-blox antenna I usually use to a Tallysman 1421 antenna available at Digikey for $42.   For the Tersus receiver I used a Tallysman dual frequency 3872 antenna which I believe is roughly a $200 antenna.  For the M8T base station, I used the same antenna on my house roof as in the previous experiment which gave a baseline less than 1 km for most of the M8T pair solution whereas the Tersus/CORS baseline was roughly 16-18 km.  For RTKLIB post-processing, I also ran a solution using base data from the nearest CORS station which gave a baseline of 7-9 km but I couldn’t use this data for the Tersus internal RTK solution because it is not available real-time.   Also, it should be noted that I collected all this data a few weeks ago before Tersus released their most recent firmware so it was all done using their previous version.

I chose a driving route very similar to the one I used for this M8N to M8T comparison in which I drive through a residential neighborhood with a moderate tree canopy.  This time I added a section of the route in a parking lot with no tree obstructions.  The parking lot is intended to be a low-stress environment and the neighborhood streets a moderate-stress environment.  Here’s a Google Earth image of the previous route to give a feel for the terrain.  Unfortunately this map feature no longer works in RTKLIB because Google has discontinued the API to Google Earth.

 

walker1

In this case the M8T  was receiving signals from the GPS, GLONASS, SBAS, and Galileo satellites and started the data set with a total of 21 phase measurements.  All of these can be used for ambiguity resolution since the two receivers are identical hardware.   The Tersus receiver measured only GPS and GLONASS but for all but a couple of satellites got both an L1 and an L2 measurement.  It started the data set with 24 phase measurements of which I would expect that only the 14 GPS phase measurements are available for ambiguity resolution because the receivers are not identical.

The previous time I ran this experiment I was able to get a nearly 100% fix solution from both the M8N and the M8T  receiver pairs but had to use some solution tracking gain (fix-and-hold) to achieve that.

In this case, with the extra Galileo satellites and the more expensive antenna, I was able to get nearly 100% fix using continuous ambiguity resolution instead of fix-and-hold. Continuous AR has the advantage of reducing the chances of locking to a false fix and is normally a preferrable solution if it is achievable.  The only float part of the solution was at the very end of the route where I parked the car underneath a large tree.

Here are three versions of the M8T receiver pair solution all run with continuous ambiguity resolution.  In all the plots, green is a fixed solution and yellow is a float solution.  The top left solution was run with 5 Hz measurements which is what I normally use for moving rovers.  I then realized that the Tersus data was only 1 Hz, so I re-ran the M8T solution after decimating the raw data down to 1 Hz (the latest Tersus firmware supports 5 Hz RTK solution).  The decimation can sometimes cause problems because the cycle slips aren’t always handled properly in the decimated data but in this case it seemed to work fine as can be seen in the plot on the top right.   The only noticeable difference is that the 1 sec data took a little longer to get to first fix.  This is less important in post-processed solutions because the solution can always be run in combined (forward/backward) mode which will usually get a fix for the beginning of the data.  This can be seen here in the bottom left solution which was run in combined mode.

ter_kin1

The zig-zag line from 21:22 to 21:26 is the lower stress circles in the parking lot followed by the moderate stress route through the residential neighborhood.

Next, let’s look at the Tersus solutions.  The internal Tersus RTK solution was run with the Tersus default settings.  The user interface for the Tersus console app is much simpler than RTKLIB so there are many fewer options to play with.  For most users this is probably an advantage because it avoids the rather overwhelming array of options that RTKLIB gives.   The RTKLIB solution was run with continuous ambiguity resolution with settings very similar to the M8T solution, just adjusted for dual frequency.  The internal solution is on the left and the RTKLIB solution on the right.

ter_kin2

The two solutions are fairly similar, both did well in the lower stress parking lot environment but struggled with the moderate stress on the residential streets.  The internal solution did a little better with scattered fixes in the latter part of the data.

Comparing differences between the internal and RTKLIB solutions and between the Tersus and M8T solutions for only the fixed points, it looks like most of the errors between the different solutions when they have a fix are small.  The Tersus/M8T differences are indicated by the distance from the circle as I have described before. I’m not too worried about the DC offsets between them.  It is somewhat tricky to get all the offsets correct and I did not spend a lot of time on that.  It is likely to be a issue with coordinate differences or handling of antenna offsets that explains the DC shifts.

ter_kin4

The above Tersus RTKLIB solutions were run with only GPS ambiguity resolution as I would not expect the GLONASS measurements to be useful for ambiguity resolution because of the inter-channel bias differences between the non-identical receivers.  However I was surprised to find that I did get fixes with the GLONASS ambiguity resolution set to “On” in the RTKLIB configuration file.  The solution was slightly worse than the GPS-only AR but I did verify that the GLONASS satellites were included in the ambiguity resolution.  I’m not quite sure what to make of this observation, whether or not it makes sense to include the GLONASS measurements in the ambiguity resolution, but I suspect it makes sense to leave them out for the reason mentioned above.

ter_kin5

I then ran another RTKLIB post-processed solution using the Tersus and base station data from a closer CORS base station.  This was to see how reducing the baseline affected the answer.  Here’s the result from a base station that is only 7-9 km away.

ter_kin6

Even though we reduced the baseline by a factor of two the solution only got slightly better and time to first fix actually increased.  This suggests that the long baseline may not be the primary reason for the poorer Tersus solution.

My suspicion is that it is a combination of two things,  at least for the RTKLIB solutions.  First of all I believe there is a mismatch between how RTKLIB interprets a cycle slip flag and how the cycle slip flag is defined in the Rinex spec.  The problem is that RTKLIB resets the phase bias estimate in the same epoch as the cycle slip is logged regardless of whether the receiver has had time to relock or not.  This can cause large errors in the bias estimates if the receiver flags a cycle slip before it has recovered from it.  In some of my earlier posts I have described having the same problem with the M8T receiver but in that case I have made some changes in the u-blox specific RTKLIB code to delay the cycle slips until the receiver has re-locked.  Something similar may need to be done for other RTKLIB receiver specific code  including the Tersus or it may be possible to modify the main RTKLIB code to better interpret these cycle slip flags.

Maybe more important, though, is the difference in the measurements between the two receivers.  As mentioned before, the M8T receiver has 21 phase measurements all of which can be used for ambiguity resolution while the Tersus has 24 of which only 14 can be used for ambiguity resolution assuming we don’ t try and use the GLONASS satellites.  Note, though, that there are only seven different satellite-receiver paths for the Tersus since each satellite is providing two measurements.  This compares to the 21 satellite-receiver paths for the M8T receiver where each satellite only provides a single measurement.  Now imagine that the receivers are under a partial tree canopy and four of the satellites are obstructed for both receivers.   The M8T will lose four measurements and still have 17 to work with but the Tersus receiver will lose 8 measurements and only have six to work with.  This is a significant disadvantage and I suspect can explain a large part of the difference in results.

If I had used a local Tersus base station, then the matched Tersus receiver pair would enable use of the GLONASS satellites for ambiguity resolution.  In the case of four obstructed satellites, the two cases would be much more similar with 17 available measurements for the M8T and 16 for the Tersus.  As more satellites were obstructed the M8T would start to gain a bigger advantage since the Tersus would lose two measurements for each obstructed satellite and the M8T would only lose one.  Of course the M8T would tend to have more obstructed satellites than the Tersus since it has more satellites to start with that can be obstructed.  That would work in favor of the Tersus reciever.  It’s hard to say which would give a better solution but my suspicion would be that if the cycle slip handling issue in RTKLIB was fixed the two solutions would be fairly similar when calculated with RTKLIB.  I don’t know enough about the internal Tersus RTK engine to predict how it would do.  Hopefully I can get my hands on a second full dual frequency receiver and run this experiment soon.

Although I ran this experiment at a random time without looking at the satellite alignment first, it may be that the satellite alignment was such that it accentuated this effect.  Note in the observations (Tersus on the top, M8T on the bottom) that the Galileo (Exx) and SBAS (Ixx) satellites have less cycle slips than any of the other satellites.

ter_kin7

Looking at the skyplot for those observations we see that three of the four Galileo satellites are at very high elevations which will tend to be blocked less from nearby trees. This would have helped the M8T solution since the Tersus receiver did not have access to these high elevation satellites.

ter_kin8

I will try to summarize what I think this data suggests but let me first emphasize that this is by no means intended to be any sort of rigorous analysis.  I don’t have the time, resources or knowledge to do that.  Instead, please take these as no more than the sharing of my thought process as I try to understand some of the differences between single and dual frequency RTK solutions.

Rover to CORS or other traditional dual frequency receiver:  Tersus has a significant advantage over the M8T both because of more matched measurements and opportunities to take advantage of the nature of the dual frequency measurements.  This advantage applies both to the RTKLIB solution and the Tersus solution although I suspect the Tersus solution takes better advantage of the dual-frequency measurements.  The advantage also increases as the baseline increases.

Matched pair of receivers with short baseline:  Good results with the RTKLIB solution will be limited to low stress environments for a pair of Tersus receivers because of limitations in the cycle slip flag handling.   With the M8N and M8T, RTKLIB can also handle moderate stress environments because of receiver specific changes in the RTKLIB cycle slip handling code.   Relative to a Tersus/CORS combination, the M8T matched pair solution will in general be superior for short baselines because of more matched measurements.

Matched pair of receivers with long baseline:  The data in this experiment doesn’t cover this case but as the baseline increases the dual frequency receiver pair should have a greater advantage because of the additional information that can be derived from the dual frequency measurements.

From a cost trade-off perspective, this suggests that the ideal way to combine these receivers might be to build the base with both an M8T single frequency receiver and a Tersus dual frequency receiver, both sharing a single antenna.  The rover would then be a second M8T receiver.  This would give the advantage of the dual frequency receiver for locating the absolute position of the base using long baseline solutions to distant reference stations or even PPP solutions while taking advantage of the matched pair of lower cost receivers for the moving rover piece of the solution.

 

AR Filter:A RTKLIB cycle-slip enhancement

Some of you may remember, one of the first code changes I made to RTKLIB was fixing a bug in the arlockcnt feature. Arlockcnt is an input parameter that specifies how many samples delay occurs before a new satellite (or a satellite that just recovered from a cycle-slip) is used for ambiguity resolution. Holding off use of the new phase-biase estimate from the kalman filter until it has had enough time to converge prevents corruption of the ambiguity resolution integer set. This in turn prevents a loss of fix.

Although waiting a fixed number of samples is fairly effective, it is not an optimal solution. Ideally we would use information from a new satellite as soon as it was converged and not after a fixed amount of time since some satellites will converge faster than others. When your data looks like this one, then every additional sample you can process is going to help.

arfilt0

This is what the AR filter attempts to do. In the current code implementation, a new satellite is unconditionally added to the integer ambiguity set when the arlockcnt expires regardless of the effect it has on the AR (ambiguity resolution) ratio. This means that the arlockcnt must be set conservatively, to insure the slowest satellite has converged, and means that most satellites will not be used for ambiguity resolution as early as they could be. In the case of frequent cycle-slips, this could mean loss of fix from having too few satellites available or it could mean a false fix since less satellites gives a less robust ambiguity resolution.

When the AR filter is enabled, a new satellite is still added to the integer ambiguity set when the arlockcnt expires but the effect of adding each new satellite is evaluated and if it causes a significant degradation in the AR ratio, that satellite’s use in ambiguity resolution will be delayed for a few more samples before being re-evaluated. Exactly how to define “significant degradation” is a bit subjective. I have chosen to disqualify a new satellite if it causes the AR ratio to drop below the AR fix threshold or if drops by more than a factor of two and the result is within 10% of the AR fix threshold. Exactly how long a satellite should be delayed is also subjective. I chose to delay a disqualified satellite for five samples plus a stagger of one sample for additional satellites. If two satellites are disqualified on the same sample, it could be either satellite or both that caused the disqualification. By adding a stagger to the delay for the second satellite, they will be re-evaluated independently on different samples.

To evaluate the change, I ran two solutions on the Ublox M8T data from my previous series of “M8N vs M8T” posts. This is my most challenging data set from a cycle-slip perspective. The solution below on the left is with arlockcnt=0 and AR filter disabled. The solution on the right is with arlockcnt=0 and AR filter enabled. As always, the yellow represents a float solution, and the green, a fixed solution. As you can see, enabling the AR filter significantly improved the number of fixes. Normally I would not set arlockcnt to zero if the AR filter was not enabled, this was for comparison purposes only.

arfilt1

As you would expect, with the AR filter disabled, increasing arlockcnt from 0 to 75 samples (15 sec) improves the solution for this data set as shown below but it still loses fix relatively often compared to the solution above with the AR filter enabled.

arfilt2

The plot below compares the number of satellites available for ambiguity resolution between the “arlockcnt=75/filter off“ solution and the “arlockcnt=0/filter on” solution”. Notice that we have significantly reduced the number of samples with less than 10 satellites available for AR by enabling the AR filter. More satellites should mean less chance of losing fix and also less chance of a false fix.

arfilt3

In this example, the accuracy of the fixed solution points did not seem to be noticeably affected by enabling the AR filter. As usual, I evaluate accuracy by comparing the receiver position relative to the position of a second receiver mounted on the same rover, both relative to the same base receiver. The difference between the two rovers should be a perfect circle, so any errors will appear as deviations from the circle. Plotting for only the fixed points, the “arlockcnt=75/filter off“ solution is on the left and the “arlockcnt=0/filter on” solution on the right. In both cases the errors appear to be very similar and within a few centimeters. This probably makes sense since the same satellites were used to calculate position in both cases, it was only the ambiguity resolution that differed. Any advantage from having more satellites in the AR calculations could be offset by the fact that the additional satellites were probably noisier since they may not have been fully converged. Also, the plot on the left does not include many of the points on the right, since samples without a fix are not included.

arfilt4

I actually created the AR filter feature quite a while ago but never got around to describing it or even fully testing it by reducing arlockcnt to zero. I have now done that, and made some small improvements to it in the last few days. I have updated my Github repository and executables folder with the latest version.

That pretty much completes my general explanation of this feature but there are a few details to be aware of if you are interested in trying it out yourself.

First of all, enabling the AR filter will slightly increase the code execution time since if a satellite is rejected, the ambiguity resolution has to be re-run without the rejected satellite. The difference is small enough however that I don’t think it will be an issue in the vast majority of cases.

The second thing is to understand is how the arlockcnt interacts with the half-cycle valid bit. A typical cycle-slip (at least on a Ublox receiver) looks like the plot below. There is usually a gap of no data, then a cycle-slip (red tick), then a number of half-cycle invalid samples (gray tick), then a final cycle-slip. The second cycle-slip is actually not reported by the receiver, but is added during the translation from raw data to RINEX format when the half-cycle valid bit transitions. Any time the half-cycle status is invalid, that satellite will not be used for ambiguity resolution regardless of the arlockcnt. The arlockcnt will be reset by the second cycle-slip and count from there. So, in this example, if arlockcont were set to 10, all the samples from the beginning of the gap until 10 samples after the second cycle-slip will be ignored for ambiguity resolution.

arfilt5

The last thing to mention is that one of the recent code changes I referred to above was to add a pseudo half-cycle invalid bit for the SBAS satellites for the M8T receiver. For some reason, the Ublox receivers don’t seem to report the half-cycle status for the SBAS satellites. The change I made was in the raw data to RINEX translation where I set the half-cycle invalid bit for a fixed delay after a cycle-slip on a SBAS satellite.  This makes cycle-slips on the SBAS satellites look very similar to the rest of the satellites.  I had previously done this for the M8N receiver and that change has been migrated to the release code but hadn’t got around to doing it for the M8T. This attempts to avoid the half-cycle uncertainties from possibly causing a false fix if the SBAS satellites were used too early for ambiguity resolution.

More Moving Rover / Fixed Base Data

I decided to try something a little more challenging for the next data set. Here’s a section of a dirt road with many trees on one side of the road and just a few on the other side. It could be representative of a farm field bordered by woods. The trees should cause a fairly large number of cycle slips on some of the satellites and just a few on others. The road is also rough enough to bounce the receiver around a bit. This should add some vertical and lateral accelerations.

road (1024x768)

I used the same two Ublox M8N receivers and antennas as last time, only this time I remembered to bring the pie pan so didn’t have to use the old beer can again for a ground plane. Here’s a photo of the improved base station.

piepan (1024x768)

I again set the receivers up to collect 5 Hz GPS and GLONASS data. Last time this worked with no issue, but this time I saw what other people have reported … I got 5 Hz data but all the GLONASS satellites were missing. After a bit of fiddling with the receiver setup what I found was that I had to first collect a brief bit of data at 1 Hz, then switch the receivers to 5 Hz and everything worked fine. To make this easier, I added an extra line to the beginning of the list of startup commands in STRSVR to set the sample rate.

I now have two startup files, one for 1Hz and the other for 5 Hz.

The 1 Hz startup file now looks like this:

!UBX CFG-RATE 1000 1 1
!UBX CFG-GNSS 0 32 32 1 0 10 32 0 1
!UBX CFG-GNSS 0 32 32 1 6 8 16 0 1
!UBX CFG-MSG 3 15 0 1 0 1 0 0
!UBX CFG-MSG 3 16 0 1 0 1 0 0
!UBX CFG-MSG 1 32 0 1 0 1 0 0

and the 5 Hz startup file looks like this:

!UBX CFG-RATE 200 1 1
!UBX CFG-GNSS 0 32 32 1 0 10 32 0 1
!UBX CFG-GNSS 0 32 32 1 6 8 16 0 1
!UBX CFG-MSG 3 15 0 1 0 1 0 0
!UBX CFG-MSG 3 16 0 1 0 1 0 0
!UBX CFG-MSG 1 32 0 1 0 1 0 0

Once I got past this hurdle, I was able to collect some data. I set up the base station in the middle of the field with unobstructed skies and attached the rover receiver and antenna to the top of the car. For the first fifteen minutes the car was stationary on a part of the road with relatively unobstructed skies, and for the remaining time I drove back and forth along the edge of the trees. Here’s a couple plots of the observation data files, base on the left, and rover on the right. The red ticks indicate cycle slips.

argeles0

As expected, the base data is free of cycle slips, but the rover starts to see a fairly large number of them as soon as the trees start to obstruct the rover antenna. For comparison, in the previous data set I collected in open skies, the rover data was nearly completely free of cycle slips.

Processing the data with RTKCONV, then running the solution with my demo3 version of RTKLIB, gave the following solution, ground track on the left, and position on the right.

argeles1

So the results looks very good, 100% fixes after the initial acquire and through all the cycle slips. But how do we know the positions are all correct? In the earlier data sets, I mounted both receivers on a single rover which forced the solution to be a circle of fixed radius. In that case any point that fell off the circle must be an error and was quite easily spotted. In this case, verifying the data is more difficult because we don’t know what the correct solution is. I don’t think there is any one easy answer to this question, but there are some things we can do to gain confidence in the results. If we run the solution with different input configurations and different sets of satellites and get the same answer, that helps. Sometimes errors are more obvious in the vertical component since it tends to be more constrained. For example, if you drive in circles and the starting point varies by 10 cm in the x or y direction that could be normal variation, but if you end up 10 cm deeper every circle something is probably wrong.

The best single test I was able to come up with is to compare two solutions, one with ambiguity resolution mode set to fix-and-hold and the other with it set to continuous. In continuous AR mode, fixes are much more independent of each other since there is no direct input to the kalman filter from the result of the fixes. That is not true of fix-and-hold, which feeds information from the previous fix into the kalman filter to help find subsequent fixes. In general, most problems with erroneous fixes occur when fix-and-hold is enabled. So the goal won’t be to prove the position is correct, only that fixes are validated as reliably with fix-and-hold enabled as they are with continuous AR mode.

RTKPLOT has a nice feature that makes this comparison quite easy. After running both solutions and then opening them as solution 1 and solution 2 in RTKPLOT, you can then plot the difference using the ‘1-2’ button in the top left of the GUI.

For this particular instance, to make the two runs even more independent and to validate my code changes, I ran the continuous AR case using the original RTKLIB code and configuration without any of my modifications (except enabling dynamics), and ran the fix-and-hold case with my demo3 code and configuration file with the extended fix-and-hold enabled for both GPS and GLONASS satellites.

Below on the left is the continuous AR solution. It finds fixes quite easily while the rover is stationary but after the rover starts moving and cycle slips start occurring it has more trouble and gradually drifts off, getting fewer and fewer fixes. The plot on the right below shows the difference between the two solutions. The green indicates where both solutions had fixes, and the yellow where one or both were not fixed.

argeles2

As you can see, every time both solutions get a fix, they differ by less than a couple centimeters in the x and y directions and a little more in the z direction. These are usually quite acceptable and too small to be caused by invalid fixes since one cycle is about 20 cm. We can not directly confirm the positions for which the continuous AR mode did not get fixes but based on the continuity of the points in between they are most likely correct as well.  So, at least for this data set I believe the fix-and-hold has reliably given us the correct position.

What happens with this test if there are erroneous positions in the fix-and-hold data? On the left below is an example of another fix-and-hold solution of the same data set. Although this one also has 100% fix after the initial acquire, it is definitely incorrect. You can see after six passes on the road, the car is now half a meter higher than it started! Comparing it with the same continuous AR case as above gives the plot on the left, where it is even more obvious that it is not correct since the green dots are all significantly distant from zero.

argeles3

I was able to create the incorrect solution above simply by delaying the starting point of the solution from the beginning of the data where the car was stationary and there were no cycle slips, to a point later in time where the car was moving and there were numerous cycle slips. You can see in the above plots that the solution starts at 7:16, where previously it started at 7:00. There is nothing magic about 7:16, I just moved the starting point around until I found a point where it broke the solution.

This brings up a very important point. Fix-and-hold is always going to be most vulnerable to error before it has acquired its first fix. There are things we can do to reduce this vulnerability but that will always be its weakest point. For that reason, I believe it is essential when using fix-and-hold, at least in it’s current state, to start it only on clean data where the rover is stationary and the skies are unobstructed. This would normally be done by turning the receivers on and leaving them in a fixed, open-sky position for some amount of time before letting the rover move. How long this is will depend on the quality of the antennas, whether GLONASS satellites are enabled, etc. If the data was being processed real-time, it would be easy to use an LED to indicate when fix-and-hold has locked to the data. I imagine some people are already doing this.

I’ve added this data and the configuration file I used with it to my library of data sets available here in the argeles_car folder in case anyone else wants to experiment with it.

Fix-and-Hold extended to GLONASS and SBAS

I have just added a demo3  branch to my GitHub repository with a couple of new features.  The first is an extension of the fix-and-hold feature to enable ambiguity resolution for the GLONASS and SBAS satellites. This is what I will discuss in this post. I have also added a ambiguity resolution filter to prevent misbehaving satellites from being added to the set of satellites used for AR but I will leave that to a future post.

A couple posts back I discussed double differences of the raw data and the kalman filter phase-bias states, in the context of cycle slips. I will do something similar here but this time focusing on the initial acquire, when the solution first locks on to the correct set of integer ambiguities.

First, let’s go over a little background material.

The kalman filter can have a varying number of states depending on the input configuration settings but the most important states are the position states and the phase-bias states. The position states are estimates of the receiver position. The phase-bias states are estimates of the missing offsets that need to be added to the carrier phase measurements. There is one for each satellite, and they represent single differences between measurements from the two receivers. Subtracting one phase-bias state from another results in a double difference, which is what we looked at in the earlier post. If all error sources are counted for exactly, then the double differences will always be integers for geometric reasons.

At the end of each epoch, after the phase-bias states have been updated, the integer ambiguity resolution algorithm attempts to map the set of double differences float values to a set of integers. If it can do that with enough confidence to meet the ambiguity resolution criteria, then we have a fix and the plot line in RTKLPOT will switch from yellow to green. RTKLIB then uses the phase-bias estimates to update the position states. If we have a successful ambiguity resolution, the position states are updated from a position derived from the set of integers and is referred to as a fixed solution. If the ambiguity resolution was unsuccessful, the position update is derived from the non-integer phase-bias values instead and is referred to as a float solution.

In the example below, from RTKPLOT, the first successful ambiguity resolution occurs between 17:30 and 17:31, at which point you can see the switch from yellow to green, and the sudden jump in position when it is first derived from the integer values instead of the float values.

acquire1

Unlike the position states, the phase-bias states are always updated from the float values and not the integer values, whether a successful fix occurred or not. Hence there is no corresponding jump in the double difference phase-biases when the first fix occurs. You can see this in the plot of a phase-bias double difference plot below for the same example as above.  There is no jump where the first fix occurred between 17:30 and 17:31 in either plot line. The red line is for a run with continuous ambiguity resolution, the blue line for one with fix-and-hold enabled.

acquire2
So that’s a quick attempt to describe the ‘fix” in “fix-and-hold” What about the hold?

In the plot above, you can see in the red line for the continuous AR run that there is no discontinuity at all. The phase-bias double difference gradually and continuously approaches the the actual double difference integer of 4. That is what happens when there is no “hold”. When fix and hold is enabled however, and the hold criteria are all met, then an additional update is made to the phase bias states using the set of integer values from the ambiguity resolution. This is a very high gain update and the response in the phase-bias filter states is almost immediate. In the plot above, hold is enabled between 17:31 and 17:32, and you can see the response in the sudden jump in the blue line.

In the plot below, the same phase-bias DD is plotted on the left with fix-and-hold enabled. The lower plot is zoomed into when the hold occurs. You can see the correction is nearly immediate and the result is very close to the actual integer value on the first correction. This is for a GPS satellite. The significant errors for the GPS satellites are all accounted for,at least for short baselines, and so the initial correction can be quite accurate.  The corrections continue every epoch for which the hold criteria are met, and if the remaining error is visible to the kalman filter, the value will continue to converge to the integer value.

acquire3

acquire4
Below is the same plot for a GLONASS satellite. In this case, there is a significant unaccounted for error in the double difference, namely the inter-channel bias that we’ve discussed before, and which comes from the fact that the GLONASS system uses multiple frequencies. This error is invisible to the kalman filter and hence the double difference never converges to the actual integer value. You can see below, the jump to near zero from the hold update, but then the line does not converge towards zero after that.

acquire5
But we know the double difference really is an integer, and at this point we should have accounted for all the other significant errors, so it might not be entirely unreasonable to assume this remaining difference is caused by the inter-channel bias and remove it, especially if we remove it slowly enough that the removal does not interact with any other feedback path.  We need to be careful how we handle it though. It won’t work to try and simply push the bias-estimate to the nearest integer. If we did that, the feedback loop would see the error and push it back. Instead we create a new array of variables to hold the inter-channel biases and move the error for each GLONASS satellite into this array. We can then adjust the double differences by these error values before feeding them into the kalman filter. To see exactly how this is done, you can look at the changes in holdamb() where the biases are adjusted and ddres() where the IC biases are removed from the kalman filter errors. To avoid upsetting anything else in the loop, I chose to be quite conservative about how quickly the adjustment occurs. It does not need to be done quickly. The plots below shows the error removed from the phase-bias state at a 0.5% per epoch rate (red) and a 1% (yellow) rate.  Again, the lower plot is zoomed in to after the hold.

acquire6

acquire7

This is fast enough to remove enough error to resolve the GLONASS ambiguities in 1 to 2 minutes after the GPS ambiguities have been resolved.  If necessary it could probably be done much more quickly.   Note that we do need to wait until the GPS ambiguities are resolved before beginning this process because before that point we have not yet accurately determined the other error terms.

Here’s an example ambiguity resolution output from the trace file for a run with this feature enabled.

acquire8

Each double difference is made up of a reference satellite and a fix satellite. They are listed in the first two lines. “N(0)” is the float solution for the double differences. “N(1)” is the set of integers with lowest residuals, and “N(2)” is the set of integers with the second lowest residuals. “Ratio” is the ratio of the residuals between the best two sets. In general, the larger this number, the more confidence we have in the solution. Satellite numbers less than or equal to 32 are GPS, 33 to 59 are GLONASS. In this example there are seven GPS satellite pairs, and four GLONASS pairs. The other three pairs are SBAS satellites paired with a GPS satellite. I’m not sure why SBAS satellites are not usable without this feature since they use the same single frequency as the GPS satellites, but at least with the NEO-M8N, I have found that until I added this feature, I had to disable them. Unfortunately, unlike the GLONASS satellites, RTKLIB does not have a mechanism to allow SBAS satellites to be used for positioning without ambiguity resolution so they have to be removed completely.

In this example, after adding the SBAS satellites, enabling this feature has doubled the number of satellite pairs used for ambiguity resolution, and increased the number used for positioning by almost 25%.

So far, I’ve tried this on several data sets and have consistently resolved the ambiguities on the GLONASS and SBAS satellites, but I would say it is still fairly experimental at this point. If you’d like to try it on your data, you can download the code from the demo3 branch. To enable the feature for the GLONASS satellites, set “pos2-gloarmode” in the config file to “fix-and-hold”. If you want to enable the SBAS satellites as well, set bit1 in “pos1-navsys” .

If you do try it, let me know how it goes. I’d be happy to help if you have any difficulties with it. You can always contact me at rtklibexplorer@gmail.com.

For those of you that read my previous post on GLONASS integer ambiguity resolution, this method is related to that one, but I think this should do a better job of driving the IC bias errors to zero because of the more continuous, lower gain nature of the updates to the IC biases.

Another Kayak Data Set: Fix-and-Hold Fails Again

Matt from Reefmaster was kind enough to send me a second data set from another two hour kayak session on the water so I thought I’d run it with the latest configuration/code. I ran it with fix-and-hold enabled both with and without GLONASS ambiguity resolution enabled. With GLONASS AR, everything looked great. Without GLONASS AR I got another false fix that corrupted the kalman filter states and caused the rest of the solution to be very poor. At this point I am really starting to question the value of fix-and-hold, but I’m not quite ready to give up on it yet.

The previous two false fixes I looked at had fairly obvious causes. In the first case the fix occurred with only a very small number of valid satellites. In the second case, the error occurred before the kalman filter states had time to converge. This case, unfortunately, does not have any obvious single cause.

Before getting into the details, lets review all of the input configuration constraints that need to be met for fixes and holds to occur. Here they are along with the values I used for this experiment

Fix Constraints:

  • AR ratio > pos2-arthres                                                      (3.0)
  • # of valid satellites > pos2-minfixsats                           (4)
  • # of samples since satellite lock > pos2-arlockcnt    (150)  30 sec*5 samples/sec
  • elevation for each sat > pos2-arelmask                         (15) 

Hold Constraints:

  • # of consecutive fixes > pos2-arminfix                          (50) 10 sec*5 samples/sec
  • # of valid satellites > pos2-minholdsats                        (5)
  • position variance < pos2-arthres1                                   (.002)
  • elevation for each sat > pos2-elmaskhold                    (0)

The constraints in blue are the ones I have either added or fixed. Note that pos2-armaxiter, pos2-arthres2, pos2-arthres3, pos2-arthres4 are listed in the configuration file but are not used by the code.

OK, time for the details. From the trace file I can see that the very first fix attempt after the position variance constraint was met is a false fix. A short time later after the minimum number of consecutive fixes constraint (arminfix)  is met, the hold occurs, and the bad fix is fed into the kalman filter. There were five valid satellites both when the first fix occurred and when the hold occurred.   

To make this easier to see, I have added some additional writes to the trace file into the code.  This includes the satellites used for fix and the position variance.  N(0) is the float solution, N(1) is the lowest residual fixed solution, and N(2) is the 2nd lowest residual fixed solution.

Output from trace file:

3 P[0]=0.001998
3 ddmat :
3 refSats= 2 2 2 2
3 fixSats= 5 7 9 30
3 N(0)= -18.475 -9.131 -6.555 16.554
3 N(1)= -19.000 -9.000 -7.000 17.000
3 N(2)= -17.000 -6.000 -1.000 18.000
3 resamb : validation ok (nb=4 ratio=4.56 s=82.25/375.07)

By plotting the solutions using “fix-and-hold” and “continuous” modes I can see that in continuous mode, the false fix lasted for approximately 16 seconds (blue/yellow) before unlocking. With fix-and-hold enabled, the false fix continued for much longer (blue/green).

kayak_false_lock

From these observations it is apparent that the false hold could be prevented by adjusting any one of several input parameters. Specifically, increasing arthres, arthres1, arminfix, or minholdsats would avoid the false hold.

With just one example it is difficult to know which is the best parameter or parameters to adjust.  I chose to increase minholdsats from 5 to 6, and increase arminfix from 10 seconds to 20 seconds. Since, in this example, there were 5 valid satellites, and the false fix lasted for 16 seconds, either change by itself will avoid the erroneous hold, but we’ll change both.  By making these changes, fix-and-hold will get invoked less, but we will have higher confidence that it is not being invoked in error.

Rerunning with these changes and then calculating my standard metrics for the results produced the following plot, where the cases on the x-axis are:

  1. GLONASS AR off, fix-and-hold 
  2. GLONASS AR off, continuous
  3. GLONASS AR on, fix-and-hold
  4. GLONASS AR on, continuous

 

kayak2_metrics

As you would expect, the best results occurred when both GLONASS AR and fix-and-hold were enabled (case 3). However, I would be very careful enabling fix-and-hold in any real application without some significant testing and probably some adjustment of the input parameters to match your particular environment.

 

 

GLONASS Ambiguity Resolution: Identical Receivers

In a previous post I explained that integer ambiguity resolution doesn’t generally work on the GLONASS satellites in RTKLIB because we have not accounted for the inter-channel biases caused by the multiple frequencies used with the different GLONASS satellites. That is why I have been setting gloarmode=0 in the configuration file even though I am using the GLONASS satellites for the float solution.

In theory, there is an exception to this rule, and that is when both receivers are using identical hardware. In this case, the biases should cancel and can be ignored. Whether this is true for some of the low cost receivers seems to be in question. Carcanague, in his thesis, describes finding that in some cases, biases can change with power on/off cycles, which if true, means they must be accounted for to make the GLONASS integer ambiguity resolution work reliably.

In my demo1 data set, both receivers use Ublox M8N receivers, although on non-identical boards. I get virtually no fixes if I enable GLONASS integer ambiguity resolution (AR), and this led me to believe that GLONASS AR was not an option unless I accounted for the inter-channel biases. 

So I was quite surprised when I enabled GLONASS AR (gloarmode=1) on this new data set and got great results! Here’s a comparison of my metrics for three solutions. The first is with fix-and-hold disabled, the second is fix-and-hold enabled with the position variance criteria described in the previous post, and the third is the same with GLONASS AR enabled.

demo2_metrics1.png

As you can see, the mean number of satellites used for the fixed solution has nearly doubled when GLONASS AR was enabled. This is a very significant improvement!  I ran this configuration on three other data sets from the same receivers taken on different days and got similar improvements on all three, suggesting that there is most likely not an issue with power cycles with these (M8T) receivers.

I was very excited to discover GLONASS AR worked on this data but now I need to understand why I don’t see the same behavior with my receivers. I can think of several possible reasons:

  1. The Reach receivers use the Ublox M8T, while my receivers are both using the Ublox M8N. I was under the impression these two receivers are very similar (they share the same documentation) but maybe there is some subtle difference between them?
  2. Although my two receivers both use the M8N chip, the boards they are on are from different companies. I wouldn’t expect this to make a difference, but maybe it does?
  3. The two antennas I used to collect my data are different, although they are both basic patch antennas.
  4. This data was taken at at 5 samples/sec. My data was taken at 1 sample/sec.

I hope to run some experiments to answer this question in the near future. In the meantime though, can anybody else explain what I’m seeing, or confirm they’ve seen similar behavior?

Update 5/2/16:  I just came across this comment from Tomoji Takasu, the author of RTKLIB, regarding differences between the M8N and M8T on the Emlid forum

“NEO-M8N does not support raw-data officially. However, it is known that some F/W version can be configured to output raw-data as TRK-MEAS and TRK-SFRBX messages. I confirmed it by F/W version 2.01. The latest RTKLIB (2.4.2 p11) can also handle such messages. For details, refer the following material.
http://gpspp.sakura.ne.jp/paper2005/gpssymp_2014_ttaka.pdf62
I found some limitation by M8N compared to M8T.
(1) Raw message can be used only with 1 Hz (NG with 5 Hz).
(2) Only GPS ambiguities can be resolved (NG for GLONASS or BeiDou)
Again, these are not supported by u-blox formally. Other F/W versions may not support them.”

It looks like he has had the same experience I am seeing, where the M8N can resolve GLONASS integer ambiguities and M8T can not.