PPK vs RTK: A look at RTKLIB for post-processing solutions

The “RTK” in RTKLIB is an abbreviation for “Real-time Kinematics”, but RTKLIB is probably used at least as often for “PPK” or “Post-Processed Kinematics” as it is for real-time work.  In applications like precision agriculture, where the solution is part of a real-time feedback loop, RTK is obviously a requirement, but in many other applications there is no need for a real-time solution.  For example, drones are often used for collecting photographic or other sensor data but only need precision positions after the fact to process the data.  PPK is simpler than RTK because there is no need for a real-time data link between GPS receivers and so is often preferable if there is a choice.  The downside of course is that if there is something wrong with the collected data, you may not find out until it’s too late.

For the most part, RTKLIB solutions are identical regardless if they are run on real-time data (RTK) or run on previously collected data (PPK).  The most significant exception to this rule is what RTKLIB calls the “Filter Type”.  This is selected in the configuration and can be set to forward, backward, or combined.  Forward is the default and this is the only mode that can be used in real-time solutions.  In forward mode, the observation data is processed through the kalman filter in the forward direction, starting with the beginning of the data and continuing through to the end.  Backward mode is the opposite,  data is run through the filter starting with the end of the data and continuing to the beginning.  In Combined mode, the filter is run both ways and the two results are combined into a single solution.   This mode is set using the “Filter Type” box in the Options menu if using one of the GUI apps, or with the “pos1-solytpe” input parameter in the configuration file if using a CUI app.

There are two advantages to a combined solution over a forward solution.  First of all, it gives two chances to find a fix for each data point.  Let’s say there is an anomaly in the middle of the data set that causes the solution to switch from fix to float and not come back to fix for some period of time.   It may cause both the forward and backward solutions to lose fix but they will lose fix on opposite sides of the anomaly.  By combining the two solutions we are likely to get a fix for everywhere except right at the anomaly.  Another case where it often helps is in recovering the beginning of a data set.  Let’s say the first fix didn’t occur until five minutes into the data set.  With a forward solution, you would need to guarantee that nothing important happened during that five minutes, but with a combined solution, the backward pass will normally provide a fix all the way to the very beginning of the data set so there is no lost data.

The second advantage of the combined solution is that it provides an extra level of validation of the results.  To understand how this happens, it’s important to understand how RTKLIB combines the forward and reverse solutions.  For each solution position point there are three possibilities; both passes are float, one is float and one is fix, or both are fixed.  If both passes generate a float position, then the combined result will be a float with a value equal to the average of the two positions.  If one is float, and the other is fix, the float is thrown away and the fix is used.  In the case where both are fixed, then RTKLIB will attempt to validate the result by comparing the two values.  If they differ by less than four sigma, then the result will be a fix, otherwise it will be downgraded to a float.  Either way, the value will be the average of the two positions.  This degrading the solution type when the answers from opposite directions differ provides an increased confidence in the solution, at least for points for which we got two fixed values.

I will show a couple examples of the differences between forward and combined modes.  The first example is a more typical case and demonstrates how combined mode will normally give you a higher fix percentage while at the same time increasing confidence in the solution.

The plots below were taken from an M8N receiver on a sailboat using a nearby CORS station as base.  With ambiguity resolution mode set to fix-and-hold, I was able to get a solution with nearly 100% fix except for the initial convergence, but I would prefer to use continuous ambiguity resolution because of the higher confidence of the solution.  In the position plots below, the top was run in forward mode, the middle in backwards mode, and the bottom in combined mode, all in continuous ambiguity resolution mode.

combined1

As you can see the forwards and backwards mode solutions are not bad but both have gaps of float in the middle as well as floats during the initial acquisition.  The combined solution though has almost 100% fix rate and in addition includes the additional confidence knowing that every point found the same solution when running the data in opposite directions.

This second example comes from a data set posted on the Emlid Reach forum with a question on why the combined solution was worse than the forward solution.  In the plots below, the top solution is forward, the middle is backward, and the bottom is combined.

combined2

This data was GPS and SBAS only, so had a fairly low number of satellites, also included a mix of poor observations and the solution was run with full tracking gain (i.e fix-and-hold with the default gain).  Both forward and backward runs found fixed (green) solutions and tracked them all the way through the data set.  However, at least one of them was most likely a false fix, causing the fix to be downgraded to float (yellow) for most of the combined solution as can be seen be seen in the bottom plot.

To confirm this, the plot below shows the difference between the forward and backward solutions.  As you can see, the two differ by a fairly substantial amount and it is not possible from this data to know which one is correct.

combined3

In this case, turning off fix-and-hold and running ambiguity resolution in continuous mode sheds some light on what may be going on.  The plots below are again forward, backward, and combined.  This time the forward solution loses fix early on and never recovers it, whereas the backwards solution maintains a fix through the whole data set and is probably correct since without fix-and-hold enabled, it is very unlikely to stay locked that long to an incorrect solution.  The backward solution is also consistent with the beginning of the forward solution, since the combined solution remains fixed in the early part of the data set where both forward and backward solutions are fixed.

combined4

Again, this can be confirmed by looking at the difference between the forward and backward solutions.  In this case they agree everywhere that both are fixed.

combined5

As this example demonstrates, if post-processing is an option, it often makes sense to run in combined mode with continuous ambiguity resolution instead of forward mode with fix-and-hold enabled.  The additional pass will increase the chances of getting a fixed solution without the risk of locking onto a false fix that fix-and-hold can cause.  Even if you find you can not disable fix-and-hold completely, it may allow you to reduce the tracking gain (pos2-varholdamb)

So one last question is why are there still some float values in the middle of the combined solution? We would expect that since the backwards solution is fixed and the forward solution is float, that the combined solution should just become the backwards solution and all but the very end should be fixed.

The answer to this question turns out to be the way the reverse pass of the kalman filter is initialized.  I have chosen in the demo5 code to not reset the filter between forward and reverse passes if continuous ambiguity resolution is selected.  If fix-and-hold is selected then the demo5 code does re-initialize the kalman filter between passes.  This is different from the release code which always resets the filter between passes.

In this case, the results would have been slightly better if the filter were re-initialized but most of the time I find that allowing the filter to stay converged avoids a large gap in the backwards solution during the active part of the data set where the filter is reconverging. With fix-and-hold enabled I have found the chance of staying locked to an incorrect fix is too high and so it is better to reset the filter.  This is a recent change and hasn’t yet made it into the released version of demo5 but I should get it out soon.  The current version of the demo5 code (b28a) does not reset the filter for either case.

Modifying the if statement in the existing code in postpos.c to match the line below will give you the newest behavior.  Removing the if statement altogether will cause the filter to always be reset and will match the release code.

combined6

The other factor to consider when deciding whether to run the filter type in forward or combined mode is that combined mode will take nearly twice as long to run since it is processing each data point twice.  Most of the time this shouldn’t be an issue since it is not being run in real-time.

So to summarize, my recommendation would be to use combined mode if you do not need a real-time solution as the only real cost is a small amount of additional computation time and it will give you both higher fix percentages and more confidence in those fixes.

Advertisements

More Moving Rover / Fixed Base Data

I decided to try something a little more challenging for the next data set. Here’s a section of a dirt road with many trees on one side of the road and just a few on the other side. It could be representative of a farm field bordered by woods. The trees should cause a fairly large number of cycle slips on some of the satellites and just a few on others. The road is also rough enough to bounce the receiver around a bit. This should add some vertical and lateral accelerations.

road (1024x768)

I used the same two Ublox M8N receivers and antennas as last time, only this time I remembered to bring the pie pan so didn’t have to use the old beer can again for a ground plane. Here’s a photo of the improved base station.

piepan (1024x768)

I again set the receivers up to collect 5 Hz GPS and GLONASS data. Last time this worked with no issue, but this time I saw what other people have reported … I got 5 Hz data but all the GLONASS satellites were missing. After a bit of fiddling with the receiver setup what I found was that I had to first collect a brief bit of data at 1 Hz, then switch the receivers to 5 Hz and everything worked fine. To make this easier, I added an extra line to the beginning of the list of startup commands in STRSVR to set the sample rate.

I now have two startup files, one for 1Hz and the other for 5 Hz.

The 1 Hz startup file now looks like this:

!UBX CFG-RATE 1000 1 1
!UBX CFG-GNSS 0 32 32 1 0 10 32 0 1
!UBX CFG-GNSS 0 32 32 1 6 8 16 0 1
!UBX CFG-MSG 3 15 0 1 0 1 0 0
!UBX CFG-MSG 3 16 0 1 0 1 0 0
!UBX CFG-MSG 1 32 0 1 0 1 0 0

and the 5 Hz startup file looks like this:

!UBX CFG-RATE 200 1 1
!UBX CFG-GNSS 0 32 32 1 0 10 32 0 1
!UBX CFG-GNSS 0 32 32 1 6 8 16 0 1
!UBX CFG-MSG 3 15 0 1 0 1 0 0
!UBX CFG-MSG 3 16 0 1 0 1 0 0
!UBX CFG-MSG 1 32 0 1 0 1 0 0

Once I got past this hurdle, I was able to collect some data. I set up the base station in the middle of the field with unobstructed skies and attached the rover receiver and antenna to the top of the car. For the first fifteen minutes the car was stationary on a part of the road with relatively unobstructed skies, and for the remaining time I drove back and forth along the edge of the trees. Here’s a couple plots of the observation data files, base on the left, and rover on the right. The red ticks indicate cycle slips.

argeles0

As expected, the base data is free of cycle slips, but the rover starts to see a fairly large number of them as soon as the trees start to obstruct the rover antenna. For comparison, in the previous data set I collected in open skies, the rover data was nearly completely free of cycle slips.

Processing the data with RTKCONV, then running the solution with my demo3 version of RTKLIB, gave the following solution, ground track on the left, and position on the right.

argeles1

So the results looks very good, 100% fixes after the initial acquire and through all the cycle slips. But how do we know the positions are all correct? In the earlier data sets, I mounted both receivers on a single rover which forced the solution to be a circle of fixed radius. In that case any point that fell off the circle must be an error and was quite easily spotted. In this case, verifying the data is more difficult because we don’t know what the correct solution is. I don’t think there is any one easy answer to this question, but there are some things we can do to gain confidence in the results. If we run the solution with different input configurations and different sets of satellites and get the same answer, that helps. Sometimes errors are more obvious in the vertical component since it tends to be more constrained. For example, if you drive in circles and the starting point varies by 10 cm in the x or y direction that could be normal variation, but if you end up 10 cm deeper every circle something is probably wrong.

The best single test I was able to come up with is to compare two solutions, one with ambiguity resolution mode set to fix-and-hold and the other with it set to continuous. In continuous AR mode, fixes are much more independent of each other since there is no direct input to the kalman filter from the result of the fixes. That is not true of fix-and-hold, which feeds information from the previous fix into the kalman filter to help find subsequent fixes. In general, most problems with erroneous fixes occur when fix-and-hold is enabled. So the goal won’t be to prove the position is correct, only that fixes are validated as reliably with fix-and-hold enabled as they are with continuous AR mode.

RTKPLOT has a nice feature that makes this comparison quite easy. After running both solutions and then opening them as solution 1 and solution 2 in RTKPLOT, you can then plot the difference using the ‘1-2’ button in the top left of the GUI.

For this particular instance, to make the two runs even more independent and to validate my code changes, I ran the continuous AR case using the original RTKLIB code and configuration without any of my modifications (except enabling dynamics), and ran the fix-and-hold case with my demo3 code and configuration file with the extended fix-and-hold enabled for both GPS and GLONASS satellites.

Below on the left is the continuous AR solution. It finds fixes quite easily while the rover is stationary but after the rover starts moving and cycle slips start occurring it has more trouble and gradually drifts off, getting fewer and fewer fixes. The plot on the right below shows the difference between the two solutions. The green indicates where both solutions had fixes, and the yellow where one or both were not fixed.

argeles2

As you can see, every time both solutions get a fix, they differ by less than a couple centimeters in the x and y directions and a little more in the z direction. These are usually quite acceptable and too small to be caused by invalid fixes since one cycle is about 20 cm. We can not directly confirm the positions for which the continuous AR mode did not get fixes but based on the continuity of the points in between they are most likely correct as well.  So, at least for this data set I believe the fix-and-hold has reliably given us the correct position.

What happens with this test if there are erroneous positions in the fix-and-hold data? On the left below is an example of another fix-and-hold solution of the same data set. Although this one also has 100% fix after the initial acquire, it is definitely incorrect. You can see after six passes on the road, the car is now half a meter higher than it started! Comparing it with the same continuous AR case as above gives the plot on the left, where it is even more obvious that it is not correct since the green dots are all significantly distant from zero.

argeles3

I was able to create the incorrect solution above simply by delaying the starting point of the solution from the beginning of the data where the car was stationary and there were no cycle slips, to a point later in time where the car was moving and there were numerous cycle slips. You can see in the above plots that the solution starts at 7:16, where previously it started at 7:00. There is nothing magic about 7:16, I just moved the starting point around until I found a point where it broke the solution.

This brings up a very important point. Fix-and-hold is always going to be most vulnerable to error before it has acquired its first fix. There are things we can do to reduce this vulnerability but that will always be its weakest point. For that reason, I believe it is essential when using fix-and-hold, at least in it’s current state, to start it only on clean data where the rover is stationary and the skies are unobstructed. This would normally be done by turning the receivers on and leaving them in a fixed, open-sky position for some amount of time before letting the rover move. How long this is will depend on the quality of the antennas, whether GLONASS satellites are enabled, etc. If the data was being processed real-time, it would be easy to use an LED to indicate when fix-and-hold has locked to the data. I imagine some people are already doing this.

I’ve added this data and the configuration file I used with it to my library of data sets available here in the argeles_car folder in case anyone else wants to experiment with it.

Fix-and-Hold extended to GLONASS and SBAS

I have just added a demo3  branch to my GitHub repository with a couple of new features.  The first is an extension of the fix-and-hold feature to enable ambiguity resolution for the GLONASS and SBAS satellites. This is what I will discuss in this post. I have also added a ambiguity resolution filter to prevent misbehaving satellites from being added to the set of satellites used for AR but I will leave that to a future post.

A couple posts back I discussed double differences of the raw data and the kalman filter phase-bias states, in the context of cycle slips. I will do something similar here but this time focusing on the initial acquire, when the solution first locks on to the correct set of integer ambiguities.

First, let’s go over a little background material.

The kalman filter can have a varying number of states depending on the input configuration settings but the most important states are the position states and the phase-bias states. The position states are estimates of the receiver position. The phase-bias states are estimates of the missing offsets that need to be added to the carrier phase measurements. There is one for each satellite, and they represent single differences between measurements from the two receivers. Subtracting one phase-bias state from another results in a double difference, which is what we looked at in the earlier post. If all error sources are counted for exactly, then the double differences will always be integers for geometric reasons.

At the end of each epoch, after the phase-bias states have been updated, the integer ambiguity resolution algorithm attempts to map the set of double differences float values to a set of integers. If it can do that with enough confidence to meet the ambiguity resolution criteria, then we have a fix and the plot line in RTKLPOT will switch from yellow to green. RTKLIB then uses the phase-bias estimates to update the position states. If we have a successful ambiguity resolution, the position states are updated from a position derived from the set of integers and is referred to as a fixed solution. If the ambiguity resolution was unsuccessful, the position update is derived from the non-integer phase-bias values instead and is referred to as a float solution.

In the example below, from RTKPLOT, the first successful ambiguity resolution occurs between 17:30 and 17:31, at which point you can see the switch from yellow to green, and the sudden jump in position when it is first derived from the integer values instead of the float values.

acquire1

Unlike the position states, the phase-bias states are always updated from the float values and not the integer values, whether a successful fix occurred or not. Hence there is no corresponding jump in the double difference phase-biases when the first fix occurs. You can see this in the plot of a phase-bias double difference plot below for the same example as above.  There is no jump where the first fix occurred between 17:30 and 17:31 in either plot line. The red line is for a run with continuous ambiguity resolution, the blue line for one with fix-and-hold enabled.

acquire2
So that’s a quick attempt to describe the ‘fix” in “fix-and-hold” What about the hold?

In the plot above, you can see in the red line for the continuous AR run that there is no discontinuity at all. The phase-bias double difference gradually and continuously approaches the the actual double difference integer of 4. That is what happens when there is no “hold”. When fix and hold is enabled however, and the hold criteria are all met, then an additional update is made to the phase bias states using the set of integer values from the ambiguity resolution. This is a very high gain update and the response in the phase-bias filter states is almost immediate. In the plot above, hold is enabled between 17:31 and 17:32, and you can see the response in the sudden jump in the blue line.

In the plot below, the same phase-bias DD is plotted on the left with fix-and-hold enabled. The lower plot is zoomed into when the hold occurs. You can see the correction is nearly immediate and the result is very close to the actual integer value on the first correction. This is for a GPS satellite. The significant errors for the GPS satellites are all accounted for,at least for short baselines, and so the initial correction can be quite accurate.  The corrections continue every epoch for which the hold criteria are met, and if the remaining error is visible to the kalman filter, the value will continue to converge to the integer value.

acquire3

acquire4
Below is the same plot for a GLONASS satellite. In this case, there is a significant unaccounted for error in the double difference, namely the inter-channel bias that we’ve discussed before, and which comes from the fact that the GLONASS system uses multiple frequencies. This error is invisible to the kalman filter and hence the double difference never converges to the actual integer value. You can see below, the jump to near zero from the hold update, but then the line does not converge towards zero after that.

acquire5
But we know the double difference really is an integer, and at this point we should have accounted for all the other significant errors, so it might not be entirely unreasonable to assume this remaining difference is caused by the inter-channel bias and remove it, especially if we remove it slowly enough that the removal does not interact with any other feedback path.  We need to be careful how we handle it though. It won’t work to try and simply push the bias-estimate to the nearest integer. If we did that, the feedback loop would see the error and push it back. Instead we create a new array of variables to hold the inter-channel biases and move the error for each GLONASS satellite into this array. We can then adjust the double differences by these error values before feeding them into the kalman filter. To see exactly how this is done, you can look at the changes in holdamb() where the biases are adjusted and ddres() where the IC biases are removed from the kalman filter errors. To avoid upsetting anything else in the loop, I chose to be quite conservative about how quickly the adjustment occurs. It does not need to be done quickly. The plots below shows the error removed from the phase-bias state at a 0.5% per epoch rate (red) and a 1% (yellow) rate.  Again, the lower plot is zoomed in to after the hold.

acquire6

acquire7

This is fast enough to remove enough error to resolve the GLONASS ambiguities in 1 to 2 minutes after the GPS ambiguities have been resolved.  If necessary it could probably be done much more quickly.   Note that we do need to wait until the GPS ambiguities are resolved before beginning this process because before that point we have not yet accurately determined the other error terms.

Here’s an example ambiguity resolution output from the trace file for a run with this feature enabled.

acquire8

Each double difference is made up of a reference satellite and a fix satellite. They are listed in the first two lines. “N(0)” is the float solution for the double differences. “N(1)” is the set of integers with lowest residuals, and “N(2)” is the set of integers with the second lowest residuals. “Ratio” is the ratio of the residuals between the best two sets. In general, the larger this number, the more confidence we have in the solution. Satellite numbers less than or equal to 32 are GPS, 33 to 59 are GLONASS. In this example there are seven GPS satellite pairs, and four GLONASS pairs. The other three pairs are SBAS satellites paired with a GPS satellite. I’m not sure why SBAS satellites are not usable without this feature since they use the same single frequency as the GPS satellites, but at least with the NEO-M8N, I have found that until I added this feature, I had to disable them. Unfortunately, unlike the GLONASS satellites, RTKLIB does not have a mechanism to allow SBAS satellites to be used for positioning without ambiguity resolution so they have to be removed completely.

In this example, after adding the SBAS satellites, enabling this feature has doubled the number of satellite pairs used for ambiguity resolution, and increased the number used for positioning by almost 25%.

So far, I’ve tried this on several data sets and have consistently resolved the ambiguities on the GLONASS and SBAS satellites, but I would say it is still fairly experimental at this point. If you’d like to try it on your data, you can download the code from the demo3 branch. To enable the feature for the GLONASS satellites, set “pos2-gloarmode” in the config file to “fix-and-hold”. If you want to enable the SBAS satellites as well, set bit1 in “pos1-navsys” .

If you do try it, let me know how it goes. I’d be happy to help if you have any difficulties with it. You can always contact me at rtklibexplorer@gmail.com.

For those of you that read my previous post on GLONASS integer ambiguity resolution, this method is related to that one, but I think this should do a better job of driving the IC bias errors to zero because of the more continuous, lower gain nature of the updates to the IC biases.

Another Kayak Data Set: Fix-and-Hold Fails Again

Matt from Reefmaster was kind enough to send me a second data set from another two hour kayak session on the water so I thought I’d run it with the latest configuration/code. I ran it with fix-and-hold enabled both with and without GLONASS ambiguity resolution enabled. With GLONASS AR, everything looked great. Without GLONASS AR I got another false fix that corrupted the kalman filter states and caused the rest of the solution to be very poor. At this point I am really starting to question the value of fix-and-hold, but I’m not quite ready to give up on it yet.

The previous two false fixes I looked at had fairly obvious causes. In the first case the fix occurred with only a very small number of valid satellites. In the second case, the error occurred before the kalman filter states had time to converge. This case, unfortunately, does not have any obvious single cause.

Before getting into the details, lets review all of the input configuration constraints that need to be met for fixes and holds to occur. Here they are along with the values I used for this experiment

Fix Constraints:

  • AR ratio > pos2-arthres                                                      (3.0)
  • # of valid satellites > pos2-minfixsats                           (4)
  • # of samples since satellite lock > pos2-arlockcnt    (150)  30 sec*5 samples/sec
  • elevation for each sat > pos2-arelmask                         (15) 

Hold Constraints:

  • # of consecutive fixes > pos2-arminfix                          (50) 10 sec*5 samples/sec
  • # of valid satellites > pos2-minholdsats                        (5)
  • position variance < pos2-arthres1                                   (.002)
  • elevation for each sat > pos2-elmaskhold                    (0)

The constraints in blue are the ones I have either added or fixed. Note that pos2-armaxiter, pos2-arthres2, pos2-arthres3, pos2-arthres4 are listed in the configuration file but are not used by the code.

OK, time for the details. From the trace file I can see that the very first fix attempt after the position variance constraint was met is a false fix. A short time later after the minimum number of consecutive fixes constraint (arminfix)  is met, the hold occurs, and the bad fix is fed into the kalman filter. There were five valid satellites both when the first fix occurred and when the hold occurred.   

To make this easier to see, I have added some additional writes to the trace file into the code.  This includes the satellites used for fix and the position variance.  N(0) is the float solution, N(1) is the lowest residual fixed solution, and N(2) is the 2nd lowest residual fixed solution.

Output from trace file:

3 P[0]=0.001998
3 ddmat :
3 refSats= 2 2 2 2
3 fixSats= 5 7 9 30
3 N(0)= -18.475 -9.131 -6.555 16.554
3 N(1)= -19.000 -9.000 -7.000 17.000
3 N(2)= -17.000 -6.000 -1.000 18.000
3 resamb : validation ok (nb=4 ratio=4.56 s=82.25/375.07)

By plotting the solutions using “fix-and-hold” and “continuous” modes I can see that in continuous mode, the false fix lasted for approximately 16 seconds (blue/yellow) before unlocking. With fix-and-hold enabled, the false fix continued for much longer (blue/green).

kayak_false_lock

From these observations it is apparent that the false hold could be prevented by adjusting any one of several input parameters. Specifically, increasing arthres, arthres1, arminfix, or minholdsats would avoid the false hold.

With just one example it is difficult to know which is the best parameter or parameters to adjust.  I chose to increase minholdsats from 5 to 6, and increase arminfix from 10 seconds to 20 seconds. Since, in this example, there were 5 valid satellites, and the false fix lasted for 16 seconds, either change by itself will avoid the erroneous hold, but we’ll change both.  By making these changes, fix-and-hold will get invoked less, but we will have higher confidence that it is not being invoked in error.

Rerunning with these changes and then calculating my standard metrics for the results produced the following plot, where the cases on the x-axis are:

  1. GLONASS AR off, fix-and-hold 
  2. GLONASS AR off, continuous
  3. GLONASS AR on, fix-and-hold
  4. GLONASS AR on, continuous

 

kayak2_metrics

As you would expect, the best results occurred when both GLONASS AR and fix-and-hold were enabled (case 3). However, I would be very careful enabling fix-and-hold in any real application without some significant testing and probably some adjustment of the input parameters to match your particular environment.

 

 

A second raw GPS data set: Fix-and-Hold issues again

In the previous series of posts I focused on a particular data set taken while driving a car around a parking lot with both receivers mounted on top of the car. By analyzing the data and making various changes to the input configuration and to the code I was able to noticeably improve the solution results. Now it’s time to try those changes on another data set to begin to find out how universal those improvements really are. After all, it’s very easy to deceive yourself when using the same data to evaluate a change that you used to develop it in the first place. I expect it will take several iterations of improvements over several data sets before finding something that is consistently reliable.

I have been given a very nice data set for this purpose. It was taken with two Emlid Reach receivers mounted fore and aft on a kayak while paddling around in the ocean. The Reach receivers are a fully integrated RTK solution built around Ublox M8T receivers  and although I have not used them myself, they look to be a good choice for someone who wants to get up to speed with working hardware very quickly. They are a little expensive to meet my goal of “ultra-low cost” but still much lower priced than any traditional alternative. (Related note: Emlid has kindly offered to send me a couple of their receivers to play with and I plan to take them up on their offer at some point here in the fairly near future.)

The data I have is from a small company called Reefmaster that provides software solutions for processing sonar images. (Thanks to Matt for providing me with the data and helping analyze the results.) They have some very cool looking sonar mosaics on their website created by stitching together multiple sonar scans. They hope to use RTKLIB to improve the location tagging of the sonar scans to improve the stitching process. This could be an ideal application for RTKLIB; open sky visibility combined with relatively slow velocities and accelerations from the boat. These sort of well defined, friendly environments are, at least in the short term, where I believe RTKLIB is going to be most successful. Sometimes it seems that people are expecting too much from these low cost receivers, using them in very challenging environments, and then being disappointed when they don’t perform well.

As in the previous data set, both receivers were mounted on the rover so I can evaluate the accuracy of the solution by measuring the deviation from a perfect circle in the difference between the two receivers. In this case I really should use a sphere instead of a circle because, unlike the previous example, there is occasionally a significant elevation variation between the two receivers. Most of this occurs when the kayak is being rolled up and down over an embankment to get it in and out of the water. For now, I’ll just focus on the data taken while on the water and ignore the extra dimension.

Let’s start by running with the final code and configuration we ended up with in the demo1 code which is available in my Github repository (For now I am going to put aside the recent experiments I made with “fix-and-bump” and GLONASS AR). The ground track result is plotted below. This is not a good start! It’s better than the original default configuration, but not by much.

demo2_a

I’m always suspicious of fix-and-hold because of it’s tendency to lock to an incorrect fix so let’s first turn it off and see what happens. In the plot below, the solution was run after “pos2-armode” in the configuration file was changed from “fix-and-hold” to “continuous”.  Again, I am only plotting the time period when the kayak was on the water.

 

demo2_b

Much better! It looks like fix-and-hold is the culprit. In the demo1 changes, I added a couple of criteria (minfixsats, minholdsats) to try and prevent exactly this kind of issue but clearly my changes were not sufficient.

A quick analysis of the fix-and-hold enabled results shows that the fix criteria was met very early, while there was still large errors in all the kalman states. When fix-and-hold was disabled, these errors were only temporary, but when fix-and-hold was enabled, the states corrupted by the incorrect fix were locked in a short time later when the fix-and-hold criteria were met, and the solution never recovered. Based on the results obtained with fix-and-hold disabled, the rest of the changes in the demo1 code/configuration look like they are working well on this data set.

It really does not make sense to look for a fix while all the states are still converging so it might be reasonable to disable integer ambiguity resolution for some period of time at the beginning of the data until the states have time to converge. In this article, Carcanague lists disabling integer ambiguity resolution for the first three minutes of data as part of his procedure.

Rather than picking an arbitrary time to elapse before enabling AR, I have chosen to wait until the variance in the position solution drops below a threshold. This seems like a little more direct measurement of what we are interested in, and will get re-engaged if we get a severe disturbance and all phase-biases get reset, unlike a pure time-from-start criteria.

Interestingly enough, in the 2.4.3 release of RTKLIB, four new parameters have been added to the config file: pos-arthres1, pos-arthres2, pos-arthres3, and pos-arthres4. Apparently someone else was thinking that integer ambiguity resolution needed some additional constraints. At the moment, though, these new parameters are unused in the code, so we will co-opt pos-arthres1 for our own use.

I modified the function resamb_LAMBA as shown below to add one more condition that must be met before invoking integer ambiguity resolution. rtk->P[0] is the variance of the x direction of the position state and rtk->opt.thresar[1] is the variable that pos-arthres1 is copied into.

Before:

if (rtk->opt.mode<=PMODE_DGPS||rtk->opt.modear==ARMODE_OFF||
     rtk->opt.thresar[0]<1.0) {
     return 0;
}

After:

if (rtk->opt.mode<=PMODE_DGPS||rtk->opt.modear==ARMODE_OFF||
     rtk->opt.thresar[0]<1.0 || rtk->P[0]>=rtk->opt.thresar[1]) {
     return 0;
}

I arbitrarily set pos-arthres1 to 0.002 because it was a round number that very roughly corresponded to three minutes into the data set.  The standard deviations of the positions are output to the .pos solution file along with the position values.  We get the variance by squaring the standard deviation.  Here is the variance of x plotted for the first 300 secs of this data set.

demo2_var

 

Re-running with fix-and-hold enabled and with the new code gave the following result.

 

demo2_c

Very similar to the previous result, but with a slightly earlier first correct fix because we have removed the very early bad fix.

Overall, quite good. I suspect this is not the last additional constraint we will need to add to fix-and-hold before we are done!

 

RTKLIB: Thoughts on Fix-and-Hold

In a previous post I briefly discussed the difference in RTKLIB between the “Continuous” mode and “Fix-and-Hold” mode for handling integer ambiguity resolution. In this post, I’d like to dig a little deeper into how fix-and-hold works and discuss some of its advantages and disadvantages, then introduce a variation I call “Fix-and-Bump”.

Each satellite has a state in the kalman filter to estimate it’s phase-bias. The phase-bias is the number of carrier-phase cycles that needs to be added to the carrier-phase observation to get the correct range. By definition, the actual phase-biases must all be integers, but their estimates will be real numbers due to errors and approximations. Each epoch, the double-difference range residuals are used to update the phase-bias estimates. After these have been updated, the integer ambiguity resolution algorithm evaluates how close the phase-bias estimates (more accurately, the differences between the phase-bias estimates) are to an integer solution. In general, the closer the estimates are to an integer solution, the more confidence we have that the solution is valid. This is because all the calculations are done with real numbers and there is no inherent preference in the calculations towards integer results.

It is important to understand that the validation process (integer ambiguity resolution) used to determine if the result is high quality (fixed) or low quality (float) is based entirely on how far it deviates from a set of perfect integers and not at all on any inherent accuracy. It doesn’t matter how wrong a result may be, if it is all integers, then the integer ambiguity resolution will deem it a high quality solution. We rely on the fact that it is very unlikely that an erroneous solution will align precisely to any set of integers.

Therefore, for the fixed solution vs float solution distinction to be reliable, it is key that we don’t violate the assumption that there is no inherent preference in the calculations toward integer results. We can use our extra knowledge that the answer should be a set of integers either to improve our answer or to verify our answer, but it is not really valid to do both. Unfortunately, that is exactly what fix-and-hold does. It uses the difference between the fixed and float solutions to push the phase-bias estimates toward integer values, then uses integer ambiguity resolution to determine how good the answer is based on how close to integers it is. This will usually improve the phase-bias estimates since it is very likely the chosen integers are the correct actual biases, but it will always improve the confidence in the result as determined by the integer ambiguity resolution, even when the result is wrong. Thus we have effectively short-circuited the validation process by adding a preference in the calculations for integer results.

This does not mean fix-and-hold is a bad thing, in fact most of the time it will improve the solution. We just need to be aware that we have short-circuited the validation process and have compromised our ability to independently verify the quality of the solution. It also means that it is easy to fool ourselves into thinking that fix-and-hold is helping more than it really is. For example, two of the metrics I have been using to evaluate my solutions, percent fixed solutions, and median AR ratio, are relatively meaningless once we have turned it on since they may improve regardless of the actual quality of the solution. The value of a particular solution is a combination of its accuracy and our confidence in that accuracy. Fix-and-hold gives up confidence in the solution in exchange for improvement in its accuracy.

Is it possible that fix-and-hold is too much of a good thing? Once it is turned on the phase-bias adjustments towards the integer solutions are done every epoch, tightly constraining the phase-biases and making it very difficult to “let go” of an incorrect solution. What if the fix-and-hold adjustments were done only for a single epoch when we first meet the enabling criteria, then turned off not used again until we lose our fix and meet the criteria again? Instead of “holding” on to the integer solution after a fix, we “bump” the solution in the right direction, then let go. I will call this “Fix-and-Bump”. Note that the “cheating” effect of adjusting the biases will be retained in the phase-bias states so turning off fix-and-hold is not enough to immediately re-validate the integer ambiguity resolution. I suspect however that erroneous fixes will not be stable and without the continuous adjustments from fix-and-hold, they will drift away from the integer solutions fairly quickly.

A good analogy might be an electric motor in a circuit with a 5 amp fuse and an 8 amp starting current. Short-circuiting the fuse (fix-and-hold) will make the motor run but a malfunction (erroneous fix) after the motor has started could cause a fire. Short-circuiting the fuse just long enough to start the motor (fix-and-bump) would also make the motor run but would be much safer in the case of a malfunction (erroneous fix) after the motor has started.

Below I have plotted the fractional part one of the phase-bias estimate differences for the three cases. The discontinuity at 180 seconds is where fix-and-hold and fix-and-bump were enabled. The plot on the right is just a zoomed-in version of the plot on the left.

phbias1a

Here is the solution stats for each run. (1=float, 2=fix,3=hold)

phbias1b.png

As you can, even a single adjustment (fix-and-bump) gives us an estimate of the phase-bias that is very close to what we get if we make adjustments every epoch (fix-and-hold). This supports the idea that it may be possible to get most of the benefit of fix-and-hold while avoiding some of the downside.

At this point, this is more of a thought exercise, than a proposal to change RTKLIB but I think it may be worth considering. I hope to evaluate this change further when I have more data sets to look at.

I will use this feature though in my next post, though, which is one reason I’ve devoted a fair bit of time to it here. In that post I will extend fix-and-bump to the GLONASS satellites. Instead of using the feedback short-circuit path (as I’ve called it) to improve the phase-bias estimates, I will use it to estimate the inter-channel biases, then enable GLONASS integer ambiguity resolution. Not having the inter-channel bias values is what currently prevents integer ambiguity resolution (gloarmode) for the GLONASS satellites from working. Fix-and-bump should provide better evaluation of the quality of the combined GPS/GLONASS solutions.  That’s enough for now … I’ll leave the rest to the next post.

Note:  Readers familiar with the solution algorithms will notice that I have left out certain details and that some of my statements are not quite precisely correct. For the most part this was done intentionally to keep things focused but I believe the gist of the argument is valid even when all the details are included. Please let me know if you think I have left anything important out.

Improving RTKLIB solution: Fix-and-Hold

Integer ambiguity resolution is done by RTKLIB after calculating a float solution to the carrier phase ambiguities to attempt to take advantage of the fact that the carrier phase ambiguities must be integers. Constraining the solution to integers improves both the accuracy and convergence time.

RTKLIB provides three options for the integer ambiguity resolution: instantaneous, continuous, and fix-and-hold. “Instantaneous” is the most conservative and relies on recalculating the phase bias estimates every epoch. “Continuous” is the default option and uses the kalman filter to estimate the phase biases continuously over many epochs. In general, this provides much less noisy estimates of the phase biases and is essential for working with low cost receivers. The downside of the continuous solution is that if bad data is allowed to get into the filter outputs it can remain for multiple epochs and corrupt subsequent solutions, so some care must be taken to avoid that.

Fix-and-Hold” takes the concept of feeding information derived from the current epoch forward to subsequent epochs one step farther. In the “Continuous” method, only the float solution is used to update the phase bias states in the kalman filter. In the “Fix-and-Hold” method, an additional kalman filter update is done using pseudo-measurements derived from the fixed solution. This is only done if the fixed solution is determined to be valid. All of this is explained in greater detail in Appendix E.7(5) of the RTKLIB user manual.

In general, taking advantage of all the information we have this epoch to improve the solution for the following epochs seems to be a good idea, especially given the high levels of noise in the low-cost receivers and antennas. Even more so than in the “Continuous” method, however, we need to be careful about feeding erroneous data into the kalman filter which may take a long time to work its way out.

Testing the “Fix-and-Hold” method on a few data sets shows the results we would expect, the solution usually does improve but can occasionally cause it to have difficulty recovering from a period of bad data where we have corrupted the phase-bias estimates.

To minimize the chance of this happening, we need to do a better job of validating the solution before feeding it forward. Looking at a couple examples of bad data, what I find is that the errors are introduced when solutions are derived from a very small number of valid satellites. The current integer ambiguity validation is based only on the ratio of residuals between the best solution and the second best solution and does not take into account the number of satellites used to find the solutions.

I have introduced two new input parameters, “minfixsats”, and “minholdsats” into the code. These set the number of satellites required to declare a fixed solution as valid, and the number of satellites required to implement the fix-and-hold feature. I have found setting the minimum number of satellites for fix to 4 and fix-and-hold to 5 seems to work quite well for my data, but I have made them input parameters in the config file to allow them to be set to any value.

I won’t include all the code here, but it is available on GitHub at https://github.com/rtklibexplorer/RTKLIB on the demo1 branch

The two key changes are:

For minfix, in resamb_LAMBDA() from:

if ((nb=ddmat(rtk,D))<=0) {
errmsg(rtk,”
no valid double-difference\n”);

to:

if ((nb=ddmat(rtk,D))<(rtk->opt.minfixsats-1)) { // nb is sat pairs
    errmsg(rtk,”not enough valid double-differences\n”);

and for minhold, in holdamb() from:

if (nv>0) {
    R=zeros(nv,nv);
    for (i=0;i<nv;i++) R[i+i*nv]=VAR_HOLDAMB;

    /* update states with constraints */
    if ((info=filter(rtk->x,rtk->P,H,v,R,rtk->nx,nv))) {
    errmsg(rtk,”filter error (info=%d)\n”,info);
}

to:

if (nv>=rtk->opt.minholdsats-1) { // nv=sat pairs, so subtract 1
    R=zeros(nv,nv);
    for (i=0;i<nv;i++) R[i+i*nv]=VAR_HOLDAMB;

    /* update states with constraints */
    if ((info=filter(rtk->x,rtk->P,H,v,R,rtk->nx,nv))) {
    errmsg(rtk,”filter error (info=%d)\n”,info);
}

Again, the changes to the position solution are subtle and difficult to see in a plot, but are much more obvious in the Ratio Factor for AR validation. A higher ratio represents greater confidence in the solution. The green is before the changes, the blue is after. The RTKLIB code limits the AR ratio to a maximum of 1000.

ar2

As you can see, there is a very significant increase in the AR ratio with the change.  The improvement is coming from switching the ambiguity resolution from continuous to fix-and-hold, not from the code changes which we made to try and reduce the occasional occurence of corrupting the filter states with bad data.

Let’s also look at the metrics described earlier for both this change and the bug fix described in the previous post. On the x-axis, 1-3 are from before the two previous changes, 4 is with the bias sum bug fix and 5 is with fix-and-hold and the min sat thresholds for fix and hold.

metrics2

From the metric plots you can see that at this point we have maxed out the % fixed solutions and median AR ratio metrics and have got the distance metric down below 0.5 cm so I will stop here and declare success, at least for this data set.

If you’d like to try this code with your data, the code for all the changes I have made so far are available at https://github.com/rtklibexplorer/RTKLIB on the demo1 branch.

If anyone does try this code on their own data sets please let me know how it goes. I’m hoping these changes are fairly universally useful, at least for open skies and low velocities, but the only way to know is to test it on multiple data sets.

Update 4/17/16:  Enabling the “fix-and-hold” feature on other data sets has had mixed success.  I still see “fix-and-hold” locking to bad fixes and corrupting the states.  It looks like the extra checks I added are not sufficient to prevent this.  So, for know,I recommend turning off this feature, or if you do use it, be aware that it can cause problems.  I hope to take another look at this soon.