A second raw GPS data set: Fix-and-Hold issues again

In the previous series of posts I focused on a particular data set taken while driving a car around a parking lot with both receivers mounted on top of the car. By analyzing the data and making various changes to the input configuration and to the code I was able to noticeably improve the solution results. Now it’s time to try those changes on another data set to begin to find out how universal those improvements really are. After all, it’s very easy to deceive yourself when using the same data to evaluate a change that you used to develop it in the first place. I expect it will take several iterations of improvements over several data sets before finding something that is consistently reliable.

I have been given a very nice data set for this purpose. It was taken with two Emlid Reach receivers mounted fore and aft on a kayak while paddling around in the ocean. The Reach receivers are a fully integrated RTK solution built around Ublox M8T receivers  and although I have not used them myself, they look to be a good choice for someone who wants to get up to speed with working hardware very quickly. They are a little expensive to meet my goal of “ultra-low cost” but still much lower priced than any traditional alternative. (Related note: Emlid has kindly offered to send me a couple of their receivers to play with and I plan to take them up on their offer at some point here in the fairly near future.)

The data I have is from a small company called Reefmaster that provides software solutions for processing sonar images. (Thanks to Matt for providing me with the data and helping analyze the results.) They have some very cool looking sonar mosaics on their website created by stitching together multiple sonar scans. They hope to use RTKLIB to improve the location tagging of the sonar scans to improve the stitching process. This could be an ideal application for RTKLIB; open sky visibility combined with relatively slow velocities and accelerations from the boat. These sort of well defined, friendly environments are, at least in the short term, where I believe RTKLIB is going to be most successful. Sometimes it seems that people are expecting too much from these low cost receivers, using them in very challenging environments, and then being disappointed when they don’t perform well.

As in the previous data set, both receivers were mounted on the rover so I can evaluate the accuracy of the solution by measuring the deviation from a perfect circle in the difference between the two receivers. In this case I really should use a sphere instead of a circle because, unlike the previous example, there is occasionally a significant elevation variation between the two receivers. Most of this occurs when the kayak is being rolled up and down over an embankment to get it in and out of the water. For now, I’ll just focus on the data taken while on the water and ignore the extra dimension.

Let’s start by running with the final code and configuration we ended up with in the demo1 code which is available in my Github repository (For now I am going to put aside the recent experiments I made with “fix-and-bump” and GLONASS AR). The ground track result is plotted below. This is not a good start! It’s better than the original default configuration, but not by much.

demo2_a

I’m always suspicious of fix-and-hold because of it’s tendency to lock to an incorrect fix so let’s first turn it off and see what happens. In the plot below, the solution was run after “pos2-armode” in the configuration file was changed from “fix-and-hold” to “continuous”.  Again, I am only plotting the time period when the kayak was on the water.

 

demo2_b

Much better! It looks like fix-and-hold is the culprit. In the demo1 changes, I added a couple of criteria (minfixsats, minholdsats) to try and prevent exactly this kind of issue but clearly my changes were not sufficient.

A quick analysis of the fix-and-hold enabled results shows that the fix criteria was met very early, while there was still large errors in all the kalman states. When fix-and-hold was disabled, these errors were only temporary, but when fix-and-hold was enabled, the states corrupted by the incorrect fix were locked in a short time later when the fix-and-hold criteria were met, and the solution never recovered. Based on the results obtained with fix-and-hold disabled, the rest of the changes in the demo1 code/configuration look like they are working well on this data set.

It really does not make sense to look for a fix while all the states are still converging so it might be reasonable to disable integer ambiguity resolution for some period of time at the beginning of the data until the states have time to converge. In this article, Carcanague lists disabling integer ambiguity resolution for the first three minutes of data as part of his procedure.

Rather than picking an arbitrary time to elapse before enabling AR, I have chosen to wait until the variance in the position solution drops below a threshold. This seems like a little more direct measurement of what we are interested in, and will get re-engaged if we get a severe disturbance and all phase-biases get reset, unlike a pure time-from-start criteria.

Interestingly enough, in the 2.4.3 release of RTKLIB, four new parameters have been added to the config file: pos-arthres1, pos-arthres2, pos-arthres3, and pos-arthres4. Apparently someone else was thinking that integer ambiguity resolution needed some additional constraints. At the moment, though, these new parameters are unused in the code, so we will co-opt pos-arthres1 for our own use.

I modified the function resamb_LAMBA as shown below to add one more condition that must be met before invoking integer ambiguity resolution. rtk->P[0] is the variance of the x direction of the position state and rtk->opt.thresar[1] is the variable that pos-arthres1 is copied into.

Before:

if (rtk->opt.mode<=PMODE_DGPS||rtk->opt.modear==ARMODE_OFF||
     rtk->opt.thresar[0]<1.0) {
     return 0;
}

After:

if (rtk->opt.mode<=PMODE_DGPS||rtk->opt.modear==ARMODE_OFF||
     rtk->opt.thresar[0]<1.0 || rtk->P[0]>=rtk->opt.thresar[1]) {
     return 0;
}

I arbitrarily set pos-arthres1 to 0.002 because it was a round number that very roughly corresponded to three minutes into the data set.  The standard deviations of the positions are output to the .pos solution file along with the position values.  We get the variance by squaring the standard deviation.  Here is the variance of x plotted for the first 300 secs of this data set.

demo2_var

 

Re-running with fix-and-hold enabled and with the new code gave the following result.

 

demo2_c

Very similar to the previous result, but with a slightly earlier first correct fix because we have removed the very early bad fix.

Overall, quite good. I suspect this is not the last additional constraint we will need to add to fix-and-hold before we are done!

 

Advertisements

RTKLIB: GLONASS Ambiguity Resolution

Several readers now have suggested that adding inter-channel bias calibration to RTKLIB would be a worthwhile enhancement so I thought I would give it a shot, or at least explore the idea.

Inter-channel biases are what make integer ambiguity resolution more difficult on the GLONASS satellites than on the GPS satellites. In the GPS system, double-differencing the carrier-phase observations is a very powerful trick that causes the largest sources of error to all cancel each other out. These errors include the satellite and receiver clock biases, as well as the majority of the ionospheric and tropospheric delays and ephemeris errors. The remaining errors are quite small, meaning that the estimates of the unknown phase-biases we are trying to solve for will be quite close to integer values. We can then use integer ambiguity resolution to select the most likely set of integers, thus rejecting most of the remaining error.

Unfortunately, unlike the GPS system in which all satellites operate at the same frequency, in the GLONASS system, the satellites use a set of multiple frequencies. The different frequencies introduce an additional set of errors from the receiver into the solution, called the inter-channel biases. There will be a different bias for each frequency but the error between two satellites should be roughly proportional to the frequency difference between the two satellites. For a more complete discussion, see here

A quite detailed description of at least one method to solve for the inter-channel biases can be found here. Fitting this solution into the existing RTKLIB architecture, however, looks to be quite challenging and more than I really want to take on, but I suspect this might be the “right” way to do it.

Instead, I decided to try a “feedback short-circuit” similar to what is is used in the RTKLIB fix-and-hold feature. If you haven’t already done so, you might want to read my previous post in which I discuss that feature and a modification I made to it I call “fix-and-bump” which I will use here.

Fix-and-hold uses the fact that we know the correct answer must be a set of integers to push the phase-bias estimates towards integer values, using the difference between fixed and float solutions as feedback, rather than deriving it from physical principles. My premise is that just as we can blindly adjust the phase-biases estimates based on the difference between fixed and float solutions, we ought be able to do something similar for the inter-channel biases.

The adjustments made to the phase-biases during fix-and-hold are done in the holdamb() function in the rtkpos.c file in RTKLIB. This function is called only when we have a high confidence in our fixed solution, to avoid chasing an incorrect solution. The key line of code from this function is

    v[nv]=(xa[index[0]]-xa[index[i]])-(rtk->x[index[0]]-rtk->x[index[i]]);

where:

v[] is a pseudo-innovation (or pseudo measurement residual) used to update the     kalman filter
xa [] is the set of fixed solution phase-bias estimates
rtk->x[] is the set of float solution phase-bias estimates
index[i] is the satellite to update
index[0] is the reference satellite.

So what this line of code does is generate a double-differenced error input to the kalman filter for each satellite based on the difference between the fixed solution phase-biases and the float solution phase-biases. This will drive the phase-bias states towards a set of integers, with all the advantages and disadvantages as discussed in the previous post.

So far I have just reviewed what is in the existing RTKLIB code. Now lets talk about an addition to the fix-and-hold code for the GLONASS satellites. It is reasonable to drive the phase-bias estimates for the GPS satellites towards integer values because we have accounted for all the significant errors (mostly through cancellation in the double-differences) and so we expect them to be integers. On the other hand, we can not expect the phase-bias estimates for the GLONASS satellites to converge towards integer values since we have not accounted for the errors from the inter-channel biases. These are different for each satellite and hence do not cancel. So, in order to account for them, I add an additional variable for each satellite to hold the estimated inter-channel biases. These values are subtracted from the measurement residuals every epoch when calculating the kalman filter innovations, just as the other error terms are. To do things properly I should make them kalman filter states, but for now they are just stand-alone variables.

The inter-channel bias estimates are initialized to zero and then only adjusted during a fix-and-hold update. The new update is done after the existing fix-and-hold updates are made to the phase-biases for the satellites with fixed solutions. At this point, we assume that all the remaining error in the GLONASS phase-bias estimates is coming from the inter-channel biases, so we move that error from one to the other. To make the new update, the fractional components of the GLONASS phase-bias estimates are removed (subtracted) from the phase-bias estimates and moved (added) to the inter-channel bias variables. Ideally we would subtract the fixed solutions from the float solutions as we did with the GPS satellites, but since we don’t have fixed solutions yet for the GLONASS satellites, we use our best guess for the fixed solution, which is just the closest integer. In a final implementation, this process would be iterated over multiple epochs to improve the estimates but at this point I am only making a single adjustment to simply evaluation of the solution.

These are the key lines of code added to holdamb() to do this:

dd=rtk->x[IB(j+1,f,&rtk->opt)]-rtk->x[IB(i+1,f,&rtk->opt)];
dd=dd-ROUND(dd); // throw out integer component
rtk->x[IB(j+1,f,&rtk->opt)]-=dd; // remove fractional part from phase bias
rtk->ssat[j].icbias[f]+=dd; // and move to IC bias

With the inter-channel biases now accounted for, we can turn on integer ambiguity resolution for the GLONASS satellites (gloarmode=1) Since we have forced the phase-bias estimates to integers, the integer ambiguity resolution criteria will always be satisfied initially, regardless of the quality of the estimates. If we then let the system run without any more fix-and-hold updates, meaning we have removed any preference in the solution calculations for integer results, the solution should hopefully either stay converged if it correct, or drift away from integer values if it is not. To do this,I modified fix-and-hold to occur only once (“fix-and-bump”) instead of every epoch, as described in my previous post.

So how well did it work? Below are some plots taken after making the code changes described above. The first plots are for a baseline taken with GPS ambiguity resolution enabled and GLONASS disabled, as RTKLIB is typically run. I have modified the meanings of green and yellow in the RTKPLOTs of residuals. Green now represents a satellite that was successfully used for ambiguity resolution, yellow represents a satellite that was valid but not successfully used for ambiguity resolution. If no fixed solution was found, then all satellites will be yellow. The GPS satellites are on the left, the GLONASS satellites on the right. You can see the GPS satellites are yellow during the initial lock-on and occasionally after a cycle slip but otherwise stay green, while the GLONASS satellites are always yellow because ambiguity resolution is disabled for GLONASS.

gloamb1

Now I make a second run with ambiguity resolution disabled for GPS (left) and enabled for GLONASS (right). I do this only for evaluation purposes, normally we would run with both enabled. Note that we still require GPS ambiguity resolution to calibrate the inter-channel biases which shows up below at the start of the plots.

gloamb2

As you can see, the solution remains fixed when using only the GLONASS satellites for ambiguity resolution for a significant period of time and the residuals remain small. This is very encouraging.

To further verify things are working the way we think they are, lets look at the output traces for three runs, GPS AR only, GLONASS AR only, and GPS+GLONASS AR. I’ve added some trace output to list the satellite pairs used to make it easier to follow. RefSats are the reference satellites for each double difference pair, fixSats are the other satellite in each pair. N(0) is the float solution, N(1) is the fixed solution, and ratio is the AR ratio. All three outputs are for the same epoch but in three different runs. I picked a random epoch from the middle of the run.

Integer ambiguity resolution with GPS satellites (17:56:52)

3 resamb_LAMBDA : nx=98
3 ddmat: gps=1/1 glo=0/0
3 refSats= 10 10 10 10 10 10 10
3 fixSats= 13 15 18 20 21 22 29
3 N(0)= -1021543.012 -583342.043 -321520.024 -1128921.004 -351484.032 -1244853.988 -1342905.008
3 N(1)= -1021543.000 -583342.000 -321520.000 -1128921.000 -351484.000 -1244854.000 -1342905.000
3 N(2)= -1021543.000 -583342.000 -321520.000 -1128921.000 -351484.000 -1244854.000 -1342904.000
3 resamb : validation ok (nb=7 ratio=160.92 s=132.06/21251.08)

 

Integer ambiguity resolution with GLONASS satellites (17:56:52)

3 resamb_LAMBDA : nx=98
3 ddmat: gps=0/0 glo=1/1
3 refSats= 33 33 33 33 33
3 fixSats= 39 40 54 55 56
3 N(0)= -1572711.936 -785246.994 -1482008.966 -496531.985 151149.024
3 N(1)= -1572712.000 -785247.000 -1482009.000 -496532.000 151149.000
3 N(2)= -1572711.000 -785247.000 -1482008.000 -496532.000 151149.000
3 resamb : validation ok (nb=5 ratio=165.97 s=38.82/6443.06)

 

Integer ambiguity resolution with GPS and GLONASS satellites (17:56:52)

3 resamb_LAMBDA : nx=98
3 ddmat: gps=1/1 glo=1/1
3 refSats= 10 10 10 10 10 10 10 33 33 33 33 33
3 fixSats= 13 15 18 20 21 22 29 39 40 54 55 56
3 N(0)= -1021543.029 -583342.057 -321520.028 -1128921.020 -351484.040 -1244853.987 -1342905.016 -1572711.928 -785246.982 -1482008.949 -496531.972 151149.020
3 N(1)= -1021543.000 -583342.000 -321520.000 -1128921.000 -351484.000 -1244854.000 -1342905.000 -1572712.000 -785247.000 -1482009.000 -496532.000 151149.000
3 N(2)= -1021543.000 -583342.000 -321520.000 -1128921.000 -351484.000 -1244854.000 -1342905.000 -1572712.000 -785247.000 -1482008.000 -496532.000 151149.000
3 resamb : validation ok (nb=12 ratio=53.40 s=473.13/25263.69)

 

Everything looks as we would expect, so again that is encouraging.

Next I made a comparison using my four standard metrics for four solutions using different AR settings. The first is no AR, the second is GPS AR, the third is GLONASS AR, and the fourth is GPS AR+GLONASS AR. The plot is below, the x-axis is the four cases.

glometrics1.png

All looks good except that there is a small increase in the std(dist) metric for GLONASS AR which is measuring position error in the solution. This is not surprising and probably would be concerning if it were not so. The code has made a single guess at the inter-channel biases and so we would not expect to get them precisely correct. Any error in our inter-channel bias estimates will translate directly into error in the fixed solutions. If we want to improve the accuracy of the GLONASS fixed solutions we will need to improve the estimates of the inter-channel biases. This should not be difficult if we iterate the estimations over multiple epochs. Even without improved accuracy, we should still get increased robustness from having more satellites to use for the fixed solution. Note the increase from 8 to 13 satellites used on average for the fixed solution in the bottom right plot. Having more satellites available for the fixed solution should improve our ability to stay locked.  To take advantage of this, however, we may need to exclude individual satellites from being used for ambiguity resolution or break the satellites into multiple sets since with the current algorithm a single bad satellite can prevent getting a fix.

Anybody have experience with using this sort of approach to estimate the inter-channel biases?  Any obvious pitfalls I’m missing?  Any other thoughts?

This is still just an experiment, not meant to be a functional feature yet, but I’ve posted the code to my Github repository in the exp1 branch in case anyone else would like to play with these options.  The config file in the exp1\data\demo1 folder is setup to run with these two features both enabled.

RTKLIB: Thoughts on Fix-and-Hold

In a previous post I briefly discussed the difference in RTKLIB between the “Continuous” mode and “Fix-and-Hold” mode for handling integer ambiguity resolution. In this post, I’d like to dig a little deeper into how fix-and-hold works and discuss some of its advantages and disadvantages, then introduce a variation I call “Fix-and-Bump”.

Each satellite has a state in the kalman filter to estimate it’s phase-bias. The phase-bias is the number of carrier-phase cycles that needs to be added to the carrier-phase observation to get the correct range. By definition, the actual phase-biases must all be integers, but their estimates will be real numbers due to errors and approximations. Each epoch, the double-difference range residuals are used to update the phase-bias estimates. After these have been updated, the integer ambiguity resolution algorithm evaluates how close the phase-bias estimates (more accurately, the differences between the phase-bias estimates) are to an integer solution. In general, the closer the estimates are to an integer solution, the more confidence we have that the solution is valid. This is because all the calculations are done with real numbers and there is no inherent preference in the calculations towards integer results.

It is important to understand that the validation process (integer ambiguity resolution) used to determine if the result is high quality (fixed) or low quality (float) is based entirely on how far it deviates from a set of perfect integers and not at all on any inherent accuracy. It doesn’t matter how wrong a result may be, if it is all integers, then the integer ambiguity resolution will deem it a high quality solution. We rely on the fact that it is very unlikely that an erroneous solution will align precisely to any set of integers.

Therefore, for the fixed solution vs float solution distinction to be reliable, it is key that we don’t violate the assumption that there is no inherent preference in the calculations toward integer results. We can use our extra knowledge that the answer should be a set of integers either to improve our answer or to verify our answer, but it is not really valid to do both. Unfortunately, that is exactly what fix-and-hold does. It uses the difference between the fixed and float solutions to push the phase-bias estimates toward integer values, then uses integer ambiguity resolution to determine how good the answer is based on how close to integers it is. This will usually improve the phase-bias estimates since it is very likely the chosen integers are the correct actual biases, but it will always improve the confidence in the result as determined by the integer ambiguity resolution, even when the result is wrong. Thus we have effectively short-circuited the validation process by adding a preference in the calculations for integer results.

This does not mean fix-and-hold is a bad thing, in fact most of the time it will improve the solution. We just need to be aware that we have short-circuited the validation process and have compromised our ability to independently verify the quality of the solution. It also means that it is easy to fool ourselves into thinking that fix-and-hold is helping more than it really is. For example, two of the metrics I have been using to evaluate my solutions, percent fixed solutions, and median AR ratio, are relatively meaningless once we have turned it on since they may improve regardless of the actual quality of the solution. The value of a particular solution is a combination of its accuracy and our confidence in that accuracy. Fix-and-hold gives up confidence in the solution in exchange for improvement in its accuracy.

Is it possible that fix-and-hold is too much of a good thing? Once it is turned on the phase-bias adjustments towards the integer solutions are done every epoch, tightly constraining the phase-biases and making it very difficult to “let go” of an incorrect solution. What if the fix-and-hold adjustments were done only for a single epoch when we first meet the enabling criteria, then turned off not used again until we lose our fix and meet the criteria again? Instead of “holding” on to the integer solution after a fix, we “bump” the solution in the right direction, then let go. I will call this “Fix-and-Bump”. Note that the “cheating” effect of adjusting the biases will be retained in the phase-bias states so turning off fix-and-hold is not enough to immediately re-validate the integer ambiguity resolution. I suspect however that erroneous fixes will not be stable and without the continuous adjustments from fix-and-hold, they will drift away from the integer solutions fairly quickly.

A good analogy might be an electric motor in a circuit with a 5 amp fuse and an 8 amp starting current. Short-circuiting the fuse (fix-and-hold) will make the motor run but a malfunction (erroneous fix) after the motor has started could cause a fire. Short-circuiting the fuse just long enough to start the motor (fix-and-bump) would also make the motor run but would be much safer in the case of a malfunction (erroneous fix) after the motor has started.

Below I have plotted the fractional part one of the phase-bias estimate differences for the three cases. The discontinuity at 180 seconds is where fix-and-hold and fix-and-bump were enabled. The plot on the right is just a zoomed-in version of the plot on the left.

phbias1a

Here is the solution stats for each run. (1=float, 2=fix,3=hold)

phbias1b.png

As you can, even a single adjustment (fix-and-bump) gives us an estimate of the phase-bias that is very close to what we get if we make adjustments every epoch (fix-and-hold). This supports the idea that it may be possible to get most of the benefit of fix-and-hold while avoiding some of the downside.

At this point, this is more of a thought exercise, than a proposal to change RTKLIB but I think it may be worth considering. I hope to evaluate this change further when I have more data sets to look at.

I will use this feature though in my next post, though, which is one reason I’ve devoted a fair bit of time to it here. In that post I will extend fix-and-bump to the GLONASS satellites. Instead of using the feedback short-circuit path (as I’ve called it) to improve the phase-bias estimates, I will use it to estimate the inter-channel biases, then enable GLONASS integer ambiguity resolution. Not having the inter-channel bias values is what currently prevents integer ambiguity resolution (gloarmode) for the GLONASS satellites from working. Fix-and-bump should provide better evaluation of the quality of the combined GPS/GLONASS solutions.  That’s enough for now … I’ll leave the rest to the next post.

Note:  Readers familiar with the solution algorithms will notice that I have left out certain details and that some of my statements are not quite precisely correct. For the most part this was done intentionally to keep things focused but I believe the gist of the argument is valid even when all the details are included. Please let me know if you think I have left anything important out.

Evaluating results: Moving base vs fixed base

 

In the last few posts, I have been focusing on the relative position solution between two receivers where both are moving but fixed relative to each other rather than the more typical scenario where one receiver is fixed (the base) and the other is moving (the rover). As described in an earlier post, I do this because it makes it possible to accurately measure the error in the solution when we don’t have an absolute trajectory reference to compare to.

Normally, though, from a functional viewpoint, we are more interested in the second case where one receiver is fixed and the other is moving. In this post I will spend a little time looking at how we can evaluate the results for that scenario.

Since we don’t know what our exact trajectory was when we collected the data, we can not measure our solution error directly like we can in the first case. However, we can look for consistency between multiple solutions calculated for the same rover data, but with varying input configurations and differing base data and satellites. The solution should not vary by more than our error tolerance no matter how much we change the input configurations.  If we believe a solution is accurate to a couple of centimeters then it would be very disturbing to find it moved 50 cm if we used the same rover data but adjusted something in the input configuration, satellites, or base data, particularly if the differences occur for epochs with fixed solutions.  We cannot know which of the two solutions is incorrect but we do know that at least one of them must have large undetected errors in it, which means we really can’t trust either solution. Of course, if the two solutions are equal it does not guarantee that they are correct, they may just both be wrong in the same way. However, the more we can vary the input conditions and still get the same answer, the more confident we will start to have in the solution.

To test the consistency of solution for the previous data set, I processed the data from the two receivers in a different way. Instead of measuring the relative distance between the two receivers, I measured the distance between each receiver and two independent base stations, giving me a total of four solutions, two for each Ublox receiver Since the two base receivers are in different locations the solutions will be different, but if we ask RTKLIB to calculate absolute position rather than relative distance then we can compare the two directly. By changing the out-solformat parameter in the input config file from “enu” to “xyz”, RTKLIB will add the absolute location of the base station to the solution, converting it from relative to absolute. With this change, the two solutions calculated from different base stations should be identical.

For the two base receivers, I used data from two nearby stations in the NOAA CORS network, TMG0 which is about 5 km away from where I collected the data and ZDV1 which is about 13 km away. TMG0 data includes Glonass satellites, ZDV does not so this also helps differentiate the two solutions by giving them different but overlapping satellite sets. To further increase the difference between the two satellite sets, I changed the minimum elevation variables in RTKLIB for the TMG station from 15 degrees to 10 degrees while leaving them at 15 degrees for the ZDV station. This will add the low elevation satellites to the TMG solution.

Now we can compare the two solutions for each Ublox receiver, each calculated with different base stations and different sets of satellites. If everything is working properly they should be identical within our expected error tolerance. Since our baseline distances between base and rover have increased from roughly 30 cm in the previous analysis to 5-13 km in this case we will expect the errors to be somewhat larger than they were before.

The plots below show the results of this experiment. The plots on the left are from the Ebay receiver and the plots on the right are from the CSG receiver. The first row of plots show the ground tracks for both solutions overlayed on top of each other, the second row shows position in x, y, and z directions, again for both solutions overlayed on top of each other. The x and y directions are indistinguishable at this scale but we do see some error in the z direction. The third row of plots shows the difference between the two solutions. We can interpret these differences as a lower bound for our error. As we would expect with the longer baselines, the errors are somewhat larger than before, but at least the x and y errors are generally still quite reasonable, most of them falling between +/- 2 cm for epochs with fixed solutions (green). The z errors are somewhat larger but fortunately we usually care less about this direction. I have made no attempt to reduce the ephemeris errors, either through SBAS or precise orbit files. Presumably, at these longer baselines, doing so would help reduce the errors.

sol18a

sol18b

sol18c

So, how do we interpret these results? I believe this analysis is consistent with the previous analysis. It is not as conclusive since we have no absolute error measurement, but since the solutions are more similar in format to how they would be done in practice, it should give us more overall confidence in our results.

Maybe more important, it also provides a baseline for using similar consistency analysis on other data sets where we often won’t have the luxury of having two receivers tracking the same object.

Let me add a few words to help put these results in context in case you are comparing them with other data sets.  This data was all taken in very open skies with fairly low velocities (<6 m/sec) so it is relatively easy data to get a good solution.  It will be more difficult to get similar results for data taken with obstructions from trees or buildings, or that includes high velocities or accelerations or high vibration.  In particular, there were very few cycle slips in this data.   None of the modifications described here will help if the data has a large number of cycle slips.

Also remember that this data was taken with very low cost hardware (the Ebay receiver cost less than $30 for receiver and antenna combined) so the data will be lower quality relative to data taken under similar conditions with more expensive hardware, especially higher quality antennas.  

That is the goal of my project, though, to get precise, reliable positioning with ultra-low cost hardware under benign environmental conditions. 

 

 

 

 

Improving RTKLIB solution: Fix-and-Hold

Integer ambiguity resolution is done by RTKLIB after calculating a float solution to the carrier phase ambiguities to attempt to take advantage of the fact that the carrier phase ambiguities must be integers. Constraining the solution to integers improves both the accuracy and convergence time.

RTKLIB provides three options for the integer ambiguity resolution: instantaneous, continuous, and fix-and-hold. “Instantaneous” is the most conservative and relies on recalculating the phase bias estimates every epoch. “Continuous” is the default option and uses the kalman filter to estimate the phase biases continuously over many epochs. In general, this provides much less noisy estimates of the phase biases and is essential for working with low cost receivers. The downside of the continuous solution is that if bad data is allowed to get into the filter outputs it can remain for multiple epochs and corrupt subsequent solutions, so some care must be taken to avoid that.

Fix-and-Hold” takes the concept of feeding information derived from the current epoch forward to subsequent epochs one step farther. In the “Continuous” method, only the float solution is used to update the phase bias states in the kalman filter. In the “Fix-and-Hold” method, an additional kalman filter update is done using pseudo-measurements derived from the fixed solution. This is only done if the fixed solution is determined to be valid. All of this is explained in greater detail in Appendix E.7(5) of the RTKLIB user manual.

In general, taking advantage of all the information we have this epoch to improve the solution for the following epochs seems to be a good idea, especially given the high levels of noise in the low-cost receivers and antennas. Even more so than in the “Continuous” method, however, we need to be careful about feeding erroneous data into the kalman filter which may take a long time to work its way out.

Testing the “Fix-and-Hold” method on a few data sets shows the results we would expect, the solution usually does improve but can occasionally cause it to have difficulty recovering from a period of bad data where we have corrupted the phase-bias estimates.

To minimize the chance of this happening, we need to do a better job of validating the solution before feeding it forward. Looking at a couple examples of bad data, what I find is that the errors are introduced when solutions are derived from a very small number of valid satellites. The current integer ambiguity validation is based only on the ratio of residuals between the best solution and the second best solution and does not take into account the number of satellites used to find the solutions.

I have introduced two new input parameters, “minfixsats”, and “minholdsats” into the code. These set the number of satellites required to declare a fixed solution as valid, and the number of satellites required to implement the fix-and-hold feature. I have found setting the minimum number of satellites for fix to 4 and fix-and-hold to 5 seems to work quite well for my data, but I have made them input parameters in the config file to allow them to be set to any value.

I won’t include all the code here, but it is available on GitHub at https://github.com/rtklibexplorer/RTKLIB on the demo1 branch

The two key changes are:

For minfix, in resamb_LAMBDA() from:

if ((nb=ddmat(rtk,D))<=0) {
errmsg(rtk,”
no valid double-difference\n”);

to:

if ((nb=ddmat(rtk,D))<(rtk->opt.minfixsats-1)) { // nb is sat pairs
    errmsg(rtk,”not enough valid double-differences\n”);

and for minhold, in holdamb() from:

if (nv>0) {
    R=zeros(nv,nv);
    for (i=0;i<nv;i++) R[i+i*nv]=VAR_HOLDAMB;

    /* update states with constraints */
    if ((info=filter(rtk->x,rtk->P,H,v,R,rtk->nx,nv))) {
    errmsg(rtk,”filter error (info=%d)\n”,info);
}

to:

if (nv>=rtk->opt.minholdsats-1) { // nv=sat pairs, so subtract 1
    R=zeros(nv,nv);
    for (i=0;i<nv;i++) R[i+i*nv]=VAR_HOLDAMB;

    /* update states with constraints */
    if ((info=filter(rtk->x,rtk->P,H,v,R,rtk->nx,nv))) {
    errmsg(rtk,”filter error (info=%d)\n”,info);
}

Again, the changes to the position solution are subtle and difficult to see in a plot, but are much more obvious in the Ratio Factor for AR validation. A higher ratio represents greater confidence in the solution. The green is before the changes, the blue is after. The RTKLIB code limits the AR ratio to a maximum of 1000.

ar2

As you can see, there is a very significant increase in the AR ratio with the change.  The improvement is coming from switching the ambiguity resolution from continuous to fix-and-hold, not from the code changes which we made to try and reduce the occasional occurence of corrupting the filter states with bad data.

Let’s also look at the metrics described earlier for both this change and the bug fix described in the previous post. On the x-axis, 1-3 are from before the two previous changes, 4 is with the bias sum bug fix and 5 is with fix-and-hold and the min sat thresholds for fix and hold.

metrics2

From the metric plots you can see that at this point we have maxed out the % fixed solutions and median AR ratio metrics and have got the distance metric down below 0.5 cm so I will stop here and declare success, at least for this data set.

If you’d like to try this code with your data, the code for all the changes I have made so far are available at https://github.com/rtklibexplorer/RTKLIB on the demo1 branch.

If anyone does try this code on their own data sets please let me know how it goes. I’m hoping these changes are fairly universally useful, at least for open skies and low velocities, but the only way to know is to test it on multiple data sets.

Update 4/17/16:  Enabling the “fix-and-hold” feature on other data sets has had mixed success.  I still see “fix-and-hold” locking to bad fixes and corrupting the states.  It looks like the extra checks I added are not sufficient to prevent this.  So, for know,I recommend turning off this feature, or if you do use it, be aware that it can cause problems.  I hope to take another look at this soon.

Improving RTKLIB solution: Phase-bias sum error

While working through the code to add comments as described in the last post, I stumbled across what looks like another bug. This one is more subtle than the previous bug (arlockcnt) and only has a small effect on the results, but still I thought it was worth discussing.

During each epoch, RTKLIB estimates the phase-bias for each satellite using the differences between the raw carrier-phase measurements and the raw pseudorange measurements. In a perfect system, these should always differ by an integer number of cycles because the carrier-phase measurement have an uncertainty in the number of cycles whereas the pseudorange measurements do not. These phase-bias estimates are then used to update the kalman filter.

Before updating the kalman filter however, RTKLIB calculates the common component between all satellites and subtracts this off of each phase-bias states (the kalman filter outputs) My guess is that this code is left over from the GPS-only days where all satellites operated at the same frequency since the estimates are all in units of cycles. As long as the frequencies of each satellite are identical, it is fine to add the cycles from one satellite to another, but this doesn’t work anymore once the satellite cycles are of different frequencies.

My code modification simply converts the biases to meters first using the wavelength for each satellite before summing them and then converting back to cycles for each satellite. The changes all occur in the udbias() routine. The following lines of code:

1) bias[i]=cp-pr/lami;

2) offset+=bias[i]-rtk->x[IB(sat[i],f,&rtk->opt)];

3) if (rtk->x[IB(i,f,&rtk->opt)]!=0.0) rtk->x[IB(i,f,&rtk->opt)]+=offset/j;

become

1) bias[i]=cp*lami-pr;

2) lami=nav->lam[sat[i]-1][f];
    offset+=bias[i]-rtk->x[IB(sat[i],f,&rtk->opt)]*lami;

3)   if (rtk->x[IB(i,f,&rtk->opt)]!=0.0) {
       lami=nav->lam[i-1][f];
      rtk->x[IB(i,f,&rtk->opt)]+=offset/lami/j;
     }

This improves the solution slightly but is most obvious when the receiver does a clock update. Since the receivers use inexpensive crystal clocks, unlike the satellites which use atomic clocks, there is a continuous drift between the two. At some point, when the difference between these two clocks gets too large, the receiver will update its clock to remove the error. Most of the time the error introduced by this bug is too small to easily pick out of the noise, but when there is a large adjustment from the clock correction it becomes quite obvious and shows up as a large spike in the residuals. Adding this code change eliminates the spikes from the residuals.

While exploring this issue, I modified the code that outputs to the state file to also output the phase-biases since I felt they provided a fair bit of insight to what was happening in the solution. What I found however when looking at these phase-bias outputs is that they are dominated by this common term (presumably some sort of clock bias) and it is difficult to see the behavior of the individual phase biases. To avoid this problem I made another modification to the code. Instead of adding the common component to every phase-bias state, I saved it in a separate common bias variable and used this when initializing phase-biases for new satellites. Since all position calculations are done with the difference of the phase-biases and not the phase-biases themselves, this change does not have any effect on the output solution. It does however remove the large common variation from the phase-bias states and makes them easier to analyze.

Here are the residuals before and after the code modification (zoomed way out to see the spike).

res1ares1b

The position solution doesn’t change much, the improvement is more obvious in the confidence of the solution which can be seen in the Ratio Factor for AR validation. Here it is before (green) and after (blue) the code modification.

ar1

As you can see there is a small but persistent increase in the AR ratio with this change.

Anyone else run across this issue and solved it in a different way? Or believe this is not a bug, and is actually the intended behavior? Or possibly know of other cases where cycles of different wavelengths are not handled properly in RTKLIB?

RTKLIB: Code comments + new Github repository

If you’ve spent any time perusing the RTKLIB source code you will surely be aware that it is very much what I would call “old-school” code. Comments are sparse and variable names tend to be short and not always very descriptive. The upside is that the code is quite compact, significant pieces of it can be seen on a single page which can sometimes be quite nice. The downside is it can be difficult to follow if you have not been poring over it for a long time. To help myself understand the code I have added a fair bit of commenting to most of the code used to calculate the carrier-phase solution, which is really the heart of the RTKLIB kinematic solution.  This is mostly in the rtkpos.c file.

The changes are too extensive to spell out here so I have forked the RTKLIB code in Github and posted my commented code there. If you are interested, you can download it from the demo1 branch at https://github.com/rtklibexplorer/RTKLIB.  

All the previous code changes are also there as well as a couple I have not yet posted about.  The receiver data I have been using for these posts is also there in the rtklib/data/demo1 folder.  The solution file updates for VS2010 c++ are in the rtklib/app/rnx2rtkp/msc folder and the modified executable is in the rtklib/bin folder.