RTKLIB: GLONASS Ambiguity Resolution

Several readers now have suggested that adding inter-channel bias calibration to RTKLIB would be a worthwhile enhancement so I thought I would give it a shot, or at least explore the idea.

Inter-channel biases are what make integer ambiguity resolution more difficult on the GLONASS satellites than on the GPS satellites. In the GPS system, double-differencing the carrier-phase observations is a very powerful trick that causes the largest sources of error to all cancel each other out. These errors include the satellite and receiver clock biases, as well as the majority of the ionospheric and tropospheric delays and ephemeris errors. The remaining errors are quite small, meaning that the estimates of the unknown phase-biases we are trying to solve for will be quite close to integer values. We can then use integer ambiguity resolution to select the most likely set of integers, thus rejecting most of the remaining error.

Unfortunately, unlike the GPS system in which all satellites operate at the same frequency, in the GLONASS system, the satellites use a set of multiple frequencies. The different frequencies introduce an additional set of errors from the receiver into the solution, called the inter-channel biases. There will be a different bias for each frequency but the error between two satellites should be roughly proportional to the frequency difference between the two satellites. For a more complete discussion, see here

A quite detailed description of at least one method to solve for the inter-channel biases can be found here. Fitting this solution into the existing RTKLIB architecture, however, looks to be quite challenging and more than I really want to take on, but I suspect this might be the “right” way to do it.

Instead, I decided to try a “feedback short-circuit” similar to what is is used in the RTKLIB fix-and-hold feature. If you haven’t already done so, you might want to read my previous post in which I discuss that feature and a modification I made to it I call “fix-and-bump” which I will use here.

Fix-and-hold uses the fact that we know the correct answer must be a set of integers to push the phase-bias estimates towards integer values, using the difference between fixed and float solutions as feedback, rather than deriving it from physical principles. My premise is that just as we can blindly adjust the phase-biases estimates based on the difference between fixed and float solutions, we ought be able to do something similar for the inter-channel biases.

The adjustments made to the phase-biases during fix-and-hold are done in the holdamb() function in the rtkpos.c file in RTKLIB. This function is called only when we have a high confidence in our fixed solution, to avoid chasing an incorrect solution. The key line of code from this function is

    v[nv]=(xa[index[0]]-xa[index[i]])-(rtk->x[index[0]]-rtk->x[index[i]]);

where:

v[] is a pseudo-innovation (or pseudo measurement residual) used to update the     kalman filter
xa [] is the set of fixed solution phase-bias estimates
rtk->x[] is the set of float solution phase-bias estimates
index[i] is the satellite to update
index[0] is the reference satellite.

So what this line of code does is generate a double-differenced error input to the kalman filter for each satellite based on the difference between the fixed solution phase-biases and the float solution phase-biases. This will drive the phase-bias states towards a set of integers, with all the advantages and disadvantages as discussed in the previous post.

So far I have just reviewed what is in the existing RTKLIB code. Now lets talk about an addition to the fix-and-hold code for the GLONASS satellites. It is reasonable to drive the phase-bias estimates for the GPS satellites towards integer values because we have accounted for all the significant errors (mostly through cancellation in the double-differences) and so we expect them to be integers. On the other hand, we can not expect the phase-bias estimates for the GLONASS satellites to converge towards integer values since we have not accounted for the errors from the inter-channel biases. These are different for each satellite and hence do not cancel. So, in order to account for them, I add an additional variable for each satellite to hold the estimated inter-channel biases. These values are subtracted from the measurement residuals every epoch when calculating the kalman filter innovations, just as the other error terms are. To do things properly I should make them kalman filter states, but for now they are just stand-alone variables.

The inter-channel bias estimates are initialized to zero and then only adjusted during a fix-and-hold update. The new update is done after the existing fix-and-hold updates are made to the phase-biases for the satellites with fixed solutions. At this point, we assume that all the remaining error in the GLONASS phase-bias estimates is coming from the inter-channel biases, so we move that error from one to the other. To make the new update, the fractional components of the GLONASS phase-bias estimates are removed (subtracted) from the phase-bias estimates and moved (added) to the inter-channel bias variables. Ideally we would subtract the fixed solutions from the float solutions as we did with the GPS satellites, but since we don’t have fixed solutions yet for the GLONASS satellites, we use our best guess for the fixed solution, which is just the closest integer. In a final implementation, this process would be iterated over multiple epochs to improve the estimates but at this point I am only making a single adjustment to simply evaluation of the solution.

These are the key lines of code added to holdamb() to do this:

dd=rtk->x[IB(j+1,f,&rtk->opt)]-rtk->x[IB(i+1,f,&rtk->opt)];
dd=dd-ROUND(dd); // throw out integer component
rtk->x[IB(j+1,f,&rtk->opt)]-=dd; // remove fractional part from phase bias
rtk->ssat[j].icbias[f]+=dd; // and move to IC bias

With the inter-channel biases now accounted for, we can turn on integer ambiguity resolution for the GLONASS satellites (gloarmode=1) Since we have forced the phase-bias estimates to integers, the integer ambiguity resolution criteria will always be satisfied initially, regardless of the quality of the estimates. If we then let the system run without any more fix-and-hold updates, meaning we have removed any preference in the solution calculations for integer results, the solution should hopefully either stay converged if it correct, or drift away from integer values if it is not. To do this,I modified fix-and-hold to occur only once (“fix-and-bump”) instead of every epoch, as described in my previous post.

So how well did it work? Below are some plots taken after making the code changes described above. The first plots are for a baseline taken with GPS ambiguity resolution enabled and GLONASS disabled, as RTKLIB is typically run. I have modified the meanings of green and yellow in the RTKPLOTs of residuals. Green now represents a satellite that was successfully used for ambiguity resolution, yellow represents a satellite that was valid but not successfully used for ambiguity resolution. If no fixed solution was found, then all satellites will be yellow. The GPS satellites are on the left, the GLONASS satellites on the right. You can see the GPS satellites are yellow during the initial lock-on and occasionally after a cycle slip but otherwise stay green, while the GLONASS satellites are always yellow because ambiguity resolution is disabled for GLONASS.

gloamb1

Now I make a second run with ambiguity resolution disabled for GPS (left) and enabled for GLONASS (right). I do this only for evaluation purposes, normally we would run with both enabled. Note that we still require GPS ambiguity resolution to calibrate the inter-channel biases which shows up below at the start of the plots.

gloamb2

As you can see, the solution remains fixed when using only the GLONASS satellites for ambiguity resolution for a significant period of time and the residuals remain small. This is very encouraging.

To further verify things are working the way we think they are, lets look at the output traces for three runs, GPS AR only, GLONASS AR only, and GPS+GLONASS AR. I’ve added some trace output to list the satellite pairs used to make it easier to follow. RefSats are the reference satellites for each double difference pair, fixSats are the other satellite in each pair. N(0) is the float solution, N(1) is the fixed solution, and ratio is the AR ratio. All three outputs are for the same epoch but in three different runs. I picked a random epoch from the middle of the run.

Integer ambiguity resolution with GPS satellites (17:56:52)

3 resamb_LAMBDA : nx=98
3 ddmat: gps=1/1 glo=0/0
3 refSats= 10 10 10 10 10 10 10
3 fixSats= 13 15 18 20 21 22 29
3 N(0)= -1021543.012 -583342.043 -321520.024 -1128921.004 -351484.032 -1244853.988 -1342905.008
3 N(1)= -1021543.000 -583342.000 -321520.000 -1128921.000 -351484.000 -1244854.000 -1342905.000
3 N(2)= -1021543.000 -583342.000 -321520.000 -1128921.000 -351484.000 -1244854.000 -1342904.000
3 resamb : validation ok (nb=7 ratio=160.92 s=132.06/21251.08)

 

Integer ambiguity resolution with GLONASS satellites (17:56:52)

3 resamb_LAMBDA : nx=98
3 ddmat: gps=0/0 glo=1/1
3 refSats= 33 33 33 33 33
3 fixSats= 39 40 54 55 56
3 N(0)= -1572711.936 -785246.994 -1482008.966 -496531.985 151149.024
3 N(1)= -1572712.000 -785247.000 -1482009.000 -496532.000 151149.000
3 N(2)= -1572711.000 -785247.000 -1482008.000 -496532.000 151149.000
3 resamb : validation ok (nb=5 ratio=165.97 s=38.82/6443.06)

 

Integer ambiguity resolution with GPS and GLONASS satellites (17:56:52)

3 resamb_LAMBDA : nx=98
3 ddmat: gps=1/1 glo=1/1
3 refSats= 10 10 10 10 10 10 10 33 33 33 33 33
3 fixSats= 13 15 18 20 21 22 29 39 40 54 55 56
3 N(0)= -1021543.029 -583342.057 -321520.028 -1128921.020 -351484.040 -1244853.987 -1342905.016 -1572711.928 -785246.982 -1482008.949 -496531.972 151149.020
3 N(1)= -1021543.000 -583342.000 -321520.000 -1128921.000 -351484.000 -1244854.000 -1342905.000 -1572712.000 -785247.000 -1482009.000 -496532.000 151149.000
3 N(2)= -1021543.000 -583342.000 -321520.000 -1128921.000 -351484.000 -1244854.000 -1342905.000 -1572712.000 -785247.000 -1482008.000 -496532.000 151149.000
3 resamb : validation ok (nb=12 ratio=53.40 s=473.13/25263.69)

 

Everything looks as we would expect, so again that is encouraging.

Next I made a comparison using my four standard metrics for four solutions using different AR settings. The first is no AR, the second is GPS AR, the third is GLONASS AR, and the fourth is GPS AR+GLONASS AR. The plot is below, the x-axis is the four cases.

glometrics1.png

All looks good except that there is a small increase in the std(dist) metric for GLONASS AR which is measuring position error in the solution. This is not surprising and probably would be concerning if it were not so. The code has made a single guess at the inter-channel biases and so we would not expect to get them precisely correct. Any error in our inter-channel bias estimates will translate directly into error in the fixed solutions. If we want to improve the accuracy of the GLONASS fixed solutions we will need to improve the estimates of the inter-channel biases. This should not be difficult if we iterate the estimations over multiple epochs. Even without improved accuracy, we should still get increased robustness from having more satellites to use for the fixed solution. Note the increase from 8 to 13 satellites used on average for the fixed solution in the bottom right plot. Having more satellites available for the fixed solution should improve our ability to stay locked.  To take advantage of this, however, we may need to exclude individual satellites from being used for ambiguity resolution or break the satellites into multiple sets since with the current algorithm a single bad satellite can prevent getting a fix.

Anybody have experience with using this sort of approach to estimate the inter-channel biases?  Any obvious pitfalls I’m missing?  Any other thoughts?

This is still just an experiment, not meant to be a functional feature yet, but I’ve posted the code to my Github repository in the exp1 branch in case anyone else would like to play with these options.  The config file in the exp1\data\demo1 folder is setup to run with these two features both enabled.

Advertisements

RTKLIB: Thoughts on Fix-and-Hold

In a previous post I briefly discussed the difference in RTKLIB between the “Continuous” mode and “Fix-and-Hold” mode for handling integer ambiguity resolution. In this post, I’d like to dig a little deeper into how fix-and-hold works and discuss some of its advantages and disadvantages, then introduce a variation I call “Fix-and-Bump”.

Each satellite has a state in the kalman filter to estimate it’s phase-bias. The phase-bias is the number of carrier-phase cycles that needs to be added to the carrier-phase observation to get the correct range. By definition, the actual phase-biases must all be integers, but their estimates will be real numbers due to errors and approximations. Each epoch, the double-difference range residuals are used to update the phase-bias estimates. After these have been updated, the integer ambiguity resolution algorithm evaluates how close the phase-bias estimates (more accurately, the differences between the phase-bias estimates) are to an integer solution. In general, the closer the estimates are to an integer solution, the more confidence we have that the solution is valid. This is because all the calculations are done with real numbers and there is no inherent preference in the calculations towards integer results.

It is important to understand that the validation process (integer ambiguity resolution) used to determine if the result is high quality (fixed) or low quality (float) is based entirely on how far it deviates from a set of perfect integers and not at all on any inherent accuracy. It doesn’t matter how wrong a result may be, if it is all integers, then the integer ambiguity resolution will deem it a high quality solution. We rely on the fact that it is very unlikely that an erroneous solution will align precisely to any set of integers.

Therefore, for the fixed solution vs float solution distinction to be reliable, it is key that we don’t violate the assumption that there is no inherent preference in the calculations toward integer results. We can use our extra knowledge that the answer should be a set of integers either to improve our answer or to verify our answer, but it is not really valid to do both. Unfortunately, that is exactly what fix-and-hold does. It uses the difference between the fixed and float solutions to push the phase-bias estimates toward integer values, then uses integer ambiguity resolution to determine how good the answer is based on how close to integers it is. This will usually improve the phase-bias estimates since it is very likely the chosen integers are the correct actual biases, but it will always improve the confidence in the result as determined by the integer ambiguity resolution, even when the result is wrong. Thus we have effectively short-circuited the validation process by adding a preference in the calculations for integer results.

This does not mean fix-and-hold is a bad thing, in fact most of the time it will improve the solution. We just need to be aware that we have short-circuited the validation process and have compromised our ability to independently verify the quality of the solution. It also means that it is easy to fool ourselves into thinking that fix-and-hold is helping more than it really is. For example, two of the metrics I have been using to evaluate my solutions, percent fixed solutions, and median AR ratio, are relatively meaningless once we have turned it on since they may improve regardless of the actual quality of the solution. The value of a particular solution is a combination of its accuracy and our confidence in that accuracy. Fix-and-hold gives up confidence in the solution in exchange for improvement in its accuracy.

Is it possible that fix-and-hold is too much of a good thing? Once it is turned on the phase-bias adjustments towards the integer solutions are done every epoch, tightly constraining the phase-biases and making it very difficult to “let go” of an incorrect solution. What if the fix-and-hold adjustments were done only for a single epoch when we first meet the enabling criteria, then turned off not used again until we lose our fix and meet the criteria again? Instead of “holding” on to the integer solution after a fix, we “bump” the solution in the right direction, then let go. I will call this “Fix-and-Bump”. Note that the “cheating” effect of adjusting the biases will be retained in the phase-bias states so turning off fix-and-hold is not enough to immediately re-validate the integer ambiguity resolution. I suspect however that erroneous fixes will not be stable and without the continuous adjustments from fix-and-hold, they will drift away from the integer solutions fairly quickly.

A good analogy might be an electric motor in a circuit with a 5 amp fuse and an 8 amp starting current. Short-circuiting the fuse (fix-and-hold) will make the motor run but a malfunction (erroneous fix) after the motor has started could cause a fire. Short-circuiting the fuse just long enough to start the motor (fix-and-bump) would also make the motor run but would be much safer in the case of a malfunction (erroneous fix) after the motor has started.

Below I have plotted the fractional part one of the phase-bias estimate differences for the three cases. The discontinuity at 180 seconds is where fix-and-hold and fix-and-bump were enabled. The plot on the right is just a zoomed-in version of the plot on the left.

phbias1a

Here is the solution stats for each run. (1=float, 2=fix,3=hold)

phbias1b.png

As you can, even a single adjustment (fix-and-bump) gives us an estimate of the phase-bias that is very close to what we get if we make adjustments every epoch (fix-and-hold). This supports the idea that it may be possible to get most of the benefit of fix-and-hold while avoiding some of the downside.

At this point, this is more of a thought exercise, than a proposal to change RTKLIB but I think it may be worth considering. I hope to evaluate this change further when I have more data sets to look at.

I will use this feature though in my next post, though, which is one reason I’ve devoted a fair bit of time to it here. In that post I will extend fix-and-bump to the GLONASS satellites. Instead of using the feedback short-circuit path (as I’ve called it) to improve the phase-bias estimates, I will use it to estimate the inter-channel biases, then enable GLONASS integer ambiguity resolution. Not having the inter-channel bias values is what currently prevents integer ambiguity resolution (gloarmode) for the GLONASS satellites from working. Fix-and-bump should provide better evaluation of the quality of the combined GPS/GLONASS solutions.  That’s enough for now … I’ll leave the rest to the next post.

Note:  Readers familiar with the solution algorithms will notice that I have left out certain details and that some of my statements are not quite precisely correct. For the most part this was done intentionally to keep things focused but I believe the gist of the argument is valid even when all the details are included. Please let me know if you think I have left anything important out.

Evaluating results: Moving base vs fixed base

 

In the last few posts, I have been focusing on the relative position solution between two receivers where both are moving but fixed relative to each other rather than the more typical scenario where one receiver is fixed (the base) and the other is moving (the rover). As described in an earlier post, I do this because it makes it possible to accurately measure the error in the solution when we don’t have an absolute trajectory reference to compare to.

Normally, though, from a functional viewpoint, we are more interested in the second case where one receiver is fixed and the other is moving. In this post I will spend a little time looking at how we can evaluate the results for that scenario.

Since we don’t know what our exact trajectory was when we collected the data, we can not measure our solution error directly like we can in the first case. However, we can look for consistency between multiple solutions calculated for the same rover data, but with varying input configurations and differing base data and satellites. The solution should not vary by more than our error tolerance no matter how much we change the input configurations.  If we believe a solution is accurate to a couple of centimeters then it would be very disturbing to find it moved 50 cm if we used the same rover data but adjusted something in the input configuration, satellites, or base data, particularly if the differences occur for epochs with fixed solutions.  We cannot know which of the two solutions is incorrect but we do know that at least one of them must have large undetected errors in it, which means we really can’t trust either solution. Of course, if the two solutions are equal it does not guarantee that they are correct, they may just both be wrong in the same way. However, the more we can vary the input conditions and still get the same answer, the more confident we will start to have in the solution.

To test the consistency of solution for the previous data set, I processed the data from the two receivers in a different way. Instead of measuring the relative distance between the two receivers, I measured the distance between each receiver and two independent base stations, giving me a total of four solutions, two for each Ublox receiver Since the two base receivers are in different locations the solutions will be different, but if we ask RTKLIB to calculate absolute position rather than relative distance then we can compare the two directly. By changing the out-solformat parameter in the input config file from “enu” to “xyz”, RTKLIB will add the absolute location of the base station to the solution, converting it from relative to absolute. With this change, the two solutions calculated from different base stations should be identical.

For the two base receivers, I used data from two nearby stations in the NOAA CORS network, TMG0 which is about 5 km away from where I collected the data and ZDV1 which is about 13 km away. TMG0 data includes Glonass satellites, ZDV does not so this also helps differentiate the two solutions by giving them different but overlapping satellite sets. To further increase the difference between the two satellite sets, I changed the minimum elevation variables in RTKLIB for the TMG station from 15 degrees to 10 degrees while leaving them at 15 degrees for the ZDV station. This will add the low elevation satellites to the TMG solution.

Now we can compare the two solutions for each Ublox receiver, each calculated with different base stations and different sets of satellites. If everything is working properly they should be identical within our expected error tolerance. Since our baseline distances between base and rover have increased from roughly 30 cm in the previous analysis to 5-13 km in this case we will expect the errors to be somewhat larger than they were before.

The plots below show the results of this experiment. The plots on the left are from the Ebay receiver and the plots on the right are from the CSG receiver. The first row of plots show the ground tracks for both solutions overlayed on top of each other, the second row shows position in x, y, and z directions, again for both solutions overlayed on top of each other. The x and y directions are indistinguishable at this scale but we do see some error in the z direction. The third row of plots shows the difference between the two solutions. We can interpret these differences as a lower bound for our error. As we would expect with the longer baselines, the errors are somewhat larger than before, but at least the x and y errors are generally still quite reasonable, most of them falling between +/- 2 cm for epochs with fixed solutions (green). The z errors are somewhat larger but fortunately we usually care less about this direction. I have made no attempt to reduce the ephemeris errors, either through SBAS or precise orbit files. Presumably, at these longer baselines, doing so would help reduce the errors.

sol18a

sol18b

sol18c

So, how do we interpret these results? I believe this analysis is consistent with the previous analysis. It is not as conclusive since we have no absolute error measurement, but since the solutions are more similar in format to how they would be done in practice, it should give us more overall confidence in our results.

Maybe more important, it also provides a baseline for using similar consistency analysis on other data sets where we often won’t have the luxury of having two receivers tracking the same object.

Let me add a few words to help put these results in context in case you are comparing them with other data sets.  This data was all taken in very open skies with fairly low velocities (<6 m/sec) so it is relatively easy data to get a good solution.  It will be more difficult to get similar results for data taken with obstructions from trees or buildings, or that includes high velocities or accelerations or high vibration.  In particular, there were very few cycle slips in this data.   None of the modifications described here will help if the data has a large number of cycle slips.

Also remember that this data was taken with very low cost hardware (the Ebay receiver cost less than $30 for receiver and antenna combined) so the data will be lower quality relative to data taken under similar conditions with more expensive hardware, especially higher quality antennas.  

That is the goal of my project, though, to get precise, reliable positioning with ultra-low cost hardware under benign environmental conditions. 

 

 

 

 

Improving RTKLIB solution: Fix-and-Hold

Integer ambiguity resolution is done by RTKLIB after calculating a float solution to the carrier phase ambiguities to attempt to take advantage of the fact that the carrier phase ambiguities must be integers. Constraining the solution to integers improves both the accuracy and convergence time.

RTKLIB provides three options for the integer ambiguity resolution: instantaneous, continuous, and fix-and-hold. “Instantaneous” is the most conservative and relies on recalculating the phase bias estimates every epoch. “Continuous” is the default option and uses the kalman filter to estimate the phase biases continuously over many epochs. In general, this provides much less noisy estimates of the phase biases and is essential for working with low cost receivers. The downside of the continuous solution is that if bad data is allowed to get into the filter outputs it can remain for multiple epochs and corrupt subsequent solutions, so some care must be taken to avoid that.

Fix-and-Hold” takes the concept of feeding information derived from the current epoch forward to subsequent epochs one step farther. In the “Continuous” method, only the float solution is used to update the phase bias states in the kalman filter. In the “Fix-and-Hold” method, an additional kalman filter update is done using pseudo-measurements derived from the fixed solution. This is only done if the fixed solution is determined to be valid. All of this is explained in greater detail in Appendix E.7(5) of the RTKLIB user manual.

In general, taking advantage of all the information we have this epoch to improve the solution for the following epochs seems to be a good idea, especially given the high levels of noise in the low-cost receivers and antennas. Even more so than in the “Continuous” method, however, we need to be careful about feeding erroneous data into the kalman filter which may take a long time to work its way out.

Testing the “Fix-and-Hold” method on a few data sets shows the results we would expect, the solution usually does improve but can occasionally cause it to have difficulty recovering from a period of bad data where we have corrupted the phase-bias estimates.

To minimize the chance of this happening, we need to do a better job of validating the solution before feeding it forward. Looking at a couple examples of bad data, what I find is that the errors are introduced when solutions are derived from a very small number of valid satellites. The current integer ambiguity validation is based only on the ratio of residuals between the best solution and the second best solution and does not take into account the number of satellites used to find the solutions.

I have introduced two new input parameters, “minfixsats”, and “minholdsats” into the code. These set the number of satellites required to declare a fixed solution as valid, and the number of satellites required to implement the fix-and-hold feature. I have found setting the minimum number of satellites for fix to 4 and fix-and-hold to 5 seems to work quite well for my data, but I have made them input parameters in the config file to allow them to be set to any value.

I won’t include all the code here, but it is available on GitHub at https://github.com/rtklibexplorer/RTKLIB on the demo1 branch

The two key changes are:

For minfix, in resamb_LAMBDA() from:

if ((nb=ddmat(rtk,D))<=0) {
errmsg(rtk,”
no valid double-difference\n”);

to:

if ((nb=ddmat(rtk,D))<(rtk->opt.minfixsats-1)) { // nb is sat pairs
    errmsg(rtk,”not enough valid double-differences\n”);

and for minhold, in holdamb() from:

if (nv>0) {
    R=zeros(nv,nv);
    for (i=0;i<nv;i++) R[i+i*nv]=VAR_HOLDAMB;

    /* update states with constraints */
    if ((info=filter(rtk->x,rtk->P,H,v,R,rtk->nx,nv))) {
    errmsg(rtk,”filter error (info=%d)\n”,info);
}

to:

if (nv>=rtk->opt.minholdsats-1) { // nv=sat pairs, so subtract 1
    R=zeros(nv,nv);
    for (i=0;i<nv;i++) R[i+i*nv]=VAR_HOLDAMB;

    /* update states with constraints */
    if ((info=filter(rtk->x,rtk->P,H,v,R,rtk->nx,nv))) {
    errmsg(rtk,”filter error (info=%d)\n”,info);
}

Again, the changes to the position solution are subtle and difficult to see in a plot, but are much more obvious in the Ratio Factor for AR validation. A higher ratio represents greater confidence in the solution. The green is before the changes, the blue is after. The RTKLIB code limits the AR ratio to a maximum of 1000.

ar2

As you can see, there is a very significant increase in the AR ratio with the change.  The improvement is coming from switching the ambiguity resolution from continuous to fix-and-hold, not from the code changes which we made to try and reduce the occasional occurence of corrupting the filter states with bad data.

Let’s also look at the metrics described earlier for both this change and the bug fix described in the previous post. On the x-axis, 1-3 are from before the two previous changes, 4 is with the bias sum bug fix and 5 is with fix-and-hold and the min sat thresholds for fix and hold.

metrics2

From the metric plots you can see that at this point we have maxed out the % fixed solutions and median AR ratio metrics and have got the distance metric down below 0.5 cm so I will stop here and declare success, at least for this data set.

If you’d like to try this code with your data, the code for all the changes I have made so far are available at https://github.com/rtklibexplorer/RTKLIB on the demo1 branch.

If anyone does try this code on their own data sets please let me know how it goes. I’m hoping these changes are fairly universally useful, at least for open skies and low velocities, but the only way to know is to test it on multiple data sets.

Update 4/17/16:  Enabling the “fix-and-hold” feature on other data sets has had mixed success.  I still see “fix-and-hold” locking to bad fixes and corrupting the states.  It looks like the extra checks I added are not sufficient to prevent this.  So, for know,I recommend turning off this feature, or if you do use it, be aware that it can cause problems.  I hope to take another look at this soon.

Improving RTKLIB solution: Phase-bias sum error

While working through the code to add comments as described in the last post, I stumbled across what looks like another bug. This one is more subtle than the previous bug (arlockcnt) and only has a small effect on the results, but still I thought it was worth discussing.

During each epoch, RTKLIB estimates the phase-bias for each satellite using the differences between the raw carrier-phase measurements and the raw pseudorange measurements. In a perfect system, these should always differ by an integer number of cycles because the carrier-phase measurement have an uncertainty in the number of cycles whereas the pseudorange measurements do not. These phase-bias estimates are then used to update the kalman filter.

Before updating the kalman filter however, RTKLIB calculates the common component between all satellites and subtracts this off of each phase-bias states (the kalman filter outputs) My guess is that this code is left over from the GPS-only days where all satellites operated at the same frequency since the estimates are all in units of cycles. As long as the frequencies of each satellite are identical, it is fine to add the cycles from one satellite to another, but this doesn’t work anymore once the satellite cycles are of different frequencies.

My code modification simply converts the biases to meters first using the wavelength for each satellite before summing them and then converting back to cycles for each satellite. The changes all occur in the udbias() routine. The following lines of code:

1) bias[i]=cp-pr/lami;

2) offset+=bias[i]-rtk->x[IB(sat[i],f,&rtk->opt)];

3) if (rtk->x[IB(i,f,&rtk->opt)]!=0.0) rtk->x[IB(i,f,&rtk->opt)]+=offset/j;

become

1) bias[i]=cp*lami-pr;

2) lami=nav->lam[sat[i]-1][f];
    offset+=bias[i]-rtk->x[IB(sat[i],f,&rtk->opt)]*lami;

3)   if (rtk->x[IB(i,f,&rtk->opt)]!=0.0) {
       lami=nav->lam[i-1][f];
      rtk->x[IB(i,f,&rtk->opt)]+=offset/lami/j;
     }

This improves the solution slightly but is most obvious when the receiver does a clock update. Since the receivers use inexpensive crystal clocks, unlike the satellites which use atomic clocks, there is a continuous drift between the two. At some point, when the difference between these two clocks gets too large, the receiver will update its clock to remove the error. Most of the time the error introduced by this bug is too small to easily pick out of the noise, but when there is a large adjustment from the clock correction it becomes quite obvious and shows up as a large spike in the residuals. Adding this code change eliminates the spikes from the residuals.

While exploring this issue, I modified the code that outputs to the state file to also output the phase-biases since I felt they provided a fair bit of insight to what was happening in the solution. What I found however when looking at these phase-bias outputs is that they are dominated by this common term (presumably some sort of clock bias) and it is difficult to see the behavior of the individual phase biases. To avoid this problem I made another modification to the code. Instead of adding the common component to every phase-bias state, I saved it in a separate common bias variable and used this when initializing phase-biases for new satellites. Since all position calculations are done with the difference of the phase-biases and not the phase-biases themselves, this change does not have any effect on the output solution. It does however remove the large common variation from the phase-bias states and makes them easier to analyze.

Here are the residuals before and after the code modification (zoomed way out to see the spike).

res1ares1b

The position solution doesn’t change much, the improvement is more obvious in the confidence of the solution which can be seen in the Ratio Factor for AR validation. Here it is before (green) and after (blue) the code modification.

ar1

As you can see there is a small but persistent increase in the AR ratio with this change.

Anyone else run across this issue and solved it in a different way? Or believe this is not a bug, and is actually the intended behavior? Or possibly know of other cases where cycles of different wavelengths are not handled properly in RTKLIB?

RTKLIB: Code comments + new Github repository

If you’ve spent any time perusing the RTKLIB source code you will surely be aware that it is very much what I would call “old-school” code. Comments are sparse and variable names tend to be short and not always very descriptive. The upside is that the code is quite compact, significant pieces of it can be seen on a single page which can sometimes be quite nice. The downside is it can be difficult to follow if you have not been poring over it for a long time. To help myself understand the code I have added a fair bit of commenting to most of the code used to calculate the carrier-phase solution, which is really the heart of the RTKLIB kinematic solution.  This is mostly in the rtkpos.c file.

The changes are too extensive to spell out here so I have forked the RTKLIB code in Github and posted my commented code there. If you are interested, you can download it from the demo1 branch at https://github.com/rtklibexplorer/RTKLIB.  

All the previous code changes are also there as well as a couple I have not yet posted about.  The receiver data I have been using for these posts is also there in the rtklib/data/demo1 folder.  The solution file updates for VS2010 c++ are in the rtklib/app/rnx2rtkp/msc folder and the modified executable is in the rtklib/bin folder.

Define metrics for evaluating the quality of a solution

So far, the improvements in the quality of the solution with each change have been quite obvious in the resulting plots from RTKLIB. Going forward though, the improvements are going to be more subtle and it will be more difficult to evaluate the changes.

Let us define some metrics we can use to help evaluate these future changes.

There are many metrics we could come up with, but I will start with these:

1) Stdev of xy distances between base and rover: This is a standard deviation of the distance between receivers. The distance between receivers should remain constant regardless of the change in orientation as discussed in a previous post, so ideally the standard deviation should be zero.

2) Median AR (ambiguity ratio) between best solution and 2nd best solution: This is a measure of the confidence of the solution coming from the integer ambiguity resolution algorithm. In general higher is better, but there is a trade-off with number of satellites used. A lower AR with a large number of satellites may be preferable to a high AR with only a small number of satellites

3) Ratio of epochs with fixed solution to total epochs: A measure of how often the result of the integer ambiguity resolution algorithm exceeds the threshold for using a fixed solution. Higher is better provided there are no significant errors in any of the fixed solutions. For integrity, we rely on a fixed solution indicating a very high confidence in the solution.

4) Median number of satellites used in float solution: This is the total number of satellites used to determine position. A higher number of satellites gives us more confidence in the result provided, if all other metrics are equal. Including low quality satellites in the solution will increase this metric while decreasing other metrics.

5) Median number of satellites used in fixed solution: This is the number of satellites considered high enough quality to use for the fixed solution. Similar trade-offs apply as with the number of satellites used in the float solution.

6) Fraction of incorrect fixed solutions: Number of erroneous fixed solution epochs divided by total number of fixed solution epochs. Any fixed solutions with significant errors dramatically reduces our confidence in the answer, so this metric should always be zero or very close to zero. We need to define a threshold for this metric based on how much error we consider acceptable in the solution.

Some of these metrics will probably prove to be more valuable than others, but I will start by monitoring all of these until have a better understanding of which are most important. Some of these metrics will tend to move in opposite directions. For example, if the algorithm excludes all but the best quality satellites, the median AR and ratio of fixed epochs will increase because the quality of the data is better, but the number of satellites used will decrease. Increasing the number of satellites used will increase the confidence that the solution is correct but may reduce some of the metrics because of the inclusion of lower quality data.

All of the metrics except #5 can be derived from data output to the status file, which is generated when RTKLIB calculates a solution. Metric #5, the satellites used in the fixed solution, requires a code modification to make a small change to the format of the status data. The code currently calculates the solution status for each satellite separately, but at the end of each epoch, changes the solution status of all satellites to the status of the solution. In other words, if five of eight satellites are used to determine the fixed solution, but the fixed solution does not meet the AR threshold criteria, the solution status for all 8 satellites are set to float. The code modification comments out one line from the end of the relpos() function in rtkpos.c as shown below in green to retain the individual status for each satellite. This change does not affect the functionality of the code in any way since this these variables are not used again after they are changed.

for (i=0;i<MAXSAT;i++) for (j=0;j<nf;j++)  {
      // Don’t lose track of which sats were used to try and resolve the ambiguities
     // if (rtk->ssat[i].fix[j]==2&&stat!=SOLQ_FIX) rtk->ssat[i].fix[j]=1;
     if (rtk->ssat[i].slip[j]&1) rtk->ssat[i].slipc[j]++;
}

I will track these metrics as I make changes in input parameters and algorithm to quantify any improvement. To add further insight, I will track the three different conditions in my input data separately. Those are static, moving slowly with open skies, and moving quickly with partially obstructed skies.

Here is the plot of the metrics for the changes we have made so far in the three previous posts for the period when the receivers are moving.

metrics1

Case 1 on the x-axis represents the baseline parameters, case 2 is after increasing eratio1 from 100 to 300, case 3 is after fixing the bug in lock count with arlockcnt=20, and case 4 is after increasing arlockcnt to 120.

The metrics in the first three plots are all moving in the desired direction. The most important, stdev of the error is reduced from over 15 cm to less than 0.5 cm indicating a significant improvement in accuracy. Increases in % fixed solutions and median AR ratio both indicate increased confidence in the solution. In the last plot, the large difference between number of satellites used for the float solution and number of satellites used for the fixed solution is because we have disabled integer ambiguity resolution for the GLONASS satellites. This suggests a large amount of information is not being used and represents opportunity for future improvement. The slight drop in number of satellites used for the fixed solution in cases 3 and 4 occur because we are not using position data from satellites that have lost lock until the kalman filter has had some time to reconverge.

Plots for the other two environments, (receivers stationary, and receivers moving quickly) show similar trends.

For all three cases after the first improvement, the last metric, fraction of incorrect fixed positions, is zero for the period when the receivers were moving, which is what we want to see. I defined an erroneous position as anytime an error in a fixed solution exceeded 5 cm. Note that for the period of time when the receivers were moving quickly, this value was non-zero and hence this setup probably would not be adequate for that environment.

I’m interested in what metrics other people are using to evaluate the quality of solutions? Please leave a comment if you have any thoughts or other metrics that might be more useful.