Define metrics for evaluating the quality of a solution

So far, the improvements in the quality of the solution with each change have been quite obvious in the resulting plots from RTKLIB. Going forward though, the improvements are going to be more subtle and it will be more difficult to evaluate the changes.

Let us define some metrics we can use to help evaluate these future changes.

There are many metrics we could come up with, but I will start with these:

1) Stdev of xy distances between base and rover: This is a standard deviation of the distance between receivers. The distance between receivers should remain constant regardless of the change in orientation as discussed in a previous post, so ideally the standard deviation should be zero.

2) Median AR (ambiguity ratio) between best solution and 2nd best solution: This is a measure of the confidence of the solution coming from the integer ambiguity resolution algorithm. In general higher is better, but there is a trade-off with number of satellites used. A lower AR with a large number of satellites may be preferable to a high AR with only a small number of satellites

3) Ratio of epochs with fixed solution to total epochs: A measure of how often the result of the integer ambiguity resolution algorithm exceeds the threshold for using a fixed solution. Higher is better provided there are no significant errors in any of the fixed solutions. For integrity, we rely on a fixed solution indicating a very high confidence in the solution.

4) Median number of satellites used in float solution: This is the total number of satellites used to determine position. A higher number of satellites gives us more confidence in the result provided, if all other metrics are equal. Including low quality satellites in the solution will increase this metric while decreasing other metrics.

5) Median number of satellites used in fixed solution: This is the number of satellites considered high enough quality to use for the fixed solution. Similar trade-offs apply as with the number of satellites used in the float solution.

6) Fraction of incorrect fixed solutions: Number of erroneous fixed solution epochs divided by total number of fixed solution epochs. Any fixed solutions with significant errors dramatically reduces our confidence in the answer, so this metric should always be zero or very close to zero. We need to define a threshold for this metric based on how much error we consider acceptable in the solution.

Some of these metrics will probably prove to be more valuable than others, but I will start by monitoring all of these until have a better understanding of which are most important. Some of these metrics will tend to move in opposite directions. For example, if the algorithm excludes all but the best quality satellites, the median AR and ratio of fixed epochs will increase because the quality of the data is better, but the number of satellites used will decrease. Increasing the number of satellites used will increase the confidence that the solution is correct but may reduce some of the metrics because of the inclusion of lower quality data.

All of the metrics except #5 can be derived from data output to the status file, which is generated when RTKLIB calculates a solution. Metric #5, the satellites used in the fixed solution, requires a code modification to make a small change to the format of the status data. The code currently calculates the solution status for each satellite separately, but at the end of each epoch, changes the solution status of all satellites to the status of the solution. In other words, if five of eight satellites are used to determine the fixed solution, but the fixed solution does not meet the AR threshold criteria, the solution status for all 8 satellites are set to float. The code modification comments out one line from the end of the relpos() function in rtkpos.c as shown below in green to retain the individual status for each satellite. This change does not affect the functionality of the code in any way since this these variables are not used again after they are changed.

for (i=0;i<MAXSAT;i++) for (j=0;j<nf;j++)  {
      // Don’t lose track of which sats were used to try and resolve the ambiguities
     // if (rtk->ssat[i].fix[j]==2&&stat!=SOLQ_FIX) rtk->ssat[i].fix[j]=1;
     if (rtk->ssat[i].slip[j]&1) rtk->ssat[i].slipc[j]++;

I will track these metrics as I make changes in input parameters and algorithm to quantify any improvement. To add further insight, I will track the three different conditions in my input data separately. Those are static, moving slowly with open skies, and moving quickly with partially obstructed skies.

Here is the plot of the metrics for the changes we have made so far in the three previous posts for the period when the receivers are moving.


Case 1 on the x-axis represents the baseline parameters, case 2 is after increasing eratio1 from 100 to 300, case 3 is after fixing the bug in lock count with arlockcnt=20, and case 4 is after increasing arlockcnt to 120.

The metrics in the first three plots are all moving in the desired direction. The most important, stdev of the error is reduced from over 15 cm to less than 0.5 cm indicating a significant improvement in accuracy. Increases in % fixed solutions and median AR ratio both indicate increased confidence in the solution. In the last plot, the large difference between number of satellites used for the float solution and number of satellites used for the fixed solution is because we have disabled integer ambiguity resolution for the GLONASS satellites. This suggests a large amount of information is not being used and represents opportunity for future improvement. The slight drop in number of satellites used for the fixed solution in cases 3 and 4 occur because we are not using position data from satellites that have lost lock until the kalman filter has had some time to reconverge.

Plots for the other two environments, (receivers stationary, and receivers moving quickly) show similar trends.

For all three cases after the first improvement, the last metric, fraction of incorrect fixed positions, is zero for the period when the receivers were moving, which is what we want to see. I defined an erroneous position as anytime an error in a fixed solution exceeded 5 cm. Note that for the period of time when the receivers were moving quickly, this value was non-zero and hence this setup probably would not be adequate for that environment.

I’m interested in what metrics other people are using to evaluate the quality of solutions? Please leave a comment if you have any thoughts or other metrics that might be more useful.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s