Several readers now have mentioned that they have had to set the receiver dynamics option in the input configuration file to “off” when running solutions in real-time because of limited CPU bandwidth and that this leads to poorer results. I don’t have this problem in my experiments because I am post-processing the data and so the CPU does not need to keep up with the input data. Hence I have always had this option set to “on”. But I hope to switch to real-time processing with an SBC at some point and decided to take a look at this issue.
First of all I tried disabling receiver dynamics and re-running the solution for the data set I introduced in the previous post, using my demo3 version of RTKLIB again. The plot on the left is position with receiver dynamics enabled, on the right is with dynamics disabled, otherwise the input options are identical.
Clearly there is some serious degradation with dynamics disabled! The difference is not a complete surprise because when we disable dynamics, we are throwing away some valuable information. The amount of degradation maybe should have been a clue that something else was wrong but at the time I didn’t investigate closely enough why things got worse. Instead I went ahead and implemented a “pseudo-dynamics” mode that uses a small fraction of the calculations of the full dynamics mode, but gives most of the benefit. I think this is a useful improvement and in fact it did make the problem go away and I will discuss that solution in the next post … but it turns out that even though it made the problem go away, it did not address the root cause, it just covered it back up again.
It wasn’t until I was testing this new feature that I started to see some strange things and realized that the lack of dynamics was not enough to explain what was going on.
So let’s take a closer look at the results with dynamics disabled. Unfortunately there are no outputs visible to RTKPLOT or in the output files where the problem can be seen, so it requires digging into the trace files. Below are some snippets from the trace files showing the residuals of the initial double differences from a sample just as the solution first started to degrade. The residuals available in RTKPLOT and the output files are the residuals after the last iteration of the kalman filter and not the initial residuals. These will be significantly smaller and so do not show the problem.
The trace on the left is with dynamics enabled, and the right is with them disabled. I will discuss more about how dynamics works in the next post, for now, if you are not familiar with the feature, just be aware that it improves RTKLIB’s initial guess of the receiver’s position each sample by using information from the previous positions.
The double difference residual for each satellite pair is listed after the “v=”. The L1 and P1 rows are for the phase measurements and the pseudorange measurements respectively. Because the initial position estimates are more accurate with dynamics on, you can see that the residuals on the left are significantly smaller than the ones on the right. Also, in this particular sample the receiver reported a cycle slip on satellite 33 and you can see the residuals are largest in both cases for this satellite. The most important difference between the two is that the larger residual with dynamics off was large enough to trigger the outlier rejection threshold, resulting in that residual to not be used as an input to the kalman filter. Introducing a non-linearity like this into a feedback loop always risks affecting its stability which looks like what happened here. Without any feedback, the errors continued to grow and to be rejected, eventually causing other residuals to be rejected, until the whole solution fell apart.
The threshold used by RTKLIB to reject outliers is adjustable and is set by the input parameter “pos2-rejionno” in the input configuration file. The name is an abbreviation for “reject innovations” although there seems to be an extra “o” . Innovations is a term for the error inputs to the kalman filter. The default value and the one I have been using in my experiments for this threshold is 30 meters. This is consistent with the two residuals rejected in the above example, both greater than 30.
There doesn’t seem to be anything magic about 30 meters, especially when we are striving for centimeter accuracy so I went ahead and increased it all the way up to 1000 meters to be sure I didn’t trip over it again, then re-ran the solution. Here is the result. Position is on the left and the difference in position with dynamics on and off is on the right.
Increasing the outlier threshold completely eliminated the problem. What is more surprising is that there is very little difference in the position solution with dynamics on or off. The larger errors in the initial position estimate are still there as are the larger initial residuals but the additional iteration of the kalman filter is apparently able to remove nearly all of the initial position error as can be seen in the right plot above.
So bottom line is, I don’t think outlier rejection is working properly in RTKLIB and I plan to leave this threshold at 1000 to effectively disable this feature until I see a need to re-enable it.
This problem is not limited to when receiver dynamics are turned off and can happen anytime large residuals occur. For example, once I knew what to look for, I was able to find the same problem occurring in the initial transient at the beginning of the solution.
To demonstrate this I did another experiment. In a previous post I described adjusting the solution start point around in the part of the data in which the rover was moving until I was able to get a bad fix. This time I did the same thing but in the part of the data in which the rover was stationary. I did this with the outlier threshold set back to 30. Again I was able to find a start time that caused an initial bad fix. I checked the trace file for rejected outliers during the initial transient and sure enough they were there. So once again, I increased “pos2-rejionno” from 30 to 1000 and re-ran. The transient was almost entirely eliminated, and I got a good first fix. Here’s the position plots for the two cases, threshold=30 on the left, threshold=1000 on the right.
Notice the difference in y-axis scales and the size of the initial transient. With the threshold set to 1000, as would be expected, there were no outliers rejected in the trace file.
I suspect another thing that aggravates this problem in my case is when I adjusted the input parameter eratio1 (ratio of pseudorange measurement errors to carrier phase measurement errors) from 100 to 300. This reduced the time to first fix but also increased the overshoot of the initial transient and hence would be more likely to trip the outlier threshold.
So is there a risk that opening up this limit will cause other problems where data that should have been rejected is not? Possibly, but I suspect the benefits of opening up this limit will outweigh any downside. I plan to keep an eye out for true bad data points and deal with them once I have some real examples, but won’t worry about hypothetical cases for now.
So to sum up, I would suggest increasing this limit even if you are not seeing problems at the moment, and be on the lookout for “outlier rejected” messages in your trace files if you are having problems.