Comparing real-time precise (RTK) solutions for the u-blox F9P

The u-blox F9P internal RTK solution engine is very good and I have been recommending using it rather than RTKLIB for real-time RTK solutions, saving RTKLIB for F9P post-processing solutions. However, in this post, I will take another look at that recommendation.

Tomoji Takasu, the creator of RTKLIB, recently published some data also confirming the high quality of the F9P RTK solutions. He ran comparisons of the F9P against several other receivers of various price and quality. You can see his results in the 2021/07/01 entry of his online diary. His conclusion (at least the Google translation of the original Japanese) is that “Considering the price, there is no reason to choose other than F9P at this time.”

He did his comparisons by splitting an antenna output, feeding it to the two receivers being compared, then cycled the antenna output on and off. He fed the output of the receivers to RTKPLOT to produce plots like the example shown below from his diary which show acquire times and consistency of the fix location. In this case the upper plot is a Bynav C1-8S receiver and the lower plot is a u-blox F9P. The yellow intervals in the plot indicate float status and show the acquire times for each signal cycle. The green intervals indicate fix status and show the consistency of the fix location. Clearly the F9P outperformed the C1-8S in this example.

Receiver comparison data from Tomoji Takasu online diary

Inspired by his comparisons and the recent changes I incorporated into the demo5 b34c code, particularly the variable AR ratio threshold option described in my previous post, I decided it was time to do a similar head to head comparison between the internal F9P RTK solution and the latest demo5 RTKNAVI RTK solution.

To setup this experiment I used u-center to configure an Ardusimple F9P receiver to output both NMEA position messages (GGA) and raw observation messages (RAWX, SFRBX) to the USB port. I then setup STRSVR and RTKNAVI on my laptop as shown in the image below.

RTKLIB configuration for the comparison

The first instance of STRSVR is configured to stream NTRIP data from a nearby UNAVCO reference base (P041) to both the u-blox receiver and a local TCP/IP port. The output of the u-blox receiver is streamed to a second local TCP/IP port by selecting the “Output Received Stream to TCP Port” in the “Serial Options” window. The second instance of STRSVR then streams the second TCP/IP port containing the output of the u-blox receiver to a file for logging the internal F9P solution. This file will contain the raw observations in binary format as well as the receiver solution but RTKPLOT will ignore the binary data and plot just the NMEA message data.

The demo5_b34c version of RTKNAVI is configured to use the two TCP/IP ports configured above, one from the receiver, and one from the UNAVCO base, as inputs for rover and base. I also configured two instances of RTKPLOT so that I could see the real-time status of both solutions in addition to saving the results to files.

To setup the RTKNAVI config file for this experiment, I started with the PPK config file from the U-blox F9P kinematic PPP data set 12/24/20 data set and made a few changes to make it more suitable for a real-time solution. Time to first fix is not as important in post-processed solutions since they are usually run in both directions, so the config parameters in that file are set to minimize the chance of a false fix at the expense of relatively long acquire times. To shift this balance towards faster acquire times, I increased the maximum filter variance allowed for fix (Max Pos Var for AR / pos2-arthres1) from 0.1 to 1.0 and decreased the number of fix samples required for hold (Min Fix / pos2-arminfix) from 20 to 10. I also changed the solution type from combined to forward since combined mode is not possible for real-time solutions.

Most importantly, I enabled the new variable AR ratio threshold described in my previous post by setting the AR ratio threshold min and max (pos2-arthresmin, pos2-arthresmax) to 1.5 and 10. I left the nominal AR ratio threshold at 3.0. This means that instead of using a fixed AR ratio threshold of 3.0 for all epochs, the AR ratio threshold used in each epoch is selected from the blue curve below, with the limitation that it can not drop below the specified minimum of 1.5

AR ratio threshold curves

This will allow the AR ratio to drop well below 3.0 when there are many available satellites and the model strength is very high, while still protecting against false fixes by increasing the AR ratio when the number of satellites and the model strength are low.

To run the experiment, I connected the antenna to the receiver, waited until both solutions had acquired a fix, then disconnected the antenna for 10-20 seconds to force the receiver to lose lock to all the satellites, then reconnected the antenna again. I repeated this sequence for about 15 cycles.

I ran this experiment twice. The first time, I connected the receiver to a Harxon geodetic GPS-500 antenna mounted on my roof with a reasonably good view of the sky. The second time, I connected the receiver to a smaller, less expensive u-blox ANN-MB antenna with ground plane on a tripod in my backyard in a moderately challenging environment with the sky view partially blocked by the house and nearby trees. In both cases, the base station (P041) was 17 kms away and included Galileo E1 signals in addition to GPS and GLONASS.

Below is the result from the first experiment. As previously, yellow is float, and green is fix. The internal F9Psolution is on the top and the RTKNAVI solution is below. The blue in the F9P plot indicates a dead reckoning solution which only occurs when the antenna is disconnected, so can be ignored.. To reduce clutter in the plots I am showing only the U/D axis. In this case I know the precise location of my roof antenna so the y-axis values are absolute errors.

F9P vs RTKNAVI, GPS-500 antenna on roof

Here is a zoomed in version of the same plot. Both solutions acquire first fix fairly quickly in most cases. The absolute errors during fix are small for both solutions but do appear to be a little larger in the internal F9P solution. None of the errors are large enough to be considered false fixes.

F9P vs RTKNAVI, GPS-500 antenna on roof (zoomed in)

Below is the same plot for the second experiment where the smaller ANN-MB antenna is located in a more challenging environment. Again, both solutions tend to acquire first fix quite quickly, and again the internal F9P errors appear to be larger than the RTKNAVI errors. In this case I don’t know the precise location of the antenna so the errors are relative.

F9P vs RTKNAVI, ANN-MB antenna in backyard (zoomed in)

Here is a summary of the acquire times for both solutions measured from the above plots. The plots don’t show precisely when the antenna was reconnected so I measured both acquire times starting from the first solution output sample after the disconnect gap, regardless of which solution it came from. The first column in the table is the number of acquisitions measured, the other columns show the minimum, maximum, mean, and standard deviations of the time to first fix in seconds.

GPS-500 antenna on roof
ANN-MB antenna in backyard
Time to first fix comparison between F9P and RTKNAVI

Both solutions performed quite well in both experiments. The RTKNAVI solution outperformed the F9P internal solution in both cases, but given the very small amount of data and testing conditions, I would be reluctant to declare RTKNAVI the winner. I does suggest though, that the RTKLIB solution has closed the gap and should be considered at least roughly equal in performance to the internal RTK engine for real-time solutions.

In many cases it will still be simpler for real-time solutions to use the internal solution than RTKNAVI since it requires less configuration. There are cases, however, where it makes more sense to use an external solution even in real-time. For example, one user recently shared data with me where the rover is measuring real-time buoy activity. In order to avoid needing a bi-directional radio link, he sends the rover raw observations back to shore and calculates the solution there. Otherwise he would need to send the raw base observations to the buoy, and then send the resulting solution back to shore.

If anyone else runs a similar experiment comparing RTKNAVI to the internal F9P solution and wants to share their results, please leave a comment below.


A variable ambiguity resolution threshold for RTKLIB

The ambiguity resolution ratio (AR ratio) threshold is used by RTKLIB to determine if there is enough confidence in an integer ambiguity resolution solution to declare it as a fixed solution. If not, the float solution is used instead. More specifically, the AR ratio is the ratio of the sum of the squared residuals of the second-best fixed solution to the sum of the squared residuals of the best fixed solution.

The higher this ratio is, the higher the confidence that the best solution is the correct solution. However there is no simple way to determine what the optimal value for this threshold should be. In practice, it is usually set based on empirical results, and typical values vary from 1.5 to 3.0. RTKLIB uses 3.0 as a default value for this threshold which is a fairly conservative value, especially when large numbers of satellites are available for ambiguity resolution. If too low a value is chosen, false fixes will be common, but too high a value will lead to unnecessarily low fix rates. In general, if the underlying GNSS model strength is higher, a lower threshold can be used. The GNSS model strength will be affected by many parameters including the number of tracked satellites, the measurement precision, the number of measurement epochs , the number of used frequencies, and many others. If the model strength is high, then this can be reduced, if the model strength is very low, the threshold should be increased.

Teunissen and Verhagen make a compelling argument in this paper (and several others) that the AR ratio threshold should be adjusted not only for different solution environments but also on a epoch by epoch basis within a single solution. They argue that as the model strength varies, the threshold should be adjusted such that there is a fixed possibility of an incorrect solution. It turns out that the exact model strength is very difficult to calculate and that the relationship between model strength and the optimal threshold is also complicated. They propose a solution in which model strength is estimated from two parameters; (1) the number of satellites, and (2) an estimate of the ambiguity resolution failure rate based on the covariance matrix of the float ambiguities. These two values are then applied to a set of lookup tables, along with a desired failure rate, to calculate an optimal threshold value. The lookup tables are generated using Monte Carlo simulations of various models and model strengths. They named this technique the fixed-failure rate ratio test (FFRT).

They provide examples of Matlab and python code to demonstrate their algorithm which can be downloaded from here along with some documentation to explain how it works. For anyone who wants to understand their methods more completely, this is a very useful tool.

One of the things they predicted in their paper from a number of years ago, is that as the number of available satellites increases with the newer constellations, the need to move away from a fixed threshold will become greater. I believe with the introduction of Galileo and Beidou, and the newer low cost dual-frequency receivers, we have reached that point where a fixed threshold is no longer adequate. The problem is that especially for a dynamic rover scenario, the number of available satellite pairs can vary from just a handful in obstructed sky conditions to several dozen in open skies. This results in a huge variation of model strength in a single solution. If a fixed threshold is chosen to be adequate for the obstructed sky conditions, it is much too conservative for the open sky conditions.

So, the next step was to try and add some of this capability to the demo5 version of RTKLIB. I decided early on to reduce the complexity of the solution by making the goal simply an improvement over the existing algorithm rather than trying to achieve an optimal solution. The first step in this direction was to replace the lookup tables with a set of cascaded polynomial coefficients to approximate the contents of the tables with a much smaller set of numbers. I then rewrote the FFRT pieces of the above mentioned python code into the RTKLIB C code. The example code also allowed the user to choose one of two targeted failure rates. In order to further simplify the solution, I eliminated the higher fixed failure rate target option and only supported the 0.001 failure rate target.

So far, so good. I was able to get very similar results between the python example code and the RTKLIB code. Unfortunately I was not completely happy with these results. The threshold adjustments based on the number of satellites seemed to behave well but the adjustments based on the covariance matrix of the float ambiguities was often too optimistic and would cause the threshold to be adjusted too low. I suspect the reason why is very similar to the reason why the accuracy estimates of the kalman filter also tend to be very optimistic. The math is based on the assumptions that the input measurements all have zero mean errors and are uncorrelated with each other and in time, none of which is true in the real world. To some extent, the lookup tables should compensate for this, but that assumes the error characteristics in their simulations matched the actual errors which is difficult to do for a general case.

My conclusion was that the ambiguity covariance matrix is not sufficient in many cases to accurately estimate the remaining model strength after the number of satellites component is accounted for. Fortunately however, I believe that the largest variation in model strength within a solution is generally going to be caused by the variation in number of available satellites, and not from the other components. I ended up with, at least as an initial implementation, one where the threshold is adjusted for the number of satellites but the remaining components of the model strength are still specified by the user. To keep the user interface more compatible with the existing RTKLIB code, I used the existing convention that model strength is adjusted by adjusting the AR ratio threshold directly. However, now it is the nominal AR ratio threshold that is adjusted rather than a fixed threshold. The nominal AR ratio threshold is defined as the threshold for the case where there are eight satellite pairs used for ambiguity resolution. If the number of satellite pairs in the ambiguity set increases, the threshold is reduced, and if it decreases, the threshold is increased.

The plot below attempts to show how this method correlates to the FFRT algorithm given a fixed target failure rate of 0.1%. I won’t bother to describe what Pf_ILS is in detail here, it is enough to know that it is the previously mentioned estimate of model strength calculated in FFRT from the float ambiguity covariance matrix.

The three trajectories of dots in the plot below are calculated in the FFRT python code using the lookup tables for specified model strengths (Pf_ILS) of 0.005, 0.035, and 0.85. The x-axis is the number of satellites used in ambiguity resolution and the y-axis is the corresponding AR ratio threshold. The three colored lines are the equivalent values calculated in RTKLIB for the nominal AR ratio thresholds that correspond to those model strengths.

Comparison of AR ratio threshold for FFRT and RTKLIB

One detail worth mentioning is that the FFRT method uses the number of satellites as input, while the RTKLIB implementation uses the number of satellite pairs as input. They are aligned in the computations for the one frequency, one constellation case. Hence you can see that the nominal AR ratio thresholds in the legend appear when the number of satellites on the x-axis is nine (one reference satellite and eight differencing satellites). As the number of reference satellites increases with multiple constellations and frequencies the difference between number of satellites and number of satellite pairs will increase.

I’ve introduced these code changes into the demo5 b34c code which is available from the rtklibexplorer Github page or the download section of

I’ve added two new config parameters to RTKLIB in an attempt to make this feature as flexible as possible. The two new features, minimum AR ratio, and maximum AR ratio are highlighted in the image below from RTKPOST. These allow the new feature to be completely disabled (the default), partially enabled, or fully enabled. The AR ratio threshold for each epoch is first calculated as shown in the plot above based on the nominal AR ratio threshold and the number of satellite pairs. It is then clipped to the minimum or maximum values if it exceeds either one.

New input options for variable AR ratio threshold

The default values are 3.0 for minimum, maximum, and nominal. This will force the ratio to be always equal to 3.0 and is equivalent to the previous single default value of 3.0. To completely enable the feature set the minimum value to something less than or equal to 1.0 and the maximum value to 10.0 or greater. This means the calculated threshold will never be clipped. In the example above (min=1.5,nom=3,max=10), I have partially enabled the feature. The calculated value from the blue line in the above plot will be used as long as it is greater than or equal to 1.5. If the number of satellite pairs is greater than 25, the calculated threshold will drop below 1.5 but the final value will be set to 1.5 since it will be clipped by the minimum threshold.

These changes make the largest difference when running real-time solutions with the u-blox F9P or other dual frequency receivers with multiple constellations since the number of satellite pairs used in ambiguity resolution can be quite large and time to first fix is more critical. In this case it can significantly reduce time to first fix for the nominal case while still protecting against false fixes if only a few satellites are available.

Be aware however that a variable threshold will not always improve fix rate and in some cases will lower it. Any time the number of available satellite pairs is below eight, the AR ratio threshold will increase. This will reduce the number of false fixes but will also reduce the fix rate. Overall, though, it should give a more constant failure rate over the full range of satellite counts.

In my next post, I will compare the internal F9P real-time solution with the RTKNAVI real-time solution using this feature.

Sub-meter GNSS accuracy with standard precision solutions? – a look at the Allystar 1201

Being able to get centimeter level errors with RTK or PPK solutions is a very powerful technique, but it does involve a certain amount of complexity and lack of robustness in more challenging conditions. In many cases, users don’t need quite this much accuracy and are looking for a simpler, more robust solution. In a recent post, I demonstrated that a u-blox F9P receiver can achieve sub-meter accuracy with it’s standard precision solution for static measurements and even better accuracy for dynamic measurements. This is a good choice for users who are primarily interested in a RTK/PPK solution but would like the standard precision solution as a backup. It is a relatively expensive solution however for those looking for just a standard precision solution. The Allystar 1201 dual-frequency GNSS module recently caught my eye as a possible answer for a lower cost option. It advertises sub-meter CEP standard precision solution and is available for only $35 on an EVK board or $13.50 for just the bare module in quantities of 10.

It does not have an internal RTK engine or provide access to the raw observations like the u-blox F9P does. This means that it can not be used for anything other than standard precision solutions, but if that is not a requirement, then this could be quite an attractive alternative to the F9P.

Here are the GNSS signals received and the accuracy spec for the Tau1201

Specs from Allystar 1201 datasheet

For comparison, here is the equivalent specs for the u-blox F9P

Specs from u-blox F9P datasheet

As you can see, the Tau1201 claims <1 meter CEP while the F9P claims 1.0 meter CEP. CEP stands for Circular Error Probability and is defined as the radius of a circle centered on the true value that contains 50% of the actual GNSS measurements. In other words, one half of the measurements should have an accuracy better than or equal to the CEP. Real-world performance will vary depending on atmospheric conditions, GNSS antenna, multipath conditions, satellite visibility and geometry. Neither spec lists specific test conditions but generally the specs assume good visibility, low multipath, nominal atmospheric conditions, and a reasonably high quality antenna.

I was curious to see how these two receivers compared in a real-world test so I ordered a Tau1201 EVK board from Digikey and collected some data. I ran for 24 hours with both receivers connected through an antenna splitter to a Harxon GPS500 survey grade antenna on the roof of my house. I configured both receivers to use all available constellations and output the internal solution with NMEA GGA messages.

Here are the resulting ground track plots for both receivers, u-blox F9P on the left, and the Allystar Tau1201 on the right. They are plotted with RTKPLOT and have statistics enabled from the options menu. I also set the coordinate origin in the options menu to the precise location of the antenna in WGS84 coordinates.

Ground tracks for 24 hours for the F9P and Tau1201

Here are the positions plotted versus time.

Position vs time for 24 hours for the F9P and Tau1201

The statistics may be difficult to read in the above plots, so I’ve copied them below.

u-blox F9P plot statistics
Allystar Tau1201 plot statistics

The standard deviations (STD) include only the measurement noise. The root-mean-squares (RMS) include both measurement noise and bias, so are the more relevant statistic in this case. We can combine the East and North measurements using the square root of the sum of the squares to get the horizontal RMS values (sometimes called 2drms). This gives us a value of 0.38 meters for the F9P and a value of 0.96 for the Tau1201. It is not possible to calculate the exact CEP metrics directly from the RMS values but we can estimate them assuming gaussian distributions and circular distributions. Neither of these assumptions is entirely true, but the estimate should still be reasonably accurate. Using a conversion factor of 1.19 derived in this article from GPS World, we get estimated CEP values of 0.32 for the F9P and 0.81 for the Tau1201.

In this test, both receivers achieved their advertised spec for CEP. Obviously, though, the more expensive F9P receiver outperformed the lower cost Tau1201. Exact results will vary based on some of the factor mentioned earlier (visibility, multi-path, atmospheric conditions, etc) but I would expect the relative differences between the two receivers to be fairly consistent.

These are for static measurements. The CEP values for dynamic measurements would be noticeably smaller due to the averaging out of the multipath errors as I demonstrated in the post referenced above.

Another interesting difference between the two measurements is the position errors averaged over 24 hours. The F9P average errors were less than 10 cm in both horizontal axes while the Tau1201 was small in the east direction, but was relatively large in the north direction at 63 cm. It’s hard to say from one measurement how consistent this difference would be over multiple measurements, but if it did hold up, then averaging the measurements over time to reduce the errors would be much more effective with the F9P than it would be with the Tau1201

In order to focus on the performance of the receiver, I used a relatively expensive antenna for this experiment, at least compared to the cost of the Tau1201. Although I did not repeat the test with any lower cost antennas, I would not expect significantly different results with any antenna intended for dual-frequency RTK solutions such as those from Ardusimple or DataGNSS, provided they are used with the appropriate ground plane. Even lower cost antennas, as long as they are intended for dual-frequency use, might work as well, but would require more testing and would probably have a larger effect on the results. If anybody has run a similar experiment with very low cost antennas and would like to share their results, please leave a comment below.

So, to summarize, the Tau1201 receiver is not able to match the performance of the F9P, but it is significantly less expensive, does achieve sub-meter accuracy, and combined with a low cost antenna, could be a good choice for some applications.