A few simple debugging tips for RTKLIB

Unfortunately, error reporting is not one of RTKLIB’s strengths.  Sometimes it will halt with an error message that is less than helpful, other times it will just continue through the entire solution with Q=0 or Q=2 without ever getting a fix because of a minor error in the input file configuration or parameters. I suspect anyone who has used RTKLIB with any regularity has run into cases like this innumerable times. It still happens to me all the time, almost always because I have made a silly mistake somewhere. For a new RTKLIB user, it can be quite difficult to figure out why they are not getting a valid solution in cases like this.

In this post, I will demonstrate some examples of common errors and some techniques to help identify them. First let’s take a quick look at a couple of real-time solution examples with RTKNAVI, then I will jump to post-processing cases for the rest of the examples.

In my experience, the most common error when using RTKNAVI is to improperly specify the base location. To demonstrate this, I go to the “Positions” tab on the “Options” menu in RTKNAVI and switch the Base Station position type from “Average of Single Position” to “RTCM Position”. In this example, my base stream is u-blox binary and contains no RTCM messages so RTKLIB will not get any valid base location information and can not calculate a solution. However, rather than halting and letting the user know there is a problem, RTKNAVI just runs through the whole solution without ever reporting either an error or a valid position. To help understand what happened in a case like this, the first step is to click on the tiny unlabeled square above the “Start/Stop” button. This will bring up the RTK Monitor window. Choose “Error/Warning” from the Monitor options as shown in the image below to bring up additional error and warning information.

In this case, every sample reports an “initial base station position error”. This should be enough information to track down the problem fairly quickly. Often this will be the case. If the additional information in this window doesn’t help, then the next step is to enable the debug trace and look at the trace file. This is covered in more detail in the post-processing examples further below.

In my experience, the second most common reason to get no solution with RTKNAVI is that navigation messages haven’t been enabled for either base or rover receiver. If this is the case, then all the SNR bars will be gray. When navigation data is available, the bars will be colored as in the image above. Lack of SNR bar color is the easiest way to detect this problem but another symptom is that the “Error/Warning” window will report “No common satellites” for every sample even though satellites are appearing in the SNR window.

A more general troubleshooting technique that will work for either real-time or post-processing and for GUI or command line RTKLIB applications is the debug trace file. I will demonstrate a couple of examples of using this with RTKPOST but the same basic technique can also be used with RTKNAVI, RNX2RTKOP, RTKCONV, CONVBIN, or STRSVR.

To enable the debug trace, set “Output Debug Trace” in the “Options” menu to something other than “OFF”. For simple errors, level 2 is enough and will produce a less overwhelming amount of information. For more detailed information about the GNSS solution, especially the ambiguity resolution, level 3 is better. For more details about the communication streams, level 4 can be useful. Level 5 includes many of the solution matrices and is too detailed for most general debug. Level 1 only contains the error messages available in the error window and is not useful. The image below from the RTKPOST Options menu shows the trace level set to 3. I usually set it to level 3 since I am often looking at the details of the ambiguity resolution, but for simple debug I would recommend starting with level 2 and that is what I will use in the examples below.

Next, I will intentionally make some silly mistakes and show how they appear in the error window and the trace file.

Let’s start with an example where RTKPOST actually does a good job of reporting the error to demonstrate how it should work, and occasionally does. To keep the example simple I will use files in the root directory, with names: rover.obs, base.obs, and rover.nav.

In this first case, I have made a typo in the name of the .nav file. Now, when I click on “Execute” to start the solution, it almost immediately halts and reports “Error: no nav data” in the error window near the bottom of the window. If only RTKLIB would always be as straightforward as this!

Next, I intentionally make a typo in the rover observation file and click on “Execute”. Again, the solution halts almost immediately but in this case the error message in the error window is much less useful. Not only does it not tell me which file has a problem, but it also suggests there is something specific wrong with the input data, not that the file is completely missing. Fortunately, clicking on another tiny unlabeled square (circled in red below) brings up the trace file, assuming trace debug was enabled as described above. This tells us exactly what went wrong, that the input rover observation file could not be opened.

For the next example, I will correct the typo in the file name but then set the location in the base rinex file header to all zeros. This is another common problem and usually happens because the rinex file was generated by RTKCONV or other application from a binary receiver output file that did not contain navigation messages. Without navigation information, RTKCONV can not estimate the receiver location and sets the approximate position field in the rinex header to all zeros. Again, the solution halts almost immediately, and the error message is identical to the previous case. However, looking at the trace file, there is now no mention of a file open error. To understand the difference between these two trace files, it’s important to know that the entries in the trace file beginning with 2 are warnings, and those beginning with 1 are errors. RTKLIB treats the missing file as a warning since there may be other observation files that are present, and doesn’t get an actual error until it looks for the base position information. With just the error, it is impossible to tell these two case apart, but by looking at the warnings as well, it is fairly easy to differentiate them.

Another fairly common error is to use the wrong base observation file. If the base observations do not overlap the rover observations in time, then the solution will run all the way through the data, without error, but the solution status will always be Q=0. Here’s the trace output for this example. I aborted the run before it completed, after it was obvious that something is wrong which is why the error windows shows “aborted”. The “age of differential” is the time between rover observation and closest base observation so again in this case, looking at the trace file makes the problem much more obvious

If the wrong navigation file is used, the solution will again run to the end with no error, and the status will remain Q=0, just as above, but this time the trace file will report “no common satellites”

Another mistake I make fairly often, is when I am specifying the precise base location rather than using the approximate location in the rinex header, and I forget to change it when moving to a different data set with a different base location. In this case, the solution will run without error, but the fix status will remain in float (Q=2) for the entire solution. There are several reasons why you might never get a fix, including just poor quality data, but in this case when I look at the trace file I see extremely large outliers right from the first sample. If all the satellites are reporting outliers and they are as large as this, that almost always means that the base location has been specified incorrectly.

These are only a few examples but probably cover over 80% of cases I look at where people are unable to get a solution. Most of the rest are caused by low quality data.

If you have other examples of common failures that I haven’t covered here, please describe them in a comment, along with the signature seen in the trace file.