RTKLIB: Demo 4 vs 2.4.3 B16 code

Over the last six months or so I have made a number of changes to the RTKLIB code. A couple of these (ArlockCnt bug and M8N half-cycle bit) have made it back into the officially released code on Github but most of them have not. While I was in the mode of comparing solutions, I thought it would be worthwhile comparing solutions from the latest demo4 code from my Github repository to the latest released code (2.4.3 B16) to see how much they differ. The demo4 code is based on the 2.4.3 B12 released code but I do not believe the differences between releases B12 and B16 should affect this comparison.

I used the same data sets and input configuration files I used for the M8T vs M8N comparisons I described in the last few posts. The only differences being that I removed the options and input parameters that do not exist in the released version from the configuration files for the released code runs.

Here are the solutions from the previous post for both the pair of M8N receivers and the pair of M8T receivers using the latest demo4 code.

     M8N                                                                             Reach M8Trelease1

Here are the solutions for the same data sets using the 2.4.3 B16 code, both RTKCONV and RNX2RTKP.

     M8N                                                                             Reach M8Trelease2

A quick look at the vertical scales as well as the shape of the plots shows that the M8N solution is completely invalid whereas the M8T solution looks reasonably close although with a noticeably lower fix ratio. The difference between the M8N and the M8T is much greater with the release code than it was with the demo4 code. This is consistent with observations from other users that the M8T is significantly better than the M8N, something I did not really see in the demo4 solutions.

Let’s also look at the differences between solutions since they can indicate errors. Here is the difference between the demo4 M8N and M8T solutions from the previous post. Remember that this difference should be a circle with a radius equal to the distance between the rover antennas since the data is from two different receivers mounted on the same rover. I have only plotted data for the fixed points since those are the ones we have highest confidence in.

release3

There is no point in looking at the release M8N solution since it is so poor, but we can compare the M8T solutions from the two code versions. In this case, because we are using two solutions from the same data set, the difference should be zero. Here is a plot of the difference between those two solutions for the M8T data set, again only for the fixed solution points.

release4

Here is the same data, plotted by axis.

release5

The differences are quite large, up to nearly a meter in x and y, and more in the z axis. Based on the previous comparison between the M8N and the M8T with the demo4 code, I have fairly high confidence in the demo4 solution so I suspect the errors are in the release solution. We can verify this though by looking for discontinuities in the solutions at the erroneous points.

Let’s pick the point at 01:22:47 which shows a large difference between solutions in all 3 axes. Here are the two solutions zoomed into this point in time.

Reach M8T: demo4                                Reach M8T: 2.4.3 B16release6

Notice the large jump in the released code solution. Since this data is from a car driving on roads, the large discontinuities in the y and z axes are not real and confirm the error is in the release code and not the demo4 code.

So, several people now have asked if these changes will make it into the official code. As I’ve mentioned before in one of my comments, I’ve chosen not to submit all of my code changes to the official code repository. This is because my changes are focused on one small corner of the world of GPS (low cost) and tested on an even smaller corner (low-cost Ublox) and I am not sure how well they all will translate into the larger realm that RTKLIB supports. At minimum, it seems they would require a fair bit of testing on multiple hardware platforms which I don’t have the time or resources to do. I’m also not sure some of them would all be considered sufficiently mathematically rigorous for general widespread use. But I will continue to make them available in my Github repository for anyone that would like to incorporate them into their code.

I have just submitted my most recent change, the improvement in decoding cycle-slips from the M8T raw data, as an issue to the official Github repository.

Ublox M8N vs M8T: Part 3

 

This is the third part in a series of posts comparing data from a pair of Ublox M8T receivers with data from a pair of Ublox M8N receivers. In the first post I identified a problem with erroneous fixes in the RTKLIB solution for the M8T data. In the second post I found that the source of the problem appeared to be in the handling of cycle slips in the translation from the receiver binary output to the text input used by RTKLIB. I also showed that adopting an algorithm more similar to the one used to translate the M8N binary improved but did not entirely fix the problem. In this post I will show that if the cycle slip translation is more precisely adapted to the differences in binary output between the M8N and the M8T then we can virtually eliminate the erroneous fixes and get an M8T solution that is now slightly more accurate than the M8N solution.

One thing I may not have made clear in the previous posts is the way I am collecting and processing the data from the Emlid Reach M8T receivers. I am using their ReachView software to collect the raw GPS data but all the processing is done afterwards using my own demo4 version of RTKLIB. I don’t have access to Emlid’s RTKLIB source code and don’t know if they have made any changes that might have improved this issue in their own real-time version of RTKLIB. So this is more a comparison between the M8N and the M8T receivers using my version of RTKLIB and not an evaluation of anything specific to the RTKLIB version of Reach.

Last post I finished with this plot showing the difference between the M8N solution and the M8T. For reasons I’ve explained before, we know this should be a perfect circle with a radius equal to the distance between the receivers.

walker12

By looking at spikes in the z-axis accelerations we were able to show that the deviations from the circle in the above plot were due to errors in the M8T solution and not the M8N solution. This plot was taken after I had modified the binary to text translation in RTKCONV to make the M8T translation more similar to the M8N translation. Before that, the errors were much larger.

Let’s start by taking a closer look at the binary outputs of the two receivers. The details for the M8T are available in the M8 receiver spec and for the M8N in this document from Tomoji Takasu but I will summarize them here.

Both receivers provide three values for each satellite output sample that can be used to evaluate the quality of the carrier phase measurement. The first is the carrier-phase valid bit. Both receivers provide this in the tracking status byte. It indicates that the receiver is currently locked to the carrier-phase for this satellite. Note that it does not tell us if the receiver has been continuously locked since the previous output sample and it also does not indicate anything about the quality of the measurement.

The second output (locktime or lock2) indicates how long the receiver has been locked to the carrier-phase for this satellite. If this count is lower than the count for the previous sample, then an unlock, i.e. cycle-slip, has occurred. Both receivers also provide this value, and this is what RTKCONV uses to detect a cycle-slip.

The third value (mesQI or cpStdev) is a quality indicator and this is where the receivers differ. On the M8N this appears to be more of a state indicator than a true quality indicator. The comments indicate that it’s value corresponds to:

0:idle
1:search
2:acquired
4-7:lock

For the M8N, the RTKCONV code throws out any phase measurement for which this quality indicator is not between 4 and 7.

On the M8T, the quality indicator is actually a numeric estimate of the standard deviation of the carrier phase measurement. If the value is between 1 and 7, then the estimate of standard deviation is this value multiplied by 0.004 meters. If this value is 15, then there is no valid estimate. This value is currently ignored by the RTKCONV code and that is the root of the problem.

Before discussing how to modify the code to incorporate this information, we need to understand exactly how the cycle-slip bit is interpreted by RTKLIB. On a sample for which this bit is set, RTKLIB will reset the phase-bias state for that satellite based on that carrier-phase measurement. Note that the reset is done using the measurement from the current sample, not the following sample. What this means is that the cycle-slip bit should not be used to indicate that there is a cycle-slip on the current sample. Rather it should indicate that there has been a cycle-slip since the last good measurement and that the quality of the current sample is now high enough to use it to reset the phase-bias state.

So how high should the quality indicator be before resetting the phase-bias? It will be a trade-off between time and accuracy. I chose the value of four since it’s in the middle of the scale and somewhat equivalent to what is used on the M8N, but it could be argued that this value should be higher or lower. Forcing the cycle-slip to be set on a good sample also eliminated the problem we previously saw with samples subsequent to the cycle-slip not being flagged, so I was able to remove the code I added in the previous post.

Here are the changes I made to the original code for the M8T receiver. The new variable cpstd is the standard deviation of the carrier phase measurement as described above. The constant STD_SLIP is the quality threshold described above and is set to 4.

walker13

Here is the position solution calculated with the above code change.

walker14

Here is the difference in position between the two solutions. As you can see, the deviations from the circle have been significantly reduced.

walker15

The noise in the z-axis accelerations has also been significantly reduced and is now slightly lower than in the M8N data.

       M8N                                                                           Reach M8Twalker16

I have posted the raw data and config files to my data set library in the niwot2_car folder and the code changes to my Github page.

Ublox M8N vs M8T: Part 2

In my last post, I compared data from a pair of Ublox M8N receivers with data from a pair of Emlid Reach M8T based receivers, and found issues in both data sets. In this post I will look into those issues more closely.

To do this, I collected another data set with a few differences from the previous one.

  • On the M8N rover receiver, I replaced the original antenna that had only a 1” cable with a Ublox ANN-MS-0-005 antenna with a much longer cable. This allowed me to move the receiver from the car roof to inside the car, a friendlier thermal environment. The goal was to see if this would avoid the all-satellite cycle-slips I saw last time at the beginning of the data set. This antenna cost me $20 from CSG but is available from other places for less.
  • On both Reach M8T receivers, I modified the dynamic platform model setting from “Airborne < 4g” to lower acceleration settings; “Pedestrian” for the rover, and “Stationary” for the base. I also changed the setting on the M8N base receiver from “Pedestrian” to “Stationary” to make them consistent. The goal here was to avoid the erroneous fixed solution points I saw in the solution from the previous M8T data. To insure the modifications occurred correctly, I used the u-center software to read back the settings from the receivers after the data was taken.
  • I chose a measurement route with much more tree cover than the previous one to increase the difficulty of the solution and to help differentiate the two receiver sets.

Here is a Google map of the measurement route. In this area there are few native trees, so by driving through residential areas, rather than next to them as I did in the previous data set, I encountered many more trees. There were numerous locations with tall trees on both sides of the road as well as places where I drove directly under large tree branches.

walker1

Here’s an example of some of the more challenging tree cover encountered in the route. The route locations are only approximate since the base locations are not calibrated, that is why the lines are not actually on the road.

walker2

Here is the observation data from both rovers. The red ticks are cycle-slips, the gray ticks are half-cycle invalids.

                             M8N                                                                          Reach  M8Twalker3

The first thing to notice is that there are no simultaneous cycle-slips in the M8N data. Of course, this doesn’t prove we will never get them but it suggests that maybe moving the receiver inside the car did help. Unfortunately, there is one simultaneous cycle-slip in the M8T data very close to the start at 00:54. Apparently, both the M8N and the M8T are susceptible to these slips, which is not surprising, since the chips are part of the same family. We do see only a single slip, which is an improvement over the previous data. Since they always occur near the beginning of the data collection, I still believe they are most likely caused by thermal transients. I suspect that the best way to avoid them would be to turn on the receivers 15 minutes before starting to collect data in an environment as similar as possible to the data collection environment.

Next, let’s look at the SNR vs. elevation plots. Last time we saw noticeably better numbers with the Reach setup because of the more expensive antenna. This time, with the Ublox antenna on the M8N receiver, the two are much more similar. SNR isn’t everything, and there still are reasons why the more expensive Reach antennas are likely better than the Ublox antennas, but we’ve at least closed the gap some between them.

                            M8N                                                                          Reach  M8Twalker4

Here are the position solutions from both receiver sets.

                            M8N                                                                          Reach  M8Twalker5

The M8N solution is very good, with nearly 100% fix after the initial acquire until the very end when I parked the car under some trees in the driveway. This is despite a very challenging data set with many cycle-slips.

Unfortunately, the M8T solution is not as good. The initial acquire is delayed because of the simultaneous cycle-slip I mentioned earlier. It does eventually acquire, but then loses fix for quite a long period near the end of the data (the yellow part of the line in the plot above)

Even more concerning, when we look at the acceleration plots, we see the same spikes in the M8T data as we did in the previous data set.

                       M8N                                                                          Reach  M8Twalker6

Again, these spikes align with erroneous fixed solution points in the M8T data as can be seen in deviations from the circle in the difference between the two solutions.

walker7

Clearly, changing the dynamic platform model setting in the receiver did not fix the problem. A couple of experienced users have commented that this setting does affect the front end of the receiver on earlier Ublox models and it’s still possible it affects the M8T as well, but it does not appear to be the cause of this problem. We will need to look elsewhere for the solution.

We are running identical RTKLIB solution code on the two data sets and we have verified that the receiver setup is nearly identical for both data sets. So what else can be different between them? One possibility is the conversion utility, RTKCONV, that translates the raw binary output from the receiver into the RINEX observation files that are used as input to the solution. Since the raw measurements are output by different commands on the two receivers, there are two different functions in the RTKCONV code that process them.

Let’s look first at the RTKCONV code to convert the data from the UBX_TRK_MEAS command used by the M8N receiver. I won’t show the code here but just describe functionally what happens in the code. Each data sample from the receiver contains a status bit indicating carrier-phase lock and a count of consecutive phase locks. RTKCONV sets the cycle-slip flag for that sample if the carrier-phase is valid and an internal code flag is set. If the carrier-phase is not valid, then the cycle-slip flag is left in its previous state. The internal code flag is set if the phase lock count is zero or less than the phase lock count for the previous sample and is only reset for the next sample if the carrier-phase is valid.

For the UBX_RXM_RAWX command used by the M8T receiver, there is also a status bit indicating carrier-phase lock. Instead of a count, there is a time for consecutive phase locks but functionally it is equivalent. The RTKCONV code for this command does not use an internal code flag and the cycle-slip flag is set or cleared directly every sample that the carrier-phase is valid. The cycle-slip flag is set if the phase lock time is zero or less than the previous sample and cleared otherwise.

I know that’s a bit confusing and the difference is fairly subtle. The most important thing to understand is that RTKCONV is doing more than just translating from binary to text, it is deciding which samples to set cycle-slips for based on a somewhat complicated algorithm, and that algorithm is different for M8N and M8T.

Basically, the effect of this difference is that for the M8N, if a cycle-slip is followed by an invalid phase then the next valid phase will always be flagged as a cycle-slip while for an M8T it won’t necessarily be so. It’s easier to understand by looking at a picture. The observation plots below show the location of one of the acceleration spikes in the M8T solution which is caused by a cycle-slip on satellite G05. The plot on the left shows which samples were flagged as cycle-slips with the existing code, the plot on the right shows the cycle-slips after modifying the code to be functionally equivalent to the M8N code. Note the extra two cycle-slips with the change.  The extra cycle-slips in this case are a good thing because they are flagging samples with large phase errors and preventing RTKLIB from incorporating them into the solution.

              Reach M8T before change                                   Reach M8T after changewalker8

Modifying the RTKCONV code for the UBX_RXM_RAWX command in this way and re-running the solution gives us the the position plot on the right and the acceleration on the left.

walker9

Much better than before! Not quite as good as the M8N but still a significant improvement. The difference between the M8N solution and the improved M8T solution is shown below.

walker10

Remember this should be a circle if both solutions are free of errors.  Actually in this case not quite a circle because there is also a separation between the two base stations ,but that effect is small and we can ignore it for now.  This is much better than the equivalent plot from the original M8T solution shown earlier. There are obviously some erroneous fixes even after the improvement, so this is still a work in progress, but I think it is a big step in the right direction.

This data set was quite a bit more challenging than the previous one and in reality both solutions are quite good given the number of cycle-slips we saw, but there is always room for improvement.

The other thing to note with this data set is that the better quality antenna made a big difference on the quality of the M8N solution.  I suspect I will not be going back to the old antenna!  I knew the nicer antenna would help but I have resisted using it till now because of the extra cost.  There are similar looking antennas with similar gain specs available for as little as $4 so I may try some of these to see how they compare.

I’m not going to post the code to my Github repository or my binaries until I’ve had a little more time to understand the remaining errors but if anyone wants to take a closer look now, here are the code modifications to ublox.c that I made in the decode_rxmrawx function.

walker11

Ublox M8N vs M8T

Thanks to the generosity of Emlid, I am now the proud owner of two of their Reach precision GPS units. At a list price of $570 for a pair, these fall more into the category of low-cost rather than the ultra-low cost receivers I have been working with, but they do allow me to do some comparisons between the two. Their units use the more capable (and more expensive) Ublox M8T receivers rather than my Ublox M8N receivers and also have higher quality (and more expensive) Tallysman 4721 antennas, rather than the cheap antennas that were included with my receivers.

To compare the two I collected a fairly challenging data set, with maximum distances from the base station of over two kilometers and maximum velocities of over 70 km/hr, with relatively unobstructed rovers, but partially obstructed base stations. I mounted one of my receivers and a Reach unit on top of my car and also one of each at the base location. Here is a Google earth plot of the ground track. The Google Earth plots are a really nice feature of RTKPLOT I have not used until now but have quickly become a fan of. I could not make this feature work with the 2.4.3 version of RTKPLOT so had to go back to the 2.4.2 version. I also found I had to specify the solution format (out-solformat) as “xyz” instead of the “enu” that I usually use to get the solution in a format Google can use. The track was over a mix of dirt and paved roads running through agricultural fields and next to residential areas.

union1

Here is a plot of the base station location, marked in red below. It is somewhat obstructed by the sheds and boats around it. I selected this location in part because it was very windy that day and this spot was fairly sheltered from the wind, but also thought it would make the solution a bit more challenging and therefore help differentiate the two receiver sets.

union2

Most of the rover’s route was relatively unobstructed, but there were a few scattered trees near the road in some locations. The plot below shows an example of one of these spots. Note also, that the ground track goes off the road a bit at the end of the loop. I suspect this is due to inaccuracies in the base station location (and maybe the Google maps) rather than the RTKLIB solution since I did not make any attempt to calibrate the base station against any known reference.

union3

Here’s the base station observation data, the M8N is on the left, the Reach M8T data on the right. From a cycle slip perspective they are fairly similar but we see a few of the satellites (G15, R16, and R18) are noticeably better with the Reach. This is most likely because of the higher quality antenna.

                                  M8N                                                              Reach M8Tunion4

Looking at the SNR vs elevation for both receivers, we see the Reach has noticeably higher signal strength especially in the lower elevation (10-20 degrees) region where it is most important. Again, this is to be expected with the higher quality antenna.  For some reason, the M8T SNR numbers are rounded off to the nearest integer. I don’t know why that is, but that is what makes the plots look like they are formatted differently.

                                 M8N                                                                       Reach M8Tunion5

Looking next at the rover data, here are the observations for the two receivers.

                           M8N                                                                              Reach M8Tunion6a

The rovers were stationary until 00:34 and then moving till 00:57, then stationary again. While they were moving and at the end when they were stationary, the two look fairly similar from a cycle-slip perspective with maybe a slight advantage for the Reach. However, during the initial stationary period, there were several slips that occurred simultaneously on every satellite in the M8N data. I’ve circled one of them in blue above.  I believe these must be caused by some sort of discontinuity in the receiver and have nothing to do with the satellite signals themselves. My best guess is that they were caused by temperature fluctuations in the chip. It was a very hot day, with an intense sun heating the dark car roof, combined with a strong wind that would create a cooling effect. Because the antenna that was connected to the M8N receiver has only a one inch lead, it was mounted on top of the car in the sun, while the M8T receiver, with a longer antenna lead, was mounted inside the car and not subject to the same temperature fluctuations. Also, notice that the simultaneous slips all occurred near the beginning of the data set, possibly while the receiver was still reaching some sort of thermal equilibrium. To try and avoid this in the future, I will either switch to another inexpensive antenna I have with a longer lead, or let the receiver sit longer before starting to collect data.

Unfortunately these slips occurred during the initial stationary period I use for the first acquire and prevented that acquire from occurring. Rather than give up on the data, though, I decided to try running a solution with the M8N rover and the M8T base. I don’t know if it was because of the better signal strength, or because I just got lucky, but I was able to get an initial acquire with that combination. So for this exercise, the rest of the data is all based on a comparison between the M8N rover and the Reach M8T rover, both referenced to the M8T base station. It turns out that having a single base station also makes the accuracy analysis a little easier as I will describe later. This is not exactly the comparison I wanted to make, but one I think is still worth doing.

So how did they do? Here’s the solutions using my demo4 code. The input configurations were identical with one exception. Since with the M8T receivers, RTKLIB can resolve the GLONASS integer ambiguities without assistance, while with the M8N receivers, it can not, I set the input parameter “gloarmode” to “on” for the M8Ts and to “fix-and-hold” for the M8Ns. This enables my extension to the fix-and-hold feature to adjust for the additional errors on the GLONASS and SBAS satellites.

                             M8N                                                                        Reach M8Tunion7

The Reach solution (on the right) looks very clean with a very fast acquire and then 100% fixed values after that. The M8N solution (on the left) also acquired quickly but then ran into the simultaneous cycle-slips, causing problems until it re-acquired at 00:40. After that it stayed fixed for nearly 100% of the time with a short dropout around 00:49.

The next questions, of course, are: Do the solutions match? And are the fixes all accurate? To check this, I will use a similar technique I did earlier when I had only two receivers, both mounted on the rover. For that case, I solved for the distance between the two receivers which forms a circle equal to the distance between the receivers. In this case, I will solve for the distance between each rover and the base, then difference the two. This should also give us a circle with radius equal to the distance between the two rover receivers. Having only one base simplifies things by avoiding addition of a second term caused by the separation between the two base receivers. Here’s the result of that operation, plotted only for the fixed solution points since those are the ones we are counting on to be accurate. On the left is the position difference (x,y,z) and on the right is the ground track difference.

                           M8N                                                                            Reach M8Tunion8

The separation between the two receivers on the car roof was 15 cm and we see here that most of the points fall on the circle of that radius. However, there are several deviations, up to half a meter in error. They are also easy to see as spikes in the z-axis on the left plot. These are unexpected and quite concerning since we really rely on the fix status to let us know which points are valid and accurate.

Zooming in on the observation data for the largest of these spikes at around 00:51 shows that both rover receivers saw cycle slips on satellites G02 and G13 at this time, presumably from passing a tree or other obstruction.

                                   M8N                                                                Reach M8Tunion9

Zooming in on the position data for this point, shows discontinuities in the Reach data but not the M8N data which is surprising, I had expected the opposite. As the driver of the car, I am certain that I did not drive over any meter high cliffs during the test, so the Reach M8T data must be wrong.

                                 M8N                                                                          Reach M8Tunion10

Looking at other error samples shows the same thing. It is more easily seen as spikes in the accelerations as shown here, especially the z-axis accelerations which only occur on the M8T data and not the M8N data

                               M8N                                                                       Reach M8Tunion11

So what could be causing these errors? The two solutions used identical code and input parameters except for the fix-and-hold for GLONASS integer ambiguities setting. I reran the M8T solution with this parameter enabled to make them completely identical but this did not fix the problem. With the code being identical we have to suspect the difference is in the receivers themselves or at least in their setup. The next step I took was to use the Ublox u-center evaluation software to examine all the registers for both receivers. It’s a little trickier to do this with the Reach receiver since we don’t have direct access to the M8T chip, but there are instructions on setting up a tcp port in the Reach documentation, which is what I did.

For the most part, the differences between the two receiver setups were slight but I did find one significant difference. The receiver dynamic platform model settings are quite different. I have my M8N receivers set up for a “Pedestrian” model, while the Reach M8T receivers are set up for the “Airborne <4g” setting. According to the Ublox M8 Receiver Description document, that setting is only recommended for extremely dynamic environments. Here’s the details from the manual.

union12

I have heard differing opinions on whether this setting affects the front-end of the receiver or whether it only affects the back-end position calculations. If it only affected the back-end, then this setting would not matter since we use only the raw GPS measurements from the receiver. However, if it does affect the front-end, we might expect it to have an effect like this since it would most likely be opening up the bandwidths of the phase lock loops tracking the carrier-phases and thus minimizing any filtering effects.

So that’s where things stand at the moment with this data set. I see issues with both receiver sets and have speculated on what may be causing the problems but have not had time to verify that either one of my guesses is correct. I had hoped to do that before posting this article but it’s been almost two weeks since my last post, so I’ve decided to just go ahead and post what I have.

This also gives me the chance to ask if anybody else has seen similar issues and/or might have other possible explanations for what is going on?

I have also uploaded this data to my data set library.

RTKLIB: Static-start feature

The static-start feature is something I added to the demo4 code a while ago but never got around to explaining, so I will do it now.

As I’ve mentioned before, I always like to initially leave the rover stationary after first turning on the receiver long enough to get a first fix-and-hold before allowing it to move. This significantly reduces the chances of getting an initial false fix. However, we could benefit even more from doing this if we could tell RTKLIB that the rover was stationary during this time. This is because we would remove the uncertainty of rover motion from the solution and the kalman filter would converge faster. By doing this we should be able to reduce the time to first fix.

RTKLIB has a solution mode for a moving rover (kinematic) and one for a stationary rover (static) but currently no way to switch between the two. What the new static-start mode I have added does is to start the solution in static mode and then switch to kinematic mode after the solution first qualifies for a fix-and-hold. It does not require fix-and-hold to be enabled, it will still make the switch to kinematic mode when that qualification occurs, even when fix-and-hold is not enabled.

Below is the only functional code modification and it is in the relpos() function in rtkpos.c. The new line of code is in red. There are also a handful of bookkeeping changes to handle the new option in the input configuration file that I won’t include here but can be seen in the demo4 code.

/* hold integer ambiguity if meet minfix count */
if (++rtk->nfix>=rtk->opt.minfix) {
    if (rtk->opt.modear==ARMODE_FIXHOLD)
        holdamb(rtk,xa);
    /* switch to kinematic after qualify for hold if in static-start mode */
    if (rtk->opt.mode==PMODE_STATIC_START)
        rtk->opt.mode=PMODE_KINEMA;
}

This feature is enabled by changing “pos1-posmode” in the input configuration file from “kinematic” to “static-start”.

So, let’s do a little testing to see how it does. I used my latest demo4 data set and ran three solutions. The initial acquire for all three are plotted below.

static_start1

The first two are both in kinematic mode. For the first one I have used the default value of 100 for “stats-eratio1”, the ratio of variances between pseudorange and carrier-phase measurements. In the second plot, I have used my preferred value of 300. This is unrelated to the static-start modification but is relevant to the acquire time so I have brought it up again. As you can see changing this value reduced the time to first fix from roughly eight minutes to a little less than five.

In the third plot, I changed “pos1-posmode” from “kinematic” to “static-start” to enable the static start feature and again used 300 for “stats-eratio1”. This reduced the time to first fix from a little less than five minutes to about two and a half minutes, nearly a factor of two. The actual times to first fix will vary fairly significantly depending on signal quality, number of satellites and quality of the sky view but the trends shown here should hold even if the times change.

To understand this change a little better it would be good to understand what is the difference between “static” mode and “kinematic” mode in RTKLIB, so let’s discuss that next.

In my earlier post describing the RTKLIB receiver dynamics feature, I explained how the kalman filter makes two estimates each sample, where the first estimate is a prediction of how far the rover has moved since the last sample. If receiver dynamics is enabled, this estimate is made using the estimated velocity and acceleration of the rover, if not, RTKLIB simply using the pseudorange measurement for the current sample along with its large uncertainty.

In either case, the uncertainty (or variance) of the position states are increased after this update because of the unknowns of what happened during that time interval. The second kalman filter estimate (the measurement update) will then reduce the variances again using the additional information from the carrier-phase measurements. So for each sample, the uncertainties of the position states will be increased by the prediction estimate, and then decreased by a larger amount with the measurement update, zigzagging towards zero The smaller the variances, assuming they are accurate, the more likely it is that the integer ambiguity resolution will resolve a fix.

If however, we know that the rover did not move during that time, there is no need for the kalman filter to increase the variances with passing time, since the rover will be exactly where it was one sample earlier., and the variances can converge towards zero more quickly. That is exactly what RTKLIB does. If static mode is enabled, then RTKLIB skips the prediction update step entirely and just assumes the rover is where it was last sample and that the uncertainty in that position has not changed. Here is a plot of the variances from the last two solutions above comparing kinematic to static-start.  I show only the z-axis because it tends to be the limiting factor to finding a fix,at least when the rover is stationary.  This is because the geometric diversity of the satellites is always most limited in the up-down direction, since all the satellites below the antenna are obscured by the earth.

static_start2

The sudden drop in variance in both traces occurs when the integer ambiguities are first resolved and the solution switches from float to fix. The step up in the static-start trace occurs when fix-and-hold first occurs and the solution switches from static to kinematic. At this point, the uncertainties from a moving rover are introduced and the variance jumps accordingly. The reason the line is smooth for the kinematic case and not zigzagging as I described above is because the data I am plotting is from the output position file and includes only one variance value for each sample. This is the value at the end of the sample, after the measurement update.

So what happens if the rover is not left stationary long enough and there is no fix before the rover starts moving? If static start is not enabled, a fix will very likely be found while the rover is moving. The chance of this fix being wrong can be fairly high however, which leads to an incorrect solution that RTKLIB reports with high confidence of being correct (i.e. fixed), although it is not.

If static-start is enabled and no fix is found before the rover starts moving, then the chance of finding any fix, correct or incorrect, while the rover is moving is almost zero.  This is because the model the kalman filter is using so poorly matches the actual rover. If it did find a fix, it would not be maintained for more than a very short time, since again, the kalman filter is incorrectly assuming that the rover is not moving.

I would argue this is a win-win situation. Not only have we reduced the time to first fix, but we have also reduced the chance of RTKLIB reporting data as being valid when it is not. Any time RTKLIB reports a bad solution as good, it can significantly undermine confidence in the entire solution.

Ublox M8N half cycle valid bit

In my last post I introduced a new more challenging data set with a higher number of cycle slips than in previous sets. Even after improving the RTKLIB code to better handle missing data samples, there are still a large number of points in the solution with only a float status (yellow in the plot). In addition, the last 15 minutes appears to have at least a meter of error in the vertical axis since the data was taken in a series of loops and the vertical measurements should be the same as earlier loops.

 

half_cycle1

By plotting the observations of the rover, we can see that there are many cycle-slips on a large number of the satellites. This is the primary cause of the poor solution. Here’s the rover observations with a 15 degree elevation mask.

half_cycle2

In order to improve this solution, we are going to have to take a closer look at the cycle slips. The first thing I noticed when examining a few of these cycle slips is that there are a lot of half cycle errors in the carrier-phase measurements following a reported cycle slip. This is true whether or not an actual cycle slip occurred. I had seen this earlier as well when doing an exercise of looking at the double differences. Here’s a plot from that previous post showing what I am talking about. The second red circle contains a number of samples all in error by almost exactly one half cycle.

half_cycle3

Let’s go back to the raw ublox data before it is converted into RINEX format to see if that can shed any light on what is going on. To see the raw unconverted ublox data I enabled the trace option when running convbin by adding “-trace” to the command line. I also had to change an “if #0” in the code in the decode_trkmeas() function in ublox.c to an “if #1” to output the full debug information. Below on the top is the rover observations zoomed into a time period around 6:45:27.0 and below is the raw ublox data for that sample.

half_cycle4a

half_cycle4b

The flag byte is what we are interested in. Since the M8N does not officially support raw output, there is no documentation for this data byte. From the existing ublox.c code though we can see how RTKLIB is using it. It is interpreting bit 5 (0x20) as phase lock and bit 6 (0x40) as the half cycle bit. When the half cycle bit is set, one half cycle is added to the carrier-phase measurement. The other 6 bits are being ignored.

Looking at the documentation for the M8T which does officially support raw data provides a hint as to what might be going on. In the description of the RAWX command one of the bits in the trkStat byte is used to indicate whether the half cycle bit is valid or not. Since the M8N and the M8T share the same core, it would be reasonable to assume there was an equivalent bit in the M8N.

We know from the data that the half cycle errors occur shortly after a reported cycle slip so let’s look at those satellites. From the observation plot we can see that in this case satellites G30, R03, R22, and R23 all have reported cycle slips in the last few seconds before the sample we have raw data for (6:45:27.0). If we look at the flag byte for those satellites, we can see that bit 7 (0x80) is clear for all four satellites. It is set for all the other satellites except G21 which has no valid data and I120 which is the SBAS satellite. Each line in the raw data is one satellite, “sys” is the system (0=GPS,1=SBAS,6=GLO) and “prn” is the satellite number. We can also see that when bit 7 is clear, the half cycle bit (0x40) is never set which would also be consistent with bit 7 being the “half cycle valid” flag.

I added the following line of code to the decode_trkmeas() function in ublox.c to update the observation with the half cycle valid status in the same way it is done for the M8T.  RTKLIB refers to this bit as the “Parity Unknown” bit but in a comment in the RTKPLOT part of the manual explains that this mean the half‐cycle ambiguities in carrier‐phase are not resolved.

     raw->obs.data[n].LLI[0]|=flag&0x80?0:2; /* half cycle valid */

I then reran the conversion using the convbin raw data conversion CUI. Plotting the updated observation data using RTKPLOT with the “Parity Unknown” option set to “ON” gave the following result:

half_cycle5

The gray ticks indicate unknown parity (i.e. half cycle invalid). Unfortunately the “unknown parity” ticks seem to take precedence over the “cycle slip” ticks, so any sample that has both shows up as “unknown parity”. This is why there seems to be less cycle slips in this plot than the original plot above.

So, let’s rerun the solution on the improved observation data using the same configuration settings as previously. The result is plotted below.

half_cycle6

This looks a lot better. Not only do we see a lot more fixes, but the obvious vertical errors between 7:15 and 7:30 have gone away. The two yellow sections in the ground track plot on the left align with the two locations where the car went directly underneath overhanging branches so it is not surprising that those spots are going to be more difficult to maintain fixes for.

There may be more opportunity for improvement here if we can figure out how to take better advantage of the half cycle status. The current RTKLIB code simply resets the phase bias estimate for that satellite every time this bit changes state.

I have uploaded this change to the demo4_b12 branch in my GitHub repository. I have also posted the modified rtkconv executable here.

The reason that I posted rtkconv (the GUI version) instead of convbin (the CUI version) is that I am now able to build the GUIs with my new Embarcadero compiler and because I have found that convbin does not seem to work for the newer (3.xx) formats of RINEX.  I prefer the newer formats because they are easier to parse.

RTKLIB: Missing data samples

Update 7/3/16:  A couple of things to be aware of before reading this post:

  • I inadvertently had the base and rover observation files switched as input parameters to rnx2rtkp when I ran the solution.  When rerun with the moving receiver as the rover, the problem described here does not occur.
  • I have removed this change from the latest code since it also turns out to have some undesirable side effects which are explained in another update at the bottom of the post.

 

The demo3 code is looking pretty clean on the last data set so it’s time to collect some more challenging data. I went back to the previous location but this time instead of running back and forth on the top road in the picture below, I drove around the full loop. The trees are closer on the lower road, and driving on both sides alternates between obstructing different sets of satellites. In a couple of spots, the road passes underneath tree branches and causes a momentary loss of nearly all the satellites.

argeles4

Here’s the observation data for the base on the left and the rover on the right, where the red ticks are cycle slips reported by the receiver. The base receiver was located in the middle of the field with an unobstructed view of the sky.

miss_samp1

There are no cycle slips at the beginning and end of the rover observations because, as in all my data sets, I left the rover stationary with unobstructed skies during these times. I use the initial stationary period to allow the RTKLIB solution to get a reliable fix before the rover starts moving and the stationary period at the end is to help with verification of the solution. I20 is an EGNOS SBAS satellite but because the EGNOS satellites don’t provide ranging information, it is not used in the solutions.

Below is a plot of the solution using the current demo3 code. Position is on the left. The yellow sections indicate float positions where RTKLIB was not able to get a fix. On the right is a plot of velocity, zoomed in a little to show some detail. There is an obvious problem here with all the large spikes in the velocity.

miss_samp2

So what is causing these velocity spikes? Zooming in further and looking at the observations and the positions shows a couple of things.

miss_samp3

First of all, the rover observation data plotted on the left shows that the velocity spikes, which in this case occur at 7:04:09.6 and 7:04:11.6, are occurring for epochs where there are no valid satellite observations. Second, the corresponding positions shows that these are obviously incorrect as well. Specifically, the reported position for epochs with no valid rover data appear to be based on the last valid data and don’t reflect any movement of the rover since then. What is most concerning is that although these samples have large errors (about half a meter in this case) RTKLIB reports them as valid fixes.

For now I will not worry about why I am getting the missing data samples, but instead use them as an opportunity to understand why RTKLIB is responding so poorly to the missing data.

Digging into the code a little bit sheds some light on to what is going on. RTKLIB does check the ages of the two observations and internally records the difference between them. However, unless the age of the data exceeds the input parameter “pos2-maxage”, it does very little with this information. The solution is calculated from the last valid data and not flagged in any way to indicate it’s poor quality.

So, how do we fix this? First of all, let’s go into a little more background on how the kalman filter works since if used properly it is very good at handling situations like this. Also, let’s assume that the receiver dynamics option is enabled, since we will need the additional states it provides. If you are running RTKLIB without receiver dynamics enabled, you might want to read my last post where I describe a change to make this feature run much faster.

In the last post I briefly explained how enabling the receiver dynamics feature in RTKLIB extends the kalman filter to include receiver velocity and acceleration states. I also described how this improves the initial position estimate RTKLIB makes at the beginning of each sample. However, as I mentioned before, the final position estimate is based on a combination of this initial prediction estimate and a second estimate calculated from the carrier phase measurements for that sample. Most of the time the final position estimate is derived primarily from the measurement-based estimate and the accuracy of the initial prediction estimate does not have much influence on the final result.

This is because the relative weighting between the two estimates depends on our best guess of the variances of each. It is important to understand that every input to the kalman filter has two components. The first is the actual value of the measurement or prediction. The second component is an estimate of the variance of that value. Large variances mean the uncertainty of the measurement is high, small variances indicate low uncertainty, or high accuracy. If you are more familiar with standard deviations, just remember that the variance is the square of the standard deviation. If the variance estimates are accurate, and the measurement distributions are well behaved, then the kalman filter will select the optimal combination of the measurement and the prediction. For example, if we know the measurements are noisy, the measurement variances will reflect this and the kalman filter will rely more on the predicted position, if we know the rover can accelerate rapidly in unknown directions between samples, the prediction variances will reflect this, and the kalman filter will rely more on the measurements.

So how does RTKLIB determine the variances for the measurements and the predictions? The variances for the measurements are a function of the two input parameters “stats-errphase” and “stats-errphaseel” as well as the satellite elevations. The variances for the predictions are calculated internally to the kalman filter and are also a function of the input parameters “stats-prnaccelh”, stats-prnaccelv” and the time between samples. I’ll leave the details to another time, but for now, what is important to understand is that normally the variances of the measurements are much smaller than the variances of the predictions. Hence the final estimate is weighted heavily towards the measurements and the predicted position matters much less.

Back to our problem. What is happening in the case of a missing sample, is that RTKLIB is feeding measurements into the kalman filter that it knows have large uncertainty because of their age, but does not modify the variances to make the kalman filter aware of this possible error. 

So how do we fix the problem? The best solution would probably be simply to not feed this measurement into the kalman filter since it is redundant to the previous one and is not providing any additional information. It turns out the architecture of the code does not lend itself to this fix easily, or at least I did not find an easy way to do this.

What I found to be a simpler fix was to adjust the variance of this measurement to reflect its large uncertainty but otherwise let the code continue through its normal path. Multiplying the estimated velocity with the time since the last good measurement gives a rough estimate of the error. Adding this to the variance is not exactly correct but is good enough for what we are trying to do, and is a much better estimate than what we do now, which is nothing. Since the variance of the measurement will now be much larger than the variance of the prediction, the final estimate will be virtually equal to the prediction, which is what we want.

It is much better to overestimate the variance of a measurement than to underestimate it. Overestimating a variance just means less information is extracted from that measurement than could be. Underestimating a variance, on the other hand, can be very detrimental to the solution since we are telling the filter we have high confidence in a measurement that is very likely wrong.

I added these lines of code to the ddres() function that calculates the double-difference residuals to accomplish what I described above.

/* adjust measurement error variances for missing samples if have velocity estimate */
if (dt>0&&rtk->opt.dynamics) {
     Rv=dt*sqrt(SQR(rtk->x[3])+SQR(rtk->x[4])+SQR(rtk->x[5]));
     Ri[nv]+=Rv;
     Rj[nv]+=Rv;
     }

Ri and Rj are the variances of the carrier phase measurements from the base and the rover respectively.

Here is the velocity and position for the solution run with the above code change. Both the spikes in the velocity and the discontinuities in the position are now gone.

miss_samp4

Often times, the variances of the measurements tend to get ignored or at least under-emphasized. This is usually because they can be hard to estimate and it is not as obvious how they affect the solution, but I think this example shows how they can be important and should not be overlooked. Improving the accuracy of the variance estimates is also a way to improve the quality of the solution and is something I hope to come back to soon.

I have started a new demo4_b12 branch in my GitHub repository which I have included this change. I have also merged in updates from the main RTKLIB repository bringing my code up to date with the 2.4.3 b12 level.  

I have also uploaded this new raw data set into the argeles2_car folder in my library of data sets.  One of the changes in this newer version of RTKLIB is that the executables have been removed and relocated to a separate repository.  I have copied the executables that I am using to here.

Update 7/3/16:  I’ve removed this change from the demo4_b12 code for the time being.  There’s a couple things I’ve realized since writing this post.  First of all, I inadvertently had the base and rover observation files switched as input parameters to rnx2rtkp when I ran the solution.  The velocity spikes disappeared when I corrected this since RTKLIB does not do updates for missing rover observations.  In theory, the problem still exists if there are missing base observations and the base is moving, but this would only happen in a moving base situation.  

I also found that this change causes problems when the base station sample rate is slower than the rover sample rate.  This is quite common in practice since the base station is usually not moving and so there is no need to sample it’s position as frequently. The lower sample rate is treated by this change as many missing samples which is what it effectively is.   Unfortunately, my change tends to overestimate the measurement variances for the missing samples as I mentioned in the description above.  This was not a problem when it was a small number of missing samples, but when it is the majority of the samples, then it is a problem and the solution is degraded.