Ublox M8N vs M8T: Part 2

In my last post, I compared data from a pair of Ublox M8N receivers with data from a pair of Emlid Reach M8T based receivers, and found issues in both data sets. In this post I will look into those issues more closely.

To do this, I collected another data set with a few differences from the previous one.

  • On the M8N rover receiver, I replaced the original antenna that had only a 1” cable with a Ublox ANN-MS-0-005 antenna with a much longer cable. This allowed me to move the receiver from the car roof to inside the car, a friendlier thermal environment. The goal was to see if this would avoid the all-satellite cycle-slips I saw last time at the beginning of the data set. This antenna cost me $20 from CSG but is available from other places for less.
  • On both Reach M8T receivers, I modified the dynamic platform model setting from “Airborne < 4g” to lower acceleration settings; “Pedestrian” for the rover, and “Stationary” for the base. I also changed the setting on the M8N base receiver from “Pedestrian” to “Stationary” to make them consistent. The goal here was to avoid the erroneous fixed solution points I saw in the solution from the previous M8T data. To insure the modifications occurred correctly, I used the u-center software to read back the settings from the receivers after the data was taken.
  • I chose a measurement route with much more tree cover than the previous one to increase the difficulty of the solution and to help differentiate the two receiver sets.

Here is a Google map of the measurement route. In this area there are few native trees, so by driving through residential areas, rather than next to them as I did in the previous data set, I encountered many more trees. There were numerous locations with tall trees on both sides of the road as well as places where I drove directly under large tree branches.

walker1

Here’s an example of some of the more challenging tree cover encountered in the route. The route locations are only approximate since the base locations are not calibrated, that is why the lines are not actually on the road.

walker2

Here is the observation data from both rovers. The red ticks are cycle-slips, the gray ticks are half-cycle invalids.

                             M8N                                                                          Reach  M8Twalker3

The first thing to notice is that there are no simultaneous cycle-slips in the M8N data. Of course, this doesn’t prove we will never get them but it suggests that maybe moving the receiver inside the car did help. Unfortunately, there is one simultaneous cycle-slip in the M8T data very close to the start at 00:54. Apparently, both the M8N and the M8T are susceptible to these slips, which is not surprising, since the chips are part of the same family. We do see only a single slip, which is an improvement over the previous data. Since they always occur near the beginning of the data collection, I still believe they are most likely caused by thermal transients. I suspect that the best way to avoid them would be to turn on the receivers 15 minutes before starting to collect data in an environment as similar as possible to the data collection environment.

Next, let’s look at the SNR vs. elevation plots. Last time we saw noticeably better numbers with the Reach setup because of the more expensive antenna. This time, with the Ublox antenna on the M8N receiver, the two are much more similar. SNR isn’t everything, and there still are reasons why the more expensive Reach antennas are likely better than the Ublox antennas, but we’ve at least closed the gap some between them.

                            M8N                                                                          Reach  M8Twalker4

Here are the position solutions from both receiver sets.

                            M8N                                                                          Reach  M8Twalker5

The M8N solution is very good, with nearly 100% fix after the initial acquire until the very end when I parked the car under some trees in the driveway. This is despite a very challenging data set with many cycle-slips.

Unfortunately, the M8T solution is not as good. The initial acquire is delayed because of the simultaneous cycle-slip I mentioned earlier. It does eventually acquire, but then loses fix for quite a long period near the end of the data (the yellow part of the line in the plot above)

Even more concerning, when we look at the acceleration plots, we see the same spikes in the M8T data as we did in the previous data set.

                       M8N                                                                          Reach  M8Twalker6

Again, these spikes align with erroneous fixed solution points in the M8T data as can be seen in deviations from the circle in the difference between the two solutions.

walker7

Clearly, changing the dynamic platform model setting in the receiver did not fix the problem. A couple of experienced users have commented that this setting does affect the front end of the receiver on earlier Ublox models and it’s still possible it affects the M8T as well, but it does not appear to be the cause of this problem. We will need to look elsewhere for the solution.

We are running identical RTKLIB solution code on the two data sets and we have verified that the receiver setup is nearly identical for both data sets. So what else can be different between them? One possibility is the conversion utility, RTKCONV, that translates the raw binary output from the receiver into the RINEX observation files that are used as input to the solution. Since the raw measurements are output by different commands on the two receivers, there are two different functions in the RTKCONV code that process them.

Let’s look first at the RTKCONV code to convert the data from the UBX_TRK_MEAS command used by the M8N receiver. I won’t show the code here but just describe functionally what happens in the code. Each data sample from the receiver contains a status bit indicating carrier-phase lock and a count of consecutive phase locks. RTKCONV sets the cycle-slip flag for that sample if the carrier-phase is valid and an internal code flag is set. If the carrier-phase is not valid, then the cycle-slip flag is left in its previous state. The internal code flag is set if the phase lock count is zero or less than the phase lock count for the previous sample and is only reset for the next sample if the carrier-phase is valid.

For the UBX_RXM_RAWX command used by the M8T receiver, there is also a status bit indicating carrier-phase lock. Instead of a count, there is a time for consecutive phase locks but functionally it is equivalent. The RTKCONV code for this command does not use an internal code flag and the cycle-slip flag is set or cleared directly every sample that the carrier-phase is valid. The cycle-slip flag is set if the phase lock time is zero or less than the previous sample and cleared otherwise.

I know that’s a bit confusing and the difference is fairly subtle. The most important thing to understand is that RTKCONV is doing more than just translating from binary to text, it is deciding which samples to set cycle-slips for based on a somewhat complicated algorithm, and that algorithm is different for M8N and M8T.

Basically, the effect of this difference is that for the M8N, if a cycle-slip is followed by an invalid phase then the next valid phase will always be flagged as a cycle-slip while for an M8T it won’t necessarily be so. It’s easier to understand by looking at a picture. The observation plots below show the location of one of the acceleration spikes in the M8T solution which is caused by a cycle-slip on satellite G05. The plot on the left shows which samples were flagged as cycle-slips with the existing code, the plot on the right shows the cycle-slips after modifying the code to be functionally equivalent to the M8N code. Note the extra two cycle-slips with the change.  The extra cycle-slips in this case are a good thing because they are flagging samples with large phase errors and preventing RTKLIB from incorporating them into the solution.

              Reach M8T before change                                   Reach M8T after changewalker8

Modifying the RTKCONV code for the UBX_RXM_RAWX command in this way and re-running the solution gives us the the position plot on the right and the acceleration on the left.

walker9

Much better than before! Not quite as good as the M8N but still a significant improvement. The difference between the M8N solution and the improved M8T solution is shown below.

walker10

Remember this should be a circle if both solutions are free of errors.  Actually in this case not quite a circle because there is also a separation between the two base stations ,but that effect is small and we can ignore it for now.  This is much better than the equivalent plot from the original M8T solution shown earlier. There are obviously some erroneous fixes even after the improvement, so this is still a work in progress, but I think it is a big step in the right direction.

This data set was quite a bit more challenging than the previous one and in reality both solutions are quite good given the number of cycle-slips we saw, but there is always room for improvement.

The other thing to note with this data set is that the better quality antenna made a big difference on the quality of the M8N solution.  I suspect I will not be going back to the old antenna!  I knew the nicer antenna would help but I have resisted using it till now because of the extra cost.  There are similar looking antennas with similar gain specs available for as little as $4 so I may try some of these to see how they compare.

I’m not going to post the code to my Github repository or my binaries until I’ve had a little more time to understand the remaining errors but if anyone wants to take a closer look now, here are the code modifications to ublox.c that I made in the decode_rxmrawx function.

walker11

Advertisements

Ublox M8N vs M8T

Thanks to the generosity of Emlid, I am now the proud owner of two of their Reach precision GPS units. At a list price of $570 for a pair, these fall more into the category of low-cost rather than the ultra-low cost receivers I have been working with, but they do allow me to do some comparisons between the two. Their units use the more capable (and more expensive) Ublox M8T receivers rather than my Ublox M8N receivers and also have higher quality (and more expensive) Tallysman 4721 antennas, rather than the cheap antennas that were included with my receivers.

To compare the two I collected a fairly challenging data set, with maximum distances from the base station of over two kilometers and maximum velocities of over 70 km/hr, with relatively unobstructed rovers, but partially obstructed base stations. I mounted one of my receivers and a Reach unit on top of my car and also one of each at the base location. Here is a Google earth plot of the ground track. The Google Earth plots are a really nice feature of RTKPLOT I have not used until now but have quickly become a fan of. I could not make this feature work with the 2.4.3 version of RTKPLOT so had to go back to the 2.4.2 version. I also found I had to specify the solution format (out-solformat) as “xyz” instead of the “enu” that I usually use to get the solution in a format Google can use. The track was over a mix of dirt and paved roads running through agricultural fields and next to residential areas.

union1

Here is a plot of the base station location, marked in red below. It is somewhat obstructed by the sheds and boats around it. I selected this location in part because it was very windy that day and this spot was fairly sheltered from the wind, but also thought it would make the solution a bit more challenging and therefore help differentiate the two receiver sets.

union2

Most of the rover’s route was relatively unobstructed, but there were a few scattered trees near the road in some locations. The plot below shows an example of one of these spots. Note also, that the ground track goes off the road a bit at the end of the loop. I suspect this is due to inaccuracies in the base station location (and maybe the Google maps) rather than the RTKLIB solution since I did not make any attempt to calibrate the base station against any known reference.

union3

Here’s the base station observation data, the M8N is on the left, the Reach M8T data on the right. From a cycle slip perspective they are fairly similar but we see a few of the satellites (G15, R16, and R18) are noticeably better with the Reach. This is most likely because of the higher quality antenna.

                                  M8N                                                              Reach M8Tunion4

Looking at the SNR vs elevation for both receivers, we see the Reach has noticeably higher signal strength especially in the lower elevation (10-20 degrees) region where it is most important. Again, this is to be expected with the higher quality antenna.  For some reason, the M8T SNR numbers are rounded off to the nearest integer. I don’t know why that is, but that is what makes the plots look like they are formatted differently.

                                 M8N                                                                       Reach M8Tunion5

Looking next at the rover data, here are the observations for the two receivers.

     M8N                                                                              Reach M8Tunion6a

The rovers were stationary until 00:34 and then moving till 00:57, then stationary again. While they were moving and at the end when they were stationary, the two look fairly similar from a cycle-slip perspective with maybe a slight advantage for the Reach. However, during the initial stationary period, there were several slips that occurred simultaneously on every satellite in the M8N data. I’ve circled one of them in blue above.  I believe these must be caused by some sort of discontinuity in the receiver and have nothing to do with the satellite signals themselves. My best guess is that they were caused by temperature fluctuations in the chip. It was a very hot day, with an intense sun heating the dark car roof, combined with a strong wind that would create a cooling effect. Because the antenna that was connected to the M8N receiver has only a one inch lead, it was mounted on top of the car in the sun, while the M8T receiver, with a longer antenna lead, was mounted inside the car and not subject to the same temperature fluctuations. Also, notice that the simultaneous slips all occurred near the beginning of the data set, possibly while the receiver was still reaching some sort of thermal equilibrium. To try and avoid this in the future, I will either switch to another inexpensive antenna I have with a longer lead, or let the receiver sit longer before starting to collect data.

Unfortunately these slips occurred during the initial stationary period I use for the first acquire and prevented that acquire from occurring. Rather than give up on the data, though, I decided to try running a solution with the M8N rover and the M8T base. I don’t know if it was because of the better signal strength, or because I just got lucky, but I was able to get an initial acquire with that combination. So for this exercise, the rest of the data is all based on a comparison between the M8N rover and the Reach M8T rover, both referenced to the M8T base station. It turns out that having a single base station also makes the accuracy analysis a little easier as I will describe later. This is not exactly the comparison I wanted to make, but one I think is still worth doing.

So how did they do? Here’s the solutions using my demo4 code. The input configurations were identical with one exception. Since with the M8T receivers, RTKLIB can resolve the GLONASS integer ambiguities without assistance, while with the M8N receivers, it can not, I set the input parameter “gloarmode” to “on” for the M8Ts and to “fix-and-hold” for the M8Ns. This enables my extension to the fix-and-hold feature to adjust for the additional errors on the GLONASS and SBAS satellites.

                             M8N                                                                        Reach M8Tunion7

The Reach solution (on the right) looks very clean with a very fast acquire and then 100% fixed values after that. The M8N solution (on the left) also acquired quickly but then ran into the simultaneous cycle-slips, causing problems until it re-acquired at 00:40. After that it stayed fixed for nearly 100% of the time with a short dropout around 00:49.

The next questions, of course, are: Do the solutions match? And are the fixes all accurate? To check this, I will use a similar technique I did earlier when I had only two receivers, both mounted on the rover. For that case, I solved for the distance between the two receivers which forms a circle equal to the distance between the receivers. In this case, I will solve for the distance between each rover and the base, then difference the two. This should also give us a circle with radius equal to the distance between the two rover receivers. Having only one base simplifies things by avoiding addition of a second term caused by the separation between the two base receivers. Here’s the result of that operation, plotted only for the fixed solution points since those are the ones we are counting on to be accurate. On the left is the position difference (x,y,z) and on the right is the ground track difference.

                           M8N                                                                            Reach M8Tunion8

The separation between the two receivers on the car roof was 15 cm and we see here that most of the points fall on the circle of that radius. However, there are several deviations, up to half a meter in error. They are also easy to see as spikes in the z-axis on the left plot. These are unexpected and quite concerning since we really rely on the fix status to let us know which points are valid and accurate.

Zooming in on the observation data for the largest of these spikes at around 00:51 shows that both rover receivers saw cycle slips on satellites G02 and G13 at this time, presumably from passing a tree or other obstruction.

                                   M8N                                                                Reach M8Tunion9

Zooming in on the position data for this point, shows discontinuities in the Reach data but not the M8N data which is surprising, I had expected the opposite. As the driver of the car, I am certain that I did not drive over any meter high cliffs during the test, so the Reach M8T data must be wrong.

                                 M8N                                                                          Reach M8Tunion10

Looking at other error samples shows the same thing. It is more easily seen as spikes in the accelerations as shown here, especially the z-axis accelerations which only occur on the M8T data and not the M8N data

                               M8N                                                                       Reach M8Tunion11

So what could be causing these errors? The two solutions used identical code and input parameters except for the fix-and-hold for GLONASS integer ambiguities setting. I reran the M8T solution with this parameter enabled to make them completely identical but this did not fix the problem. With the code being identical we have to suspect the difference is in the receivers themselves or at least in their setup. The next step I took was to use the Ublox u-center evaluation software to examine all the registers for both receivers. It’s a little trickier to do this with the Reach receiver since we don’t have direct access to the M8T chip, but there are instructions on setting up a tcp port in the Reach documentation, which is what I did.

For the most part, the differences between the two receiver setups were slight but I did find one significant difference. The receiver dynamic platform model settings are quite different. I have my M8N receivers set up for a “Pedestrian” model, while the Reach M8T receivers are set up for the “Airborne <4g” setting. According to the Ublox M8 Receiver Description document, that setting is only recommended for extremely dynamic environments. Here’s the details from the manual.

union12

I have heard differing opinions on whether this setting affects the front-end of the receiver or whether it only affects the back-end position calculations. If it only affected the back-end, then this setting would not matter since we use only the raw GPS measurements from the receiver. However, if it does affect the front-end, we might expect it to have an effect like this since it would most likely be opening up the bandwidths of the phase lock loops tracking the carrier-phases and thus minimizing any filtering effects.

So that’s where things stand at the moment with this data set. I see issues with both receiver sets and have speculated on what may be causing the problems but have not had time to verify that either one of my guesses is correct. I had hoped to do that before posting this article but it’s been almost two weeks since my last post, so I’ve decided to just go ahead and post what I have.

This also gives me the chance to ask if anybody else has seen similar issues and/or might have other possible explanations for what is going on?

I have also uploaded this data to my data set library.

RTKLIB: Static-start feature

The static-start feature is something I added to the demo4 code a while ago but never got around to explaining, so I will do it now.

As I’ve mentioned before, I always like to initially leave the rover stationary after first turning on the receiver long enough to get a first fix-and-hold before allowing it to move. This significantly reduces the chances of getting an initial false fix. However, we could benefit even more from doing this if we could tell RTKLIB that the rover was stationary during this time. This is because we would remove the uncertainty of rover motion from the solution and the kalman filter would converge faster. By doing this we should be able to reduce the time to first fix.

RTKLIB has a solution mode for a moving rover (kinematic) and one for a stationary rover (static) but currently no way to switch between the two. What the new static-start mode I have added does is to start the solution in static mode and then switch to kinematic mode after the solution first qualifies for a fix-and-hold. It does not require fix-and-hold to be enabled, it will still make the switch to kinematic mode when that qualification occurs, even when fix-and-hold is not enabled.

Below is the only functional code modification and it is in the relpos() function in rtkpos.c. The new line of code is in red. There are also a handful of bookkeeping changes to handle the new option in the input configuration file that I won’t include here but can be seen in the demo4 code.

/* hold integer ambiguity if meet minfix count */
if (++rtk->nfix>=rtk->opt.minfix) {
    if (rtk->opt.modear==ARMODE_FIXHOLD)
        holdamb(rtk,xa);
    /* switch to kinematic after qualify for hold if in static-start mode */
    if (rtk->opt.mode==PMODE_STATIC_START)
        rtk->opt.mode=PMODE_KINEMA;
}

This feature is enabled by changing “pos1-posmode” in the input configuration file from “kinematic” to “static-start”.

So, let’s do a little testing to see how it does. I used my latest demo4 data set and ran three solutions. The initial acquire for all three are plotted below.

static_start1

The first two are both in kinematic mode. For the first one I have used the default value of 100 for “stats-eratio1”, the ratio of variances between pseudorange and carrier-phase measurements. In the second plot, I have used my preferred value of 300. This is unrelated to the static-start modification but is relevant to the acquire time so I have brought it up again. As you can see changing this value reduced the time to first fix from roughly eight minutes to a little less than five.

In the third plot, I changed “pos1-posmode” from “kinematic” to “static-start” to enable the static start feature and again used 300 for “stats-eratio1”. This reduced the time to first fix from a little less than five minutes to about two and a half minutes, nearly a factor of two. The actual times to first fix will vary fairly significantly depending on signal quality, number of satellites and quality of the sky view but the trends shown here should hold even if the times change.

To understand this change a little better it would be good to understand what is the difference between “static” mode and “kinematic” mode in RTKLIB, so let’s discuss that next.

In my earlier post describing the RTKLIB receiver dynamics feature, I explained how the kalman filter makes two estimates each sample, where the first estimate is a prediction of how far the rover has moved since the last sample. If receiver dynamics is enabled, this estimate is made using the estimated velocity and acceleration of the rover, if not, RTKLIB simply using the pseudorange measurement for the current sample along with its large uncertainty.

In either case, the uncertainty (or variance) of the position states are increased after this update because of the unknowns of what happened during that time interval. The second kalman filter estimate (the measurement update) will then reduce the variances again using the additional information from the carrier-phase measurements. So for each sample, the uncertainties of the position states will be increased by the prediction estimate, and then decreased by a larger amount with the measurement update, zigzagging towards zero The smaller the variances, assuming they are accurate, the more likely it is that the integer ambiguity resolution will resolve a fix.

If however, we know that the rover did not move during that time, there is no need for the kalman filter to increase the variances with passing time, since the rover will be exactly where it was one sample earlier., and the variances can converge towards zero more quickly. That is exactly what RTKLIB does. If static mode is enabled, then RTKLIB skips the prediction update step entirely and just assumes the rover is where it was last sample and that the uncertainty in that position has not changed. Here is a plot of the variances from the last two solutions above comparing kinematic to static-start.  I show only the z-axis because it tends to be the limiting factor to finding a fix,at least when the rover is stationary.  This is because the geometric diversity of the satellites is always most limited in the up-down direction, since all the satellites below the antenna are obscured by the earth.

static_start2

The sudden drop in variance in both traces occurs when the integer ambiguities are first resolved and the solution switches from float to fix. The step up in the static-start trace occurs when fix-and-hold first occurs and the solution switches from static to kinematic. At this point, the uncertainties from a moving rover are introduced and the variance jumps accordingly. The reason the line is smooth for the kinematic case and not zigzagging as I described above is because the data I am plotting is from the output position file and includes only one variance value for each sample. This is the value at the end of the sample, after the measurement update.

So what happens if the rover is not left stationary long enough and there is no fix before the rover starts moving? If static start is not enabled, a fix will very likely be found while the rover is moving. The chance of this fix being wrong can be fairly high however, which leads to an incorrect solution that RTKLIB reports with high confidence of being correct (i.e. fixed), although it is not.

If static-start is enabled and no fix is found before the rover starts moving, then the chance of finding any fix, correct or incorrect, while the rover is moving is almost zero.  This is because the model the kalman filter is using so poorly matches the actual rover. If it did find a fix, it would not be maintained for more than a very short time, since again, the kalman filter is incorrectly assuming that the rover is not moving.

I would argue this is a win-win situation. Not only have we reduced the time to first fix, but we have also reduced the chance of RTKLIB reporting data as being valid when it is not. Any time RTKLIB reports a bad solution as good, it can significantly undermine confidence in the entire solution.