New release of demo5 B29e RTKLIB code

With the recent upgrades to the SwiftNav firmware and the upcoming release of the u-blox F9 receiver the last couple months have been an exciting time in the world of low-cost precision GNSS.!  It has kept me very busy, both making necessary updates to the demo5 version of the RTKLIB code and with consulting work related to the new receivers.  Unfortunately, this has meant I haven’t got a blog post out in over two months.

I have, however, just recently released a new version (b29e) of the demo5 RTKLIB code with some fairly significant changes from the previous version.  These changes have been much more of a group effort than my previous releases, so I first want to thank everyone who helped with the new features.

Here’s a list of the most important changes:

1)  U-blox F9 support:  Support for the new dual-frequency u-blox raw binary messages.  The updated code will now run real-time and post-processed solutions for the F9 receiver using all available raw binary observations and navigation messages.

2) Swiftnav F/W 2.0 support:  Support for the new Galileo and Bediou Swiftnav binary messages.  The updated code will now run real-time and post-processed solutions for the Piksi Multi receiver using all available raw binary observations and navigation messages.

3) Galileo E5b frequency support:  Both the u-blox F9 and the Swiftnav receiver are using the E5b frequency for the second Galileo frequency.  It was difficult to set the option for this frequency in the RTKLIB solutions and including it caused the solutions to run quite slowly.  Since the demo5 code is focused on low-cost receivers, and both SwiftNav and u-blox, the two most popular low-cost dual frequency receivers, are both using E5b, I have re-ordered the frequency tables in RTKLIB so that a three frequency solution now includes L1, L2, and E5b.  Previously, you would need to run what was effectively a five frequency solution to include E5b which caused RTKLIB to run noticeably slower.

4)  Event logging and event position logging:  This is a nice feature that has been available in the Emlid version of RTKLIB for a long time.  I have ported the code over from their open-source code base and have extended support to the Swiftnav receivers as well as the u-blox receivers.  Any events recorded by the receivers (e.g. camera triggers) are decoded from the binary messages and added to the rinex files.  Post-processing the rinex files will now generate two position logs.  The first is unchanged from before, with a solution position for every rover time stamp.  The second, only includes positions for the logged events which are interpolated from the time stamp positions.

5) Fix for using time-tag files to emulate real-time RTKNAVI solutions with file inputs:  This is a really useful feature that was broken by changes ported from the official 2.4.3 code quite a while ago, so it is really nice to have it working again.  Thanks to Christophe for figuring this one out and giving me the necessary code fixes!

6) Reduce unnecessary NTRIP connection requests:  RTKLIB was behaving quite badly on both server and client side whenever a receiver was disconnected without shutting down an NTRIP caster connection and was hammering the caster with nearly continuous connection requests.  This was causing bandwidth issues for the NTRIP casters, and was causing some users (including me) to get temporarily banned for misuse.   Thanks to David from SNIP for helping resolve this one and also for helping me to test the code.

7)  Improve cycle-slip handling for non-u-blox receivers:  RTKLIB was ignoring cycle-slips in cases where the carrier-phase was not set or set to zero.  This was causing it in some cases to ignore valid cycle-slips which can significantly degrade the solution.  The u-blox receiver code already had a fix for this so this change primarily affects non-ublox receivers

Several of these changes were written specifically for clients that needed the features or fixes for their own use but were willing to share them with the larger community.  I appreciate their willingness to share and hope I can continue to bring more changes this way into the open-source code in the future.

I’ve had a chance to run real-time and post-processed solutions with this code with raw observations from both the u-blox F9 receivers and with the SwiftNav receivers with F/W 2.0 and am getting great results with both of them.  I hope to share more results in the near future, but just wanted to say that the quality and number of raw observations I am seeing from both receivers is excellent.

If you’d like to try the new code, Windows executables can be downloaded here and the source code is available here.

 

 

 

 

 

 

 

Tersus/M8T moving rover comparison

In my last couple of posts I compared a u-blox M8T single frequency receiver to a Tersus BX306 dual frequency receiver for a static rover using a fairly distant CORS receiver for base data.  Both receivers had over twenty raw phase measurements, but the Tersus receiver had much better overlap with the CORS receiver with twelve measurements available for ambiguity resolution (GPS L1 and L2) while the M8T had only six (GPS L1).  Not surprisingly, the Tersus provided a much better solution than the M8T.  I also compared the RTKLIB solution and the internal Tersus RTK solution and showed that they appeared to be roughly comparable.

In this post, I will add a second M8T receiver and compare a M8T to M8T short baseline solution to the Tersus to CORS longer baseline solution.  While this may not sound like a fair comparison, it could be a reasonable choice given that two M8T receivers are still significantly less expensive than one Tersus receiver.   Also, to make things more interesting,  I will use a moving rover this time rather than a stationary one.

For the experiment, I mounted both receivers in a car, each with it’s own antenna on the roof.  Given that we are making a comparison to a relatively expensive solution I felt it wouldn’t be unreasonable to add $20 to the M8T solution and upgraded its antenna from the standard $20 u-blox antenna I usually use to a Tallysman 1421 antenna available at Digikey for $42.   For the Tersus receiver I used a Tallysman dual frequency 3872 antenna which I believe is roughly a $200 antenna.  For the M8T base station, I used the same antenna on my house roof as in the previous experiment which gave a baseline less than 1 km for most of the M8T pair solution whereas the Tersus/CORS baseline was roughly 16-18 km.  For RTKLIB post-processing, I also ran a solution using base data from the nearest CORS station which gave a baseline of 7-9 km but I couldn’t use this data for the Tersus internal RTK solution because it is not available real-time.   Also, it should be noted that I collected all this data a few weeks ago before Tersus released their most recent firmware so it was all done using their previous version.

I chose a driving route very similar to the one I used for this M8N to M8T comparison in which I drive through a residential neighborhood with a moderate tree canopy.  This time I added a section of the route in a parking lot with no tree obstructions.  The parking lot is intended to be a low-stress environment and the neighborhood streets a moderate-stress environment.  Here’s a Google Earth image of the previous route to give a feel for the terrain.  Unfortunately this map feature no longer works in RTKLIB because Google has discontinued the API to Google Earth.

 

walker1

In this case the M8T  was receiving signals from the GPS, GLONASS, SBAS, and Galileo satellites and started the data set with a total of 21 phase measurements.  All of these can be used for ambiguity resolution since the two receivers are identical hardware.   The Tersus receiver measured only GPS and GLONASS but for all but a couple of satellites got both an L1 and an L2 measurement.  It started the data set with 24 phase measurements of which I would expect that only the 14 GPS phase measurements are available for ambiguity resolution because the receivers are not identical.

The previous time I ran this experiment I was able to get a nearly 100% fix solution from both the M8N and the M8T  receiver pairs but had to use some solution tracking gain (fix-and-hold) to achieve that.

In this case, with the extra Galileo satellites and the more expensive antenna, I was able to get nearly 100% fix using continuous ambiguity resolution instead of fix-and-hold. Continuous AR has the advantage of reducing the chances of locking to a false fix and is normally a preferrable solution if it is achievable.  The only float part of the solution was at the very end of the route where I parked the car underneath a large tree.

Here are three versions of the M8T receiver pair solution all run with continuous ambiguity resolution.  In all the plots, green is a fixed solution and yellow is a float solution.  The top left solution was run with 5 Hz measurements which is what I normally use for moving rovers.  I then realized that the Tersus data was only 1 Hz, so I re-ran the M8T solution after decimating the raw data down to 1 Hz (the latest Tersus firmware supports 5 Hz RTK solution).  The decimation can sometimes cause problems because the cycle slips aren’t always handled properly in the decimated data but in this case it seemed to work fine as can be seen in the plot on the top right.   The only noticeable difference is that the 1 sec data took a little longer to get to first fix.  This is less important in post-processed solutions because the solution can always be run in combined (forward/backward) mode which will usually get a fix for the beginning of the data.  This can be seen here in the bottom left solution which was run in combined mode.

ter_kin1

The zig-zag line from 21:22 to 21:26 is the lower stress circles in the parking lot followed by the moderate stress route through the residential neighborhood.

Next, let’s look at the Tersus solutions.  The internal Tersus RTK solution was run with the Tersus default settings.  The user interface for the Tersus console app is much simpler than RTKLIB so there are many fewer options to play with.  For most users this is probably an advantage because it avoids the rather overwhelming array of options that RTKLIB gives.   The RTKLIB solution was run with continuous ambiguity resolution with settings very similar to the M8T solution, just adjusted for dual frequency.  The internal solution is on the left and the RTKLIB solution on the right.

ter_kin2

The two solutions are fairly similar, both did well in the lower stress parking lot environment but struggled with the moderate stress on the residential streets.  The internal solution did a little better with scattered fixes in the latter part of the data.

Comparing differences between the internal and RTKLIB solutions and between the Tersus and M8T solutions for only the fixed points, it looks like most of the errors between the different solutions when they have a fix are small.  The Tersus/M8T differences are indicated by the distance from the circle as I have described before. I’m not too worried about the DC offsets between them.  It is somewhat tricky to get all the offsets correct and I did not spend a lot of time on that.  It is likely to be a issue with coordinate differences or handling of antenna offsets that explains the DC shifts.

ter_kin4

The above Tersus RTKLIB solutions were run with only GPS ambiguity resolution as I would not expect the GLONASS measurements to be useful for ambiguity resolution because of the inter-channel bias differences between the non-identical receivers.  However I was surprised to find that I did get fixes with the GLONASS ambiguity resolution set to “On” in the RTKLIB configuration file.  The solution was slightly worse than the GPS-only AR but I did verify that the GLONASS satellites were included in the ambiguity resolution.  I’m not quite sure what to make of this observation, whether or not it makes sense to include the GLONASS measurements in the ambiguity resolution, but I suspect it makes sense to leave them out for the reason mentioned above.

ter_kin5

I then ran another RTKLIB post-processed solution using the Tersus and base station data from a closer CORS base station.  This was to see how reducing the baseline affected the answer.  Here’s the result from a base station that is only 7-9 km away.

ter_kin6

Even though we reduced the baseline by a factor of two the solution only got slightly better and time to first fix actually increased.  This suggests that the long baseline may not be the primary reason for the poorer Tersus solution.

My suspicion is that it is a combination of two things,  at least for the RTKLIB solutions.  First of all I believe there is a mismatch between how RTKLIB interprets a cycle slip flag and how the cycle slip flag is defined in the Rinex spec.  The problem is that RTKLIB resets the phase bias estimate in the same epoch as the cycle slip is logged regardless of whether the receiver has had time to relock or not.  This can cause large errors in the bias estimates if the receiver flags a cycle slip before it has recovered from it.  In some of my earlier posts I have described having the same problem with the M8T receiver but in that case I have made some changes in the u-blox specific RTKLIB code to delay the cycle slips until the receiver has re-locked.  Something similar may need to be done for other RTKLIB receiver specific code  including the Tersus or it may be possible to modify the main RTKLIB code to better interpret these cycle slip flags.

Maybe more important, though, is the difference in the measurements between the two receivers.  As mentioned before, the M8T receiver has 21 phase measurements all of which can be used for ambiguity resolution while the Tersus has 24 of which only 14 can be used for ambiguity resolution assuming we don’ t try and use the GLONASS satellites.  Note, though, that there are only seven different satellite-receiver paths for the Tersus since each satellite is providing two measurements.  This compares to the 21 satellite-receiver paths for the M8T receiver where each satellite only provides a single measurement.  Now imagine that the receivers are under a partial tree canopy and four of the satellites are obstructed for both receivers.   The M8T will lose four measurements and still have 17 to work with but the Tersus receiver will lose 8 measurements and only have six to work with.  This is a significant disadvantage and I suspect can explain a large part of the difference in results.

If I had used a local Tersus base station, then the matched Tersus receiver pair would enable use of the GLONASS satellites for ambiguity resolution.  In the case of four obstructed satellites, the two cases would be much more similar with 17 available measurements for the M8T and 16 for the Tersus.  As more satellites were obstructed the M8T would start to gain a bigger advantage since the Tersus would lose two measurements for each obstructed satellite and the M8T would only lose one.  Of course the M8T would tend to have more obstructed satellites than the Tersus since it has more satellites to start with that can be obstructed.  That would work in favor of the Tersus reciever.  It’s hard to say which would give a better solution but my suspicion would be that if the cycle slip handling issue in RTKLIB was fixed the two solutions would be fairly similar when calculated with RTKLIB.  I don’t know enough about the internal Tersus RTK engine to predict how it would do.  Hopefully I can get my hands on a second full dual frequency receiver and run this experiment soon.

Although I ran this experiment at a random time without looking at the satellite alignment first, it may be that the satellite alignment was such that it accentuated this effect.  Note in the observations (Tersus on the top, M8T on the bottom) that the Galileo (Exx) and SBAS (Ixx) satellites have less cycle slips than any of the other satellites.

ter_kin7

Looking at the skyplot for those observations we see that three of the four Galileo satellites are at very high elevations which will tend to be blocked less from nearby trees. This would have helped the M8T solution since the Tersus receiver did not have access to these high elevation satellites.

ter_kin8

I will try to summarize what I think this data suggests but let me first emphasize that this is by no means intended to be any sort of rigorous analysis.  I don’t have the time, resources or knowledge to do that.  Instead, please take these as no more than the sharing of my thought process as I try to understand some of the differences between single and dual frequency RTK solutions.

Rover to CORS or other traditional dual frequency receiver:  Tersus has a significant advantage over the M8T both because of more matched measurements and opportunities to take advantage of the nature of the dual frequency measurements.  This advantage applies both to the RTKLIB solution and the Tersus solution although I suspect the Tersus solution takes better advantage of the dual-frequency measurements.  The advantage also increases as the baseline increases.

Matched pair of receivers with short baseline:  Good results with the RTKLIB solution will be limited to low stress environments for a pair of Tersus receivers because of limitations in the cycle slip flag handling.   With the M8N and M8T, RTKLIB can also handle moderate stress environments because of receiver specific changes in the RTKLIB cycle slip handling code.   Relative to a Tersus/CORS combination, the M8T matched pair solution will in general be superior for short baselines because of more matched measurements.

Matched pair of receivers with long baseline:  The data in this experiment doesn’t cover this case but as the baseline increases the dual frequency receiver pair should have a greater advantage because of the additional information that can be derived from the dual frequency measurements.

From a cost trade-off perspective, this suggests that the ideal way to combine these receivers might be to build the base with both an M8T single frequency receiver and a Tersus dual frequency receiver, both sharing a single antenna.  The rover would then be a second M8T receiver.  This would give the advantage of the dual frequency receiver for locating the absolute position of the base using long baseline solutions to distant reference stations or even PPP solutions while taking advantage of the matched pair of lower cost receivers for the moving rover piece of the solution.

 

RTKLIB on a drone with u-blox M8T receivers

Drones are a popular application for RTKLIB and quite a few readers have shared their drone-collected data sets with me, usually with questions on how they can get better solutions. In many cases, the quality of this data has been fairly poor and it has been difficult to get satisfactory results. I was curious to understand why this environment tends to be so challenging since in theory a drone should have more open skies than just about any other application.

To do an experiment, I bought an inexpensive consumer drone from Amazon. I chose the X8C from Syma since it is beginner model and a little larger than some options. I figured the larger size should make it better able to carry some extra weight.

After a few practice flights to get the hang of flying it, I used some duct tape and double-sided foam adhesive to attach a u-blox antenna and 90 mm diameter ground plane to the top of the drone and a u-blox M8T receiver with my custom CHIP data logger underneath where the camera usually goes. I used the landing gear as a spool to wind the unnecessary five meters of antenna cable which was the heaviest part of the whole setup. From a weight perspective, the Emlid Reach units would have been a better choice, but I wanted to collect data from the Galileo constellation of satellites as well as GPS and GLONASS so I used my CSG receiver with the newer 3.0 firmware. I used a second CSG receiver mounted on top of my car as the base station.  Here’s a stock photo of the drone on the left and after my modifications on the right.

drone1drone2a

Although the drone was able to lift the extra weight fairly easily, it seemed to affect the stability of the flight control system and after a few minutes the prop motors would start to fight each other. At that point the drone would start to descend even at full throttle and the drone would land hard enough to usually bounce on its side or back. Still I was able to make a number of short flights which were adequate for testing purposes.

Here’s the observation data for the first set of flights, base station on the left and drone on the right. Red ticks are cycle-slips and gray ticks are half-cycle ambiguities. Ideally, the drone data would look as clean as the base but as you can see it is significantly worse and it turned out to be unusable for any sort of reliable position solution.  The regions without cycle-slips in the drone observations are the times in between flights in which the drone is sitting on the ground.

drone3

Clearly, while the drone is flying, something is interfering with the GPS receiver or antenna, most likely either EMI or mechanical vibration. I could have used a fancy test stand and RF sniffer to try and locate the source of interference but since this blog is focused on low-cost solutions I just used some duct tape, a steel bar, and the RTKLIB code instead.

I used two types of duct tape, both the polyester/fabric type that everyone calls duct tape, and also the metal foil type that is actually used to repair or install ducts. I first used the non-metal duct tape to securely attach the landing gear to the heavy steel bar. The steel bar was convenient because it was easy to attach but anything heavy enough to prevent the drone lifting off under full throttle would work fine.

I then started an instance of RTKNAVI on my laptop and connected it to the receiver on the drone.  The goal was to simulate a complete drone flight while the drone was sitting on the ground and at the same time watch the RTKNAVI observations to detect any degradation of the measurements.  I used a wireless connection but a USB cable would have worked too.

Unfortunately RTKNAVI won’t plot the observation data real-time, but by selecting the tiny “RTK Monitor” box in the bottom left corner of the main RTKNAVI screen, then choosing “Obs Data” from the menu, I was able to get a continuously updating listing of the observations.  Cycle-slips show up as non-zero values in the first column with the I heading. I chose a location outdoors with open enough skies that any degradation in the observation data would be obvious.

drone4

I first observed the cycle-slip column with the drone powered down to verify I wasn’t getting any cycle-slips on all but the lowest elevation satellites. I then continued to observe the cycle-slip column while sequencing through the steps required to fly the drone. I first powered on the drone, then powered on the transmitter, then issued the calibration/connection sequence, then turned on the throttle to low. So far, so good, no sign of cycle-slips. I then started moving the joysticks to issue steering commands which caused the motors to change speeds. All of a sudden I started getting cycle-slips, the more aggressive the steering commands, the more cycle-slips I saw. Aggressive changes in throttle also caused cycle-slips but full throttle with no adjustments or steering commands was fine.

Next I moved just the antenna, then just the receiver away from the drone while issuing steering commands. Moving the antenna away had no effect but moving the receiver away eliminated the cycle-slips.

At this point my guess was that the interference was coming from the relatively high power switching in the motor control circuits and that the antenna ground plane was shielding the antenna from this interference but nothing was shielding the receiver. To test this theory, I attached a layer of the metal duct tape to the bottom of the drone to act as a shield between the drone controller board and the receiver.  I then re-attached the receiver to the bottom of the drone and re-ran the experiment. This time there were almost no cycle-slips even with the most aggressive steering.

I then removed the steel bar and ran a second set of short flights with the layer of metal tape still in place. I was a little concerned that the new shield would interfere with commands sent from the transmitter to the drone so I first tested everything while still on the ground and then kept the drone fairly close during the flight. Fortunately I didn’t see any sign of commands not getting through.

The drone data looked much cleaner in this flight!  Unfortunately, this time the base data was no good with many simultaneous cycle-slips throughout the observation data. At this point I realized that I had placed the base station receiver directly on the top of the car when collecting the data which was very hot at the time. Usually I keep the receiver in the car to avoid this and only place the antenna on the roof. I have seen this kind of severe temperature effects cause simultaneous cycle-slips before but never to this extent. Again the data was completely unusable.

So, back out there again for a third round of flights. This time, everything looked much better. I still saw a few cycle-slips, especially when first applying the throttle at take-off, so my shielding was not perfect but a dramatic improvement over the first flight. The plots below show the results. The top two plots are position solutions for the z-axis. The top plot is with continuous ambiguity resolution and the middle plot is with fix-and-hold enabled. The bottom plot is the drone observation data.

drone5

I made three adjustments to the input configuration file from what I would normally use for my car based measurements.  First of all, since the drones have very open skies, I adjusted the minimum elevation angles from 15 degrees to 10 degrees.   Then, after plotting and observing the accelerations from an initial solution, I increased the vertical acceleration dynamics estimate (stats-prnaccelv) from 0.25 to 1.0.  Finally, because I was seeing slightly higher position variances in the initial solution than I usually do, I adjusted the position variance AR threshold (pos2-arthres1) from 0.004 to 0.1  Both of these last two changes would make sense if the level of vibration were higher in the drone than I am used to seeing, which is quite likely.

Each time the drone landed/crashed due to the unstable flight control system it would bounce to the side or upside-down and that is what is causing the cycle-slips and switch from fix to float at the end of each flight. As you can see though in every case I quickly get another fix after I put the drone upright again. The fixes are solid enough to hold through the entire flight even in continuous mode for all but one of the flights. With fix-and-hold enabled all flights maintained 100% fix rate. The data is as good as or better than similar experiments where I have mounted the rover on top of a car.

This is not surprising since the skies are more open in this experiment. Having over twenty satellites available for ambiguity resolution also helped. I used all the satellites (GPS/GLONASS/Galileo/SBAS) for ambiguity resolution and took advantage of the new feature in the demo5 b26 code that cycles through all the satellites and will throw a single one out if it is preventing a fix. This will automatically occur anytime the number of satellites available for ambiguity resolution is greater than the config parameter “pos2-mindropsats” which defaults to twenty.

I have added the raw data and the configuration file to the  sample data set section at rtkexplorer.com

I imagine different drones will have different issues and not all will be as easy to fix as this one, but the method described here or something similar should be helpful any time drone data is not looking as clean as the base station data.

The fix I chose was very easy to implement but a better fix would probably have been to wrap just the receiver in a shield rather than placing a shield between the control board and the receiver. This would protect the receiver better and avoid affecting commands sent from the transmitter.  In fact, based on these results, I suspect shielding the GPS receiver on a drone is always a good idea.

Receiver warm-up glitches

I’ve described before the occasional glitches that both the M8N and M8T seem to be susceptible too in their first few minutes of operation, but my previous description was buried in one of my more technical posts and maybe not seen by people more interested in just the practical side of using RTKLIB, so I thought it was worth bringing them up again.

Here is an example of one of these glitches which was in a data set recently sent to me by a reader, and one that was giving him trouble finding a solution.  The data is very clean, except for a nearly simultaneous cycle-slip (shown by red ticks) on every satellite.

rec_glitch

Here is a zoomed in image of the same glitch.

rec_glitch2

I see these glitches on both the M8N and the M8T receivers.  Every occurrence I have seen, the glitch occurred within a few minutes of turning on the receiver, and was present on every satellite.  In this example it occurred seven minutes after starting up, usually I see it within in the first five minutes.

These glitches are very disruptive to the RTKLIB solution.  Since the cycle-slips affect every satellite, all the phase-bias kalman filter states are reset and the solution has to start again from the beginning.  In some cases, the phase-biases initial values may have larger than normal errors in which case it is even worse than starting over.

I don’t have any good suggestions on how to deal with these other than to avoid them in the first place.  From my experience I believe they are more likely to occur if the external environment of the receiver has just changed.  For example if it went from hot to cold, or into the sun.  Once the receiver has had time to stabilize, everything is usually OK.

Giving the receiver time to adapt to it’s current environment before collecting data and protecting the receiver from sudden changes should help avoid these glitches.  Using external antennas with cables rather than the small antennas that come with the receivers helps because it allows you to place the receiver in a more protected location than the antenna.  For example, when I collect data from a moving car, I place the antenna on the roof but keep the receiver in the car.

For information on plotting the observations with cycle slip enabled see this post.  For another post where I discuss this problem in more detail, see this post.

Does anyone else have more information on what causes these glitches and maybe other steps that can be taken to avoid or deal with them?

 

 

AR Filter:A RTKLIB cycle-slip enhancement

Some of you may remember, one of the first code changes I made to RTKLIB was fixing a bug in the arlockcnt feature. Arlockcnt is an input parameter that specifies how many samples delay occurs before a new satellite (or a satellite that just recovered from a cycle-slip) is used for ambiguity resolution. Holding off use of the new phase-biase estimate from the kalman filter until it has had enough time to converge prevents corruption of the ambiguity resolution integer set. This in turn prevents a loss of fix.

Although waiting a fixed number of samples is fairly effective, it is not an optimal solution. Ideally we would use information from a new satellite as soon as it was converged and not after a fixed amount of time since some satellites will converge faster than others. When your data looks like this one, then every additional sample you can process is going to help.

arfilt0

This is what the AR filter attempts to do. In the current code implementation, a new satellite is unconditionally added to the integer ambiguity set when the arlockcnt expires regardless of the effect it has on the AR (ambiguity resolution) ratio. This means that the arlockcnt must be set conservatively, to insure the slowest satellite has converged, and means that most satellites will not be used for ambiguity resolution as early as they could be. In the case of frequent cycle-slips, this could mean loss of fix from having too few satellites available or it could mean a false fix since less satellites gives a less robust ambiguity resolution.

When the AR filter is enabled, a new satellite is still added to the integer ambiguity set when the arlockcnt expires but the effect of adding each new satellite is evaluated and if it causes a significant degradation in the AR ratio, that satellite’s use in ambiguity resolution will be delayed for a few more samples before being re-evaluated. Exactly how to define “significant degradation” is a bit subjective. I have chosen to disqualify a new satellite if it causes the AR ratio to drop below the AR fix threshold or if drops by more than a factor of two and the result is within 10% of the AR fix threshold. Exactly how long a satellite should be delayed is also subjective. I chose to delay a disqualified satellite for five samples plus a stagger of one sample for additional satellites. If two satellites are disqualified on the same sample, it could be either satellite or both that caused the disqualification. By adding a stagger to the delay for the second satellite, they will be re-evaluated independently on different samples.

To evaluate the change, I ran two solutions on the Ublox M8T data from my previous series of “M8N vs M8T” posts. This is my most challenging data set from a cycle-slip perspective. The solution below on the left is with arlockcnt=0 and AR filter disabled. The solution on the right is with arlockcnt=0 and AR filter enabled. As always, the yellow represents a float solution, and the green, a fixed solution. As you can see, enabling the AR filter significantly improved the number of fixes. Normally I would not set arlockcnt to zero if the AR filter was not enabled, this was for comparison purposes only.

arfilt1

As you would expect, with the AR filter disabled, increasing arlockcnt from 0 to 75 samples (15 sec) improves the solution for this data set as shown below but it still loses fix relatively often compared to the solution above with the AR filter enabled.

arfilt2

The plot below compares the number of satellites available for ambiguity resolution between the “arlockcnt=75/filter off“ solution and the “arlockcnt=0/filter on” solution”. Notice that we have significantly reduced the number of samples with less than 10 satellites available for AR by enabling the AR filter. More satellites should mean less chance of losing fix and also less chance of a false fix.

arfilt3

In this example, the accuracy of the fixed solution points did not seem to be noticeably affected by enabling the AR filter. As usual, I evaluate accuracy by comparing the receiver position relative to the position of a second receiver mounted on the same rover, both relative to the same base receiver. The difference between the two rovers should be a perfect circle, so any errors will appear as deviations from the circle. Plotting for only the fixed points, the “arlockcnt=75/filter off“ solution is on the left and the “arlockcnt=0/filter on” solution on the right. In both cases the errors appear to be very similar and within a few centimeters. This probably makes sense since the same satellites were used to calculate position in both cases, it was only the ambiguity resolution that differed. Any advantage from having more satellites in the AR calculations could be offset by the fact that the additional satellites were probably noisier since they may not have been fully converged. Also, the plot on the left does not include many of the points on the right, since samples without a fix are not included.

arfilt4

I actually created the AR filter feature quite a while ago but never got around to describing it or even fully testing it by reducing arlockcnt to zero. I have now done that, and made some small improvements to it in the last few days. I have updated my Github repository and executables folder with the latest version.

That pretty much completes my general explanation of this feature but there are a few details to be aware of if you are interested in trying it out yourself.

First of all, enabling the AR filter will slightly increase the code execution time since if a satellite is rejected, the ambiguity resolution has to be re-run without the rejected satellite. The difference is small enough however that I don’t think it will be an issue in the vast majority of cases.

The second thing is to understand is how the arlockcnt interacts with the half-cycle valid bit. A typical cycle-slip (at least on a Ublox receiver) looks like the plot below. There is usually a gap of no data, then a cycle-slip (red tick), then a number of half-cycle invalid samples (gray tick), then a final cycle-slip. The second cycle-slip is actually not reported by the receiver, but is added during the translation from raw data to RINEX format when the half-cycle valid bit transitions. Any time the half-cycle status is invalid, that satellite will not be used for ambiguity resolution regardless of the arlockcnt. The arlockcnt will be reset by the second cycle-slip and count from there. So, in this example, if arlockcont were set to 10, all the samples from the beginning of the gap until 10 samples after the second cycle-slip will be ignored for ambiguity resolution.

arfilt5

The last thing to mention is that one of the recent code changes I referred to above was to add a pseudo half-cycle invalid bit for the SBAS satellites for the M8T receiver. For some reason, the Ublox receivers don’t seem to report the half-cycle status for the SBAS satellites. The change I made was in the raw data to RINEX translation where I set the half-cycle invalid bit for a fixed delay after a cycle-slip on a SBAS satellite.  This makes cycle-slips on the SBAS satellites look very similar to the rest of the satellites.  I had previously done this for the M8N receiver and that change has been migrated to the release code but hadn’t got around to doing it for the M8T. This attempts to avoid the half-cycle uncertainties from possibly causing a false fix if the SBAS satellites were used too early for ambiguity resolution.

Ublox M8N vs M8T: Part 3

 

This is the third part in a series of posts comparing data from a pair of Ublox M8T receivers with data from a pair of Ublox M8N receivers. In the first post I identified a problem with erroneous fixes in the RTKLIB solution for the M8T data. In the second post I found that the source of the problem appeared to be in the handling of cycle slips in the translation from the receiver binary output to the text input used by RTKLIB. I also showed that adopting an algorithm more similar to the one used to translate the M8N binary improved but did not entirely fix the problem. In this post I will show that if the cycle slip translation is more precisely adapted to the differences in binary output between the M8N and the M8T then we can virtually eliminate the erroneous fixes and get an M8T solution that is now slightly more accurate than the M8N solution.

One thing I may not have made clear in the previous posts is the way I am collecting and processing the data from the Emlid Reach M8T receivers. I am using their ReachView software to collect the raw GPS data but all the processing is done afterwards using my own demo4 version of RTKLIB. I don’t have access to Emlid’s RTKLIB source code and don’t know if they have made any changes that might have improved this issue in their own real-time version of RTKLIB. So this is more a comparison between the M8N and the M8T receivers using my version of RTKLIB and not an evaluation of anything specific to the RTKLIB version of Reach.

Last post I finished with this plot showing the difference between the M8N solution and the M8T. For reasons I’ve explained before, we know this should be a perfect circle with a radius equal to the distance between the receivers.

walker12

By looking at spikes in the z-axis accelerations we were able to show that the deviations from the circle in the above plot were due to errors in the M8T solution and not the M8N solution. This plot was taken after I had modified the binary to text translation in RTKCONV to make the M8T translation more similar to the M8N translation. Before that, the errors were much larger.

Let’s start by taking a closer look at the binary outputs of the two receivers. The details for the M8T are available in the M8 receiver spec and for the M8N in this document from Tomoji Takasu but I will summarize them here.

Both receivers provide three values for each satellite output sample that can be used to evaluate the quality of the carrier phase measurement. The first is the carrier-phase valid bit. Both receivers provide this in the tracking status byte. It indicates that the receiver is currently locked to the carrier-phase for this satellite. Note that it does not tell us if the receiver has been continuously locked since the previous output sample and it also does not indicate anything about the quality of the measurement.

The second output (locktime or lock2) indicates how long the receiver has been locked to the carrier-phase for this satellite. If this count is lower than the count for the previous sample, then an unlock, i.e. cycle-slip, has occurred. Both receivers also provide this value, and this is what RTKCONV uses to detect a cycle-slip.

The third value (mesQI or cpStdev) is a quality indicator and this is where the receivers differ. On the M8N this appears to be more of a state indicator than a true quality indicator. The comments indicate that it’s value corresponds to:

0:idle
1:search
2:acquired
4-7:lock

For the M8N, the RTKCONV code throws out any phase measurement for which this quality indicator is not between 4 and 7.

On the M8T, the quality indicator is actually a numeric estimate of the standard deviation of the carrier phase measurement. If the value is between 1 and 7, then the estimate of standard deviation is this value multiplied by 0.004 meters. If this value is 15, then there is no valid estimate. This value is currently ignored by the RTKCONV code and that is the root of the problem.

Before discussing how to modify the code to incorporate this information, we need to understand exactly how the cycle-slip bit is interpreted by RTKLIB. On a sample for which this bit is set, RTKLIB will reset the phase-bias state for that satellite based on that carrier-phase measurement. Note that the reset is done using the measurement from the current sample, not the following sample. What this means is that the cycle-slip bit should not be used to indicate that there is a cycle-slip on the current sample. Rather it should indicate that there has been a cycle-slip since the last good measurement and that the quality of the current sample is now high enough to use it to reset the phase-bias state.

So how high should the quality indicator be before resetting the phase-bias? It will be a trade-off between time and accuracy. I chose the value of four since it’s in the middle of the scale and somewhat equivalent to what is used on the M8N, but it could be argued that this value should be higher or lower. Forcing the cycle-slip to be set on a good sample also eliminated the problem we previously saw with samples subsequent to the cycle-slip not being flagged, so I was able to remove the code I added in the previous post.

Here are the changes I made to the original code for the M8T receiver. The new variable cpstd is the standard deviation of the carrier phase measurement as described above. The constant STD_SLIP is the quality threshold described above and is set to 4.

walker13

Here is the position solution calculated with the above code change.

walker14

Here is the difference in position between the two solutions. As you can see, the deviations from the circle have been significantly reduced.

walker15

The noise in the z-axis accelerations has also been significantly reduced and is now slightly lower than in the M8N data.

       M8N                                                                           Reach M8Twalker16

I have posted the raw data and config files to my data set library in the niwot2_car folder and the code changes to my Github page.

Ublox M8N vs M8T: Part 2

In my last post, I compared data from a pair of Ublox M8N receivers with data from a pair of Emlid Reach M8T based receivers, and found issues in both data sets. In this post I will look into those issues more closely.

To do this, I collected another data set with a few differences from the previous one.

  • On the M8N rover receiver, I replaced the original antenna that had only a 1” cable with a Ublox ANN-MS-0-005 antenna with a much longer cable. This allowed me to move the receiver from the car roof to inside the car, a friendlier thermal environment. The goal was to see if this would avoid the all-satellite cycle-slips I saw last time at the beginning of the data set. This antenna cost me $20 from CSG but is available from other places for less.
  • On both Reach M8T receivers, I modified the dynamic platform model setting from “Airborne < 4g” to lower acceleration settings; “Pedestrian” for the rover, and “Stationary” for the base. I also changed the setting on the M8N base receiver from “Pedestrian” to “Stationary” to make them consistent. The goal here was to avoid the erroneous fixed solution points I saw in the solution from the previous M8T data. To insure the modifications occurred correctly, I used the u-center software to read back the settings from the receivers after the data was taken.
  • I chose a measurement route with much more tree cover than the previous one to increase the difficulty of the solution and to help differentiate the two receiver sets.

Here is a Google map of the measurement route. In this area there are few native trees, so by driving through residential areas, rather than next to them as I did in the previous data set, I encountered many more trees. There were numerous locations with tall trees on both sides of the road as well as places where I drove directly under large tree branches.

walker1

Here’s an example of some of the more challenging tree cover encountered in the route. The route locations are only approximate since the base locations are not calibrated, that is why the lines are not actually on the road.

walker2

Here is the observation data from both rovers. The red ticks are cycle-slips, the gray ticks are half-cycle invalids.

                             M8N                                                                          Reach  M8Twalker3

The first thing to notice is that there are no simultaneous cycle-slips in the M8N data. Of course, this doesn’t prove we will never get them but it suggests that maybe moving the receiver inside the car did help. Unfortunately, there is one simultaneous cycle-slip in the M8T data very close to the start at 00:54. Apparently, both the M8N and the M8T are susceptible to these slips, which is not surprising, since the chips are part of the same family. We do see only a single slip, which is an improvement over the previous data. Since they always occur near the beginning of the data collection, I still believe they are most likely caused by thermal transients. I suspect that the best way to avoid them would be to turn on the receivers 15 minutes before starting to collect data in an environment as similar as possible to the data collection environment.

Next, let’s look at the SNR vs. elevation plots. Last time we saw noticeably better numbers with the Reach setup because of the more expensive antenna. This time, with the Ublox antenna on the M8N receiver, the two are much more similar. SNR isn’t everything, and there still are reasons why the more expensive Reach antennas are likely better than the Ublox antennas, but we’ve at least closed the gap some between them.

                            M8N                                                                          Reach  M8Twalker4

Here are the position solutions from both receiver sets.

                            M8N                                                                          Reach  M8Twalker5

The M8N solution is very good, with nearly 100% fix after the initial acquire until the very end when I parked the car under some trees in the driveway. This is despite a very challenging data set with many cycle-slips.

Unfortunately, the M8T solution is not as good. The initial acquire is delayed because of the simultaneous cycle-slip I mentioned earlier. It does eventually acquire, but then loses fix for quite a long period near the end of the data (the yellow part of the line in the plot above)

Even more concerning, when we look at the acceleration plots, we see the same spikes in the M8T data as we did in the previous data set.

                       M8N                                                                          Reach  M8Twalker6

Again, these spikes align with erroneous fixed solution points in the M8T data as can be seen in deviations from the circle in the difference between the two solutions.

walker7

Clearly, changing the dynamic platform model setting in the receiver did not fix the problem. A couple of experienced users have commented that this setting does affect the front end of the receiver on earlier Ublox models and it’s still possible it affects the M8T as well, but it does not appear to be the cause of this problem. We will need to look elsewhere for the solution.

We are running identical RTKLIB solution code on the two data sets and we have verified that the receiver setup is nearly identical for both data sets. So what else can be different between them? One possibility is the conversion utility, RTKCONV, that translates the raw binary output from the receiver into the RINEX observation files that are used as input to the solution. Since the raw measurements are output by different commands on the two receivers, there are two different functions in the RTKCONV code that process them.

Let’s look first at the RTKCONV code to convert the data from the UBX_TRK_MEAS command used by the M8N receiver. I won’t show the code here but just describe functionally what happens in the code. Each data sample from the receiver contains a status bit indicating carrier-phase lock and a count of consecutive phase locks. RTKCONV sets the cycle-slip flag for that sample if the carrier-phase is valid and an internal code flag is set. If the carrier-phase is not valid, then the cycle-slip flag is left in its previous state. The internal code flag is set if the phase lock count is zero or less than the phase lock count for the previous sample and is only reset for the next sample if the carrier-phase is valid.

For the UBX_RXM_RAWX command used by the M8T receiver, there is also a status bit indicating carrier-phase lock. Instead of a count, there is a time for consecutive phase locks but functionally it is equivalent. The RTKCONV code for this command does not use an internal code flag and the cycle-slip flag is set or cleared directly every sample that the carrier-phase is valid. The cycle-slip flag is set if the phase lock time is zero or less than the previous sample and cleared otherwise.

I know that’s a bit confusing and the difference is fairly subtle. The most important thing to understand is that RTKCONV is doing more than just translating from binary to text, it is deciding which samples to set cycle-slips for based on a somewhat complicated algorithm, and that algorithm is different for M8N and M8T.

Basically, the effect of this difference is that for the M8N, if a cycle-slip is followed by an invalid phase then the next valid phase will always be flagged as a cycle-slip while for an M8T it won’t necessarily be so. It’s easier to understand by looking at a picture. The observation plots below show the location of one of the acceleration spikes in the M8T solution which is caused by a cycle-slip on satellite G05. The plot on the left shows which samples were flagged as cycle-slips with the existing code, the plot on the right shows the cycle-slips after modifying the code to be functionally equivalent to the M8N code. Note the extra two cycle-slips with the change.  The extra cycle-slips in this case are a good thing because they are flagging samples with large phase errors and preventing RTKLIB from incorporating them into the solution.

              Reach M8T before change                                   Reach M8T after changewalker8

Modifying the RTKCONV code for the UBX_RXM_RAWX command in this way and re-running the solution gives us the the position plot on the right and the acceleration on the left.

walker9

Much better than before! Not quite as good as the M8N but still a significant improvement. The difference between the M8N solution and the improved M8T solution is shown below.

walker10

Remember this should be a circle if both solutions are free of errors.  Actually in this case not quite a circle because there is also a separation between the two base stations ,but that effect is small and we can ignore it for now.  This is much better than the equivalent plot from the original M8T solution shown earlier. There are obviously some erroneous fixes even after the improvement, so this is still a work in progress, but I think it is a big step in the right direction.

This data set was quite a bit more challenging than the previous one and in reality both solutions are quite good given the number of cycle-slips we saw, but there is always room for improvement.

The other thing to note with this data set is that the better quality antenna made a big difference on the quality of the M8N solution.  I suspect I will not be going back to the old antenna!  I knew the nicer antenna would help but I have resisted using it till now because of the extra cost.  There are similar looking antennas with similar gain specs available for as little as $4 so I may try some of these to see how they compare.

I’m not going to post the code to my Github repository or my binaries until I’ve had a little more time to understand the remaining errors but if anyone wants to take a closer look now, here are the code modifications to ublox.c that I made in the decode_rxmrawx function.

walker11

Ublox M8N vs M8T

Thanks to the generosity of Emlid, I am now the proud owner of two of their Reach precision GPS units. At a list price of $570 for a pair, these fall more into the category of low-cost rather than the ultra-low cost receivers I have been working with, but they do allow me to do some comparisons between the two. Their units use the more capable (and more expensive) Ublox M8T receivers rather than my Ublox M8N receivers and also have higher quality (and more expensive) Tallysman 4721 antennas, rather than the cheap antennas that were included with my receivers.

To compare the two I collected a fairly challenging data set, with maximum distances from the base station of over two kilometers and maximum velocities of over 70 km/hr, with relatively unobstructed rovers, but partially obstructed base stations. I mounted one of my receivers and a Reach unit on top of my car and also one of each at the base location. Here is a Google earth plot of the ground track. The Google Earth plots are a really nice feature of RTKPLOT I have not used until now but have quickly become a fan of. I could not make this feature work with the 2.4.3 version of RTKPLOT so had to go back to the 2.4.2 version. I also found I had to specify the solution format (out-solformat) as “xyz” instead of the “enu” that I usually use to get the solution in a format Google can use. The track was over a mix of dirt and paved roads running through agricultural fields and next to residential areas.

union1

Here is a plot of the base station location, marked in red below. It is somewhat obstructed by the sheds and boats around it. I selected this location in part because it was very windy that day and this spot was fairly sheltered from the wind, but also thought it would make the solution a bit more challenging and therefore help differentiate the two receiver sets.

union2

Most of the rover’s route was relatively unobstructed, but there were a few scattered trees near the road in some locations. The plot below shows an example of one of these spots. Note also, that the ground track goes off the road a bit at the end of the loop. I suspect this is due to inaccuracies in the base station location (and maybe the Google maps) rather than the RTKLIB solution since I did not make any attempt to calibrate the base station against any known reference.

union3

Here’s the base station observation data, the M8N is on the left, the Reach M8T data on the right. From a cycle slip perspective they are fairly similar but we see a few of the satellites (G15, R16, and R18) are noticeably better with the Reach. This is most likely because of the higher quality antenna.

                                  M8N                                                              Reach M8Tunion4

Looking at the SNR vs elevation for both receivers, we see the Reach has noticeably higher signal strength especially in the lower elevation (10-20 degrees) region where it is most important. Again, this is to be expected with the higher quality antenna.  For some reason, the M8T SNR numbers are rounded off to the nearest integer. I don’t know why that is, but that is what makes the plots look like they are formatted differently.

                                 M8N                                                                       Reach M8Tunion5

Looking next at the rover data, here are the observations for the two receivers.

     M8N                                                                              Reach M8Tunion6a

The rovers were stationary until 00:34 and then moving till 00:57, then stationary again. While they were moving and at the end when they were stationary, the two look fairly similar from a cycle-slip perspective with maybe a slight advantage for the Reach. However, during the initial stationary period, there were several slips that occurred simultaneously on every satellite in the M8N data. I’ve circled one of them in blue above.  I believe these must be caused by some sort of discontinuity in the receiver and have nothing to do with the satellite signals themselves. My best guess is that they were caused by temperature fluctuations in the chip. It was a very hot day, with an intense sun heating the dark car roof, combined with a strong wind that would create a cooling effect. Because the antenna that was connected to the M8N receiver has only a one inch lead, it was mounted on top of the car in the sun, while the M8T receiver, with a longer antenna lead, was mounted inside the car and not subject to the same temperature fluctuations. Also, notice that the simultaneous slips all occurred near the beginning of the data set, possibly while the receiver was still reaching some sort of thermal equilibrium. To try and avoid this in the future, I will either switch to another inexpensive antenna I have with a longer lead, or let the receiver sit longer before starting to collect data.

Unfortunately these slips occurred during the initial stationary period I use for the first acquire and prevented that acquire from occurring. Rather than give up on the data, though, I decided to try running a solution with the M8N rover and the M8T base. I don’t know if it was because of the better signal strength, or because I just got lucky, but I was able to get an initial acquire with that combination. So for this exercise, the rest of the data is all based on a comparison between the M8N rover and the Reach M8T rover, both referenced to the M8T base station. It turns out that having a single base station also makes the accuracy analysis a little easier as I will describe later. This is not exactly the comparison I wanted to make, but one I think is still worth doing.

So how did they do? Here’s the solutions using my demo4 code. The input configurations were identical with one exception. Since with the M8T receivers, RTKLIB can resolve the GLONASS integer ambiguities without assistance, while with the M8N receivers, it can not, I set the input parameter “gloarmode” to “on” for the M8Ts and to “fix-and-hold” for the M8Ns. This enables my extension to the fix-and-hold feature to adjust for the additional errors on the GLONASS and SBAS satellites.

                             M8N                                                                        Reach M8Tunion7

The Reach solution (on the right) looks very clean with a very fast acquire and then 100% fixed values after that. The M8N solution (on the left) also acquired quickly but then ran into the simultaneous cycle-slips, causing problems until it re-acquired at 00:40. After that it stayed fixed for nearly 100% of the time with a short dropout around 00:49.

The next questions, of course, are: Do the solutions match? And are the fixes all accurate? To check this, I will use a similar technique I did earlier when I had only two receivers, both mounted on the rover. For that case, I solved for the distance between the two receivers which forms a circle equal to the distance between the receivers. In this case, I will solve for the distance between each rover and the base, then difference the two. This should also give us a circle with radius equal to the distance between the two rover receivers. Having only one base simplifies things by avoiding addition of a second term caused by the separation between the two base receivers. Here’s the result of that operation, plotted only for the fixed solution points since those are the ones we are counting on to be accurate. On the left is the position difference (x,y,z) and on the right is the ground track difference.

                           M8N                                                                            Reach M8Tunion8

The separation between the two receivers on the car roof was 15 cm and we see here that most of the points fall on the circle of that radius. However, there are several deviations, up to half a meter in error. They are also easy to see as spikes in the z-axis on the left plot. These are unexpected and quite concerning since we really rely on the fix status to let us know which points are valid and accurate.

Zooming in on the observation data for the largest of these spikes at around 00:51 shows that both rover receivers saw cycle slips on satellites G02 and G13 at this time, presumably from passing a tree or other obstruction.

                                   M8N                                                                Reach M8Tunion9

Zooming in on the position data for this point, shows discontinuities in the Reach data but not the M8N data which is surprising, I had expected the opposite. As the driver of the car, I am certain that I did not drive over any meter high cliffs during the test, so the Reach M8T data must be wrong.

                                 M8N                                                                          Reach M8Tunion10

Looking at other error samples shows the same thing. It is more easily seen as spikes in the accelerations as shown here, especially the z-axis accelerations which only occur on the M8T data and not the M8N data

                               M8N                                                                       Reach M8Tunion11

So what could be causing these errors? The two solutions used identical code and input parameters except for the fix-and-hold for GLONASS integer ambiguities setting. I reran the M8T solution with this parameter enabled to make them completely identical but this did not fix the problem. With the code being identical we have to suspect the difference is in the receivers themselves or at least in their setup. The next step I took was to use the Ublox u-center evaluation software to examine all the registers for both receivers. It’s a little trickier to do this with the Reach receiver since we don’t have direct access to the M8T chip, but there are instructions on setting up a tcp port in the Reach documentation, which is what I did.

For the most part, the differences between the two receiver setups were slight but I did find one significant difference. The receiver dynamic platform model settings are quite different. I have my M8N receivers set up for a “Pedestrian” model, while the Reach M8T receivers are set up for the “Airborne <4g” setting. According to the Ublox M8 Receiver Description document, that setting is only recommended for extremely dynamic environments. Here’s the details from the manual.

union12

I have heard differing opinions on whether this setting affects the front-end of the receiver or whether it only affects the back-end position calculations. If it only affected the back-end, then this setting would not matter since we use only the raw GPS measurements from the receiver. However, if it does affect the front-end, we might expect it to have an effect like this since it would most likely be opening up the bandwidths of the phase lock loops tracking the carrier-phases and thus minimizing any filtering effects.

So that’s where things stand at the moment with this data set. I see issues with both receiver sets and have speculated on what may be causing the problems but have not had time to verify that either one of my guesses is correct. I had hoped to do that before posting this article but it’s been almost two weeks since my last post, so I’ve decided to just go ahead and post what I have.

This also gives me the chance to ask if anybody else has seen similar issues and/or might have other possible explanations for what is going on?

I have also uploaded this data to my data set library.

Ublox M8N half cycle valid bit

In my last post I introduced a new more challenging data set with a higher number of cycle slips than in previous sets. Even after improving the RTKLIB code to better handle missing data samples, there are still a large number of points in the solution with only a float status (yellow in the plot). In addition, the last 15 minutes appears to have at least a meter of error in the vertical axis since the data was taken in a series of loops and the vertical measurements should be the same as earlier loops.

 

half_cycle1

By plotting the observations of the rover, we can see that there are many cycle-slips on a large number of the satellites. This is the primary cause of the poor solution. Here’s the rover observations with a 15 degree elevation mask.

half_cycle2

In order to improve this solution, we are going to have to take a closer look at the cycle slips. The first thing I noticed when examining a few of these cycle slips is that there are a lot of half cycle errors in the carrier-phase measurements following a reported cycle slip. This is true whether or not an actual cycle slip occurred. I had seen this earlier as well when doing an exercise of looking at the double differences. Here’s a plot from that previous post showing what I am talking about. The second red circle contains a number of samples all in error by almost exactly one half cycle.

half_cycle3

Let’s go back to the raw ublox data before it is converted into RINEX format to see if that can shed any light on what is going on. To see the raw unconverted ublox data I enabled the trace option when running convbin by adding “-trace” to the command line. I also had to change an “if #0” in the code in the decode_trkmeas() function in ublox.c to an “if #1” to output the full debug information. Below on the top is the rover observations zoomed into a time period around 6:45:27.0 and below is the raw ublox data for that sample.

half_cycle4a

half_cycle4b

The flag byte is what we are interested in. Since the M8N does not officially support raw output, there is no documentation for this data byte. From the existing ublox.c code though we can see how RTKLIB is using it. It is interpreting bit 5 (0x20) as phase lock and bit 6 (0x40) as the half cycle bit. When the half cycle bit is set, one half cycle is added to the carrier-phase measurement. The other 6 bits are being ignored.

Looking at the documentation for the M8T which does officially support raw data provides a hint as to what might be going on. In the description of the RAWX command one of the bits in the trkStat byte is used to indicate whether the half cycle bit is valid or not. Since the M8N and the M8T share the same core, it would be reasonable to assume there was an equivalent bit in the M8N.

We know from the data that the half cycle errors occur shortly after a reported cycle slip so let’s look at those satellites. From the observation plot we can see that in this case satellites G30, R03, R22, and R23 all have reported cycle slips in the last few seconds before the sample we have raw data for (6:45:27.0). If we look at the flag byte for those satellites, we can see that bit 7 (0x80) is clear for all four satellites. It is set for all the other satellites except G21 which has no valid data and I120 which is the SBAS satellite. Each line in the raw data is one satellite, “sys” is the system (0=GPS,1=SBAS,6=GLO) and “prn” is the satellite number. We can also see that when bit 7 is clear, the half cycle bit (0x40) is never set which would also be consistent with bit 7 being the “half cycle valid” flag.

I added the following line of code to the decode_trkmeas() function in ublox.c to update the observation with the half cycle valid status in the same way it is done for the M8T.  RTKLIB refers to this bit as the “Parity Unknown” bit but in a comment in the RTKPLOT part of the manual explains that this mean the half‐cycle ambiguities in carrier‐phase are not resolved.

     raw->obs.data[n].LLI[0]|=flag&0x80?0:2; /* half cycle valid */

I then reran the conversion using the convbin raw data conversion CUI. Plotting the updated observation data using RTKPLOT with the “Parity Unknown” option set to “ON” gave the following result:

half_cycle5

The gray ticks indicate unknown parity (i.e. half cycle invalid). Unfortunately the “unknown parity” ticks seem to take precedence over the “cycle slip” ticks, so any sample that has both shows up as “unknown parity”. This is why there seems to be less cycle slips in this plot than the original plot above.

So, let’s rerun the solution on the improved observation data using the same configuration settings as previously. The result is plotted below.

half_cycle6

This looks a lot better. Not only do we see a lot more fixes, but the obvious vertical errors between 7:15 and 7:30 have gone away. The two yellow sections in the ground track plot on the left align with the two locations where the car went directly underneath overhanging branches so it is not surprising that those spots are going to be more difficult to maintain fixes for.

There may be more opportunity for improvement here if we can figure out how to take better advantage of the half cycle status. The current RTKLIB code simply resets the phase bias estimate for that satellite every time this bit changes state.

I have uploaded this change to the demo4_b12 branch in my GitHub repository. I have also posted the modified rtkconv executable here.

The reason that I posted rtkconv (the GUI version) instead of convbin (the CUI version) is that I am now able to build the GUIs with my new Embarcadero compiler and because I have found that convbin does not seem to work for the newer (3.xx) formats of RINEX.  I prefer the newer formats because they are easier to parse.

RTKLIB: Receiver Dynamics and Outlier Rejection

Several readers now have mentioned that they have had to set the receiver dynamics option in the input configuration file to “off” when running solutions in real-time because of limited CPU bandwidth and that this leads to poorer results. I don’t have this problem in my experiments because I am post-processing the data and so the CPU does not need to keep up with the input data.  Hence I have always had this option set to “on”. But I hope to switch to real-time processing with an SBC at some point and decided to take a look at this issue.

First of all I tried disabling receiver dynamics and re-running the solution for the data set I introduced in the previous post, using my demo3 version of RTKLIB again. The plot on the left is position with receiver dynamics enabled, on the right is with dynamics disabled, otherwise the input options are identical.

dynamics1

Clearly there is some serious degradation with dynamics disabled! The difference is not a complete surprise because when we disable dynamics, we are throwing away some valuable information. The amount of degradation maybe should have been a clue that something else was wrong but at the time I didn’t investigate closely enough why things got worse. Instead I went ahead and implemented a “pseudo-dynamics” mode that uses a small fraction of the calculations of the full dynamics mode, but gives most of the benefit. I think this is a useful improvement and in fact it did make the problem go away and I will discuss that solution in the next post … but it turns out that even though it made the problem go away, it did not address the root cause, it just covered it back up again.

It wasn’t until I was testing this new feature that I started to see some strange things and  realized that the lack of dynamics was not enough to explain what was going on.

So let’s take a closer look at the results with dynamics disabled. Unfortunately there are no outputs visible to RTKPLOT or in the output files where the problem can be seen, so it requires digging into the trace files. Below are some snippets from the trace files showing the residuals of the initial double differences from a sample just as the solution first started to degrade. The residuals available in RTKPLOT and the output files are the residuals after the last iteration of the kalman filter and not the initial residuals.  These will be significantly smaller and so do not show the problem.

The trace on the left is with dynamics enabled, and the right is with them disabled. I will discuss more about how dynamics works in the next post, for now, if you are not familiar with the feature, just be aware that it improves RTKLIB’s initial guess of the receiver’s position each sample by using information from the previous positions.

dynamics2

The double difference residual for each satellite pair is listed after the “v=”. The L1 and P1 rows are for the phase measurements and the pseudorange measurements respectively. Because the initial position estimates are more accurate with dynamics on, you can see that the residuals on the left are significantly smaller than the ones on the right. Also, in this particular sample the receiver reported a cycle slip on satellite 33 and you can see the residuals are largest in both cases for this satellite. The most important difference between the two is that the larger residual with dynamics off was large enough to trigger the outlier rejection threshold, resulting in that residual to not be used as an input to the kalman filter. Introducing a non-linearity like this into a feedback loop always risks affecting its stability which looks like what happened here. Without any feedback, the errors continued to grow and to be rejected, eventually causing other residuals to be rejected, until the whole solution fell apart.

The threshold used by RTKLIB to reject outliers is adjustable and is set by the input parameter “pos2-rejionno” in the input configuration file. The name is an abbreviation for “reject innovations” although there seems to be an extra “o” . Innovations is a term for the error inputs to the kalman filter. The default value and the one I have been using in my experiments for this threshold is 30 meters. This is consistent with the two residuals rejected in the above example, both greater than 30.

There doesn’t seem to be anything magic about 30 meters, especially when we are striving for centimeter accuracy so I went ahead and increased it all the way up to 1000 meters to be sure I didn’t trip over it again, then re-ran the solution.  Here is the result. Position is on the left and the difference in position with dynamics on and off is on the right.

dynamics3

Increasing the outlier threshold completely eliminated the problem. What is more surprising is that there is very little difference in the position solution with dynamics on or off. The larger errors in the initial position estimate are still there as are the larger initial residuals but the additional iteration of the kalman filter is apparently able to remove nearly all of the initial position error as can be seen in the right plot above.  

So bottom line is, I don’t think outlier rejection is working properly in RTKLIB and I plan to leave this threshold at 1000 to effectively disable this feature until I see a need to re-enable it.

This problem is not limited to when receiver dynamics are turned off and can happen anytime large residuals occur.  For example, once I knew what to look for, I was able to find the same problem occurring in the initial transient at the beginning of the solution.

To demonstrate this I did another experiment.  In a previous post I described adjusting the solution start point around in the part of the data in which the rover was moving until I was able to get a bad fix. This time I did the same thing but in the part of the data in which the rover was stationary. I did this with the outlier threshold set back to 30.  Again I was able to find a start time that caused an initial bad fix. I checked the trace file for rejected outliers during the initial transient and sure enough they were there. So once again, I increased “pos2-rejionno” from 30 to 1000 and re-ran. The transient was almost entirely eliminated, and I got a good first fix. Here’s the position plots for the two cases, threshold=30 on the left, threshold=1000 on the right.

dynamics4

Notice the difference in y-axis scales and the size of the initial transient. With the threshold set to 1000, as would be expected, there were no outliers rejected in the trace file.

I suspect another thing that aggravates this problem in my case is when I adjusted the input parameter eratio1 (ratio of pseudorange measurement errors to carrier phase measurement errors) from 100 to 300.  This reduced the time to first fix but also increased the overshoot of the initial transient and hence would be more likely to trip the outlier threshold. 

So is there a risk that opening up this limit will cause other problems where data that should have been rejected is not? Possibly, but I suspect the benefits of opening up this limit will outweigh any downside. I plan to keep an eye out for true bad data points and deal with them once I have some real examples, but won’t worry about hypothetical cases for now.

So to sum up, I would suggest increasing this limit even if you are not seeing problems at the moment, and be on the lookout for “outlier rejected” messages in your trace files if you are having problems.