Google Smartphone Decimeter Challenge 2022

In a previous post, I described my experience using RTKLIB to analyze smartphone GNSS data from last year’s Google Smartphone Decimeter Challenge. In that case, I did not get involved until after the competition was complete. After making a few modifications to RTKLIB to handle the relatively low quality smartphone data, I was able to generate a set of solutions that would have placed 5th out of 811 teams in the final standings. I shared the code to duplicate my results in this code release on Github. It includes a custom version of RTKLIB with changes specifically made for the smartphone data, as well as a set of python scripts to automatically run solutions on all of the 2021 Google test rides.

Google is is hosting a second competition this year. It started in the beginning of May and will finish at the end of July. This year I decided to join the fun and submit some results while the competition was still ongoing.

Since last year, I had already incorporated all of the changes that were previously in only the GSDC version of RTKLIB, into the main branch of the demo5 fork of RTKLIB. These are in the latest b34f release, so the special release is no longer required.

Google changed the format of some of the files for this year’s competition and so I did have to rewrite the python scripts. One of the more significant changes they made was to include only one set of phone data for each ride in the test data set. Last year it was possible to combine results from multiple phones on a single ride to improve the results but that is not an option this time.

In order to encourage participation in this year’s competition, I have shared the code and instructions to duplicate my initial attempt on this year’s data in a Kaggle notebook . If followed correctly it will generate a score of 3.135 meters when submitted to Kaggle, the competition host. At the time I first published it, it was good enough for first place. However, the competition has picked up since then, and some teams have taken advantage of this code. It will no longer get you into first place, but it will still put you into a tie for 21st place out of 234 teams. This means that anyone interested in jumping in now can still start near the front of the pack.

Since sharing the notebook, I have made a few local tweaks to the code, config files, and python scripts which improve my score to 2.152 meters. This is currently good enough for first place, but given that there are nearly two months left in the competition, I don’t think this will be good enough to win without further improvements.

To keep things interesting, I don’t plan to share my most recent changes until the competition is complete but anyone who follows some of the suggested hints at the end of my Kaggle notebook should be able to get a good part of the way there. To get all the way there will require a little more ingenuity but I also believe there is still plenty of room for further improvement on my results.

However, I suspect that winning the competition using RTKLIB will require more than just configuration changes and python script changes. I believe it will also require making changes to the RTKLIB code itself.

As anyone who has worked with the RTKLIB code is probably aware, it can be quite a challenging environment to work in. To make things easier and to encourage innovation to the code and algorithms I have recently ported a subset of RTKLIB sufficient to generate PPK solutions into Python which I described in this post. The actual code is available on Github here. I have also generated a second Kaggle notebook with instructions on duplicating the C/C++ version results on the Google data with the Python code. I have not actually submitted the results of this code to Kaggle, but based on results from this year’s training data set, and last year’s test data set, I believe this code should give slightly better results than the C/C++ code.

The python code is primarily intended for those planning to develop or modify algorithms internal to the PPK solutions and not just running the code as-is or with just configuration changes. For those users, the C code will run much faster. However, the python version provides a friendlier development platform. When development is complete, the modified python code can either be run on the complete data set on a faster PC with a little patience, or the completed changes can be fairly easily ported back into the C code since two code sets are very closely aligned. This alignment includes file names, function names, variable names, and comments. The code does not align on a line by line basis because of extensive use of Numpy in the python code, but structurally it is very similar.

Based on the discussion threads on the Kaggle forum for this competition, it appears that most competitors are more familiar with machine learning and post-solution filtering techniques than they are with GNSS theory. I suspect anyone who already has a reasonably solid background in GNSS can do quite well in the competition without an enormous amount of effort. Using some of the tools I describe here should help to get there even more quickly.

My hope is that providing these tools will encourage at least a few more people from the GNSS community to participate and help them to do well. For any of you who decide to take the challenge, I wish you good luck and hope to see you near the top of the leaderboard!

11 thoughts on “Google Smartphone Decimeter Challenge 2022”

  1. Thank you, This is excellent work!
    But I have a question, how to gracefully calculate and adjust the values of the three variables in (stats-eratio)?
    Maybe this question can be expressed more clearly, divided into several parts:
    1. How to calculate quantitatively (stats-eratio) (L1, L2, L5)?
    2. How to adjust these variables according to the data to achieve a higher Fix rate?
    In your previous articles, we can see the trend of data adjustment, when the residual of the L5 band is small, we should appropriately reduce the value of (stats-eratio) (L5) to increase the L5 band data Weights in Kalman Filtering
    For example in this article,”Cell phone RTK/PPK -How important is a ground plane?”we can see this “To take advantage of this difference, I adjusted the code/carrier phase error ratio for L1 (stats-eratio1) from 300 to 1500 and left the L5 error ratio (stats-eratio5) at 300. This will cause the kalman filter to weight the L5 pseudorange observations more heavily than the L1 observations.”My question is, how are the values of 300 and 1500 calculated?
    Thanks!

    Liked by 1 person

    1. The eratio parameters represent how much greater the carrier phase measurements are weighted relative to the pseudorange measurements in the Kalman filter. At least as a starting point, they should be set to roughly the ratio of the errors in the pseudorange measurements to the errors in the carrier phase measurements. It is difficult to measure exactly what this ratio is, but looking at the solution residuals between the two will at least give an estimate of this number. From here it can be adjusted up and down by trial and error to see if the performance can be improved.

      Like

  2. Thank you, you’re work is truly amazing, i am really looking to dig into your version of RTKLIB.
    I just finished reading your article about last years competition, and i have noticed that you didn’t mention anything about correction data, so i assume that the competition doesn’t allow the use of correction data from base stations.
    How accurate do you think RTKLIB demo5 can get when applied on Android GNSS data, when combined with RTCM correction data from base stations? Is it possible to get decimeter accuracy in open-sky? or even in urban canyons?

    Like

    1. The competition does allow using any data source that is publicly available including base station data. All of the solutions I describe for this and the previous competition were PPK solutions using nearby CORS stations for base station data, although the data is too low quality to attempt any ambiguity resolution. Duty cycling is turned off for this data but because the phones are inside the vehicle and are not using ground planes, the quality is very low. I am currently seeing 50 percentile horizontal errors a little below one meter for the higher quality rides.

      Like

      1. Thank you! Theoretically do you believe that this results could improve significantly if the phones were outside the cars? for example if it was hand held by a pedestrian walking the same path

        Like

          1. Here are some example error scores for the RTKLIB solutions from the 2022 training set. The results are in 50%, 95%, 50%/95% mean.

            2020-05-15-US-MTV-1/GooglePixel4XL , 1.210, 4.654, 2.932
            2020-05-21-US-MTV-1/GooglePixel4 , 1.470, 2.722, 2.096
            2020-05-21-US-MTV-1/GooglePixel4XL , 0.905, 1.797, 1.351
            2020-05-21-US-MTV-2/GooglePixel4 , 0.923, 2.006, 1.464
            2020-05-21-US-MTV-2/GooglePixel4XL , 1.447, 2.928, 2.187
            2020-05-28-US-MTV-2/GooglePixel4 , 1.175, 2.088, 1.631
            2020-05-28-US-MTV-2/GooglePixel4XL , 1.056, 2.189, 1.623
            2020-05-29-US-MTV-1/GooglePixel4 , 0.527, 1.743, 1.135
            2020-05-29-US-MTV-1/GooglePixel4XL , 0.681, 1.664, 1.172
            2020-05-29-US-MTV-2/GooglePixel4 , 1.310, 2.101, 1.705
            2020-05-29-US-MTV-2/GooglePixel4XL , 0.849, 1.466, 1.157

            Like

        1. The only time I have been able to achieve centimeter level accuracy with a smartphone is when the smartphone is mounted on a ground plane and is static as I describe in this post. I have tried analyzing data from a phone mounted on top of a car which provides a good ground plane but was not able to resolve the integer ambiguities for this data set so the errors were relatively large although better than from the phones mounted inside the car.

          Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.