Google Smartphone Decimeter Challenge 2022

In a previous post, I described my experience using RTKLIB to analyze smartphone GNSS data from last year’s Google Smartphone Decimeter Challenge. In that case, I did not get involved until after the competition was complete. After making a few modifications to RTKLIB to handle the relatively low quality smartphone data, I was able to generate a set of solutions that would have placed 5th out of 811 teams in the final standings. I shared the code to duplicate my results in this code release on Github. It includes a custom version of RTKLIB with changes specifically made for the smartphone data, as well as a set of python scripts to automatically run solutions on all of the 2021 Google test rides.

Google is is hosting a second competition this year. It started in the beginning of May and will finish at the end of July. This year I decided to join the fun and submit some results while the competition was still ongoing.

Since last year, I had already incorporated all of the changes that were previously in only the GSDC version of RTKLIB, into the main branch of the demo5 fork of RTKLIB. These are in the latest b34f release, so the special release is no longer required.

Google changed the format of some of the files for this year’s competition and so I did have to rewrite the python scripts. One of the more significant changes they made was to include only one set of phone data for each ride in the test data set. Last year it was possible to combine results from multiple phones on a single ride to improve the results but that is not an option this time.

In order to encourage participation in this year’s competition, I have shared the code and instructions to duplicate my initial attempt on this year’s data in a Kaggle notebook . If followed correctly it will generate a score of 3.135 meters when submitted to Kaggle, the competition host. At the time I first published it, it was good enough for first place. However, the competition has picked up since then, and some teams have taken advantage of this code. It will no longer get you into first place, but it will still put you into a tie for 21st place out of 234 teams. This means that anyone interested in jumping in now can still start near the front of the pack.

Since sharing the notebook, I have made a few local tweaks to the code, config files, and python scripts which improve my score to 2.152 meters. This is currently good enough for first place, but given that there are nearly two months left in the competition, I don’t think this will be good enough to win without further improvements.

To keep things interesting, I don’t plan to share my most recent changes until the competition is complete but anyone who follows some of the suggested hints at the end of my Kaggle notebook should be able to get a good part of the way there. To get all the way there will require a little more ingenuity but I also believe there is still plenty of room for further improvement on my results.

However, I suspect that winning the competition using RTKLIB will require more than just configuration changes and python script changes. I believe it will also require making changes to the RTKLIB code itself.

As anyone who has worked with the RTKLIB code is probably aware, it can be quite a challenging environment to work in. To make things easier and to encourage innovation to the code and algorithms I have recently ported a subset of RTKLIB sufficient to generate PPK solutions into Python which I described in this post. The actual code is available on Github here. I have also generated a second Kaggle notebook with instructions on duplicating the C/C++ version results on the Google data with the Python code. I have not actually submitted the results of this code to Kaggle, but based on results from this year’s training data set, and last year’s test data set, I believe this code should give slightly better results than the C/C++ code.

The python code is primarily intended for those planning to develop or modify algorithms internal to the PPK solutions and not just running the code as-is or with just configuration changes. For those users, the C code will run much faster. However, the python version provides a friendlier development platform. When development is complete, the modified python code can either be run on the complete data set on a faster PC with a little patience, or the completed changes can be fairly easily ported back into the C code since two code sets are very closely aligned. This alignment includes file names, function names, variable names, and comments. The code does not align on a line by line basis because of extensive use of Numpy in the python code, but structurally it is very similar.

Based on the discussion threads on the Kaggle forum for this competition, it appears that most competitors are more familiar with machine learning and post-solution filtering techniques than they are with GNSS theory. I suspect anyone who already has a reasonably solid background in GNSS can do quite well in the competition without an enormous amount of effort. Using some of the tools I describe here should help to get there even more quickly.

My hope is that providing these tools will encourage at least a few more people from the GNSS community to participate and help them to do well. For any of you who decide to take the challenge, I wish you good luck and hope to see you near the top of the leaderboard!