Implementing the GPSTest Database: A Serverless Architecture
Recently, I set out to implement a new feature in the GPSTest Android app — crowdsourcing device Global Navigation Satellite System (GNSS) capabilities. You can read more about why and a description of the new feature in this article. I’d suggest reading that article before this one, as it will make a lot more sense. And if you want to see the data from the end result — the GPSTest Database — you can see it here.
In this article, I also wanted to share exactly how I implemented this new feature in case it’s useful to other developers. The GPSTest Android app is open-source on GitHub, and I wanted to share the design and software behind this “server-based” feature as well.
Allowing users to contribute their Android device GNSS features to a central database obviously required some software infrastructure beyond the Android app itself — a first for any feature in GPSTest.
Typically this architecture would involve the mobile app making an HTTP request to a web server, which would then store the information in a database server.
I would also typically need to implement a web application to retrieve the information from the database and make it visible to users in a web browser.
Like many developers, I really didn’t want to get into the responsibility and costs of maintaining my own web and database servers. GPSTest is something I work on in my spare time. It’s free to download and doesn’t contain ads — there isn’t a dedicated funding source for this application.
So here were my basic goals:
- Don’t require a stand-alone web server
- Don’t require a stand-alone database server
- Write as little server code as possible — including no custom web application to view the data
- Write server code in a standard programming language (vs. custom scripts for proprietary environments)
Here’s an overview of the architecture I decided to use for the GPSTest Database:
The GPSTest mobile app sends an HTTP GET request with the device info to a Google App Script, which then stores the data in the Google Sheet and sends a success or failure message back to the app.
This implementation is referred to as a “server-less” architecture because I’m not responsible for maintaining any servers — I just write some code in a
doGet() method to obtain the HTTP GET parameters and set the value on a
Sheet object, and magically it executes somewhere in the cloud (triggered on an incoming HTTP request to Google’s infrastructure) and we see the result in the Google Sheet.
Another benefit of this design is that because the Google Sheet is publicly viewable, I don’t have to implement a separate web application to allow people to view the data — the default Google Sheet user interface works great for things like sorting or searching for records.
I can still control who can change the contents — only I can edit it directly, and everyone else has a read-only view. Users can also download the information as a CSV or Excel file or copy it to their own Google Sheet for their own purposes. And, if another developer wants to query this Sheet programmatically, there are APIs for that.
Once I had this basic architecture in place for submitting data to the database, there was some specific additional functionality I needed to tackle — including the offline data cleanup — which is discussed in the following subsections.
Avoiding duplicate uploads
First, a client-side optimization to prevent duplicate records — after successfully sending the data, the GPSTest app stores a hash of the data. When you tap “Upload” again, it will only submit new information if something changed (i.e., the hash is different) and will tell the user if nothing has changed. This cuts down on duplicate submissions that need to be cleaned up (see below), and also makes it easier for you to track if something changed on your device.
Adding device name
Strangely, Android doesn’t include an API for retrieving the device name — it only offers the device manufacturer and model, like “Samsung SM-G955U”. But that’s not very human-readable. So where does the Name column in the database come from?
Fortunately, Google published a CSV file mapping the manufacturer and model to device names like “Galaxy S8+”. I wrote a script to pull this file every few hours and populate the Name column with this data. I also found that some newer models weren’t listed in the Google file, so I’m keeping a mapping for these newer devices in another file on GitHub. The script parses this file as well. If you find that your device name remains blank for more than a few hours, feel free to contribute it with the make and model on GitHub!
Removing (near) duplicate submissions
There are obviously a lot of devices out there running GPSTest that are the same make and model. When crowdsourcing data, how do we deal with a large number of submissions that may be identical, or close to identical?
I wrote another script to to help with this. First, every 12 hours identical records (aside from the submission timestamp) are removed from the database. That’s the easy part.
But what about records that are almost the same, and might differ by only one or two fields? This is harder, and the exact criteria will likely evolve as I continue to see more data. It also requires a detailed explanation of some fields in the database. Instead of repeating that here, please head over to the first article if you haven’t already and read the “Key features” section — I’ll wait.
Ok — first, we need to figure out which devices are the “same” model device. The script does this by comparing:
- GNSS hardware year
- GNSS hardware model name
If any one of these fields is different, both records will be kept. As mentioned in the other article, I’ve seen cases where the GNSS hardware model name version changes with a firmware update, which also corresponds with changes in capabilities. For example, in the March 2021 Pixel 5 update the hardware name changed from
redfin;MPSS.HI.2.0.c8–00069-SAIPAN_GEN_PACK-1.364697.1.369458.1, which also corresponded to the Pixel 5 Carrier Phase support going from
SUPPORTED. Keeping both records helps users understand why a feature may or may not be working on their particular device depending on what software version they are on.
The script also prefers records with more GNSS constellations and carrier frequencies and more SBAS constellations and carrier frequencies (see “Caveats” section in the first article). It also prefers records that contain Carrier Phase (accumulated delta range (ADR)) over those that don’t contain this information — this is due to a bug in early GPSTest beta testing that forced me to discard the ADR column for early submissions. So as users contribute new data for these same devices, we want to replace the records that are missing ADR info.
This particular implementation met my goals, but it certainly isn’t the only way to architect this type of system. Do you have suggestions for improvements, or alternate ideas of how to approach it? Let me know in the comments below!
Thanks to everyone who helped with the testing of this new feature. I’d also like to give a shout out in particular Cagri Cetin, who implemented an earlier version of this server-less architecture in the OneBusAway app. This implementation in GPSTest was inspired by it’s success!