Implementing the GPSTest Database: A Serverless Architecture

Recently, I set out to implement a new feature in the GPSTest Android app — crowdsourcing device Global Navigation Satellite System (GNSS) capabilities. You can read more about why and a description of the new feature in this article. I’d suggest reading that article before this one, as it will make a lot more sense. And if you want to see the data from the end result — the GPSTest Database — you can see it here.

In this article, I also wanted to share exactly how I implemented this new feature in case it’s useful to other developers. The GPSTest Android app is open-source on GitHub, and I wanted to share the design and software behind this “server-based” feature as well.

Requirements

Allowing users to contribute their Android device GNSS features to a central database obviously required some software infrastructure beyond the Android app itself — a first for any feature in GPSTest.

Typically this architecture would involve the mobile app making an HTTP request to a web server, which would then store the information in a database server.

Traditional architecture for sending information from a mobile device to a database server

I would also typically need to implement a web application to retrieve the information from the database and make it visible to users in a web browser.

Like many developers, I really didn’t want to get into the responsibility and costs of maintaining my own web and database servers. GPSTest is something I work on in my spare time. It’s free to download and doesn’t contain ads — there isn’t a dedicated funding source for this application.

So here were my basic goals:

  • Don’t require a stand-alone web server
  • Don’t require a stand-alone database server
  • Write as little server code as possible — including no custom web application to view the data
  • Write server code in a standard programming language (vs. custom scripts for proprietary environments)

Implementation

Here’s an overview of the architecture I decided to use for the GPSTest Database:

GPSTest sends the device info to a Google App Script, which stores it in a Google Sheet

The GPSTest mobile app sends an HTTP GET request with the device info to a Google App Script, which then stores the data in the Google Sheet and sends a success or failure message back to the app.

I’ve uploaded a copy of the source code of the Google App Script to GitHub here, although I edit, test, and deploy it within the Google App Script Editor. It’s written in JavaScript, so it’s nice that the bulk of the data processing code isn’t specific to Google’s infrastructure. Included in this file are also several Google App Script JavaScript functions that run offline in batch mode to do cleanup of submissions (more on this below). These scripts run offline because looping through and comparing a lot of data takes time, and we don’t want the mobile app to wait for this process before getting an HTTP response.

This implementation is referred to as a “server-less” architecture because I’m not responsible for maintaining any servers — I just write some code in a doGet() method to obtain the HTTP GET parameters and set the value on a Sheet object, and magically it executes somewhere in the cloud (triggered on an incoming HTTP request to Google’s infrastructure) and we see the result in the Google Sheet.

Google App Script code in the doGet() method receives data from the mobile app and stores it in the Google Sheet

Similarly, I can configure Google App Script to trigger other JavaScript methods for the data cleanup functionality based on timers — no code necessary to schedule these events.

Another benefit of this design is that because the Google Sheet is publicly viewable, I don’t have to implement a separate web application to allow people to view the data — the default Google Sheet user interface works great for things like sorting or searching for records.

Google Sheets serves as a user-interface to the GPSTest Database for Sort, Find, and more operations

I can still control who can change the contents — only I can edit it directly, and everyone else has a read-only view. Users can also download the information as a CSV or Excel file or copy it to their own Google Sheet for their own purposes. And, if another developer wants to query this Sheet programmatically, there are APIs for that.

Once I had this basic architecture in place for submitting data to the database, there was some specific additional functionality I needed to tackle — including the offline data cleanup — which is discussed in the following subsections.

Avoiding duplicate uploads

First, a client-side optimization to prevent duplicate records — after successfully sending the data, the GPSTest app stores a hash of the data. When you tap “Upload” again, it will only submit new information if something changed (i.e., the hash is different) and will tell the user if nothing has changed. This cuts down on duplicate submissions that need to be cleaned up (see below), and also makes it easier for you to track if something changed on your device.

Adding device name

Strangely, Android doesn’t include an API for retrieving the device name — it only offers the device manufacturer and model, like “Samsung SM-G955U”. But that’s not very human-readable. So where does the Name column in the database come from?

Fortunately, Google published a CSV file mapping the manufacturer and model to device names like “Galaxy S8+”. I wrote a script to pull this file every few hours and populate the Name column with this data. I also found that some newer models weren’t listed in the Google file, so I’m keeping a mapping for these newer devices in another file on GitHub. The script parses this file as well. If you find that your device name remains blank for more than a few hours, feel free to contribute it with the make and model on GitHub!

Removing (near) duplicate submissions

There are obviously a lot of devices out there running GPSTest that are the same make and model. When crowdsourcing data, how do we deal with a large number of submissions that may be identical, or close to identical?

I wrote another script to to help with this. First, every 12 hours identical records (aside from the submission timestamp) are removed from the database. That’s the easy part.

But what about records that are almost the same, and might differ by only one or two fields? This is harder, and the exact criteria will likely evolve as I continue to see more data. It also requires a detailed explanation of some fields in the database. Instead of repeating that here, please head over to the first article if you haven’t already and read the “Key features” section — I’ll wait.

Ok — first, we need to figure out which devices are the “same” model device. The script does this by comparing:

  • Manufacturer
  • Model
  • GNSS hardware year
  • GNSS hardware model name

If any one of these fields is different, both records will be kept. As mentioned in the other article, I’ve seen cases where the GNSS hardware model name version changes with a firmware update, which also corresponds with changes in capabilities. For example, in the March 2021 Pixel 5 update the hardware name changed from redfin;MPSS.HI.2.0.c8–00003-SAIPAN_GEN_PACK-1.335488.1.336118.1 to redfin;MPSS.HI.2.0.c8–00069-SAIPAN_GEN_PACK-1.364697.1.369458.1, which also corresponded to the Pixel 5 Carrier Phase support going from UNSUPPORTED to SUPPORTED. Keeping both records helps users understand why a feature may or may not be working on their particular device depending on what software version they are on.

The script also prefers records with more GNSS constellations and carrier frequencies and more SBAS constellations and carrier frequencies (see “Caveats” section in the first article). It also prefers records that contain Carrier Phase (accumulated delta range (ADR)) over those that don’t contain this information — this is due to a bug in early GPSTest beta testing that forced me to discard the ADR column for early submissions. So as users contribute new data for these same devices, we want to replace the records that are missing ADR info.

Closing thoughts

This particular implementation met my goals, but it certainly isn’t the only way to architect this type of system. Do you have suggestions for improvements, or alternate ideas of how to approach it? Let me know in the comments below!

Thanks to everyone who helped with the testing of this new feature. I’d also like to give a shout out in particular Cagri Cetin, who implemented an earlier version of this server-less architecture in the OneBusAway app. This implementation in GPSTest was inspired by it’s success!

And if you have an Android device, please contribute to the GPSTest Database! Check out the “Tips & tricks” section of the first article, and then submit away!

Was this article helpful? Consider following me on Medium. If you’re a user of the open-source GPSTest app and you’d like to support it, you can check out the GPSTest “Buy me a coffee” page:

Improving the world, one byte at a time. @sjbarbeau, https://github.com/barbeau, https://www.linkedin.com/in/seanbarbeau/. I work @CUTRUSF. Posts are my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store