The Coronavirus causes heavy damange to different industry sectors. Therefore, one hope is to contain spread with Coronavirus Geotracking apps and cell analysis. One potential solution are Bluetooth driven apps which do not violate privacy or to do Coronavirus geotracking with third party data. Another solution is complete Coronavirus geotracking with Big Time Series Analysis as well as storing and doing analysis with Time Series Database backend.
In this article, we discuss how extended geotracking with smartphone apps works. Forthermore, we examplain how location fingerprinting to gain more data privacy can be realized on top of extended location data.
Coronavirus Geotracking VS Anonymous Bluetooth tracking
In a previous article, we discussed anonymous tracking via Bluetooth.
The huge disadvantage of Coronavirus geotracking apps over pure Bluetooth solutions is the lack of privacy.
On the other hand, extended Coronavirus geotracking opens much more preceise matching and allows to understand infection chains also from a geographic perspective.
In addition, extended tracking can also use a fused approach. It is possible to go way beyond pure GPS ccordinates.
Tracking which WiFi networks that are received in a certain strength or which are connected leads to new insights. For example, when two smartphones are connected to the same WiFi there is a high likelyhood that more than one person were in the same room and when two different WiFis are connected at the same time those are likely different flats even if the GPS position is similar.
The same it is for Wifis which goes for near Bluetooth signals for devices (speakers, keyboards and so on); which, are found near a device to gain more inisghts where other smartphones were at the same time near the same devices. In addition, smartphones themselves can broadcast Bluetooth signals themselves that are then recived by other smartphones.
An additional data point is also the signal strength of cell towers. This can reveal insights when two people have very same readings of all cell towers around.
The additional data which is purely on geolocation makes it possible to to compute the distance and infection between two smartphones in a much higher precision.
All of these data offers new possibilities and a much larger analysis vector then pure Bluetooth or GPS that can lead to complete new insights when machine learning is applied.
Therefore countries, politicians and users have to decide if the benefits of advanced Coronavirus geotracking outweights the privacy concerns.
In the following, we describe how an advanced Coronavirus geotracking approach is implemented.
How Coronavirus Geotracking works
- A smartphone app receives different location indicating data. These are:
– Cell tower signals of the mobile provider
– Connected WiFi and the signal strength of non connected WiFis
– Bluetooth devices and Bluetooth Beacons near the smartphone
– GPS position
– A triangulated location from the operating system provider of the mobile phone - All the priorly described location data is sent to a Coronavirus database or cached locally in the smartphone.
- Once a person is infected it reports the infection to the central database.
- The infected person movement history is matched with movements of other people to see which people were at similar corrdinates, logged into the same WiFi or recieving the same Bluetooth Beacons.
The matching can be done in a central database where all users transmit their telemetry regularly or there is an interface (API) where healthy users can match their movement profile. - From the matching, users receive a risk alert when they are in the same WiFi, recieving the same Bluetooth Beacons or having been at the same geocoordinates at the same time.
Coronavirus Geotracking app
Developing a background Coronavirus geotracking app which collects Cell-tower signals, WiFis, Bluetooth devices, GPS and locations form the operating system is challenging.
The main challenge is that the Coronavirus geotracking App resides in the background and receives regular location updates but does still not use too much energy. For instance, GPS tracking consumes battery heavily, wherelse cell signal scanning does not consume a lot of power.
iPhones have app-running-in-background restructions which we have already discussed when talking about the pure Bluetooth App. Therefore, we focus in the following on Android.
Low energy scanning
In order to save battery when doing location tracking, one needs to consider that scanning intervals and also scanning types (WiFi, GPS, Bluetooth) creates huge implications on battery life.
Therefore, one can use energy efficent and passive scans which do not eat into battery to decide which signals are captured and if GPS is turned on.
From experience, we propose the following states for decisions on battery heavy or light location scans.
- Refinement
Often, the location provider from the operating system first delivers a coarse grained location, and then a few seconds later, a more fine grained location which is within the proximity of the coarse grained location.
When that happens, a GPS and WiFi update is not needed and it is enough to scan for Bluetooth Devices. - Tiny Movement
The last location is moved in a tiny way (<40m) . The passive location provider from the OS shows only a tiny distance update of the cell tower readings and the WiFi signals do not differ much. Alternatively, a network to a Wifi is established that was before just readable but not connected.
A cell update and updating WiFi signals and Bluetooth should be done. - Movement
A WiFi Network is disconnected, the signal strength of cell towers differs a lot of the operating system indicates a larger movement.
All signals inclusively of GPS should be updated.
Since this regularly happens when moving in a vehicle the GPS update should be scheduled till the moving stops to avoid continous movement tracking. However, since other people can be in the moving vehicle, Bluetooth and Wifi scans should be done as they are more energy efficent.
Location Datatypes
Another key challenge is the strength of the signals. Thereby, it is not enough to just store the devices and networks which are seen, but also how long they were there to be able to match it with other people who were near the devices later.
Here is how one can store the Bluetooth devices as a list.
// when the device appeared for the first time Date seenFrom; // when the device appeared for the last time Data seenTo; // the maximum distance float maxdistance; // the Mac of the device bluetoothAddress; // Optional things like a url for Eddystone becons, Beacon Regions ... Map<String,String> additions;
With simple rules then the similarity between two scans can be computed to derive the movement type. An easy way is listed below and actual implementation can use even more complex measures to compute the movement from the signals strength.
int similarity = 0; for (BluetoothDevice btDevice: btDevices) { for (BluetoothDevice priorBtDevice : priorBtDevices) if(priorBtDevice.bluetoothAddress.equals( btDevice.bluetoothAddress)) { similarity=similarity++; continue; } } }
Similary, to Bluetooth devices, routers around a smartphone can be scanned and assembled into a list. Likewise, we show the properties of scanned WiFis and one can see it is similar to the Bluetooth devices and the same goes for cell information.
An example WiFi record // when the device appeared for the first time Date seenFrom; // when the device appeared for the last time Data seenTo; // the hardware address of the router String BSSID; String ssid; // the signal strength over the reading period int signalDbMin; int signalDbMax; int signalDbAvg; //If the smartphone was connected to the network boolean isConnected;
All in all, this leads then to a detailed location information, containing Bluetooth, WiFi and Cell Tower information. This extended location can then be used to track location history and to compute the movement to prior locations.
List<CellInfo> cellInfo; List<WifiInformation> wifiInformation; List<BluetoothBeacon> bluetoothInformation; GPSLocation operatingSystemtriangulatedLocation; GPSLocation gpsLocation;
We see that the information is now much more extended then a pure GPS position which has the advantage that often GPS is outdated and battery intensive. This makes it possible to match potential infection contacts based on additional criteria.
Extended location Time Series Analysis
In the following, we see Time Series Data of four different smartphones.
Smartphones recognize different signals and GPS positions. We imagine that one of the records could be of an infected person and we desire to find out possible matches and how high the likelhood of an infection is.
Timefrom | Duration | Smartphone ID | Bluetooth devices | WiFis | Connected WiFi | Cell Tower IDs | Best GPS (OS and Signal) |
---|---|---|---|---|---|---|---|
14:00 | 15 | 1 | b3:c0:83:16:27:cd fa:c9:27:cb:8e:40 fe:41:85:56:ee:78 | 84:56:04:a4:6b:58 5d:03:4c:ea:21:d9 | 5d:03:4c:ea:21:d9 | 11111 22222 | 49.2949,8.6468 |
14:03 | 15 | 2 | – | 5d:03:4c:ea:21:d9 | 5d:03:4c:ea:21:d9 | 33333 | 49.2949,8.6468 |
14:05 | 25 | 3 | b3:c0:83:16:27:cd | 84:56:04:a4:6b:58 | – | 22222 | 49.294,8.646 |
14:07 | 3 | 4 | – | 84:56:04:a4:6b:58 5d:03:4c:ea:21:d9 | – | 33333 | 49.2949,8.6468 |
… |
Time Series Analysis for matching infection risks
In the simplest analysis case these Time Series Data can now be loaded into a Time Series Database and then time series analyis queries can be done on top of it to find likely matches. In the following we show a few examples.
Query | Bluetooth | Wifis | Connected Wifs | Cell Towers | GPS | Smartphone Shared Minutes |
---|---|---|---|---|---|---|
Nearby Bluetooth and GPS distance | b3:c0:83:16:27:cd | – | – | – | 110 m precison | 1,3 – 10 Minutes |
Nearby WiFi | – | 84:56:04:a4:6b:58 5d:03:4c:ea:21:d9 | – | – | – | 1,4 – 3 Minutes |
Connected WiFi | – | – | 5d:03:4c:ea:21:d9 | – | – | 1,2 – 12 Minutes |
Same GPS | – | – | – | – | 11 m precison | 1, 2 – 12 Minutes 1, 4 – 3 Minutes 2, 4 – 3 Minutes |
Same Bluetooth and GPS distance
We query for the same received Bluetooth devices and the GPS distance. We find two smartphones, 1 and 3, which received the same Bluetooth device for 10 minutes.
The GPS likelihood of distance is 110 meters, and that could altogether be an indicator for a possibility of an infection.
Nearby WiFi
WiFi signals can be an indicator if people were near another. Hence, we see a Time Series Database query answering the question which smartphones received the same WiFis at the same time.
The result shows that smartphone 1 and 4 shared this for 3 minutes. One can now do a longer analysis and see the signal strength or a longer time period to have a clear indication if the smartphones have been in the same room.
Connected WiFi
Out of the signal time series data we also query a Time Series Database which smartphones used the WiFi and how long.
This is especially important as WiFis normally belongs to one entity, such as a household or a restaurant, and it is therefore a good indicator of potential infection.
We see the Time Series Database query results results in a 12 minute shared time from smartphone 1 and 2.
Same GPS
Last but not least, GPS coordinates can be used to match devices which have similar GPS coordinates at the same time.
We see different matching results for the different smartphone pairs. 1 and 2 were likely within a radius of 11 meters for 12 minutes, wereby 1,2 and 4 only were 3 minutes near another.
Fused querying; Signal strength; Downsampling
All in all, the queries we show before are examples that can be combined. One can provide queries which specify devices which are being used at the same time in the same WiFi or being at the same geolocation.
Out of using these different criterias together, it provides a higher accuracy than pure geolocations for matching can achieve.
Furthermore, the signal strength, accuracy likelihood and concrete timings can be used in addition when a GPS or other signal was found. Signal strength, timings and accuracy values help to then introduce new matching criteria and to advance the matching results.
An important factor in getting more precision is also applied downsampling and aggregation what is naturally supported by Time Series Databases. Thereby, signal readings is condesed over a time interval to extract averages, peaks and most common signal values in that time.
The nature that an intense Corona infection likelyhood implies that there is likely a 15 minute contact within a distance of two meters. Hence, signal readings can be condensed via aggregation rules to time intervals lesser than 15 minutes (e.g. 5 minutes) where the most descriptive readings are used for matching afterwards. These most descriptive factors can be the average signal strength of received WiFis or the precision digit of the GPS coordinate which did not change in that time. Typically this downsampling and aggregation into time intervals
Downsampling, signal readings and querying for multiple factors can then be used together to increase the matching accuracy of potential infected people.
Value of extended Time Series Analysis
Extented location tracking based on signals aside GPS opens new precision possibilities to determine when people have access to the same Bluetooth device or when they are in the same WiFi. One even could go further and use the earth magnetic field for positioning or just compare magnetic fingerprint readings from smartphones.
All readings and signals together with GPS increase the likelyhood of better location matching when people have been near another.
By storing these time series data then in a Time Series Database an efficent way of aggreating and downsampling is available.
Extended location fingerprinting
The huge downside of extended Coronavirus geotracking is privacy where other approaches without a and data transmission offer more secrecy for the user.
With readings to which devices and WiFis a user is connected, even more information than the pure geolocations is available.
In addition to that, one can mine habbits even without artifical intelligence by just querying the Time Series Database wherein the data is stored. For instance one can find out on which weeks a user is connected to which WiFi and on which geopositions these WiFis are located.
Therefore, it is needed that the geolocation can be fuzziefied and the signal readings and received devices are not transmitted over the internet.
Geoposition fuzzification
One way to fuzzyfy the geoposition in Coronavirus geotracking is to remove decimals from the GPS coordinate till the desired fuzzyfication is reached.
The geolocation can also be fuzziefied and abstracted via a Mercator projection.
Originally, Mercator projection is used in maps that one can zoom and look at a map in different resolutions.
The advantage of a Meractor projection to abstract to geposition to a tile ID is that the two dimensional position is now just a one dimensional ID, which is often easier for further procession. However, also just removing decimals from a GPS position works, too.
Privacy and location fingerprinting
The fuzzification of the geolocation alone might not be enough anonymization.
One method to avoid sending signal and geolocations details to a server and a Time Series Database can be to compute a fingerprint of the situation which a smartphone reads and then computing a one way signature.
One can use normal hashing techniques, but the problem is that different smartphones might have fluctuating readings, which would then lead to a different fingerprint.
A better possibility is compute fingerprints via local sensitive hashing (LSH).
LSH is an algorithmic technique that hashes similar input items into the same “buckets” with high probability.
https://en.wikipedia.org/wiki/Locality-sensitive_hashing
Locally on a smartphone signals can be collected for a certain time interval. Once a time interval is complete the representative signal strength (e.g. average value or main quantile) can be combined together vith the fuzzified geoposition and other values via local sensitive hashing.
Another smartphone who is near this situation will likely come to a a similar fingerprint of of the similarities.
The fingerprints are then submitted to the server and the Time Series Database only will contain the following entries:
Timefrom | Duration | Fingerprint | Smartphone |
---|---|---|---|
14:00 | 15 | AEFGIJ-845D-UZ00-1122004986 | 1 |
14:03 | 15 | 000000-005D-UZ00-0000334986 | 2 |
Now, one can compute similaries of the fingerprints, but the actual information is more anonymous than before. It is for sure not the perfect model to anonymize the data, but it fuzzyfies it a bit.
In order to make the fingerprint even more anomyous the current day can be used what will make it even harder to attack.
It is even possible the duration and the time in the fingerprint, what would then have a direct impact on the LSH distance of two fingerprints. In the following, we show an example with the timestamp in the fingerprint and the duration excluded.
Duration | Fingerprint | Smartphone |
---|---|---|
15 | AEFGIJ-845D-UZ00-1122004986-07 | 1 |
15 | 000000-005D-UZ00-0000334986-08 | 2 |
The downside is that now queries for the different features with logical conditions (e.g. “Same Bluetooth and GPS distance”) are not so easy possible anymore. In order to allow each signal group would need to be fingerprinted on its own, that would then again cause less anonymity.
After all, privacy is a trade-off to time series analysis capabilities. We described local sensitive hashing helps and a specific project scope will need to decide which level of privacy and functionality at the same time is required.
In case, the reader has additional ideas that they consider noteworthy, please contact us for feedback.
Conclusion
We discussed the differences of Coronavirus geotracking to privacy Bluetooth solutions. We gave insights on how a Coronavrius geotracking app works and as well as insights about concrete implementation details for Android.
Afterwards, we discussed how Time Series Data Analysis can be done with a Time Series Database. We showed how queries on top of the smartphone data form Coronavirus geotracking can be executed and which value the Time Series analysis contributes here.
Ultimately, we talked about possible Coronavirus geotracking privacy extensions. Geolocation and Signal fingerprinting with local sensitive hasing showed that more privacy is generally possible. We saw first indications that more privacy means likely also more limitations in Time Series analysis capabilities.
All in all, Coronavirus geotracking and Time Series analysis of signals and GPS data opens new possibilities for tracking Coronavirus infections and doing extended analysis.
From a technological perspective, we see that Time Series Databases are easing this analysis case.
Out of the transmissions of the data over the web and central databases and similar the privacy risk is inherently higher than in pure Bluetooth tracking. At the same time the analysis capabilities also open a complete new set of possibilities.
Ultimately, it is up to the societies and their politicians to outweight the benefits of geotracking to more annonymous tracking methods.
Update (29th April 2021): Check out our latest project, Fahrbar!
Get in touch with us
If you are interested in Fahrbar or want to find out how we can help you leverage your data