Posted on

## Driver fingerprinting

Perhaps it doesn’t come as a surprise that we all have unique driving styles. My father is a calm, sometimes even slow driver. My mother is more temperamental in her driving ways – always cutting the line where she can (this was a highly useful skill when we were late to school). When I close my eyes, I can immediately tell whether my mother or father is behind the wheel.

But so can anybody – using in-vehicle logs. Turns out that these driving styles can translate to bits and bytes that our in-vehicle sensors record.

# In-vehicle sensors

There are a number of systems built in our cars for various purposes – to regulate emission, to optimize engine performance or to log and detect malfunctions. Think of your dashboard for example – you can see your speed, consumption and RPM all at a glance! The data required for these comes from in-vehicle sensors. This data is converted and processed by ECUs (Electronic Control Unit), which are essentially small computers inside our cars. The ECUs communicate with each other through the so-called CAN-bus (CAN – Controller Area Network).

## The CAN-bus whisperer: OBD dongle

The data sent over the CAN-bus can be easily captured. For example, so-called ”insurance dongles” (or On-Board Diagnostics dongles) can collect and send in-vehicle sensor data to insurance companies. They use this information to give you discounts for example, if you drive safely.

Another similar approach to lowering insurance premiums is using telematics data, as described here. The biggest difference however, is that telematics data are currently subject to data protection laws, unlike CAN logs.

The OBD dongles connect to the ODBII interface of the vehicle.
These devices are not expensive at all (prices start at 10$), and can come with mobile apps so you can look at your data from your phone! In some cases, we cannot access CAN logs directly through an OBD port (or at least not all parts of it). There is usually a gateway regulating access to these data. Even though CAN-bus data can be readily accessible (and not encrypted), the interpretation of the messages is still a challenge. There is no global standard, and the structure of the messages varies from manufacturer to manufacturer. Without the cooperation of manufacturers, it is hard to reverse engineer these messages… but not impossible! For example we can use our phone to record the speed while driving. Then, we can extract the brake signal (which records how far we press the brake pedal) by searching for a signal that usually decreases right before the observed speed decreases (of course it does not need to always match, there are multiple scenarios where one can lose speed without braking…). A similar search can be conducted for the gaspedal signal as well. Of course this is one approach, but if you want to know how to reverse-engineer signals using machine learning techniques, check out my talented colleagues’ paper (Lestyan et al., 2019)! # Driver fingerprinting So what can we do with the obtained CAN-bus data? Turns out, that we can use it to determine who was behind the wheel. ## How it works Driver fingerprinting or driver re-identification uses the data collected from the CAN logs (called traces) to correctly identify drivers. From these traces, we can extract the data from one sensor to obtain a time series (measuring speed for example). The input of such a model can consist of multiple time series, like in the example image below. However, to obtain these time series, first we need to reverse-engineer them. Then, we can use Machine Learning to make predictions on the driver’s identity. For my 2018 Scientific Students’ Association paper I used the measurements of 5 sensors (namely brake, clutch, gaspedal, rpm and speed). Brake, clutch and gaspedal sensors measure the position of the respective pedals. As for the rest: rpm records the revolutions per minute and speed measures the vehicle’s speed. I was also able to work on data collected from 33 drivers, whereas state-of-the-art driver re-identification papers only published results on models that had to distinguish 2-15 drivers. My model achieved a 66% mean accuracy, given a mere 10 second trace as input! One might think that this time-window can be pushed down even further – theorizing that the distinguishing part in everyone’s driving could be the part where they accelerate or shift gears, which is about 2-3 seconds long. However, when driving in a city there are a lot of situations where one must stand still (in morning traffic, in front of a stoplight, or while letting that old lady cross the street). Related work suggests that all the information needed to identify a driver can be obtained from the measurements of a single turn. This confirms the intuition that someone’s driving can be quite accurately distinguished based on how they slow down, switch gears and accelerate. After writing this paper me and my colleagues were motivated to extend this research by eliminating the first step: reverse engineering the raw data. ## Driver re-identification on raw data In our paper titled Automatic Driver Identification from In-Vehicle Network Logs we demonstrated that it is possible to achieve the same results without having to reverse engineer and pre-select signals from the CAN log. We collected 72 time series from unknown sensors and trained a Deep Learning model on each of them. Not all time series were equally good inputs for the re-identification task, which is why we combined the top-10 performing classifiers into one mixture model. We achieved a mean accuracy of 75-85% for CAN traces whose length varied between 20 and 120 seconds. This approach outperforms the model that was trained on extracted signals, which might indicate that among the 72 time-series there are some that are even better for distinguishing drivers than the ones that were originally extracted! However, we don’t know their meaning, and thus their possible connection to the drivers’ fingerprints. Great! Acquiring an OBD dongle that can read CAN logs is not only cheap, it turns out you don’t have to understand the data to build a working driver-fingerprinting classifier with it. # Privacy implications So now that we know that our in-vehicle data could potentially identify us, why should we protect it? I mean, aside from well-wishing insurance companies who give us discounts, who could possibly get our data? Except if… car companies were motivated to sell CAN logs to 3rd parties. And currently they could do so. Without our consent (because it is not considered to be personal data as of yet). Uh-oh. ## Driving data: the car industry’s new cash cow? Many sources suggest the lucrativity of driving data – McKinzey&Co, a management consulting firm, estimates that ”data from connected cars will be worth up to$750 billion by 2030”. Companies are already preparing for the market
for data from cars, which is expected to be even bigger than the market for the cars themselves according to this Forbes article.

Future ideas include selling data to map provider firms (Google Maps, Waze, etc.) looking to provide more accurate
traffic information, or selling consumption-related and other statistics to operators of larger vehicle fleets.

Just like with any other personal information that could potentially identify us: using such data should require our consent!

## GDPR and awareness

Because of how new these findings are, it is still unclear whether drivers are completely informed about the wealth of personal information that their CAN logs potentially reveal. Hopefully this post was an eye-opener to the reader that CAN log data are not only personal, but that there is also a great incentive to sell/use them.

Although car companies stress that the collected network logs are “anonymized”, the proper anonymization of such high-dimensional data is notoriously difficult in practice without significantly degrading accuracy (Aggarwal et al., 2005). Even aggregated metrics aren’t always completely safe (Dinur and Nissim, 2003).

More about reconstruction attacks in theory and in practice.

The feasibility to identify (and/or profile) a driver means that CAN logs are indeed personal data and, as such, are subject to the European General Data Protection Regulation (GDPR) as of 25 May 2018. It is a fundamental duty of companies handling such data to adequately inform drivers and protect their personal data. Failing to do so does not only ruin customer trust, but can also result in a fine up to 20 million Euros.

# Closing thoughts

We have seen that it is fairly easy to access this data, and will probably become even more accessible in the not-so-distant future of connected vehicles.

Massachusets has voted (Nov. 4th, 2020) on standardizing access to car data (to extend the state’s right-to-repair law)! This could set a standard for the vehicles to come…

We have also seen that even without the reverse-engineering of the logs, identifying drivers is still possible. Turns out, our phones aren’t the only devices that store and/or process information which could potentially identify us. 🙂

I don’t know what driving fingerprint I possess, but I hope I can eventually get the best from both of my parents: my father’s calm (but ultimately safe) driving and my mother’s mad parking skills!

A special thank you to the following people who have provided me with useful feedback and suggestions:
Gergely Ács, Andris Gazdag, Ferenc Huszár and Viktor Remeli.

Posted on

## Post-Quantum Cryptography: The First Steps

This article was prepared in cooperation with Microsec Ltd. Check out their blog for more interesting readings!

In a previous post, we discussed how the possibility of quantum attacks will affect the security of currently used cryptographic methods. As we saw, the post quantum security level of symmetric key methods including block ciphers and hash functions can reach the classical security level after doubling the key length. At the same time, we need alternative solutions instead of the ubiquitously used public key cryptosystems that are known to be breakable with scalable quantum computers. In this post, we comment on the crypto aspect of a recent breakthrough in quantum computing and briefly introduce the most important standardization effort of the post-quantum transition.

Posted on

## Gépi Tanulás & Személyes Adatok Védelme

Ez a blogposzt az utolsó egy három részes sorozatból mely a gépi tanulás adatvédelmi kockázatairól kíván közérthető nyelven egy átfogó képet nyújtani. Az első egy bevezető volt a gépi tanulás világába, a második a létező támadásokat ölelte át. Ebben a záró részben a lehetséges védekezéseket mutatjuk be.

Posted on

Ez a blogposzt a második egy három részes sorozatból mely a gépi tanulás adatvédelmi kockázatairól kíván közérthető nyelven egy átfogó képet nyújtani. Az első egy bevezető volt a gépi tanulás világába, ez a létező támadásait öleli át, míg az utolsó rész a lehetséges védekezéseket mutatja majd be.

Posted on

## Gépi Tanulás

Ez a blogposzt az első egy három részes sorozatból mely a gépi tanulás adatvédelmi kockázatairól kíván közérthető nyelven egy átfogó képet nyújtani. Ez az első egy bevezető a gépi tanulás világába. A következő a létező támadásokat fogja átölelni, míg az utolsó záró rész a lehetséges védekezéseket mutatja majd be.

Posted on

## c0r3dump @ VolgaCTF

VolgaCTF is one of the largest hacker competitions of the Russian Federation. This year, our one-year-old hacker team, c0r3dump, successfully qualified for the VolgaCTF Final by achieving the 16th place in the qualification round in March. The 9th VolgaCTF Final was held in Holiday Inn, Samara between September 17 (Tuesday) and September 20 (Friday). 19 teams were competing in the final, 3 Austrian, 1 Vietnamese, 14 Russian and 1 Hungarian. This was the first CTF final of “c0r3dump”.

Posted on

## Post-Quantum Cryptography: What and Why Needs to be Changed Soon?

This article was prepared in cooperation with Microsec Ltd. Check out their blog for more interesting readings!

In this post, we explain the impact of prospective quantum computers on current implementations of cryptographic primitives, including digital signatures. Continue reading Post-Quantum Cryptography: What and Why Needs to be Changed Soon?

Posted on

## Certificate Transparency – the current landscape in Hungary

This blog post was written by Márton Horváth who worked on implementing a monitor for Certificate Transparency logs in the context of his student semester project. As a proof-of-concept, he used the monitor he created to collect information about logged certificates issued in Hungary or issued to Hungarian web sites. This blog post contains the main findings of the analysis of those certificates. Continue reading Certificate Transparency – the current landscape in Hungary

Posted on

## Enabling WiFi and converting the Raspberry Pi into a WiFi AP

This blog post, written by Márton Juhász, is the fifth in a series of blog posts on transforming the Raspberry Pi into a security enhanced IoT platform. This post specifically will explain how to convert the Raspberry Pi into a WiFi access point such that it can perform some gateway-like functionality. First, we describe how to enable WiFi and then how to enable other software components to make the Pi an access point. Continue reading Enabling WiFi and converting the Raspberry Pi into a WiFi AP