DFWORKS

Preventing damages caused by cyberattackers, online disinformation distributors and propagandists.

Winning Elections with MAID data

Posted on February 25, 2025

Introduction

Having worked in intelligence, both in a government and private capacity, several recent articles have piqued my interest around Mobile Advertising IDs (MAIDs) and bidstream data. A recurring theme in the articles below (which are well worth reading) is that private intelligence agencies are widely leveraging bidstream data for what is essentially global surveillance.

Bidstream data seems to be widely used by private intelligence agencies, but its use, unless I’m severely out of the loop, appears to be more prevalent in the US, where there are fewer regulatory hurdles. In contrast, UK privacy laws like GDPR make handling such datasets significantly more risky. The legal and ethical concerns surrounding bidstream data are well-documented elsewhere, but I want to take a practical look at how the aggregation and analysis of this data amplifies risks, making it even more dangerous than some may realise.

Mobile Advertising ID (MAID) data has typically been seen as a tool for targeted advertising, but its aggregation and analysis raise significant concerns. Large datasets enable the tracking of individuals, the inference of behaviors, and even the modeling of political affiliations. The modeling presented in this article is basic, probabilistic, and significantly reduced in complexity to demonstrate ‘the art of the possible’. It aims to demonstrate that with access to more data and greater resources, intelligence agencies or major tech companies could develop highly sophisticated models capable of achieving far more precise and nuanced outcomes.

Understanding MAID and Bidstream Data

Mobile Advertising IDs are persistent but resettable alphanumeric identifiers assigned to mobile devices. They were initially designed to enable advertisers to track user behavior without relying on directly identifiable information such as phone numbers or email addresses. In theory, this allows for targeted advertising while maintaining user anonymity.

However, in practice, a thriving industry has emerged around the collection, enrichment, and resale of MAIDs. Marketing firms, data brokers, and advertisers routinely compile extensive lists of MAIDs, supplementing them with additional details - ranging from basic identifiers like names and emails to more invasive data such as social media profiles, precise GPS locations, and consumer behaviour insights. As a result, what was once an anonymous identifier has become a powerful tool for tracking individuals over time.

Bidstream data originates from the real-time bidding (RTB) process, a system used in digital advertising where ad space is auctioned off in milliseconds before a webpage or app loads. For example, when you visit an article on The Sun, you might notice a grey space around the content that gets filled with an ad a second or two later. In that brief moment, information about your device is sent into an automated auction, where advertisers use complex algorithms to decide if you fit their target audience and, if so, place a bid to display their advert.

This data exchange is largely invisible to users, yet it occurs every time an ad-supported app is used or a website with programmatic advertising is visited. Many common apps, including weather, fitness, and social networking apps, collect and transmit MAID data as part of this ecosystem. The issue is that this data is often shared openly, allowing not only advertisers but also virtually anyone involved in the ad-tech supply chain, such as data brokers and intelligence firms, to observe the auction and aggregate the data without actually placing any bids.

The Fallacy of Privacy Settings in Mobile Advertising

While MAIDs and bidstream data are technically anonymised, their sheer richness makes de-anonymisation trivial. Movement patterns, repeated locations, and app usage can quickly reveal the identity of a person, even without a name attached. Investigations into bidstream datasets have demonstrated that it is possible to establish precise movement profiles of individuals, highlighting where they live, work, and spend their leisure time.

The potential for abuse is significant. Intelligence agencies, corporate entities, and even malicious actors can leverage bidstream data to monitor individuals, track dissidents, or infer political affiliations. Even when users opt out of tracking, there are often alternative device identifiers (such as IFV, device IDs, or app-specific identifiers) that can be exploited for cross-app tracking, further eroding the illusion of privacy controls.

Beyond traditional ad-tracking mechanisms, many apps transmit unexpected data points such as screen brightness, battery levels, and motion sensor data. These seemingly minor signals, when analysed in context alongside OSINT methods, can provide valuable insights into user behaviour. Researcher Tim (tim.sh) has suggested that Uber may use battery level data to influence ride pricing. The theory is that users with lower battery levels are more likely to accept surge pricing, as they may fear their phone dying before they can request another ride. While Uber has denied this practice, it highlights how seemingly inconsequential data points can be monetised

Warrantless Surveillance and the Use of Bidstream Data in Investigations

Babel Street and its subsidiary, Atlas, have leveraged MAID data to conduct location-based tracking of individuals near sensitive locations, including mosques, courtrooms, and abortion clinics. This data, typically sourced from mobile applications, allows for near real-time geolocation monitoring without the need for direct user consent or judicial authorisation.

A particularly concerning instance of this practice involved investigators using MAID data to track jurors in a New Jersey courtroom. By analysing the location signals from mobile devices, investigators were able to determine not only the presence of jurors at the courthouse but also their movements before and after trial proceedings. This level of surveillance raises significant ethical and legal concerns, as it could be used to exert undue influence, monitor private behavior, or compromise the integrity of judicial processes. The primary legal concern surrounding the use of MAID data in this manner is the lack of regulatory oversight. Unlike traditional surveillance techniques, which require warrants or legal justification, MAID-based tracking is often conducted without scrutiny. Anybody can purchase such data from third-party vendors for around £10,000.

The tracking of individuals near religious sites, medical facilities, and legal institutions raises civil liberties concerns, particularly regarding privacy rights and potential discrimination. The ability to monitor individuals based on their location near a mosque, for instance, could enable religious profiling, while tracking individuals near abortion clinics in US states where it has been outlawed could be used to intimidate or target those seeking medical care.

Without clear regulatory safeguards, the commercial availability of MAID data enables extensive surveillance capabilities with minimal accountability. Experts with more knowledge of this subject matter than I have called urgently for legislative intervention to establish stricter controls on how location data is collected, sold, and used by both private entities and government agencies.

Synthetic Data Generation

Accessing authentic MAID data in the UK is prohibitively expensive and fraught with legal complications, effectively rendering it off-limits for research or experimentation. Therefore, I created a synthetic dataset designed to mimic real-world patterns. Although this synthetic data is almost certainly markedly different from actual MAID/bidstream data, it still incorporates similar fields and exhibits enough variability and randomness to facilitate basic modelling and illustrate privacy concerns.

For those interested in the underlying methodology, the code used to build this dataset is available here. Developed using actual locations in Stockport, UK, it facilitates the random (with constraints) simulation of user movements and interactions within a realistic urban setting.

To simulate realistic usage, fictional device personas are assigned based on common lifestyle behaviors. These personas dictate movement patterns and interactions with locations, as an example a user may be:

Foodie - Frequently visits restaurants, cafés, and food markets.
Pub-goer - Spends evenings at bars and pubs, often making late-night movements.
Commuter - Follows a structured daily route between home and work locations.
Shopper - Regularly visits retail areas and shopping centers.
Smoker - Pauses at convenience stores and designated smoking areas

Each persona follows a probabilistic movement model, meaning that their routines include both predictable behaviors (e.g., a commuter heading to work at 8 AM) and spontaneous deviations (e.g., stopping at a café). By incorporating stochastic elements, the dataset better reflects real-world unpredictability.

The dataset is generated through a multi-step simulation process:

Geo-Spatial Mapping - A map of Stockport is enriched with open source business and land use data from overpass turbo, including points of interest (e.g., restaurants, cafes, businesses).
Device Seeding - Synthetic devices are distributed across residential areas, with each assigned a job, favourite locations and a persona.
Movement Simulation - Devices interact with amenities, workplaces, and transit routes, following realistic daily schedules. Variations are introduced to account for weekends, holidays, and unexpected detours.
Bidstream Data Emulation - The synthetic devices generate timestamped location pings from fictional applications (social media, weather app, dating app), replicating the kind of data observed in real MAID datasets. This includes signal frequency variations, dwell times, and movement speed to prevent artificial uniformity.

By leveraging realistic user segmentation and geospatial modeling, this synthetic dataset enables experimentation with bidstream mechanics, ad targeting simulations, and behavioral analysis, all while avoiding the legal and ethical concerns tied to real-world MAID data usage.

Demonstrating the Predictive Power of Bidstream Data

Before diving into modelling, we can first explore some basic patterns within the dataset. These images show how location data clusters, how individual movement patterns emerge, and how co-location analysis can hint at relationships between devices.

The first visualisation is a map displaying all data points in the dataset, revealing key clusters around residential areas, workplaces, and common travel routes. Since this dataset is based on real locations in Stockport, UK, we would expect to see concentrations near transport links, shopping areas, and hospitality venues which is mirrored in our fictional data.

Next, we examine the movement patterns of a single device over time. This visualisation highlights:

Primary locations such as home, workplace, and frequent leisure spots.
Preferred routes for commuting, errands, or social activities.
Temporal habits, indicating when this individual is most likely to be in specific areas.

This type of pattern of life analysis is valuable for understanding user routines, potential deviations, and broader behavioral trends that could be leveraged for targeted advertising, surveillance, or predictive modeling.

By identifying instances where two or more devices consistently appear in the same locations at similar times, we can infer potential relationships. If this co-location occurs in a residential setting, it may suggest cohabitation, such as roommates or family members. If the overlap happens at public amenities, it could indicate social connections, such as friendships or dating relationships.

By mapping these overlapping location patterns, we can infer:

Shared residences (devices returning to the same address each night).
Regular joint activities (frequenting the same venues or traveling together).
Possible workplace connections (devices present at the same office or worksite).

While this is a simple demonstration, at scale, such methods could be used for network analysis, law enforcement investigations, or commercial audience segmentation.

The above map shows two devices colocating at a Farm Cafe and in a residential area, it would be reasonable to assume that the owners of these devices have a relationship of some sort.

Behavioural Profiling

At present, our raw bidstream data is structured as follows:

While real-world bidstream data purchased from brokers often contains a far richer set of attributes, our synthetic dataset serves as an approximation of the kind of sensitive information being shared. It includes timestamped latitude/longitude coordinates, device IDs (which, in this case, act as a proxy for MAIDs), and app data which would more often than not be found in programmatic advertising bidstreams. This simplified version still demonstrates the potential risks associated with such data, even without the full breadth of fields available in commercial datasets.

As previously discussed, there are companies that offer MAID data enriched with PII, allowing direct linkage between device activity and individual users. However, even without direct PII access, those familiar with OSINT (Open Source Intelligence) methodologies will recognise that public records can be leveraged to approximate device ownership.

For example:

Residential Identification: By cross-referencing observed home locations with publicly available property records, electoral rolls, or tenancy databases, one can infer probable owners or tenants of a given residential address.
Workplace Identification: Similarly, using lat/lon coordinates observed during working hours, we can reverse engineer workplaces by matching frequent locations to known business addresses.

This approach effectively converts anonymous geospatial data into actionable intelligence, bridging the gap between digital footprints and real-world identities. While this process does not provide a definitive link between a MAID and a specific individual, it significantly narrows the field of potential owners, making re-identification highly feasible when combined with other datasets.

Using just the fields in our synthetic data, we can begin to profile individual devices and infer characteristics about their likely owners. While this is a simplified model, it illustrates how even basic bidstream data can reveal sensitive personal attributes, raising concerns about how such data can be exploited when combined with additional sources.

Some examples of inferences that can be drawn from movement patterns and app interactions include:

Occupation & Income Level:
- Work location and commuting patterns can distinguish between blue-collar and white-collar roles.
- The types of retail and leisure locations visited can suggest approximate disposable income (e.g., high-end retailers vs. discount stores).
Health Indicators: -Frequent visits to pharmacies, clinics, or hospitals may indicate chronic health conditions.
- Fast food consumption patterns, fitness tracker usage, or presence at gyms could suggest dietary habits and exercise routines.
- Regular visits to e-cigarette shops or tobacco stores could indicate smoking habits.
Age Estimation:
- Dating app usage, social media behavior, and visits to nightclubs or youth-centric venues can suggest younger demographics.
- Conversely, regular attendance at locations associated with senior services, medical centers, or quieter suburban amenities may indicate an older demographic.
Religious & Ethnic Affiliation:
- Visits to places of worship (churches, mosques, temples, synagogues) can reveal religious beliefs.
- Certain grocery stores, restaurants, or cultural centers may provide proxies for ethnic background, particularly in areas with strong community ties.

With even a basic dataset, one can begin to infer political inclinations based on lifestyle, income, and demographic factors. While these are broad statistical generalisations rather than precise classifications, they demonstrate how a more sophisticated analysis could refine these assumptions:

Income Level & Ideology:
- Lower-income households are often more likely to support liberal policies due to their stance on social welfare and taxation.
- Higher-income households may lean conservative based on fiscal priorities and tax incentives.
Age & Political Views:
- Younger individuals tend to lean more progressive, aligning with policies on climate change, social justice, and economic reform.
- Older demographics generally show more conservative voting patterns, particularly on economic and social stability issues.
Occupation & Ideology:
- Blue-collar workers may lean conservative, influenced by economic policies favoring job security and trade protections.
- White-collar professionals, particularly in urban settings, are often more liberal, favoring globalisation, tech policies, and corporate regulations.
Marital Status & Political Preference:
- Single individuals tend to lean more liberal, aligning with policies on social issues, housing affordability, and worker rights.
- Married couples are more likely to lean conservative, with a focus on economic stability, family policies, and tax benefits.

Political modelling

While these estimates are inherently probabilistic, drawing on broad trends in social science and political research, even approximate models of behavior can be pivotal in tightly contested elections with millions of participants. In the UK, where campaign spending is capped, ensuring a high probability that adverts or videos reach the most likely swing voters is essential. With access to greater computational resources and enriched datasets, companies could refine these models to a remarkable degree.

Building on these assumptions, we can model our fictional dataset to rank devices along a spectrum from most conservative to most liberal. By associating behavioral indicators with ideological tendencies, we can construct a rough ideological map of our simulated population. The ability to identify and target swing voters has been critical in U.S. elections and played a major role in the operations of firms like Cambridge Analytica, whose core offering revolved around identifying and influencing undecided or persuadable voters.

According to our model, the most conservative device in our dataset belongs to an individual who exhibits a set of behaviors traditionally associated with right-leaning views. This person appears to be an older, presumably married man with a routine that includes regular church attendance and frequent visits to a designated smoking area. His employment at a car garage, combined with spending patterns indicative of a lower income bracket, further aligns with demographic groups often linked to conservative voting tendencies. While our dataset is synthetic and lacks real-world verification, these markers provide a useful proxy for mapping ideological leanings.

At the opposite end of the spectrum, our most liberal device appears to be associated with a younger female living in a house of multiple occupancy. Her location and movement data suggest employment at a pilates studio, and she regularly engages with both a dating app and a fitness application. These behaviors, coupled with her urban lifestyle and flexible work patterns, align with characteristics often associated with left-leaning political preferences. While individual ideology is far more nuanced than any model can fully capture, the broad trends in our synthetic dataset reflect real-world correlations found in political and social research.

Focusing on the middle 10% of our ideological spectrum, we can identify a subset of devices representing potential swing voters. By mapping their home locations, we can visualise geographic clusters where political persuasion is more fluid. In a real-world scenario, these areas would be of particular interest to political campaigns, as small shifts in voter sentiment within these regions could be decisive in a closely contested election. The ability to target advertising and messaging to these individuals with high precision underscores why even probabilistic models of behavior hold significant strategic value.

Conclusion

The analysis presented here, while based on a synthetic dataset, demonstrates just how revealing bidstream and MAID data can be when aggregated and analysed at scale. Even with a basic probabilistic model, we can infer broad ideological leanings, identify swing voters, and simulate how political campaigns, intelligence agencies, or commercial entities might exploit such data for targeted influence. When applied to real-world datasets, where enriched personal information is often available, the potential for invasive tracking and manipulation becomes even more alarming.

The UK’s regulatory framework, particularly GDPR, places legal obstacles in the way of widespread bidstream exploitation. However, legal restrictions do not always deter those with the resources and intent to operate in legal grey areas. The sheer volume of location data circulating in the advertising ecosystem makes it highly susceptible to misuse, whether by political operatives seeking to influence election outcomes, private intelligence firms conducting mass surveillance, or malicious actors engaging in coercion and profiling.

This article has aimed to provide a practical look at how seemingly harmless individual bidstream events can be transformed into a powerful surveillance tool. As regulatory scrutiny lags behind the rapid evolution of ad-tech and data brokerage, the implications for privacy, democracy, and civil liberties remain profound.

If nothing else, this should serve as a stark reminder: when you see an ad pop up on your phone, you may not just be a potential customer, you may also be a data point in a far larger, more intricate system of influence and surveillance.