DFWORKS

DFWORKS

Preventing damages caused by cyberattackers, online disinformation distributors and propagandists.

Zero-shot learning and the AMITT framework

Posted on May 12, 2021

Problem Statement

Can organisations or individuals without nationstate resources assimilate information quickly enough to orchestrate an effective response to combative influence operations?

Online Influence Operations

Influence operations includes the collection of information about an adversary as well as the dissemination of propaganda in pursuit of a competitive advantage over an opponent. State actors exploiting the openness and reach of the internet to conduct influence operations in pursuit of strategic objectives has been well documented and studied in academia. The history of online influence operations is brief, relative to the history of more traditional forms of propaganda, but malicious, anonymous and opposing online actors have used computational propaganda techniques to spread disinformation, censor and attack journalists and create fake trends.

In the last decade, clear-cut cases of influence operations have been observed in;

Often, in order to fully understand the complexity and motives of an online influence operation, an organisation requires both specific domain expertise and a lot of data. This is achievable if studied retrospectively but there are significant issues in being able to assimilate information quickly enough to orchestrate a response when you are the target of an active and aggressive influence operation. These issues are more acute for an organisation suffering from faster moving smear campaigns or localised influence operations where nationstate resources and friendly media outlets are unavailable. Situations which may call for an imperfect but faster response would include;

  • Trial by media - A requirement to impact television and newspaper coverage regarding an individual or organisation’s reputation by seizing a narrative and influencing perceptions of guilt or innocence before a verdict.
  • Defamation/Libel - If you are an organisation that is suffering reputational damage as a result of damaging articles or statements then the virality at which the offending content is shared often outstrips the speed of legal proceedings to have the content removed.
  • Negative Exposure - Negative but accurate media reporting can be combated with counter influence operations.
  • Shareholder activism - Hedge fund activism designed to assert pressure and leverage boards into making favourable decisions is often supplemented with PR and media activity to seize a narrative and sway neutral voters.

An organisation that is not typically equipped for handling aggressive influence operations would therefore need to quickly understand the actors involved, the information disseminated and have a framework for forming a response.

Quickly assimilating information

Zero-Shot Learning (ZSL) is a machine learning method that can detect classes that a model hasn’t observed during training. It resembles our ability as humans to generalize and identify new things without explicit supervision. While ZSL models are unlikely to achieve the accuracy or utility of a model trained specifically for a task they are a suitable tool for this problem set. Hand-labeled training sets are expensive, time consuming to assemble and often require domain expertise - all of which are not conducive to responding quickly during a crisis.

Until fairly recently, (Natural Language Processing) NLP models were limited to classifying text with a predefined number of candidate categories. These categories had to be set in advance during training. The addition of new categories would require you to re-train your model with more examples. There are excellent open-source NLP models out there, based on Hugging Face Transformers, that work well for zero-shot text classification which I have utilised in this project;

  • zeroshot_topics is distributed on PyPI as a universal wheel and harnesses KeyBERT which is an easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and phrases that are most similar within a document. The ambition for utilising this library was to identify themes or clusters of articles/posts deployed in an influence operation.
  • zero_shot_re is a zero-shot relation extractor project based on the paper ‘Exploring the zero-shot limit of FewRel’ by Alberto Cetoli. Being able to agnostically generate knowledge graphs from text will help determine threat actors and themes.

You can follow along with the notebook on github.

A framework for responding to online influence operations

The Adversarial Misinformation and Influence Tactics and Technique AMITT Framework was created from a need for a common language for disinformation. The structure and propagation patterns of misinformation attacks have many similarities to those seen in information security and computer hacking so the framework adopts a similar structure to that of the MITRE ATT&CK framework which has been heavily adopted in the cyber security industry.

AMITT is a set of data standards and an open source knowledge base of both red and blue team disinformation tactics and techniques. AMITT’s intended users are disinformation responders; its purpose is to give them the ability to tactically respond to influence operations, plan defences and to transfer information security principles to the disinformation sphere. The framework consists of blue team (defence) and red team (attack) models as well as a repository of examples, descriptions and mitigations.

Project

We have used Neo4j, a graph database that features the labeled property graph model, to present the information we are extracting from each article. The articles scraped will have a sentiment, a theme/topic that has been classified by ZSL and a series of entities that have been extracted.

Relations between entities are stored as intermediate nodes instead of direct links in order to more easily display sentiment as well as providing an audit trail of the source text from which the relation was extracted. With the labeled property graph model, you can’t have a relationship pointing to another relationship. For this reason, we refactor the connection between extracted entities into an intermediate node. Feel free to try topics and search terms of your own.

Overlaying AMITT framework and further work

There will still be elements of manual analysis required to understand the context of a given influence operation but the graph data, as an example, could be used to help determine the following more quickly;

  • Discovery of frequently discussed topics can help to determine opponent strategic objectives (dismiss, distort, distract, dismay, divide) as well as existing and competing narratives.
  • The entire network analysis can help inform a centre of gravity study where key communities, influencers and media outlets can be identified.
  • Replacing media articles with tweets or other social media posts and incorporating bot or “fake news” detection will help build a picture of where and how computational propaganda is being used.
  • Identification of friendly influencers who may also be targeted.
  • Identification of fault lines between communities can help develop counter narratives that are sympathetic to opposing forces.

Issues and Hints

  • Neo4j can only display a limited number of nodes and relationships. Filtering nodes by a minimum number of relationships will help display useful information.
  • The codebase conducts coreference as well as pairing mathematically similar words and phrases but there will still be problems where there are multiple nodes for a single entity (James, Jim, Jimmy, James Johnson), manual correction may be required.
  • Occasionally where a website renders an error, or something else unusual, the text will be processed by the program and included as nodes which will need to be removed.
  • The more obscure a relationship you try to extract, the less successful your project will be. I have had success using “linked”, “associates” and “interacts” but more complex relationships like “shareholder” or “beneficial owner” are less succesful.

Example

The recent 2021 Gambia election wasn’t a subject that I was particularly knowledgeable about but the underlying graph data for the below chart was scraped and analysed from 25 news articles in less than half an hour. The chart shows nodes with more than 2 connections and the annotations were manually added. Whilst not groundbreaking, the chart is hopefully demonstrative of what could de done with more data and supplemented with manual analysis.

Gambia Election