Generative AI is the New Attack Vector for Platforms, According to ActiveFence Threat Intelligence

Young adult woman wearing a futuristic virtual reality headset, surrounded by a digital, blue-toned environment.

Ensuring Generative AI Safety by Design

Learn More

New ActiveFence report reveals how generative AI is being abused to create child sex abuse, disinformation, fraud and extremism content on online platforms of all sizes

NEW YORK, May 18, 2023 — ActiveFence, whose mission is to protect online platforms and their users from malicious behavior and harmful content, today released the “Generative AI: The New Attack Vector for Platforms” report. Through this research, ActiveFence investigated hidden communities to examine how threat actors are abusing generative AI to carry out child sex abuse material (CSAM), disinformation, fraud, and extremism.

“The explosion of generative AI has far-reaching implications for all corners of the internet,” said Noam Schwartz, CEO and founder of ActiveFence. “We’ve identified three key areas of concern. First, we’re seeing that threat actors are now able to accelerate and amplify their operations, leading to unprecedented mass production of malicious content. Second, these same actors are exploring ways to exploit generative AI, manipulating these models and revealing their inherent vulnerabilities. Finally, these evolving threats place increased pressure on digital platforms to improve the precision and efficiency of their data training protocols.”

The report identified several key ways that generative AI is being abused:

Creation of child sex abuse material, ranging from visual images to erotic narratives
Generation of fraudulent, AI-generated images that are deceiving millions
Production of deepfake audio files that tout extremism

Child sex abuse material

ActiveFence has tracked a 172% increase in the volume of shared CSAM produced by generative AI in the first quarter of this year. It also detected a poll conducted by administrators of a closed child predator forum in the dark web, which surveyed almost 3,000 predators about their use of generative AI. The poll revealed that 78% of respondents have or plan to use generative AI for child sex abuse material (CSAM), and the remaining 22% said they had plans to try the technology. These predator forums leverage generative AI algorithms to produce sexual images as well as textual descriptions, stories, and narratives.

In one instance that ActiveFence observed when asked to write an erotic story involving two minors, a major generative AI platform refused, calling the request “inappropriate and potentially illegal,” according to ActiveFence. But when the same question was made with just a few altered words, the algorithm produced an erotic story describing an adult male who inappropriately watched two young boys swimming.

Child predators are also using generative AI to create tutorials of their creations, which helps them gain credibility within the child predator community, encourage others to replicate their efforts, and share recommended phrases and keywords to evade platform safeguards. To bypass these platform limitations, ActiveFence detected child predators making requests in different languages, using alternative and suggestive terms, and manipulating the AI algorithm with various prompts, inputs and dedicated models.

Disinformation and fraudulent content

While fraud and disinformation are not new concepts, generative AI has allowed threat actors to create fraudulent images more quickly, accurately and with a higher reach.

One AI-generated image that ActiveFence detected on Telegram falsely shows Russian President Vladimir Putin kneeling before Chinese President Xi Jinping, begging for his support in the Ukraine conflict. ActiveFence identified several key generative AI signifiers of this image: obscured faces, blurred hands, distorted pieces of furniture and a lack of photography attribution. Despite these indicators, the misleading content generated a reach of 10 million users.

To demonstrate how threat actors manipulate generative AI chatbots for malicious purposes, ActiveFence detected methods used to override several policies of major generative AI platforms. In one case, exploiters were able to produce a generative AI phishing email, and in another, they successfully prompted a bot to write an inauthentic positive review of an app that is widely accessible on a major online marketplace. While this example was positive, used maliciously, this tactic not only misleads a platform’s users but can also harm a platform’s credibility as a secure place for online activity.

Violent extremism

ActiveFence detected numerous instances where threat actors have exploited generative AI to create hyper-realistic yet harmful content that incites violence and promotes extremist propaganda. These threat actors are using generative AI to create racist, nationalist or extremist manifestos or speeches.

ActiveFence discovered an AI-generated deepfake audio file that exploited growing political and economic distress. This fabricated audio wrongly imitated a well-known UK reporter, inciting a rebellion against the British government. The misleading manifesto provided instructions on procuring weapons from the underground market and urged an assault on the British national infrastructure.

ActiveFence made these discoveries through its technology and analysis capabilities, which arm organizations with accurate, detailed, context-led and actionable insights into online harms to help close policy gaps, improve enforcement and increase safety. With expertise in over 100 languages, ActiveFence has far-reaching access on the clear and dark web to threat actor communities, including those engaged in child sexual abuse, disinformation, hate speech, terrorism, violent extremism and fraud.

ActiveFence today has announced that it provides the following capabilities for GenAI platforms and larger platforms that seek to integrate to them:

Automated Prompt Moderation – stops prompt injection and jailbreaking.
Automated Output Filtering – detects violative outputs at scale via contextual analysis model.
AI Model Safety Testing – keeps AI training data safe.
Gen AI Red Teaming – identifies exposures and loopholes in product, policy, and enforcement.
Threat Landscaping – reports on Dark Web and off-platform threats and attacks.
Generative AI T&S Platform- provides an end-to-end enforcement and management.

To learn more about how ActiveFence safeguards online platforms and users against online harm, please visit our website at www.activfence.com.

About ActiveFence:
ActiveFence is the leading Trust and Safety provider for online platforms, protecting over three billion users daily from malicious behavior and content. Trust and Safety teams of all sizes rely on ActiveFence to keep their users safe from the widest spectrum of online harms, including child abuse, disinformation, hate speech, terror, fraud, and more. We offer a full stack of capabilities with our deep intelligence research, AI-driven harmful content detection and moderation platform. ActiveFence protects platforms globally, in over 100 languages, letting people interact and thrive safely online.