New Webinar! Safeguarding Children in the GenAI Era Watch On-demand
Manage and orchestrate the entire Trust & Safety operation in one place - no coding required.
Take fast action on abuse. Our AI models contextually detect 14+ abuse areas - with unparalleled accuracy.
Watch our on-demand demo and see how ActiveOS and ActiveScore power Trust & Safety at scale.
The threat landscape is dynamic. Harness an intelligence-based approach to tackle the evolving risks to users on the web.
Don't wait for users to see abuse. Proactively detect it.
Prevent high-risk actors from striking again.
For a deep understanding of abuse
To catch the risks as they emerge
Disrupt the economy of abuse.
Mimic the bad actors - to stop them.
Online abuse has countless forms. Understand the types of risks Trust & Safety teams must keep users safe from on-platform.
Protect your most vulnerable users with a comprehensive set of child safety tools and services.
Stop online toxic & malicious activity in real time to keep your video streams and users safe from harm.
The world expects responsible use of AI. Implement adequate safeguards to your foundation model or AI application.
Implement the right AI-guardrails for your unique business needs, mitigate safety, privacy and security risks and stay in control of your data.
Our out-of-the-box solutions support platform transparency and compliance.
Keep up with T&S laws, from the Online Safety Bill to the Online Safety Act.
Over 70 elections will take place in 2024: don't let your platform be abused to harm election integrity.
Protect your brand integrity before the damage is done.
From privacy risks, to credential theft and malware, the cyber threats to users are continuously evolving.
Here's what you need to know.
Ensuring Generative AI Safety by Design
New ActiveFence report reveals how generative AI is being abused to create child sex abuse, disinformation, fraud and extremism content on online platforms of all sizes
NEW YORK, May 18, 2023 — ActiveFence, whose mission is to protect online platforms and their users from malicious behavior and harmful content, today released the “Generative AI: The New Attack Vector for Platforms” report. Through this research, ActiveFence investigated hidden communities to examine how threat actors are abusing generative AI to carry out child sex abuse material (CSAM), disinformation, fraud, and extremism.
“The explosion of generative AI has far-reaching implications for all corners of the internet,” said Noam Schwartz, CEO and founder of ActiveFence. “We’ve identified three key areas of concern. First, we’re seeing that threat actors are now able to accelerate and amplify their operations, leading to unprecedented mass production of malicious content. Second, these same actors are exploring ways to exploit generative AI, manipulating these models and revealing their inherent vulnerabilities. Finally, these evolving threats place increased pressure on digital platforms to improve the precision and efficiency of their data training protocols.”
The report identified several key ways that generative AI is being abused:
Child sex abuse material
ActiveFence has tracked a 172% increase in the volume of shared CSAM produced by generative AI in the first quarter of this year. It also detected a poll conducted by administrators of a closed child predator forum in the dark web, which surveyed almost 3,000 predators about their use of generative AI. The poll revealed that 78% of respondents have or plan to use generative AI for child sex abuse material (CSAM), and the remaining 22% said they had plans to try the technology. These predator forums leverage generative AI algorithms to produce sexual images as well as textual descriptions, stories, and narratives.
In one instance that ActiveFence observed when asked to write an erotic story involving two minors, a major generative AI platform refused, calling the request “inappropriate and potentially illegal,” according to ActiveFence. But when the same question was made with just a few altered words, the algorithm produced an erotic story describing an adult male who inappropriately watched two young boys swimming.
Child predators are also using generative AI to create tutorials of their creations, which helps them gain credibility within the child predator community, encourage others to replicate their efforts, and share recommended phrases and keywords to evade platform safeguards. To bypass these platform limitations, ActiveFence detected child predators making requests in different languages, using alternative and suggestive terms, and manipulating the AI algorithm with various prompts, inputs and dedicated models.
Disinformation and fraudulent content
While fraud and disinformation are not new concepts, generative AI has allowed threat actors to create fraudulent images more quickly, accurately and with a higher reach.
One AI-generated image that ActiveFence detected on Telegram falsely shows Russian President Vladimir Putin kneeling before Chinese President Xi Jinping, begging for his support in the Ukraine conflict. ActiveFence identified several key generative AI signifiers of this image: obscured faces, blurred hands, distorted pieces of furniture and a lack of photography attribution. Despite these indicators, the misleading content generated a reach of 10 million users.
To demonstrate how threat actors manipulate generative AI chatbots for malicious purposes, ActiveFence detected methods used to override several policies of major generative AI platforms. In one case, exploiters were able to produce a generative AI phishing email, and in another, they successfully prompted a bot to write an inauthentic positive review of an app that is widely accessible on a major online marketplace. While this example was positive, used maliciously, this tactic not only misleads a platform’s users but can also harm a platform’s credibility as a secure place for online activity.
Violent extremism
ActiveFence detected numerous instances where threat actors have exploited generative AI to create hyper-realistic yet harmful content that incites violence and promotes extremist propaganda. These threat actors are using generative AI to create racist, nationalist or extremist manifestos or speeches.
ActiveFence discovered an AI-generated deepfake audio file that exploited growing political and economic distress. This fabricated audio wrongly imitated a well-known UK reporter, inciting a rebellion against the British government. The misleading manifesto provided instructions on procuring weapons from the underground market and urged an assault on the British national infrastructure.
ActiveFence made these discoveries through its technology and analysis capabilities, which arm organizations with accurate, detailed, context-led and actionable insights into online harms to help close policy gaps, improve enforcement and increase safety. With expertise in over 100 languages, ActiveFence has far-reaching access on the clear and dark web to threat actor communities, including those engaged in child sexual abuse, disinformation, hate speech, terrorism, violent extremism and fraud.
ActiveFence today has announced that it provides the following capabilities for GenAI platforms and larger platforms that seek to integrate to them:
To learn more about how ActiveFence safeguards online platforms and users against online harm, please visit our website at www.activfence.com.
About ActiveFence: ActiveFence is the leading Trust and Safety provider for online platforms, protecting over three billion users daily from malicious behavior and content. Trust and Safety teams of all sizes rely on ActiveFence to keep their users safe from the widest spectrum of online harms, including child abuse, disinformation, hate speech, terror, fraud, and more. We offer a full stack of capabilities with our deep intelligence research, AI-driven harmful content detection and moderation platform. ActiveFence protects platforms globally, in over 100 languages, letting people interact and thrive safely online.
Learn 8 key insights from the Crimes Against Children Conference, where child safety experts discussed sextortion, the impact of generative AI, and more.
Read about the latest updates in ActiveOS and ActiveScore that improve granular PII detection and enhance protection against multiple threats.
Explore the alarming rise in online financial sextortion targeting minors - Discover the latest advanced detection methods, and strategies to combat this global threat.