Manage and orchestrate the entire Trust & Safety operation in one place - no coding required.
Take fast action on abuse. Our AI models contextually detect 14+ abuse areas - with unparalleled accuracy.
Watch our on-demand demo and see how ActiveOS and ActiveScore power Trust & Safety at scale.
The threat landscape is dynamic. Harness an intelligence-based approach to tackle the evolving risks to users on the web.
Don't wait for users to see abuse. Proactively detect it.
Prevent high-risk actors from striking again.
For a deep understanding of abuse
To catch the risks as they emerge
Disrupt the economy of abuse.
Mimic the bad actors - to stop them.
Online abuse has countless forms. Understand the types of risks Trust & Safety teams must keep users safe from on-platform.
Protect your most vulnerable users with a comprehensive set of child safety tools and services.
Stop online toxic & malicious activity in real time to keep your video streams and users safe from harm.
The world expects responsible use of AI. Implement adequate safeguards to your foundation model or AI application.
Implement the right AI-guardrails for your unique business needs, mitigate safety, privacy and security risks and stay in control of your data.
Our out-of-the-box solutions support platform transparency and compliance.
Keep up with T&S laws, from the Online Safety Bill to the Online Safety Act.
Over 70 elections will take place in 2024: don't let your platform be abused to harm election integrity.
Protect your brand integrity before the damage is done.
From privacy risks, to credential theft and malware, the cyber threats to users are continuously evolving.
Here's what you need to know.
[This blog’s cover image is comprised of AI-generated images of children]
Generative AI has seen mass adoption, but as with all technologies - it is open to abuse by threat actors. In this article Guy Paltieli, ActiveFence’s Senior Child Safety Researcher, uses his exclusive research into predator communities to discuss the risks of this new technology and the steps platforms can take to secure themselves. This article will remain purposefully high-level to avoid providing specific techniques that threat actors could exploit.
By freeing creative processes from human constraints, generative AI will enable mass content production and the fast and accurate transmission of instructional information. The capabilities enabled by generative AI are extensive, but all platforms must prepare as AI-generated content becomes more widespread.
Various types of threat actors, including those engaged in terrorist propaganda, hate speech, disinformation campaigns, or child abuse, are testing the new possibilities offered by generative AI. ActiveFence is monitoring these communities, active on hidden forums and instant messaging groups, to detect new trends in abuse. The first article in our series will focus on child predators.
Child predator communications often take place in hidden communities on private or instant messaging forums. In many of these forums, we have identified newly created sections dedicated to the abuse of AI. Here members advise and request information on acquiring child sexual abuse-related materials from AI systems.
Testing platforms to locate weaknesses, the predators share examples of how they were able to circumvent safeguards, often including examples of the content that they were able to produce. Access to this chatter allows us to identify platform weaknesses and understand how to strengthen the systems in place best.
We are seeing that child predators are using generative AI for serious text and image-based violations, including:
The creation of instructional guides to abuse minors and the sexualized modification of innocent images depicting minors is undoubtedly dangerous and frequently would be regarded as criminal. The visual CSAM generated by the AI would often be classified with the same severity as child pornography. This approach can be seen in the UK, Australia, and Canada, where sexual depictions of minors meet the criminal threshold. This is particularly important when considering that generative AI platforms themselves create child sexual content, albeit under the direction of threat actors.
Predators can create this malicious material by tapping into Trust & Safety weaknesses in the Generative AI platform and processes. These weaknesses are discovered via dedicated group communications in a concerted effort to test platform defenses: when one predator finds a weakness – such as sub-optimal coverage of a certain language, they will share that information with the group. Other predators will take that information, swarming to test and prod it, coming up with the specific strings of text in different languages that will provide the desired outcome. Three of the most common weaknesses involve language coverage, contextual understanding, and technical loopholes.
A core weakness for Trust & Safety in generative AI platforms is the lack of complete language coverage. Our research found that not only is there targeted predator activity seeking to locate gaps, but also uneven coverage that is vulnerable to this abuse. This opens up massive opportunities for pedophiles to create child predator content in unsecured languages.
As an illustrative example, if one platform has sophisticated protection in English but weaker systems to tackle malicious content in Urdu or Korean, predators will attempt to use the platform to produce child predator content – like fantasy stories or grooming guides, using Korean commands.
This inconsistent security poses a major risk to platform integrity as once threat actors discover a gap, they share and exploit this weakness quickly. For providers to ensure the safety of their platforms, regardless of language, they should not offer services in languages where they cannot protect users. New language coverage should only be added once proper, language-specific safeguards are in place
Another challenge facing generative AI platforms involves the ability to recognize and block generic, as well as niche expressions, keywords, or references to child sexual exploitation. While generic terminology may be known to many Trust & Safety teams, references to niche names of CSAM studios and popular child predator manuals or guides require more specialized knowledge.
Specialized knowledge of child predator terms and CSAM production studios is critical for generative AI. To illustrate this, we observed a generative AI tool responding to a request to draft a list of tips on grooming minors based on a well-known pedophile manual. The AI tool located the predator manual and extracted relevant information from the guides, presenting dangerous tips on how to abuse minors sexually. Had the model been trained to block queries related to this manual, which is well-known in predator circles, it would have triggered a warning and refused the request.
By identifying technical loopholes, predators can easily manipulate generative AI to create sexual content depicting children. These technical loopholes are also easily shared, replicated, and built upon, opening opportunities for further platform abuse. One example of this involves using strings of primary and secondary commands, which are frequently based on mainstream media AI guides.
While an initial request for violative content may not be successful, predators have found and shared sets of queries, which, when asked in succession, can manipulate the AI to create harmful content. These usually require multiple steps to produce sexual material that depicts children. However, by utilizing specific primary commands paired with secondary requests on an AI-generated image, predators can direct the same tool to produce explicitly sexual image-based content of minors. Accordingly, teams should consider the risks posed by a flow of queries and train the systems to take these as a whole.
While the challenge is great, effective moderation and risk mitigation are possible.
To ensure protection against the threat of child predator abuse, Trust & Safety teams at generative AI companies must develop and expand their safeguarding techniques in the following ways.
ActiveFence’s deep knowledge of the child predator landscape and advanced research capabilities into new TTPs will enable those Trust & Safety teams to moderate those questions malicious users pose and control the content their platforms will generate.
To learn more about our Threat Intelligence solution
Learn 8 key insights from the Crimes Against Children Conference, where child safety experts discussed sextortion, the impact of generative AI, and more.
Read about the latest updates in ActiveOS and ActiveScore that improve granular PII detection and enhance protection against multiple threats.
Explore the alarming rise in online financial sextortion targeting minors - Discover the latest advanced detection methods, and strategies to combat this global threat.