GenAI tools, and the Large Language Models (LLMs) that underpin them – are impacting the day-to-day lives of billions of users across the globe. But can these technologies be trusted to keep users safe?
This report examines how this new technology can be used by bad actors and vulnerable users to create dangerous content. By testing LLM responses to risky prompts, we are able to assess their relative safety, identify weaknesses, and, most importantly – define actionable steps to improve LLM safety.
In this first independent benchmarking report into the LLM safety landscape, ActiveFence’s subject-matter experts put LLMs to the test. We ran over 20,000 prompts to analyze the responses of six leading LLMs in seven major languages, across four high-risk abuse areas:
The results offer important data for teams to understand their LLM’s relative strengths and weaknesses, and understand where resource allocation is required.