The Guide to Trust & Safety: Measuring Success

By Gideon Freud
February 20, 2022

messuring growth on the statistics animation

In our second edition of the Guide to Trust & Safety, we tackle the complex challenge of measuring the trust and safety of your online platform. From ensuring visibility to ensuring policy enforcement, we make it simple to implement valuable metrics.

In the first part of our Guide to Trust & Safety, we share the four functions needed to build a trust and safety team. However, building a team isn’t enough. Trust & Safety must be managed, and a key component of that is evaluating its effectiveness and ensuring constant improvement. There are many factors to consider when measuring a Trust & Safety operation, ranging from hard metrics of enforcement rates, speed to action, and threat coverage, to the perception of your platform’s work, its fairness, and sincerity. Here we take you through the key questions that you should use to assess your team and build its priorities.

Who are my users?

It might sound obvious, but Trust & Safety teams exist to secure online communities. So to be effective your team should first assess its context, in order to understand which safeguarding measures are needed.

The type of users that a platform caters to will directly impact the type of harmful activity that can be expected, so be clear who you are protecting and from what.

To take two examples, if your platform is a space for children, then child safety should be your principal concern. You will require strict measures against bullying and predatory activities. However, if you provide a place for political discussion, then extremism, hate speech, and disinformation must be carefully guarded against. While all threats should be considered, focus should be placed on the threats most relevant to your platform.

Understanding all of the ways that a platform could be exploited will enable your Trust & Safety team to create and implement proactive security measures. Drawing up concrete risk assessments will enable teams to focus their efforts appropriately, and then be evaluated on their results.

How effective is my policy?

Your Trust & Safety team is only as good as the policies they enforce. To help platforms create the most effective Trust & Safety operations, we have reviewed the policy wording of twenty-six of the leading technology companies to show how they handle and categorize platform violations. You can find the ActiveFence Trust & Safety Policy Series here.

Besides looking at leading companies’ policies for guidance, it is also crucial to understand where your company has blindspots. When harmful content is flagged (by users, trusted flaggers or third party vendors) but cannot be actioned, document it. To improve and strengthen the effectiveness of your platform policy, assess the content that is not actionable to identify where gaps exist.

Am I trusted to be fair and consistent?

Your platform’s users must trust your moderators. So, record the percentage of successful appeals against your total moderation decisions, and then by category to understand if there are areas of your policy enforcement that are not fair.

It is also important to evaluate the consistency of moderation activities between team members when faced by different examples of the same abuse. For instance, racial hate directed at two minority communities should be handled in the same way, while responses to political extremism from left or right should not be divergent.

How do I measure success, successfully?

There is an expectation that the volume of harmful content would be the defining measurement of a Trust & Safety team. In reality an increase in findings could either indicate an improvement in your team’s detection techniques, or a growth in platform abuse. Additional information is required to understand the raw numbers.

To understand the meaning behind the numbers, review which accounts are uploading harmful content, look to see if there are patterns, networked activity or a high recurrence rate of individual users violating platform policy. Another key metric to evaluate is your average time-to-detection rate of harmful content.

What is the prevalence of harmful content on my platform?

An important indicator is the on-platform reach—the number of views received—of harmful content, prior to its removal. The fewer views that prohibited material gains, the more successful the operation has been. Track the reach of the highest risk on-platform content to evaluate your team’s work.

Another key performance indicator of a strong Trust and Safety team is the reduction of negative press directed at a platform due to user generated content. If the moderation is successful, the prevalence of harmful content falls, which reduces the platform’s exposure to bad actor activity.

What is our platform coverage?

Every interaction carries with it the risk of malicious intent, so you should aspire for total on-platform visibility. To achieve this you should assess your language and file type coverage against all the content uploaded to the platform. Review what percentage of your on-platform languages your team can review, and use these figures to allocate resources to build your capabilities.

If your team cannot read a format of content then it is blind, and is made reliant on user-flagging. This exposes users to the content from which they are meant to be shielded. Record the percentage of flagged harmful content that was proactively detected by your team, rather than your users. Work to reduce instances of user-flagging and increase the percentage of content flagged by your team. To do so, partner with linguistic experts or companies that can provide this vital knowledge.

Can you see beyond your platform?

The metaverse—the idea of a future where platforms connect together to form a single digital space—is dominating tech conversations. While this concept may appear futuristic, users today are already using platforms simultaneously and together, broadcasting the same content across multiple live video streaming platforms. For example harmful content may appear in a video game played by a user and then be broadcast simultaneously on numerous platforms. In this scenario each platform’s Trust & Safety team is responsible for content produced externally.

Beyond the actual content, teams responsible for securing services should survey the entire online ecosystem to identify and evaluate threats that may present in the future. Access to threat actor communal chatter is essential not only to mitigate risks, but also for you to understand how harmful communities are responding to moderation actions—are they changing their tactics and continuing to exploit your service, or are they migrating to new digital spaces?

In the end, Trust & Safety success should be measured by evaluating the extent of a team’s visibility, their ability to respond quickly, and the combination of policy and its consistent enforcement. In addition to monitoring threat actor communications, teams should keep track of:

The percentage of languages and file types used that can be understood by moderators;
The prevalence of harmful content prior to its removal;
The number of times that policy fails to block harmful content;
The moderating decisions that were unfair; and
The moderating decisions that were inconsistent.

If you are looking to enjoy the fruits of the internet age, then build safety by design into your platform. Start by asking yourself the questions we have outlined here and review the answers to identify your team’s strengths and weaknesses, in order to build robust platforms with online threats kept at bay.

Check out part one of the Guide to Trust & Safety, The Four Functions of a Trust & Safety Team.

The Guide to Trust & Safety: Measuring Success

Table of Contents

Related Content

8 Takeaways from the Crimes Against Children Conference

ActiveOS Updates: Granular PII Detection & Model Updates

Financial Sextortion: Characteristics, Challenges, Solutions