New Webinar! Safeguarding Children in the GenAI Era Watch On-demand
Manage and orchestrate the entire Trust & Safety operation in one place - no coding required.
Take fast action on abuse. Our AI models contextually detect 14+ abuse areas - with unparalleled accuracy.
Watch our on-demand demo and see how ActiveOS and ActiveScore power Trust & Safety at scale.
The threat landscape is dynamic. Harness an intelligence-based approach to tackle the evolving risks to users on the web.
Don't wait for users to see abuse. Proactively detect it.
Prevent high-risk actors from striking again.
For a deep understanding of abuse
To catch the risks as they emerge
Disrupt the economy of abuse.
Mimic the bad actors - to stop them.
Online abuse has countless forms. Understand the types of risks Trust & Safety teams must keep users safe from on-platform.
Protect your most vulnerable users with a comprehensive set of child safety tools and services.
Stop online toxic & malicious activity in real time to keep your video streams and users safe from harm.
The world expects responsible use of AI. Implement adequate safeguards to your foundation model or AI application.
Implement the right AI-guardrails for your unique business needs, mitigate safety, privacy and security risks and stay in control of your data.
Our out-of-the-box solutions support platform transparency and compliance.
Keep up with T&S laws, from the Online Safety Bill to the Online Safety Act.
Over 70 elections will take place in 2024: don't let your platform be abused to harm election integrity.
Protect your brand integrity before the damage is done.
From privacy risks, to credential theft and malware, the cyber threats to users are continuously evolving.
Here's what you need to know.
(See also: Precision, Recall)
Refers to the degree to which automated or manual moderation tools make the correct decisions. Sometimes measured as an inverse of the false-positive rate. Read more.Â
(See also: Bad Actor, Violative Content, Malicious Content)
General term for unwanted on-platform behavior
(See also: Automated Detection)
A reference to the volume of violative content and the flexibility of the system with the capacity to properly moderate the content.
(See also: Banning)
The act of bypassing platform moderation actions or circumventing a platform ban, often involves the creation of at least one additional account.
By creating new accounts, threat actors can return to a platform whose policies they have violated, in an effort to continue to do harm.
(See also: Suspension, Ban evasion)
Permanently removing or blocking a user from the platform.
Read more: Policy Enforcement
(See also: Child Safety, Child Sexual Abuse Material)
Popular term among predator communities to denote “child pornography”
(See also: Grooming, Child Sexual Abuse Material, CP, Cyberbullying)
In Trust & Safety, refers to online risks to children, including exposure to harmful content, abuse, harassment, CSAM, and exploitation.
Read more: Policy Series: Child Safety
(See also: Child Safety)
Widely referred to as CSAM.
Images, text, or videos depicting the sexual abuse of minors (under 18 years old). For many platforms, this includes individuals who appear to be minors.
(See also: Dark Web)
Websites that are publicly accessible to all audiences, through standard browsers and search engines. Estimated to only host 5% of online content.
(See also: Anti-Money Laundering)
A set of laws and regulations that require financial institutions and financial technology firms to assist law enforcement in their attempt to block terrorist entities from accessing funds.
Read more: Read More
Read more: Funding White SupremacyÂ
(See also: Content Moderation)
The process of detecting, flagging, removing, or denying abusive users the ability to post harmful comments.
Read more: Content Detection Tools
Moderation activities that are conducted by community members, and not professional moderation teams.
(See also: Trust & Safety, Harmful Content)
The internal process of screening user-generated content posted to online platforms, in order to determine whether or not it violates policy, and take appropriate action against violative content.
(See also: Policy Analyst, Policy Specialist/Policy Manager)
Team member involved in the dynamic process of creating and maintaining the community guidelines of an online platform.
The use of online platforms to bully individuals, generally refers to the abuse of children.
(See also: Clear Web)
The part of the online world that is only accessible via dedicated dark web browsers which use “onion routing” technology to allow completely anonymous browsing. While the dark web isn’t necessarily illegal or illicit, the anonymity it provides allows for these activities to take place.
(See also: Trust & Safety Platform)
A tree-shaped model (or flow chart) of questions or decisions, where each outcome determines the next question or decision to be made, in order to come to an eventual outcome.
In Trust & Safety, a decision tree is often created to streamline and standardize the moderation decision process. Policy teams will create a decision tree for moderators to use when making a decision about a specific item o... Read More
In Trust & Safety, a decision tree is often created to streamline and standardize the moderation decision process. Policy teams will create a decision tree for moderators to use when making a decision about a specific item or account.
(See also: Misinformation)
Intentionally misleading information that is shared and broadly distributed (disseminated) with the purpose of misleading or deceiving an audience. Often used as propaganda, disinformation has been widely used to sow public mistrust, influence elections, and legitimize wars. This is generally an organized, orchestra... Read More
Intentionally misleading information that is shared and broadly distributed (disseminated) with the purpose of misleading or deceiving an audience. Often used as propaganda, disinformation has been widely used to sow public mistrust, influence elections, and legitimize wars. This is generally an organized, orchestrated effort.
A distinction should be made between disinformation – the intentional dissemination of misleading content, and misinformation – non-intentional distribution of misleading content.
Read more on disinformation related to elections, health, warfare, and social radicalization.
(See also: Community Moderation)
A type of content moderation where no individual person makes the moderation decision, rather community members (individual users) vote to determine if an item should or should not be allowed on the platform or forum.
(See also: Online Safety Bill (UK))
In the UK’s drafted Online Safety Bill, online platforms have a Duty of Care to assess risks to their users, put policies and procedures to minimize that risk, and take actions to keep users safe.
Read more:Â Read More
Read more:Â The UK Online Safety BillÂ
(See also: Feature Blocking, Blocking, Banning, Ban evasion)
The broad range of actions taken by Trust & Safety teams when content violates policy.
(See also: Accuracy)
The rate at which items or moderation events are incorrectly identified. This is the inverse of accuracy.
Read more: Measuring Trust & Safety
(See also: Fraudster, Phishing)
Defined as criminal deception intended to result in financial or personal gain.
In Trust & Safety, this may also refer to deceiving a user into providing their personally identifiable information (PII) or unknowingly providing access to their devices or accounts.
(See also: Child Safety, Child Sexual Abuse Material, CP)
The act of preparing or manipulating a minor into sexual victimization. Generally involves prolonged online communication, which may begin as non-sexual, and gradually escalates into sexually suggestive communication, before eventually leading to sexually offensive activities which may include physical contact.
Read more: Supporting Child Safety Read More
Read more: Supporting Child Safety
(See also: Community Guidelines, Community Moderation)
Any text, image, audio, video, or other content posted online that is considered violative, malicious, deceptive, illegal, offensive, or slanderous.
(See also: Hashing)
In Trust & Safety, organizations such as NCMEC aggregate databases of image hashes that are related to various offenses (in this case, child safety). Platforms can then compare image hashes from their content to hashes of known malicious content (such as CSAM). This way, moderators do not have to view and analyze a potentially harmful piece of content and can compare its hash to that of recognized images.
(See also: Hash Database)
Technology that creates a unique, fixed-length string of letters and numbers to represent a piece of data (often an image). The hash is non-reversible, meaning an image can’t be recreated from its hash.
(See also: Digital Services Act, Online Safety Bill (UK), Duty of Care)
Any expression or online content that incites, discriminates, justifies hatred, or promotes violence against an individual or group. Given distinct cultural and linguistic nuances, the detection of hate speech is often a complex task requiring regional and linguistic expertise.
(See also: Intelligence, Open Source Intelligence)
The collection of intelligence by means of interpersonal contact.
In Trust & Safety, specialized teams use human intelligence to infiltrate threat actor communities and proactively identify their means and methods.
(See also: Phishing)
Apps or websites that are intentionally created to resemble existing apps or services or appear to be a part of the user interface in order to gain access to personal data, passwords, bank accounts, etc.
(See also: Account Hijacking)
The creation of fake accounts, often using a target’s name or photo, in order to cause harm to that individual.
(See also: Human Intelligence)
In trust & safety, intelligence is used to proactively alert platforms about impending risks, and to inform better moderation decisions.
(See also: Intelligence, Human Intelligence, Open Source Intelligence)
Internal or vendor teams that are responsible for on- and off-platform intelligence collecting in support of the Trust & Safety team’s efforts. Utilizing a broad range of tactics, including OSINT, WEBINT, and HUMINT, the team detects new threats and trends, identifies tactics, techniques, and procedures (TTPs), conducts investigations into suspicious account behaviors, “red teams” platform policies, and more.
Read more: Trust & Safety Intelligence
(See also: Comment Moderation, Content Policy Manager, Automated Detection, Community Guidelines)
A form of content moderation that flags the instance of specific, potentially violative keywords used in text, audio, images, or videos posted on the platform. This type of moderation is limited in that it often lacks a contextual understanding of the keyword’s use, and requires constant updating of new violative keywords.
Read more: ... Read More
(See also: Anti-Money Laundering, Fraud, Foreign Terrorist Organization)
A component of a financial institution’s anti-money laundering policy, Know Your Customer (KYC) is a requirement for financial institutions and certain financial technology firms to verify the identity of a client to prevent illegal access to funds (for example – funding of terrorist activities).
(See also: Child Safety, Child Sexual Abuse Material, CP, Grooming)
Child predator slang for an underage female with a childlike appearance, or a woman who is of age but physically looks, or dresses like a minor.
(See also: Harmful Content)
Content that is created or shared with malicious intent. Includes but is not limited to child sexual abuse material, nudity, profanity, sexual content, bullying, terrorist content, violence, and disinformation.
(See also: Disinformation)
The unintentional creation or sharing of inaccurate or misleading information. This differs from disinformation in that misinformation is unintentional, while disinformation is the intentional distribution of misleading information.
(See also: Banning, Blocking, Blocklist, Community Guidelines, Downranking)
Platforms can allow users to mute other users so that their activities do not appear on their feeds.
(See also: Malicious Content, Child Sexual Abuse Material, Online Sexual Harassment)
Sexually-explicit images that were either acquired unknowingly or unlawfully or were taken consensually but shared or posted online without consent. NCII also include the sharing of intimate imagery beyond the scope of its intended use (ie. leaking intimate images shared on one platform, across other platforms)
(See also: Duty of Care, Digital Services Act, Christchurch Voluntary Principles)
The UK’s Online Safety Bill is upcoming legislation that will require online platforms to take proactive action to keep users safe. The Bill outlines illegal, and some legal but harmful content that platforms will have to act against. The Bill’s current draft is in Parliament and is expected to pass by the end of 2022.
Read more: The UK Online Safety Bill
(See also: Non Consensual Intimate Imagery, Cyberstalking)
Sexual harassment or misconduct that is conducted online. This is disproportionately aimed at women and/or members of the LGBTQIA+ community.
(See also: Human Intelligence, Intelligence)
Also knowns as web intelligence (WEBINT). Intelligence that is collected via publicly available information online. In Trust & Safety, open-source intelligence is used to gain a contextual understanding of harmful activities, enabling proactive content moderation.
(See also: Community Guidelines)
The content policy of a user-generated content website defines what can and can’t be posted to that specific website. Also known as Content Policy or Community Guidelines.
(See also: Policy Enforcement, Policy Specialist/Policy Manager, Policy)
Member of the policy team, responsible for analyzing and examining the effectiveness of a company’s content policies.
(See also: Policy, Policy Analyst, Policy Specialist/Policy Manager)
The team responsible for enforcing the platform’s content policies and taking action against violative content.
(See also: Policy, Policy Analyst, Policy Enforcement)
Individuals in the Trust & Safety team responsible for establishing the platform’s community guidelines and defining what is and isn’t allowed to be shared on the platform.
Involves collaboration with internal teams as well as external agencies such as law enforcement, regulators, and industry partners.
(See also: Proactive Removal Rate)
A form of content moderation that aims to detect malicious or violative content before it is seen or reported by others. Utilizes various techniques, including automated detection and intelligence collecting to identify the violative content before it has a chance to harm user safety or platform integrity.
Previously used only to detect illegal content like CSAM and terror, pending legislation may require platforms to proactively moderate harmful, not just illegal content.
See Harmful Content Detection
(See also: Proactive Moderation)
A metric that indicates the rate at which action was taken on content or accounts prior to being posted or reported by other users.
Calculated by the number of proactively moderated items divided by the total number of moderated items.
A moderation process that relies on a platform’s community or other individuals to identify and flag content that may be in breach of a platform’s policies. Due to its reliance on the community, violative content is often seen by multiple users, before action is taken.
(See also: Precision)
The measure of how much of a platform’s malicious content is picked up by its moderation systems.
Calculated by the number of correctly identified malicious items, divided by the total number of malicious items on the platform. For example, if a platform has ten malicious pieces of content, and AI identified seven of them, that AI had a 70% recall rate. For most automated detection mechanisms, recall and precision are inversely correlated.
Read More: Measuring Trust & Safety
(See also: Red Team)
A method or process where attempts are made to replicate a system, process, machine, device, or software. Used in cybersecurity and red teams to find malicious code in software, websites, and apps.
An enforcement mechanism that involves the temporary, time-limited banning of an account.
(See also: Request for Information (RFI))
Generally provided by intelligence providers or internal intelligence desks, TTPs are a description of the techniques used by bad actors to conduct harm. In Trust & Safety, an understanding of bad actor TTPs can provide teams with the insights needed to proactively detect and stop on-platform harm.
Learn more: Trust & Safety Intelligence
(See also: Pro-Ana/Pro-Mia)
Prevalent in eating disorder communities, the term references images or other content that encourages one to engage in extreme dieting behaviors. Also known as “thinspiration”, “bonespo”, “fitspo”.
Learn more: Eating Disorder Communities
(See also: False Positive, False Negative)
Content that is correctly flagged as violative
Teams that are focused on the development, management, and enforcement of a platform’s policies to ensure that only acceptable content is posted online and that users are safe.
(See also: User Flagging)
An individual or vendor who is considered by the platform to be an expert in their field. Content flagged by this entity is therefore given special notice by moderation teams.
Sorry, there were no results for that search term.