Our messaging tool brings your volunteer community together in one secure place. To ensure conversations remain respectful without adding work for your team, automatic content moderation powered by artificial intelligence (AI) is built directly into the platform. This safety filter runs in the background to analyse messages before they become visible to other users, helping maintain a secure environment in real time.
IN THIS ARTICLE:
Understanding AI Messaging Moderation
AI moderation operates as an automatic safety filter running in the background 24 hours a day across every communication channel. Driven by artificial intelligence, this safety system instantly analyses text, emojis, and media against defined platform policies before messages ever become visible to other users.
Unlike manual moderation methods that require a person to read every submission, this automated system assesses content instantly and at scale. This continuous screening process maintains a secure digital environment in real time without placing an operational burden on administrative teams or volunteers.
How the Moderation Process Works
When a user sends a message, the content passes through an AI engine in real time before it is delivered to the channel. Driven by artificial intelligence, this safety system instantly analyses text, emojis, and media at scale, scoring the content against defined platform policies. This real-time evaluation results in one of two automatic outcomes:
Message Approved: Content that falls within safe, acceptable boundaries is delivered normally. The entire process happens instantly, so approved messages appear with no noticeable delay.
Message Blocked: Content that violates platform policies is stopped immediately. The message is not delivered, the sender receives a notification, and other channel members will never see the flagged content.
Blocked Content Categories and Character Limits
The AI engine screens all message text and emojis against defined standards appropriate for a volunteer management environment. This includes filtering out:
Hate speech or discriminatory language
Harassment, threats, or abusive behavior
Sexually explicit or graphic content
Spam or repetitive unsolicited content
Content violating privacy or confidentiality
To ensure consistent and reliable performance for all users across all channels, each message is subject to a 5,000-character limit.
Moderation policies are defined and maintained by Rosterfy based on industry best practice and standards appropriate for a volunteer management environment. For a complete list of what can be detected, see the Full List of Flagged Categories section below.
System Limitations and Shared Responsibilities
Automated moderation is a support mechanism, not a flawless solution or a substitute for personal responsibility. The AI may occasionally miss policy violations or mistakenly flag appropriate content. Your team plays an important role alongside the automated system:
Inform volunteers: Ensure volunteers are aware that messaging is subject to automated content moderation before they begin using it.
Set expectations: Share your organisation's communication guidelines so that volunteers understand what is and is not appropriate.
Volunteers and Administrators must share the following operational responsibilities:
Read and understand the terms of use and take personal responsibility for sent content.
Report inappropriate content through the correct organisational channels.
Treat fellow volunteers with respect, regardless of system outcomes.
📌 Note: Users should not rely on the system to detect every policy violation and should always consider whether their meessage is suitable before sending it.
Data Privacy and Security Standards
Message data is processed solely for real-time safety screening. Processing adheres to strict data handling policies and privacy legislation based on the following standards:
Your data is not used to train AI models. Message content is never used to develop, train, or improve any AI or machine learning models.
Processing is limited in scope. Content is analysed strictly for policy compliance only in real time and is not stored or analysed for any other purpose.
Data handling follows Rosterfy's privacy obligations. To maintain volunteer privacy within public channels, user profiles display only the volunteer's first name and the first initial of their surname. Full names are not visible to other channel members.
Excluded Features
To maintain standard system boundaries, the following administrative capabilities are not included in the platform:
No Blocked Message Access: Administrators do not have a view or inbox within the platform to read blocked messages.
No Moderation Analytics: Detailed moderation logs, metrics, and analytics are not available.
No Policy Customisation: Moderation policies are entirely managed and updated centrally by Rosterfy based on standard volunteer environment practices.
Full List of Flagged Categories
The AI moderation engine can detect and act on the following categories of content:
📌 Note: This is the full list of what the engine can detect and is subject to change. Rosterfy will determine which of these are active and what action is taken for each.
Severity levels also apply to many categories. Rather than a simple on/off, the engine classifies content as Low, Medium, High, or Critical, allowing different actions to be configured at each level.
Harmful and Abusive Content
Hatred: Expressions of intense dislike or ill will towards individuals or groups.
Insult: Language that demeans or belittles others.
Threat: Statements expressing an intention to cause harm or violence.
Moral Harassment: Comments that aim to degrade or humiliate someone.
Body-shaming: Criticism or mockery of someone's physical appearance.
Discrimination
Racism: Discriminatory or prejudiced comments based on race or ethnicity.
Misogyny: Content expressing hatred or prejudice against women.
Ableism: Discriminatory comments based on physical or mental disabilities.
LGBTQIA+ Phobia: Negative or hostile content towards LGBTQIA+ individuals.
Sexual Content
Sexual Harassment: Unwelcome comments or advances of a sexual nature.
Sexually Explicit: Sexually graphic or suggestive content.
Pedophilia: Any content expressing sexual interest in minors.
Safety and Security
Self Harm: Expressions of intent to harm oneself or self-destructive behaviour.
Terrorism & Violent Extremism: Support or promotion of terrorist activities or ideologies.
Terrorism Reference: Mentions of terrorist organisations or activities.
Doxxing: Publicly revealing private information about an individual without consent.
Weapon Explicit: Content related to weapons or their use.
Privacy and Identity
PII (Personally Identifiable Information): Sharing personal details such as phone numbers, addresses or email addresses without consent.
Underage User: Statements indicating a user is below the minimum age requirement.
Reputation Harm: Damaging someone's reputation through false or malicious statements.
Spam and Unwanted Content
Scam: Fraudulent schemes or deceptive practices.
Flood: Excessive or repetitive posting of messages.
Ads: Promotional messages or advertisements.
Useless: Irrelevant or trivial content that does not contribute to the conversation.
Forbidden Link: Sharing links to prohibited or harmful websites.
Link: Sharing of URLs or hyperlinks (configurable).
Platform Bypass: Attempts to move conversation to another platform.
Contextual / Configurable Categories
Drug Explicit: Content related to drug use or trafficking.
Vulgarity: Crude or offensive language.
Negative Criticism: Harsh or unconstructive comments.
Boycott: Calls to avoid or stop supporting certain products or organisations.
Dating: Comments expressing romantic interest or seeking relationships.
Politics: Comments related to political matters or figures.
Geopolitical: Comments related to international relations or global issues.
