Navigating AI Moderation and the Risks to Free Expression

Hanna Barakat & Cambridge Diversity Fund / https://betterimagesofai.org / https://creativecommons.org/licenses/by/4.0/

November 20, 2025 | Confluence Blog, Learning

In conjunction with our Human Rights Due Diligence Working Group, GNI held a series of three calls for our members from May to October 2025 on the intersections of online content moderation, government restrictions and demands, possible impacts to freedom of expression and privacy, and related human rights due diligence practices. This blog post is the third in a series summarizing the takeaways from the calls. For further background, the first two posts in this series examine Exploring Human Rights Due Diligence in Community Moderation Models and Hash Databases, Due Diligence, and the Boundaries of Government Oversight.

Online platforms are increasingly integrating AI and AI-based tools into content moderation, creating risks to fundamental rights, especially freedom of expression and privacy. As automated systems scale, they also concentrate power over what is seen, who can speak, and how platforms respond to government pressure.

In October, the Global Network Initiative (GNI) convened a learning call exploring how AI is being used in content moderation, how government interventions intersect with these tools, and what human rights due diligence (HRDD) looks like in this shifting landscape.

The call was part of a broader series that aims to deepen collective understanding of how different models of content governance can impact the rights to freedom of expression and privacy, particularly as governments around the world become more active in regulating online spaces. As with all GNI activities, these discussions were held under GNI’s policies, including our antitrust compliance policy and code of conduct.

Moderation at scale

A central theme was scale. Automation has become essential in content moderation, helping platforms meet – at times legally mandated – response times and reducing the psychological toll on human moderators. Yet this reliance brings significant rights concerns. Keyword filters and hash-matching tools routinely take down lawful speech, including documentation of abuses, because they cannot distinguish between harmful content and reporting. In one case, automated translation errors led to benign Arabic phrases being read as violent commands. These incidents underscore the sometimes imprecise nature of current systems. While these tools are significantly improved from a few years ago, they still lack the ability to fully grasp nuance or intent.

The conversation considered the effectiveness of AI tools across different problem areas. These systems tend to perform relatively well when content can be judged from the image or text itself, such as in the detection of nudity. They perform far less effectively in cases that depend on external context, such as misinformation or identity verification, where accuracy requires understanding beyond the content itself.

Participants noted that while they come with risks, large language models do introduce new opportunities as well. Trust and safety teams are expanding the use of LLMs in operations, as the tools have improved enough to assist in shaping policy, managing appeals, and drafting user notices, not just assisting not with flagging content. These systems can be updated quickly when policies or standards change, making moderation more responsive and adaptable.

Yet, many challenges and risks remain. The Center for Democracy and Technology (CDT)’s recent paper, Lost in Translation: Large Language Models in Non-English Content Analysis, offers a deeper look into some of these challenges. The study found that large language models (LLMs) used for moderation perform unevenly across languages because most are trained primarily on English data. The problem compounds when crises erupt in regions where datasets are sparse and don’t adequately represent the new and evolving contexts and linguistic patterns. Systems that function adequately in English are much less accurate when analyzing speech from other linguistic or cultural communities.

Additionally, in this new tooling landscape, governments could seek to influence moderation more indirectly and opaquely, by providing standards or expectations that are then encoded into these systems. The same flexibility that allows platforms to refine moderation practices could also facilitate more direct state involvement.

Regulating automated moderation

Several participants warned that AI tools could become bargaining instruments in negotiations between platforms and governments. Companies might deploy or withhold certain moderation features to avoid regulatory penalties or to shape future oversight. Others cautioned against the euphemistic language often used to describe moderation, noting that, in practice, it can function as a form of censorship.

The discussion turned to the broader regulatory environment, where laws increasingly incentivize the use of AI while providing little guidance on how to govern its responsible use. For instance, laws in multiple jurisdictions require platforms to proactively take steps to mitigate content-related harms or risks, which in practice would likely involve increased reliance on automated approaches. While these laws may include language underscoring the importance of preserving freedom of expression, the focus of their compliance and enforcement provisions tends to prioritize content moderation. This “risk-based” orientation focuses on harm mitigation rather than on promoting pluralism or safeguarding freedom of expression. Without clearer guidance, platforms may end up over-removing content or implementing rigid systems that suppress lawful speech.

Participants reflected on how these systems are transforming not just moderation but the structure of online visibility itself. Freedom of expression online is no longer simply about the ability to post, it’s also about the ability to be seen. With algorithmic curation, recommendation, and demotion at play, “freedom of reach” becomes as consequential as freedom of speech. Practices like “shadow banning” or “algorithmic throttling” can silence users without explicit removals, creating invisible layers of moderation that escape oversight.

A participant noted that, as a result of these approaches, the Internet is shifting from an “open by default” to a “closed by default” environment with an increasing share of online interactions taking place between humans and bots. These bots are trained both to function as chat assistants and as autocomplete systems. They described this dynamic as a form of value creation suggesting that the entire online space should now be understood as operating within this emerging paradigm of the future.

As governments increasingly demand action against “harmful” or “illegal” content, platforms face mounting pressure to embed compliance into their moderation infrastructure. The danger lies in allowing regulatory demands to dictate algorithmic design. AI, once intended as a tool for scale and safety, risks becoming a conduit for indirect state control. At the same time, platforms may use their AI capabilities as leverage in negotiations with governments, offering or withholding moderation features to shape regulation in their favor.

Role of human rights due diligence

Across the discussion, participants emphasized that effective human rights due diligence (HRDD) must evolve alongside AI itself. HRDD cannot be a one-off assessment; it has to be built into product design, testing, and deployment. Continuous monitoring, independent evaluation, transparency about error rates, and meaningful user recourse are essential ways that platforms can ensure that their content moderation efforts avoid creating unintended consequences. Open channels for engagement with key stakeholders, such as civil society organizations and researchers, can also help platforms understand the impacts of automated tools and find ways to adjust them. Platforms should make moderation datasets and model evaluations public wherever possible, while regulators must focus on overseeing systems rather than dictating content outcomes.

This learning call contributed to GNI’s ongoing work to explore how human rights principles can be applied in the context of AI and content moderation. GNI will continue bringing together members and experts to identify practical approaches for enhancing transparency, accountability, oversight, and remedies across platforms, ensuring that automated systems are designed and operated in ways that respect fundamental rights.

Resources and references:

Dave Willner and Samidh Chakrabarti, “Using LLMs for Policy-Driven Content Classification,” January 29, 2024.
Agustina Del Campo, Nicolas Zara, and Ramiro Álvarez-Ugarte from the Center for Studies on Freedom of Expression at the University of Palermo, “Are Risks the New Rights? The Perils of Risk-based Approaches to Speech Regulation,” March 2025.
Juan Felipe Gomez, Caio Machado, Lucas Monteiro Paes, and Flavio Calmon, “Algorithmic Arbitrariness in Content Moderation,” June 2024.

More content on: HRDD, Intermediary Liability & Content Regulation.

Navigating AI Moderation and the Risks to Free Expression

Latest From GNI