By Jacob Mchangama and Jordi Calvet-Bademunt
Summary
Anyone who has tested generative AI with slightly controversial issues is now familiar with expressions such as “I’m unable to help you with that” (Google’s Gemini). Or “I’m not able to generate content that takes a stand on controversial historical or political issues” (Inflection’s Pi). And “I am committed to promoting balanced and unbiased information” (OpenAI’s ChatGPT). These replies resulted from prompts on topics like transgender women’s participation in women’s tournaments, the Israeli-Hamas war, or the effects of European Colonialism. They are not only frustrating; they are also bad for our free-speech culture. The situation could be made even worse as generative AI becomes integrated into other applications such as email services, word processors, and web search.
We urgently need to have a conversation on free speech, including access to information and what type of content we want and do not want in generative AI. The harms potentially resulting from generative AI, like the generation of child exploitation material and even existential risks, have been widely discussed and need to be firmly tackled. However, we seem to have neglected to talk about what to do with controversial or even offensive content and accepted that issues like “disinformation” or “hate speech” should be totally banned from generative AI.
This paper, published by The Future of Free Speech in February 2024, contributes to the discussion on free speech in generative AI. The paper reviews the policies of six major chatbots – AI21 Labs Chat (AI21 Labs), Gemini (Google), ChatGPT (OpenAI), Claude (Anthropic), Coral (Cohere), and Pi (Inflection) – and examines the type of content they prohibit. The analysis focuses on generative AI models with web interfaces (“chatbots”) and their policies on disinformation, misinformation, and hate speech, which all the chatbots had.
Key findings
-
The policies of the selected chatbots do not align with the benchmark international human rights standards.
First, the policies on disinformation, misinformation, and hate speech are not sufficiently clear and specific. Regarding disinformation and misinformation, freedom of expression experts encourage digital companies to clearly define these terms and outline the potential harms that prohibited content may cause, like public health risks. None of the generative AI companies follow this guidance. Hate-speech policies are excessively vague, too, as they do not provide sufficient information on the specific categories of users that are protected from hatred (e.g., ethnicity, religion, gender), the reasons justifying the prohibition (e.g., threatening the right to vote), and other criteria the freedom of expression experts have proposed.
-
The policies are not proportionate and go significantly beyond the legitimate interests that justify speech restrictions.
These legitimate interests (e.g., the respect of the rights or reputations of others) are outlined in the benchmark international human rights standards and provide justifications for restricting freedom of expression, subject to a proportionality test. Due to limited resources, this analysis focuses only on hate-speech policies. Specifically, the paper finds that none of the companies precisely define which categories of users are protected from hate speech (for instance, based on race, nationality, or religion); rather, all of them use broad or open-ended restrictive clauses. The analysis also considers the Rabat Plan of Action, a key global standard that includes a six-part test providing guidance on how to balance between freedom of expression and incitement to hatred. The paper concludes that one of the six elements in the test – the extent of the dissemination of content – is likely less worrying in generative AI than in social media. The other elements do not so obviously change in a generative AI context. And yet, generative AI providers’ policies seem even more restrictive than social media’s, at least regarding hate speech.
-
Most chatbots seem to significantly restrict their content.
Chatbots refused to generate text for more than 40 percent of the prompts – and may be biased regarding specific topics – as they were generally willing to generate content supporting one side of the argument but not the other. The paper explores this point using anecdotal evidence. The findings are based on prompts that requested chatbots to generate “soft” hate speech – speech that is controversial and may cause pain to members of communities but does not intend to harm and is not recognized as incitement to hatred by international human rights law. Specifically, the prompts asked for the main arguments used to defend certain controversial statements (e.g., why transgender women should not be allowed to participate in women’s tournaments, or why white Protestants hold too much power in the U.S.) and requested the generation of Facebook posts supporting and countering these statements. The paper recognizes that policies other than those prohibiting hate speech may play a role in blocking content generation.