OpenAI is introducing a more accurate Moderation endpoint as a new-and-improved update to its content moderation tool to help API developers in protecting their applications. The company has levied this update to perform robustly across a wide range of applications, including social media, messaging systems, and AI chatbots.
The updated Moderation endpoint allows developers to access the application only via a programming interface to OpenAI’s Generative Pre-trained Transformer-based classifiers. These classifiers detect any unnecessary content in the applications.
Once you give an input text, the Moderation endpoint analyzes for content like hateful speech, sexual content, abusive language, etc, to be filtered out. It blocks all content (generated by OpenAI’s API) that goes against OpenAI’s content policy. Additionally, the endpoint can also weed and block harmful content generated by humans.
For instance, NGL, an anonymous messaging platform, utilizes OpenAI’s tool to filter out hateful language, bullying, racist remarks, etc.
The company claims that the update significantly reduces the possibility of an AI model “saying” the wrong thing, making it applicable to more sensitive use cases like education. This Moderation endpoint is free if your content is generated on OpenAI API. In the case of non-API content, users will have to pay a fee.
Developers can start using the tool after going through its documentation. OpenAI has also published a paper highlighting training and performance analysis of the tool along with an evaluation dataset to inspire further research in AI-driven moderation.