OpenAI offers error-prone AI detector amid fears of a machine-stuffed future

OpenAI has released a free online tool designed to predict whether a passage of text was generated by AI or written by a human.

Dubbed the “AI Text Classifier”, the software is powered by a language model and rates the likelihood a chunk of text was generated by an AI model on a five-point scale that goes from “very unlikely” to “unclear” to “likely.” We tried it on some of our articles from a decade or so ago, and several vultures may be peeved to learn their copy was rated “unclear if it is AI-generated.”

The tool, unveiled Tuesday, debuted months after the startup launched ChatGPT, a chatbot that automatically generates text when instructed, such as answering a question, telling a joke, or penning a poem. In effect, OpenAI helped fuel the rise in prattle-streaming bots and now offers a half-assed tool to detect this kind of stuff.

Schools and universities in the US, France, and India, at least, have since banned students from accessing ChatGPT using their networks or from submitting essays generated by the software. 

The classifier is sometimes extremely confident in a wrong prediction

Experts also fear text-generation models could be used to emit tons of misinformation, phishing emails, reams and reams of nonsense to flood the internet. Several organizations and computer scientists have therefore started work on their own classifier tools to detect AI-generated content.

OpenAI’s AI Text Classifier isn’t perfect. “Our intended use for the AI Text Classifier is to foster conversation about the distinction between human-written and AI-generated content,” the Microsoft-bankrolled lab said.

“The results may help, but should not be the sole piece of evidence, when deciding whether a document was generated with AI,” the organization added. “The model is trained on human-written text from a variety of sources, which may not be representative of all kinds of human-written text.”

The AI Text Classifier is designed to detect machine-made text from various sources, not just its over-hyped ChatGPT. It was trained on both AI-written text from 34 models built by five organizations, and human-written text scraped from the internet and taken from an internal company dataset.

The tool requires text samples of 1,000 characters, and doesn’t work well for languages other than English. OpenAI’s head of alignment Jan Leike told Axios its predictions can produce false positives or false negatives.

The classifier will not even be that useful for teachers who are looking to assess whether a student submitted an assignment generated by software like ChatGPT. The AI Text Classifier is not sensitive enough to tell which sentences or snippets may have been the work of AI, so any text produced by a computer and tweaked by a human may evade detection. 

“We caution that the model has not been carefully evaluated on many of the expected principle targets – including student essays, automated disinformation campaigns, or chat transcripts. Indeed, classifiers based on neural networks are known to be poorly calibrated outside of their training data. For inputs that are very different from text in our training set, the classifier is sometimes extremely confident in a wrong prediction,” the lab warned.

OpenAI is also reportedly exploring other approaches to detect AI-generated text including a watermarking technique that may be built into its future products.

The Register has asked OpenAI for further, preferably non-AI-generated, comment. ®



Accessibility Dashboard

Accessibility settings have been reset

Help = available voice commands

Hide help = available voice commands

Scroll down = available voice commands

Scroll up = available voice commands

Go to top = available voice commands

Go to bottom = available voice commands

Tab = available voice commands

Tab back = available voice commands

Show numbers = available voice commands

Hide numbers = available voice commands

Clear input = available voice commands

Enter = available voice commands

Reload = available voice commands

Stop = available voice commands

Exit = available voice commands