An instructor has accused students taking his agriculture science class at the University of Texas A&M-Commerce of cheating by using AI software to write their essays.

As detailed in a now-viral Reddit thread this week, Jared Mumm, a coordinator at the American university’s department of agricultural sciences and natural resources, informed students he used ChatGPT to assess whether their submitted assignments were human-written or produced by computer.

We’re told OpenAI’s bot labeled at least some of the submitted work as machine crafted, leading to grades being withheld pending an investigation. Students caught up in the row hit back, saying their essays were indeed written by them. As a result of the probe, diplomas were temporarily withheld for those graduating. It’s understood about half the class had their diplomas put on hold.

Specifically, Mumm said he ran his seniors’ final three essays through ChatGPT twice, and if the bot said both times for each piece that it wrote the work, he would flunk that paper.

“I will be giving everyone in this course an X,” he reportedly told his class, and apparently told several students: “I’m not grading AI s***.”

The University of Texas A&M-Commerce confirmed the X grade means incomplete, and was a temporary measure while the affair was investigated. Several students have now been cleared of any cheating, we note, while some others opted to submit fresh essays to be graded. At least one pupil so far has admitted using ChatGPT to complete assignments.

“A&M-Commerce confirms that no students failed the class or were barred from graduating because of this issue,” the institution said in a statement. “University officials are investigating the incident and developing policies to address the use or misuse of AI technology in the classroom.”

“They are also working to adopt AI detection tools and other resources to manage the intersection of AI technology and higher education. The use of AI in coursework is a rapidly changing issue that confronts all learning institutions. ChatGPT,” it continued.

A representative from the university declined to comment further. The Register has asked Mumm for comment.

One person familiar with the brouhaha at the uni told us: “So far it seems the situation is mostly resolved: the school admitted to students that the grades should not have been withheld in the first place. It was completely out of protocol and an inappropriate use of ChatGPT. They haven’t addressed the foul language in accusations yet.”

The kerfuffle highlights whether or not educators should use software to detect AI-produced content within submitted coursework. ChatGPT is not the greatest tool to use to classify machine-generated text; it cannot even accurately determine whether someone used it to write an essay. Basically, it shouldn’t be used this way, to detect text output by ChatGPT or some other model.

Other types of software specifically built to detect text generated by AI models are often not reliable, either, as is becoming increasingly apparent. 

A pre-publication study suggested it will be impossible to discern AI-written text as models improve. Vinu Sankar Sadasivan, a PhD student at the University of Maryland, and the first author of that paper, told us the chances of detecting AI-generated text using the best detectors is no better than flipping a coin. 

“Generative AI text models are trained using human text data with the objective of making their output resemble that of humans,” Sadasivan said.

“Some of these AI models even memorize human text and output them in some instances without citing the actual text source. As these large language models improve over time to mimic humans, the best possible detector would achieve only an accuracy of nearly 50 percent.

“This is because the probability distribution of text output from human and AI models can nearly be the same for a sufficiently advanced [large language model], making detection hard. Hence, we theoretically show that the task of reliable text detection is impossible in practice.”

The paper also showed that such software can be easily tricked into classifying AI text as human, if users make a few quick edits to paraphrase the outputs of a large language model. Sadasivan says universities and schools should not be using these detectors to check for plagiarism since they’re unreliable. 

“We should not use these detectors to make the final verdict. Borrowing words from my advisor, Prof Soheil Feizi: ‘I think we need to learn to live with the fact that we may never be able to reliably say if a text is written by a human or an AI’,” he said. ®