When Palo Alto High School students submit an essay through Turnitin.com, their teacher can see two numbers attached to it – the first for plagiarism, and the second for AI use. While both numbers are important, the latter is becoming one of the most crucial numbers for the English department, and, according to Palo Alto Online, recently resulted in a 76% AI use flag that compromised a Paly sophomore’s semester of work.
According to the San Francisco Standard, a recently filed lawsuit against the Palo Alto Unified School District involves a Paly student accused of using AI on a school assignment. After being told to redo it by hand, he received a D on the assignment, significantly impacting his semester grade. Despite negotiations between the student’s parents and the district, the district has stood firm on the teacher’s decision to keep the student’s grade lowered.
This student’s experience is not an isolated incident – many Paly students face similar guilty-until-proven-innocent situations, and teachers are struggling to distinguish between what is and isn’t student work.
In December, AP Language and Composition teacher Alanna Williamson said student Turnitin AI use numbers were through the roof.
“It [AI detection] was over 40% in each one of my periods,” Williamson said. “So 10 to 13 students in each class period got flagged at like 50 to 80%.”
Williamson attributes the trend of using AI not to academic dishonesty, but to the intense academic pressure students face.
“It’s like handing [students] a digital gun,” Williamson said. “You guys don’t have the ability developmentally to not use something so enticing, right? … I don’t think kids are cheaters or liars. I just think that you guys are stressed out, and you’re looking for ways to get yourself help.”
Turnitin Director of Product Marketing Gretchen Hanson explained that this sentiment is reflected in Turnitin’s own AI chat, which students are using not as an essay writer, but as an on-demand tutor.
“What we found, which is pretty exciting, is that … students are using it as a 24/7 tutor,” Hanson said. “They’re like, ‘Can you help me with my grammar here?’ … which was not really what any of us expected. We kind of expected them to take the shortcut, and we’re not seeing them do that.
Hanson also explained that AI detectors flag potential AI activity by comparing student writing to standard word patterns used by large language models such as ChatGPT.
“It’s basically looking at the word you write and what’s the next most likely word that is most common out there,” Hanson said. “When we look at an AI detector to say, ‘Oh, hey, do we think that this was written by ChatGPT or something similar,’ it’s looking to see how off that baseline is.”
Hanson said AI detectors analyze past word pattern recognition, explaining that if students deviate too far from what is expected of them by the AI, they will be flagged for AI usage.
“[It’s] also just looking at ‘Is this normal for a student of high school age’ or college age or whatnot, and so it’s sort of geared toward the level of writing that one would expect for it for this type of assignment,” Hanson said.
Williamson said she takes additional steps to verify the legitimacy of Turnitin’s result once warned of potential AI use within the student paper.
“I would ask for the student’s document to be shared with me so that I can look through their revision history and see if there’s anything suspicious in there,” Williamson said. “Like massive copy-pastes or writing that doesn’t sound like their own.”
Turnitin encourages secondary verification as Williamson does. According to Hanson, authentic human writing is messy with backtracking and deletions, with most students starting with an outline, jumping between sections, and revising, making revision history a crucial part of evaluating student work.
“Most students are doing this non-linear writing, but then you have one student who pastes the whole two pages worth of text in, or writes from beginning to end, all the way down,” Hanson said. “That’s suspicious. That doesn’t tell me that the student is really writing.”
Hanson said that perfectly detecting AI-generated text is getting harder and harder, so she reframed the focus of the detection tool. Rather than using Turnitin as a tool to catch students using AI, she hopes it will help teachers to determine if AI is used as a tool or as a crutch. Furthermore, Hanson hopes teachers rely on AI detectors as dialogue tools for teachers to teach students about responsible use of AI rather than for “detection.”
Though Turnitin has helped Williamson detect students exploiting AI to complete assignments, she also said it flagged those who use tools like Grammarly that are accepted in her classroom.
“I think a lot of other kids are just using it [AI] in small ways that are like trying to get help,” Williamson said. “And unfortunately, that’s going to get flagged, and there’s no way for us to actually verify and see what they did.”
Paly senior Nina Faust said that she had her paper flagged for AI use even though it was completely her work.
“I felt pretty offended because I worked really hard on that paper,” Faust said. “ And my teacher talked a lot about how she was [going to] punish anyone who used AI. I remember, my heart dropped when I saw how much AI it thought that I used.”
While Turnitin strives for accurate detections, false positives still occur. According to Hanson, while Turnitin’s system aims for a false positive rate of under 1%, Hanson cautions against its use as absolute proof of academic dishonesty.
“We know this isn’t a perfect system,” Hanson said. “Those are just indicators that something might be up, and you might want to take the next step to investigate further … Your AI detection score should never really be considered as the end-all be-all of understanding if AI is in the paper.”
According to Hanson, Turrnitin constantly makes changes to its AI detection model to adapt to rapidly advancing AI models.
“This is a very fast-moving space, so we have monitoring in place that is consistently looking at our models … comparing them to future models that are coming out,” Hanson said, “We do have an open dialogue with some of the major LLM providers to understand when new models will be coming, so we can do some testing in certain cases and try to keep pace.”
Hanson sees improvements in AI detectors as a means to transition AI detection from catching students being dishonest to helping students grow.
“And I think that [greater development of AI detection tools] is how AI becomes really a helpful and supportive part of education, rather than just a signal or indicator to police or be punitive, which is not the right environment for anyone to learn effectively,” Hanson said.
There are some limitations to Turnitin’s adaptability, however, as sometimes the release of models outpaces the release of AI detectors.
“[It’s] not always possible,” Hanson said. “We can’t always line up our updates with the newest thing that comes out, because sometimes we just don’t know, but we’re constantly working with other folks in the industry, and are tweaking what we’ve put together.”
Based on an article Turnitin posted on April 9, 2024, it has reviewed over 200 million papers since its launch in April 2023. That number will only increase in the coming years.
Paly senior Jerry Yan, a researcher in AI prompt engineering, said that despite the low false positives AI detectors advertise, the errors detectors make still affect a large number of people because of the number of papers in the system.
“Even if, let’s say they [AI detectors] claim to have a 95% accuracy, that means 5% are still being falsely flagged,” Yan said. “Which is like, let’s say 10 million people turn it in … 500,000 people will get flagged.”
Williamson said that while Turnitin is not the most effective tool to prevent AI use, it is the only option they have. She anticipates that the technology will improve for more precision in detecting AI use for student work.
“Hopefully in the future things [AI detectors] will get more advanced, where we can actually see exactly what someone did and verify things,” Williamson said.
However, one way Williamson uses to combat the high prevalence of students flagged for AI use in their papers is to have everyone write the paper by hand. Since she implemented this method in her classroom, the majority of her students now display 0% AI use.
“That’s the only thing I’ve thought of so far, and it’s working,” Williamson said. “So everyone’s just gonna have to handwrite everything forever back to the 90s.”
