ChatGPT has released a tsunami of plagiarism issues in classrooms around the country. What if the answer to these issues was something very basic – the way that students actually write using keyboards? What if researchers used keystroke data to figure out if students wrote what they said they wrote? A new body of work is exploring whether these techniques can be used to help teachers navigate the integration of AI in their classrooms.
Without question, AI has opened up new opportunities for students and teachers to make education more effective and personalized. But this technology has also created difficult issues and thorny questions regarding the training, use, and learning effects of AI in schools.
Last year at the University of Texas, for example, an entire class of students received failing grades after ChatGPT falsely flagged all of their papers as having been AI-generated when the professor asked the chatbot to check them for plagiarism.
This incident and others like it emphasize the need for safeguards and raise crucial questions on the ethical and practical implications of entrusting AI with the evaluation of student work.
Similar concerns may have been on President Biden’s mind recently when he announced an executive order directing the Department of Commerce to create guidance for authenticating and labeling AI-generated content. While the focus of this was likely preventing disinformation, the order may impact the use of AI in education. A recent survey by the National Education Association revealed that about one out of four teachers said they have caught students “cheating” by using an AI chatbot.
While few would urge students to use technology like ChatGPT to avoid challenging work or tough assignments, many researchers and experts argue that there’s a place for AI in today’s classroom. What’s missing are the tools teachers need to monitor and, when warranted, reign in the use of this emerging technology.
AI detection tools do exist and they all boast of an ability to identify AI-generated materials. But as the Texas example shows, they’re far from perfect. A recent analysis conducted by researchers across Europe and Mexico assessed 14 of the most widely-used tools on their ability to accurately distinguish between AI and human-generated writing. The researchers concluded that none of the detection tools were accurate or reliable. They also found that the tools were overwhelmingly biased towards classifying materials as human-written rather than created by AI. Two of the tools tested, Turnitin and PlagiarismCheck, are among the most popular apps used by teachers to check student work.
Even OpenAI, the company that created ChatGPT, took its AI-detection tool down after it found the model to be significantly flawed. Nearly 10% of the time the OpenAI tool mislabeled human-written text as “likely AI-written.” It was also wrong one-quarter of the time when asked to identify text written in languages other than English.
With so much inaccuracy and unreliability in the current models, educators need better tools and strategies to guide how AI is used in their classrooms. This is where developers can help.
Rather than build tools that can only recognize AI-generated content, they should also look to train their technology on how students write in real life.
One such effort that is currently underway seeks to identify the telltale clues that both humans leave behind like breadcrumbs in their writing process. Currently, I’ve been working with Vanderbilt University and Kaggle, a subsidiary of Google, to host a competition among data scientists to find these types of clues from student-produced writing and big language models, like Chat-GPT.
Some of these breadcrumbs may be hiding in plain sight. For instance, could studying a student’s keystroke behavior yield clues on a paper’s authenticity? After all, chatbots do not pause while writing, or delete and revise their work like a student would. Previous studies on the keystrokes writers make have been limited by small datasets, so more large-scale data analysis is needed, but the idea is both intriguing and promising. Furthermore, these typing behaviors and patterns in a student’s writing process may identify relationships between these behaviors and a student’s writing performance. Does a student’s pattern of pauses and edits, for example, contribute to the overall logic and flow of their essay? Researching and identifying relationships between a student’s writing behavior and their writing performance could open up new lines of exploration on how to better identify machine-generated content. It could also give teachers valuable information on how they could improve their in-person writing instruction.
As the incident in Texas shows – as well as similar instances in California and elsewhere – it’s paramount that developers get this right, and quickly. In nearly every circumstance, an overreliance on a tool led to a grave error, causing significant distress for both students and educators. As AI becomes more common in classrooms, developers and others must invest in robust, accurate, and fair tools to help educators assess the authenticity of student work. By improving the technology that underpins these tools, we can mitigate the risks associated with AI while harnessing its benefits for educational advancement.
Regardless of the benchmarking tools the Department of Commerce tries to use in its effort to meet President Biden’s executive order, the challenges to create reliable and accurate guidelines will be significant. As the President notes, AI will transform education. Teachers are rightfully concerned about their students using AI as a “cheat code” to good grades. But using inaccurate or unreliable tools can be just as detrimental if it leads to students being punished wrongly.
To ensure the equitable and fair use of AI in American schools, both teachers and students deserve tools that are accurate, free of bias, and safe.
Only through large-scale collaboration among the tech industry, government, and educators will AI meet its potential to be a positive force for change and progress in education.
Jules King is a Program Manager at The Learning Agency Lab where she helps organize and lead data science competitions. She has a strong background in research, development, and project management. Jules has worked in different analyst roles for state and federal agencies and non-profit organizations.
Your donation will support the work we do at brightbeam to shine a light on the voices who challenge decision makers to provide the learning opportunities all children need to thrive.