The Problem with Using Auto-Captions in Education
Educational institutions have a responsibility to ensure accessible learning materials for their students, including providing accurately captioned videos.* The question is, are institutions that rely on auto-captions providing truly accessible videos?
In the 2020 State of Captioning study, which measures captioning behaviors across industries, 64 percent of respondents reported using auto-captions for educational content. Yet, institutions that solely use auto-captions for recorded videos are not necessarily delivering on accessibility.
As the use of online videos in classrooms continues to grow, institutions must strive to meet video accessibility standards. So, do auto-captions meet quality standards? How accurate are they? Why might auto-captions be problematic for recorded video content?

The Auto-Caption Dilemma
Auto-captions use automatic speech recognition (ASR) technology to transcribe audio and synchronize text with videos to create closed captions. Platforms like YouTube offer auto-captioning tools for free, which is why auto-captions often appeal to so many.
Typically, solely using ASR to generate auto-captions for recorded videos is detrimental to the accuracy of the captions.
At its best, ASR can typically generate captions that are about 80-90 percent accurate. Still, many conditions must be present to reach that accuracy: little or no background noise, minimal grammatical errors, limited mispronunciations, and excellent audio quality. If any of the critical audio conditions aren’t met, auto-caption accuracy can drop as low as 50 percent. That means that out of every ten words, five could be captioned incorrectly. With that low of an accuracy rate, viewers must follow along with confusing and often inaccurate information.
The key to 99% or higher caption accuracy is human interaction.
ASR can be a useful gateway tool for accurate captions since it does the bulk of the transcribing work. Once an auto-captioning tool generates a rough transcript for recorded video content, a human should conduct quality assurance and make necessary edits to ensure caption accuracy.
Caption Accuracy Goes a Long Way
The importance of digital media in education is increasing as more educators use videos in their curriculum. Educational videos may come in the form of recorded lectures, study materials, and recorded presentations. All of that content must be accessible, and yes, that means it should be accurately captioned.

For educational content, especially, accuracy is critical. Students who are deaf and hard of hearing rely on captions to consume video materials for their courses. Still, they aren’t able to do so effectively, if at all, without accurate captions.
The other thing to consider is that accurate captions benefit all students, and 80 percent of people who use captions are not deaf or hard of hearing. Many students without disabilities utilize captions as a tool to improve their learning, comprehension, and focus. One study by the University of Florida St. Petersburg found that 42 percent of students use closed captions to help maintain focus.
At the very least, accurate captions are a powerful learning tool that helps all students. At the very most, accurate captions are what allow students with hearing disabilities to have equal opportunities in learning environments. These two reasons should serve as a strong motivation for institutions to rethink their auto-caption use and to ensure high-quality, accessible captions for all recorded video content.
Caption Accuracy as a Legal Standard
Recent legal cases have deemed accurate captions as a legal expectation for educational institutions.
In 2015, the NAD filed a class-action lawsuit against Harvard University for allegedly violating the Americans with Disabilities Act and the Rehabilitation Act by failing to provide accurate and comprehensive captioning for online educational videos.
The case between the NAD and Harvard was unique because it brought forth issues with the accuracy and comprehensiveness of the university’s captions.
After four years of litigation, on November 27, 2019, the NAD and Harvard University settled. The settlement contained specific requirements for caption accuracy, citing that captions should be on par with 3Play Media’s 99 percent accuracy rate.
The outcome of the NAD v. Harvard case (as well as the NAD v. MIT case) has set a legal precedent regarding caption accuracy. The new guidelines for Harvard and MIT may be a strong motivator for other institutions to focus on caption quality and perhaps move away from solely relying on auto-captions.
*Please note, for this post, we’re specifically referring to auto-captions for recorded video content.