News: AI tool for hospitals fabricates text in transcription, researchers say

October 31, 2024

CDI Strategies - Volume 18, Issue 49

Amidst the many artificial intelligence (AI) tools introduced to improve documentation accuracy, one version of the AI-powered transcription tool called Whisper is being used by over 30,000 clinicians and 40 health systems to translate and transcribe patients’ consultations with doctors, MedPage Today reported. This tool built by Nabla is intended to free up medical providers to spend less time on notetaking or report writing, and it’s estimated to have been used to transcribe seven million medical visits.

Whisper’s creator, OpenAI, has cautioned that the tool should not be used in “high-risk domains,” however, and three different researchers who conducted studies of the tool’s audio transcriptions found the tool is prone to making up chunks of text and even entire sentences.

One researcher from the University of Michigan told MedPage Today that he found such invented text—known in the tech industry as “hallucinations”—in eight out of every 10 audio transcriptions. A machine learning engineer analyzed over 100 hours of the Whisper transcriptions and discovered hallucinations in about half of them. A third developer said he created 26,000 transcripts with Whisper and found hallucinations in almost every one. While transcription tools are expected to misspell words and make other errors, engineers and researchers said they haven’t seen another AI-powered transcription tool hallucinate as often as Whisper.

Some of the text can include racial commentary, violent rhetoric, and imagined medical treatments, experts have warned. For instance, in one transcription, Whisper invented a non-existent medication called “hyperactivated antibiotics.” Software developers said these tend to occur amid pauses, background sounds, or music playing. Unfortunately, Nabla's chief technology officer Martin Raison says it’s impossible to compare their tool’s AI-generated transcript to the original recording, because Nabla's tool erases the original audio for “data safety reasons.”

Particularly in hospital settings, such mistakes could have “really grave consequences,” said Alondra Nelson, PhD, who led the White House Office of Science and Technology Policy until last year.

“This seems solvable if the company is willing to prioritize it,” said William Saunders, a San Francisco-based research engineer. Earlier this year, he quit OpenAI over concerns with the company's direction. “It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems.”

An OpenAI spokesperson told MedPage Today that the company continually studies how to reduce hallucinations and appreciated the researchers' findings. They added that OpenAI incorporates feedback in model updates.

Editor’s note: To read MedPage Today’s coverage of this story, click here.

Found in Categories:

Clinical & Coding, News