Mendel AI and the University of Massachusetts Amherst (UMass Amherst) have published key data on addressing the critical issue of hallucinations in AI-generated medical summaries using a pioneering artificial intelligence (AI) framework called “Hypercube”. Launched in 2023, Hypercube arrived with the potential of improved patient engagement and screening for clinical trials.
The Mendel and UMass Amherst study, titled “Faithfulness Hallucination Detection in Healthcare AI,” tackles the challenge posed by large language models (LLMs) like GPT-4o and Llama-3, which, despite their impressive capabilities, are prone to generating inaccurate or misleading information — a phenomenon known as AI hallucinations. The study evaluated LLMs for their tendency to deviate from the context of their instructions, which sometimes produced contradictory results from the provided data.
XTALKS WEBINAR: How Artificial Intelligence is Transforming Clinical Development
Live and On-Demand: Thursday, September 12, 2024, at 3pm EDT (12pm PDT)
Register for this webinar to learn more about how organizations can apply AI for the clinical protocol development process, critically evaluate AI-based solutions to ensure they are fit for purpose and how organizations can be set up for success by adopting tools to accelerate clinical development.
Key findings in the study showed that GPT-4o, while generating longer summaries, was more prone to hallucinations due to its complex reasoning, whereas Llama-3 produced fewer hallucinations but with lower-quality summaries. The research underscored the necessity of accuracy in AI models to prevent misdiagnoses and inappropriate treatments.
To combat these challenges, the research team developed the Hypercube system, leveraging medical knowledge bases, symbolic reasoning and natural language processing (NLP) to detect hallucinations. Hypercube offers a cost-effective solution by providing a comprehensive representation of patient documents, facilitating initial detection before expert review.
A framework that systematically detects and categorizes AI hallucinations significantly improves the trustworthiness of AI in clinical settings. The Mendel AI and UMass collaboration is a step forward in ensuring the reliability and safety of AI applications in healthcare. Looking ahead, the team plans to refine their detection framework and further develop automated systems like Hypercube.
Stanford University’s Human-Centered Artificial Intelligence recently published an article about a Nature study where AI-generated summaries often outperformed those written by human experts. The study highlighted the potential of AI to alleviate the documentation burden on doctors, allowing them to spend more time with patients and reducing the risk of errors.
Late last year, researchers at Penn State College of Information Sciences and Technology published a paper on their own innovative framework to manage unfaithfulness. The Faithfulness for Medical Summarization (FaMeSumm) framework was used to fine-tune smaller mainstream language models. These fine-tuned models were found to outperform the larger GPT-3 when summarizing doctor-patient records in the study.
With generative AI touted to augment clinical productivity in terms of clinical workflows, clinician-patient interactions, patient care and administrative efficiency, refining LLM models are becoming increasingly crucial.
Join or login to leave a comment
JOIN LOGIN