Using LLMs with decision support could improve diagnoses, MGB shows

General researchers in Brigham General sees the value of a hybrid approach that uses obstetric artificial intelligence to diagnose patients.
Comparison of two large language-language models-APENAI GPT-4 and Google Gemini 1.5-through the local diagnostic decision support system, DXPLAIN found that MGB scientists found that DDSS outperformed LLMS in the diagnosis of patients with accurately-but the two types of AI can enhance one another better.
Why do it matter
DXPLAIN was first developed in Boston again in 1984 as a stand -alone platform and since then has evolved into a web -based diagnostic engine. It currently depends on 2,680 diseases, more than 6100 clinical results and hundreds of thousands of data points that generate and form potential diagnoses.
For comparison, researchers from the MGB General Hospital Laboratory in Computer Science prepared a set of 36 diverse clinical cases that depend on actual patients from three academic medical centers.
“The user can enter clinical results and DDSS will create a list of arranged diagnoses that explain the results,” the researchers explained to them a reportPublished last Thursday in Jama Network.
On the contrary, it has been shown that LLMS also performs as well as doctors in passing certain types of painting exams and achieved success Analyzing the status descriptions and generating accurate diagnoses.
“These results are worth noting, as the Toulidi AI was not designed for clinical thinking, but they generate text responses similar to any question using huge data collections collected from the Internet,” they said.
However, “amid all attention to LLMS, it is easy to forget that the first artificial intelligence systems used successfully in medicine were expert systems.”
The researchers chose Chatgpt and Gemini because they were the best performance in previous studies published in New England Magazine for Medicine and mosque.
“It has been proven that DDSS improves the accuracy of the medical population diagnosis capabilities, limits the length of residence for medical patients with sophisticated conditions and reveals high prediction results for critical diseases that can allow the previous detection.”
For a year -long study, three cases of doctors were evaluated, identifying all clinical results as well as sub -groups that are related to diagnoses set on DDSS vocabulary. They have returned two groups of distinctive copies, one to determine all clinical results and the other all positive and negative results related to the creation of diagnoses.
The researchers explained in the report that they chose two copies of entering cases to evaluate the DDSS to study because using all clinical results is likely how it is possible that the approach of future electronic health records is implemented.
Two other doctors enter without access to diagnosing each of these cases to DDSS, LLM1 (Chatgpt) and LLM2 (GEINI) to run AI comparison against artificial intelligence.
Unlike the MGB, the MGB DDSS requires the user to use vocabulary to be controlled from his dictionary. It also depends on matching keywords and other lexical technologies. For the purposes of the study, the researchers extracted individual results from each case and then appointed them to the clinical vocabulary of DDSS.
They compared both groups of the 25 most important DDSS diagnoses with 25 diagnoses created by each LLM for each of the 36 cases.
For discrimination with all results, but there are no laboratory test results, DDSS has often included the differential diagnosis (56 %) of ChatGPT (42 %) and Gemini (39 %), which researchers considered unimportant statistical.
However, where the results of the laboratory test in the case reports were included, all the three systems achieved success in the correct diagnosis -DDSS by 72 % of the time, ChatGPT, 64 % and Jimini, 58 %.
The researchers said: “LLMS performance well taking into account that it has not been designed for the medical field,” although she does not explain her thinking – the challenge of black black behavior in LLMS.
DDSS medical performance was better as data entry got all the results of the laboratory test, which is designed in its nature to explain its conclusions.
The researchers said: “Consequently, integration with clinical workflows must be allowed as all data is available to improve performance when compared to the current way to introduce doctors’ condition for the specific results, for example. “
Interestingly, the DDSS included the diagnosis of the differential condition of more than half the time that no LLM – 58 % compared to ChatGPT and 64 % compared to Al -Jamini – but each LLM included the diagnosis of 44 % of the time when DDSS was not included.
Thus, researchers imagine DXPLAIN with LLM as an ideal way forward, because it would improve the clinical effectiveness of both systems.
“For example, inquiries about LLMS to support their thinking to include the correct diagnoses that DDSS can help developers to correct any knowledge base errors,” they said. “On the contrary, the LLM claim to consider the diagnosis that the DDSS is on the list that LLM did not allow LLM to reconsider its differential diagnosis.”
The biggest direction
A previous study by MGB researchers, conducted at the Health System Innovation Center at the Operations Research Center, Put Chatgpt in the test that works through an entire clinical meeting with a patientRecommend the diagnosis of work, decide on the course of work and conduct a final treatment diagnosis.
LLM performance was fixed through care methods, but he struggled with differential diagnoses.
This is “The Meat and Potatoes of Medicine”, Dr. Marc Suci, Assistant Head of Innovation and Marketing and Executive Director of the Mesh Incubator Innovation Group in the Operations Research Group, said in a statement at that time.
“This is important because it tells us where doctors are truly experts and add the most value – in the early stages of patient care with a little progress information, when a list of potential diagnoses is required.”
With confidence, there is a decisive question about making decisions supported by artificial intelligence, it is possible that health care will have many support systems “simultaneously” in the foreseeable future, such as Dr. Blackford Middleton-a famous media expert and a clinical consultant with more than 40 years of experience With the support of the clinical decision – I recently clarified in a HimssCast interview with Health care news.
In the record
“The mixed approach that combines moderate linguistic capabilities and clarification,” said MGB researchers about the inevitable and illustrative capabilities of traditional Ddsss. Ticket.
Andrea Fox is a great health care editor.
Email: Afox@himss.org
Healthcare is Hosz News.