The convenience of turning to artificial intelligence (AI) for health information is appealing. But it raises an important question: how reliable are these answers when the topic is our health?
Instead of opening multiple websites or searching through medical articles, we now can type a question into a chatbot and receive a clear, conversational answer in seconds. Symptoms, medications, diets and risk factors – questions that once required careful searching now produce a single response that sounds confident and credible.
However, research examining large language models in health contexts suggests a mixed picture. AI tools can sometimes generate useful explanations or summaries of medical information, but they can also produce responses that are inaccurate, incomplete or unsupported by evidence.
That tension – between accessibility and reliability – is a significant issue in how people encounter and interpret health information.
Survey data from the United States suggest about one in six adults report using AI chatbots at least monthly for health information or advice, with use rising to roughly one in four adults under 30.
At the same time, investment in AI technologies for health care is expanding rapidly. Market projections estimate the global generative AI health care sector could grow from approximately US$1.7 billion in 2023 to nearly US$15 billion by 2030.
However, increased use does not necessarily mean the information these systems provide is consistently accurate or transparent.
Researchers have been examining how large language models – the systems behind many AI chatbots – perform when asked health questions.
A systematic review evaluating large language model health-advice chatbots found wide variation in accuracy and safety across studies, along with inconsistent evaluation methods and documented risks such as hallucinated information.
Another systematic review examining the use of ChatGPT in health care identified potential applications alongside several limitations. While the technology may assist with tasks such as summarizing medical information or generating explanations, researchers also highlighted issues including lack of source transparency, the possibility of misleading outputs and potential safety concerns if responses are relied upon without oversight.
Survey research examining how people use AI-generated health information adds another dimension. Many users report perceived benefits, including convenience and easier understanding of health topics, while also expressing concerns about accuracy and about how AI-generated information might influence health decisions or care-seeking behaviour.
Researchers have also examined whether AI systems can evaluate the quality of health news. One study found that ChatGPT could sometimes distinguish higher-quality reporting from weaker reporting, but its reasoning and accuracy were inconsistent, suggesting caution when relying on AI systems as a filter for health misinformation.
But if the limitations are known, why do AI health answers feel so compelling?
Ethics researchers examining large language models in medicine have highlighted an additional concern: these systems can generate “convincing but inaccurate content,” meaning responses may appear credible even when they contain errors.
Taken together, the evidence suggests a consistent pattern: AI-generated health information can sometimes be useful, but its accuracy varies and its sources are often not transparent.
But if the limitations are known, why do AI health answers feel so compelling?
Part of the reason lies in how the information is delivered.
Traditional online searches often require sorting through multiple websites, conflicting advice and technical language. AI tools, by contrast, provide a single response written in conversational language that can feel clear and coherent.
That clarity can be appealing during moments of uncertainty. However, the appearance of certainty does not necessarily reflect the strength of the underlying evidence.
Unlike peer-reviewed research or clinical guidelines, AI responses may not clearly identify the sources supporting the information they provide. Without that context, it becomes difficult for users to judge whether a response reflects well-established medical evidence or uncertain information.
Because large language models generate responses conversationally and often without clearly identifying sources, users may find it difficult to evaluate the reliability of the information they receive.
One useful step is to check whether the response clearly identifies its sources.
Does the answer reference a study, guideline or credible health organization? Does it explain where the information comes from?
When sources are absent or vague, verifying the information through other reliable sources can help reduce the risk of acting on incomplete or misleading advice.
AI likely will remain part of the health information landscape. Its speed and accessibility make it attractive to users, and its capabilities continue to evolve.
But as these tools become more widely used, the challenge for health systems, researchers and communicators will be ensuring that convenience does not replace evidence.
Because when it comes to health decisions, confidence in an answer is not the same thing as confidence in the evidence behind it.
