More and more people are turning to Artificial Intelligence to ask about health. The challenge is how to use it without getting hurt.
A man with chest discomfort types a question into a chatbot: could this be “just reflux”? The answer is straightforward. Heartburn is common. Anxiety can mimic symptoms. Urgent care seems “unlikely.” He waits. By morning, the pain is worse. In the emergency department (ED), the ECG suggests an evolving coronary syndrome. The problem isn’t only that the AI tool can be wrong; it’s that confident language hides uncertainty, and triage is not what these systems do best.
The risks show up in studies. This year, researchers fed leading AI chatbots clinical scenarios that included a made-up term and a fake disease, test or symptom to see if the bots would accept the fiction and build advice around it. They often did. Even when the researchers flagged that the prompt might be inaccurate, the systems still “ran with” the misinformation.
Case reports describe patients harmed after following AI-generated advice that seemed authoritative. In fact, newsrooms have shown that, with minimal prompting, chatbots can produce plausible-looking false health answers with fake citations. That is a different risk: not just a wrong answer, but a factory for wrong answers that seem peer reviewed.
So, how should patients use AI safely and usefully?
Start with what the tools are good at: explaining terms, translating jargon and helping you prepare for a visit. Ask for plain language. State your age, major conditions and medicines. Then ask the model to generate questions for your clinician, not answers. Bring those questions with you to your medical appointment.
Then add two failsafes to your interactions with AI.
First, verify. Take any AI explanation and compare it to a reputable Canadian source – a government page, a major hospital, a disease-specific charity. If they disagree, treat the AI output as a draft, not a decision.
Second, respect thresholds. If you have crushing chest pain, one-sided weakness, severe shortness of breath, heavy bleeding, new slurred speech, confusion, fainting or signs of a severe allergy, seek care now. AI is not a triage nurse.
When you do bring AI into a visit, say so. “I asked a chatbot about statins. It said the benefits were small and the side effects common. Can we look at my absolute risk and decide together?” That is a useful conversation. You are not asking the model to make the call; you are using it to organize the call you will make with a professional.
None of this means AI shouldn’t be used. It means we should be honest about what it does well and where it falls short.
Clinicians have work to do, too. A 2025 systematic review of 137 studies found inconsistent reporting quality and thin safety detail; most papers used closed models with limited transparency. We should be cautious about broad claims based on that literature.
None of this means AI shouldn’t be used. It means we should be honest about what it does well and where it falls short. Some clinical evaluations find models helpful for writing summaries or drafting handoff notes with human review; others show solid performance in constrained tasks. But the words “with human review” do most of the safety work.
As mentioned, triage is a consistent weak spot. Some studies suggest large models look decent on average accuracy; others show under-triage for high-risk patients, which is more significant. If a tool misses the one person who needed urgent care, averages are cold comfort.
Hallucinations, the confident invention of facts, create a different kind of hazard. One survey in anesthesia warns that fabricated risks or reassurances can skew judgment during pre-op assessment. Another paper tracks “medical hallucination” as a workload issue: clinicians spend time verifying or correcting outputs instead of caring for patients.
We’ve already seen public examples outside of medical journals when Google’s health AI conflated anatomy and published an “old left basilar ganglia infarct”; a body part that doesn’t exist.
Some slips may sound minor until you imagine them passing unchecked into a patient chart and affecting management decisions.
Automation bias is real: tidy prose can look authoritative even when the facts are weak. Studies in safety science warn that inconsistent inputs, typos, missing demographics and dramatic language, can nudge models toward different answers to the same clinical problem. That should make us careful about moving outputs into workflows without checks.
We can acknowledge AI’s promise but it is essential that we insist on safeguards, transparency and human responsibility given the high cost of potential errors.
