·20:30

60% of Nurses Don't Trust Their Hospital's AI — May 18, 2026

Show notes

Most hospitals running AI agents have no idea if they actually work.

Run time: 20:30

In today's episode:

  1. Only 59 percent of health orgs track their AI agents
  2. Qure.ai Aira deployed across Mozambique's national health system
  3. Annalise.ai goes network-wide at Parkway Singapore
  4. Nature Medicine: clinical trials for self-updating AI
  5. Nature Medicine: the AI co-scientist arrives in the clinic
  6. Google Health Coach launches tomorrow on Gemini
  7. Anthropic Code with Claude London hits tomorrow
  8. Anthropic briefs finance regulators on Claude Mythos risks

TL;DR:

  • Healthcare IT Today's May 17 industry digest surfaced two governance numbers worth pinning above every CIO's desk: only 59% of healthcare orgs actively track the performance of their AI agents in production, and 60% of nurses say they lack confidence in their organization's AI oversight — a credibility gap that lands the same week as Aidoc's CARE foundation model and Bayesian Health's continuous-sepsis monitor scale clinically.
  • Nature Medicine published "Clinical trials for continuously monitored and updated AI systems" by van Amsterdam, Oberst, Feng et al., proposing a conceptual framework that separates AI-intrinsic monitoring/updating from trial-oversight monitoring — the methodological companion to last week's Frontiers in Science manifesto on adaptive surgical robots and the FDA's real-time trials pilot.
  • VillageReach and Qure.ai have deployed Aira — Qure.ai's LLM-powered co-pilot for community health workers — across Mozambique's national health system under a Gates Foundation pandemic-preparedness grant, embedding it in the AlôVida hotline and PHEOC for early signal detection across primary care; pairs with last week's Anthropic-Gates $200M deal on neglected diseases.

Sources cited:

Subscribe: YouTube

medAI Times is for educational and informational purposes only. The content does not constitute medical advice, diagnosis, treatment recommendation, or professional clinical guidance. Consult qualified healthcare professionals and refer to official sources before making clinical, research, regulatory, or business decisions.

Transcript

Auto-generated from the episode audio. Click any timestamp to jump the player there.

Most hospitals running AI agents have no idea if they actually work. Welcome to MedAI Times podcast, your daily update on medical AI. Don't forget to like and subscribe. So imagine you hand a surgical team this brand new, highly advanced robotic scalpel.

But then the hospital administration just casually admits that, well, nobody is actually going to check if it stays sharp after that first surgery. Which is wild to even think about. Yeah, but that is the exact reality we are crashing into today.

I mean, we're rapidly moving past the shiny, controlled environment of building AI and slamming straight into the messy, high stakes reality of managing it in the real world. The real world is a lot less predictable, for sure.

Exactly. So to set the stage for you on this deep dive today, I'm just going to run through today's headlines really rapidly, back to back. Ready? Here we go. Only 59% of health organizations track their AI agents.

Queer.A's AIRA copilot is deployed across Mozambique's national health system. Network-wide AI triage hits Parkway, Singapore. Nature Medicine proposes clinical trials for self-updating AI. The AI co-scientist arrives in the clinic.

Google Health Coach launches tomorrow on Gemini. Anthropics Developer Conference hits London tomorrow. And Anthropic Briefs finance regulators on macroprudential AI risks. I mean, just the sheer velocity of those developments, it forces a total paradigm shift.

We're really no longer treating AI as this isolated experimental moonshot. Right, it's everywhere now. Yeah, suddenly we are dealing with governance, foundational infrastructure, and, well, systemic risk on a global scale. So let's start with that very first number, because I feel like it completely frames the scale of the problem we're facing.

59%. Yeah, that health care IT today, Digette. Right, it just revealed that only 59% of health care organizations actively track the performance of their AI agents after deployment. It's staggering. It is.

That means a massive 41% of organizations plug these incredibly complex, probabilistic systems into their hospital workflows and essentially just walk away. And the staff on the ground, they definitely feel it.

I mean, 60% of nurses are reporting low confidence in their employers' AI oversight. Well, and the clinical context requires a much closer look at what is actually running on the hospital floor right now. I mean, we're talking about foundation model triage tools like ADOC Care and continuous patient monitors like the ones from Bayesian Health looking for sepsis.

The systems that are autonomously flagging critical anomalies. Exactly. They're scaling up massively. So the primary bottleneck for deploying these tools, it used to be getting clearance from regulators like the FDA. Just getting in the door. Right, but today, the real friction is in hospital, post-market governance.

The clinical workforce delivering the care, like those nurses you mentioned, they understand something that hospital IT departments are seemingly ignoring. Which is? The absolute inevitability of clinical drift.

Clinical drift? OK, I've heard that term tossed around in research papers. But let's break down the actual mechanism there for the listener. Are we saying the AI itself inherently gets worse? Or is it that the environment around the AI changes? It is entirely about the environment outgrowing the algorithm.

You see, machine learning models are essentially frozen mathematical snapshots of the data they were trained on. OK, a snapshot. But human populations, they're dynamic. Demographic shift. Or a hospital might upgrade the specific type of scanning machines they use, which slightly alters the contrast or resolution of the images.

Oh, I see. So the input changes. Exactly. If an AI was trained on patient data from, say, 2021, and is suddenly analyzing a totally different patient demographic with brand new equipment in 2026, its accuracy is going to silently degrade.

The model didn't break. The world simply moved on without it. Which makes that 41% of hospitals doing zero monitoring just utterly terrifying. I mean, it's the dull robotic scalpel. You can't expect nurses, the actual people responsible for patient outcomes, to trust a black box that might be quietly failing.

No, you really can't. But here's the catch, right? The only alternative to a frozen degrading model is a model that continuously learns on the job. Yes, exactly. Which introduces a massive headache for regulators. And a new framework was just published in Nature Medicine by Van Amsterdam, Oberst, Fang, and colleagues tackling this exact problem, basically how to run clinical trials for AI that updates itself.

Yeah, because think about the friction of trying to regulate something that flat out refuses to stay the same. I mean, the gold standard of medical research is the randomized controlled trial, the RCT. The foundation of modern medicine. Right, and the entire premise of an RCT relies on isolating variables to establish a stable baseline.

If you're testing a new blood pressure medication, the chemical composition of that pill cannot change halfway through the study. Obviously not. But a self-updating AI system is fundamentally designed to ingest new patient data and refine its internal weights constantly.

So the intervention itself becomes a moving target. So traditional RCTs just completely break down. You lose your baseline entirely. You do. But if we ban models from updating, we're essentially handicapping the technology, right?

We're telling the AI, you aren't allowed to get smarter from experience. Which is why that Nature Medicine framework is so crucial. They're proposing a dual monitoring structure. You explicitly separate the monitoring that is intrinsic to the AI.

So the algorithm auditing its own updates from the monitoring required for the scientific oversight of the trial itself. Oh, OK, so splitting the oversight. Yeah. You establish an entirely new taxonomy that allows the AI to iterate safely without destroying the scientific validity of the clinical study.

It's literally sketching out a post-RCT regulatory science. And when you look at this alongside the FDA's current real-time trials pilot, like the AstraZeneca and Amgen oncology pilot, and those pushes for adaptive surgical robots and frontiers in science, you realize we are actively writing the rulebook for continuously learning clinical AI right now.

We really are. But wait, tracking clinical drift or running these continuous dual monitoring systems, that assumes the hospital actually has the digital plumbing to see what the AI is doing in real time. Do most regional hospitals even have that underlying capability yet?

That is the massive operational hurdle. Because you cannot regulate what you cannot see. The infrastructure layer is basically becoming the defining battlefield for health care AI in 2026. It is entirely about platform consolidation underneath the AI layer.

Which perfectly explains those massive enterprise imaging rollouts we're seeing across acute care footprints right now, like Arden Health rolling out FUJIFILM Synapse across their six-state network. On the surface, a hospital network upgrading its entire imaging system just sounds like a boring, expensive corporate IT upgrade.

But they're actually building an app store. That's exactly what it is. The Picture Archiving and Communication System, the PACS of record, dictates exactly which specialized AI tools a hospital can even run. It acts as the substrate.

Right, the foundation. Yeah, specialized vendors like Azure AI doing automated oncology workflows, or Revealdex doing lung nodule characterizations, they cannot operate in a vacuum. They have to plug into these massive consolidated platforms.

So they're layering specialized agents on top of the base infrastructure. Precisely. And once you have that consolidator plumbing, the scale you can achieve is just staggering. I mean, in Singapore right now, Parkway Radiology just implemented Enelis.ai's enterprise chest X-ray system across all their clinics and hospitals.

The first network-wide, country-level rollout. Yeah. And this isn't just looking for a collapsed lung, you know. The system flags over 124 different specific findings on a single scan. And clinically, this marks a massive shift.

It moves the entire Asia-Pacific market from these single-site experimental pilots to network defaults, which is a huge blow to rivals like Adoc and Lunit, by the way. Oh, I bet. But the mechanism of scale here is vital to understand.

A human radiologist, no matter how brilliant or experienced, experiences biological eye fatigue after an hour of scanning complex images. They're only human. Right. But the machine never needs a coffee break. It applies the exact same level of microscopic scrutiny to the 124th finding on the 500th scan of the day as it did on the very first.

It is the ultimate high-resource optimization tool. But what absolutely fascinates me is the duality of how this tech is scaling globally. I mean, in Singapore, it's about pushing the absolute ceiling of medical imaging in a highly-resourced environment.

Yeah. But then you look at Mozambique, where an AI scale-up is acting as the ultimate infrastructure leapfrogging tool. VillageReach and Core.ai, funded by the Gates Foundation, just deployed the AIRA copilot across Mozambique's national health system.

Yeah, they embedded an LLM specifically designed for community health workers directly into their primary health centers and, crucially, into their Alovida National Public Health Hotline.

Wow. And that data feeds directly into their public health emergency operations center. And they aren't just using it to diagnose individual callers, right? It's functioning as a mechanism for pandemic preparedness. Exactly. Because traditional sentinel surveillance relies on doctors noticing a trend, filling out paperwork, and sending it up a bureaucratic chain until someone at a health ministry notices a cluster of strange fevers.

Which takes weeks. It's inherently dangerously slow. But by running an LLM across a national public health hotline, the system can instantly aggregate thousands of scattered, localized conversations in real time. It's just listening to everything it wants.

Yes. It is listening for emerging symptom patterns and mapping the early whispers of a virus before humans can even perceive a connection. It analyzes the noise to find the signal before the outbreak even registers on a traditional hospital's radar.

That is profound. It's revolutionary for primary care networks. I mean, we're talking about tracking tiny anomalies in a high-tech Singapore hospital and simultaneously hunting for a novel virus across an entire nation's primary care network.

But, you know, AI isn't just operating at these massive network levels. It is shrinking down directly to the individual patient. Right. Because the consumer wearable market crosses a major threshold tomorrow with the launch of the Google Health Premium service on Gemini.

Yeah. It's a $9.99 a month service, and it ingests data from that new $99 Fitbit Air. It pulls in your SEP count, your nutrition, your sleep cycles, and optionally, it ingests your historical U.S. medical records.

Which is huge. It is the first mass-market consumer AI designed to bridge the gap between real-time wearable biometric data and historical electronic health record data. The ultimate dossier. The AI synthesizes the past and the present to generate highly personalized continuous health insights.

I've really been trying to imagine the real-world friction of this in a clinic. I mean, think about the traditional doctor-patient dynamic. You sit in the paper gown, the doctor walks in with the chart and they hold all the information. The traditional gatekeepers.

Right. But what happens when a patient walks in already pre-processed by an LLM? They arrive with a synthesized 20-page dossier of medical hypotheses generated by Gemini over the last month.

Well, the physician is no longer the sole aggregator of the patient's data, which entirely upends the power dynamic of the consultation. I'd imagine as a physician, it would be deeply frustrating to have to spend the first 10 minutes of a 15-minute consultation debunking a hallucination that Gemini convinced the patient was a real symptom.

You know, you're no longer just diagnosing, you are negotiating with an AI's assumptions. The clinical encounter effectively becomes a tri-party negotiation. The doctor, the patient, and the algorithm. That sounds exhausting. It'll be an adjustment.

And that dynamic of AI autonomously generating hypotheses brings us to a major development on a laboratory side, actually. Oh, right, the spotlight feature. Yeah, while Consumer Tech is putting an AI health coach on the patient's wrist, the Nature Medicine spotlight feature details how AI is fundamentally altering the scientific method at the laboratory bench.

Yes, the AI co-scientist. I mean, we are moving so far past the idea of AI as a clever chatbot where you type a prompt and get an essay. Researchers are building multi-agent orchestrations. Let's unpack the mechanism of how a team of AI agents actually does science.

You really have to view it as a closed-loop virtual lab meeting, utilizing specialized roles. It begins with a generation agent. This component does nothing but ingest the vast, unreadable ocean of global scientific literature to propose novel biological hypotheses, then passes those ideas to a ranking agent.

So the generation agent just brainstorms wildly, and the ranking agent filters out the junk by scoring the ideas against the known laws of biology. Exactly, and then the surviving hypotheses are handed to a review agent. This agent is programmed to act as a relentless skeptic.

Oh, I love that. Its sole function is to aggressively probe the hypothesis for logical leaps, weak evidence, or methodological flaws. The critique is sent to an evolution agent, which refines the original idea based on that feedback.

And this cycle iterates autonomously at incredible speeds. It is literally replicating the rigorous friction of peer review in silicon. Exactly. And the concrete example Nature Medicine highlighted is staggering. They tasked one of these multi-agent co-scientists with repurposing existing drugs to treat liver fibrosis.

And it didn't just spit out generic ideas, it identified two epigenomic modifier candidates that actually demonstrated significant antifibrotic activity and promoted liver regeneration when tested in physical lab models.

Alongside identifying three previously known agents, which basically proved the system's baseline accuracy against established science. But, you know, the deductive reasoning of the language model is only half the equation here. Once the LLM multi-agent system identifies a biological target, it hands the baton to a structural physics model, like an alpha fold pipeline from isomorphic or iambic.

I want to make sure you all catch this because this handoff is the absolute magic trick of modern biomedical research. Explain how the language model connects to the physics model. So the language model reads the millions of textbooks and research papers to deduce what protein or receptor is causing the disease.

But language models cannot do math or physical chemistry. So it tells the structural AI, I think this specific receptor is the problem. Okay, here's the target. Yeah. Then the structural AI simulates the actual 3D shapes of chemical compounds, calculating the atomic physics to see if a drug molecule will physically lock into that receptor, basically like finding the exact 3D puzzle piece

to jam the gears of the disease. So the language model does the detective work and the structural model does the physics simulation to prove the drug will actually bind. That kind of autonomous pipeline radically accelerates the shots on goal for drug discovery.

But, and here's the pivot, it also introduces a massive new oversight burden far beyond the hospital floor. Because the multi-agent architecture powering these medical breakthroughs is the exact same architecture frontier AI labs are deploying into the broader global economy this week.

The convergence is happening immediately. I mean, Anthropics Code with Claude Developer Conference hits London tomorrow, and they are focusing heavily on the Claude platform, Claude Code, and agent orchestration. Right.

They're laying the tracks for general enterprise use of the exact same multi-agent frameworks that are currently designing liver drugs. Because the expectation going into London isn't some massive new scary frontier model. Building on their SF keynote, it's about managed agents and infrastructure.

They want these autonomous multi-agent teams running supply chains, drafting legal contracts, and managing corporate logistics. The focus is shifting from raw intelligence to autonomous execution. And when you deploy agents that can take independent actions across interconnected enterprise systems, the blast radius of a failure expands exponentially.

Yeah, it's not just a bad essay anymore. No, which is why regulators are suddenly stepping in with an entirely new vocabulary. And this is perhaps the most quietly terrifying item we are looking at today. Anthropic is actively briefing finance ministries and central banks on cyber risks found during a preview of their Claude mythos system.

And the regulators are treating these AI vulnerabilities as a macro-prudential risk. Yes, macro-prudential. Now, we usually only hear that word tossed around on financial news networks when they are talking about the housing market or interest rates.

Right, macro-prudential regulation is a framework designed to mitigate systemic risks to the entire financial system. It's the regulatory category reserved for major banking systems, clearinghouses, and global exchanges.

The too-big-to-fail category. Exactly. Institutions whose operations are so critical that if they seize up, the entire economy cascades into a recession. Applying that specific term to an AI software update is a massive escalation.

I mean, this builds on that recent Mozilla disclosure with those 271 patched Firefox vulnerabilities, but on a totally different scale. Let's play out the scenario. How exactly does a glitch in a language model crash an economy?

Show us the math of how a multi-agent system failure triggers a macro-prudential crisis. Okay, imagine an interconnected web of enterprise AI agents. You have agents managing the automated clearinghouse of a major international bank, and they are communicating directly with agents managing global shipping logistics and agents managing corporate payrolls.

Okay, they're all talking to each other. Right. Now, if a hallucination or a cyber vulnerability causes the banking agent to incorrectly perceive a liquidity threat, it might autonomously freeze international transfers to protect the bank's assets.

And because it's operating at machine speed, it locks up billions of dollars in milliseconds before a human operator even gets an alert on their dashboard. Precisely. And then the logistics agents see the frozen payments and automatically halt shipping containers at the ports.

The payroll agents register a failure to clear funds and suspend employee direct deposits globally. Oh, wow. You haven't just experienced a software glitch, you have inadvertently triggered a localized bank run and a supply chain freeze simultaneously, all driven by a cascading logic error between autonomous agents.

That is a macro-prudential threat. The vulnerability isn't just a localized bug on a single laptop, it is a systemic shockwave. Wow. To synthesize this incredibly wild journey we've been on today, we started on the hospital floor, where 60% of nurses actively distrust their tools because hospitals lack the digital plumbing to catch an AI degrading in real time.

And that operational friction led us to the desperate need for self-updating clinical trials and the realization that AI is being used both to spot 124 tiny anomalies on an X-ray in Singapore and to aggregate the whispers of a pandemic across Mozambique.

The sheer scale of application. Yeah. We saw the consumer side, where a Gemini coach pre-processes patient data and completely flips the power dynamic of the doctor's office. And finally, we saw how multi-agent lab systems are autonomously discovering liver regeneration drugs, utilizing the exact same underlying architecture that has central banks terrified of a sudden machine-speed economic collapse.

We are watching the messy, unvarnished reality of implementation. The guardrails of the laboratory have been removed and these systems are now active load-bearing pillars of our global infrastructure. Which leaves me with one final lingering thought for you to ponder on your own today.

If consumer tech like Gemini is going to start ingesting and synthesizing millions of live wearable data streams and medical records starting tomorrow, and at the exact same time, Nature Medicine's AI co-scientists are successfully generating and validating novel drug targets in the lab today, what happens the moment those two domains are allowed to talk to each other?

How long is it until the AI agent sitting quietly on your wrist detects a subtle negative shift in your biomarkers, automatically reaches out to a multi-agent laboratory system in the cloud, and begins autonomously designing and validating a custom clinical trial entirely for you?

Thanks for listening. Find us on YouTube and your favorite podcast app. See you tomorrow.