FDA Cleared These AI Tools — But Are They Proven? — May 22, 2026
Listen & watch
Show notes
FDA cleared does not mean it actually works. Today, the proof.
Run time: 16:28
In today's episode:
- AZmed's X-ray AI clears fractures, effusions, dislocations
- FDA clears first digital-pathology breast cancer risk tool
- Radiologist-plus-AI beats AI alone for clot detection
- Top journal: FDA cleared isn't FDA proven
- Ultromics raises 55 million for echo AI
- FDA now clears an AI device every 31 hours
- KPMG puts Claude in front of 276,000 staff
- Anthropic eyes 900-billion valuation, may pass OpenAI
- Microsoft's 100 agents find 16 Windows flaws
- xAI ships Grok Build, its first coding agent
TL;DR:
- A fresh wave of FDA clearances (AZmed's AZtrauma for fractures/effusions/dislocations; ArteraAI Breast) lands the same week a top journal warns that "cleared" is not "clinically proven" — the procurement question from yesterday's lung-CXR head-to-head is now everyone's problem.
- KPMG is rolling Claude out to 276,000+ employees as Anthropic courts a $30B raise at a $900B+ valuation that could top OpenAI for the first time.
- New peer-reviewed data shows radiologists plus AI beat AI alone at finding pulmonary embolism — the week's quiet through-line is collaboration and evidence, not replacement.
Sources cited:
- Diagnostic Imaging
- Artera
- AuntMinnie
- Annals of Internal Medicine
- Crescendo AI Healthcare News
- Innolitics 510(k) Year-in-Review
- Anthropic
- Bloomberg
- Microsoft Security Blog
- The Decoder
Subscribe: YouTube
medAI Times is for educational and informational purposes only. The content does not constitute medical advice, diagnosis, treatment recommendation, or professional clinical guidance. Consult qualified healthcare professionals and refer to official sources before making clinical, research, regulatory, or business decisions.
Transcript
Auto-generated from the episode audio. Click any timestamp to jump the player there.
FDA clear does not mean it actually works. Today, the proof. Welcome to MedAI Times podcast, your daily update on medical AI. Don't forget to like and subscribe. Right now, there is this massive, rapidly widening gap between what artificial intelligence is legally allowed to do and what it is actually proven to do.
Yeah, the gap is huge right now. It is. And as someone trying to cut through the relentless noise of the tech hype cycle, that gap is the exact signal you need to be tracking. So today, we are tearing into a stack of updates, spanning everything from life-saving medical AI making its way into your local emergency room to the astronomical, multibillion-dollar power grabs happening in the corporate enterprise.
OK. Let's unpack this. And we'll do it. AceMed's X-ray AI clears fractures, effusions, and dislocations. FDA clears first digital pathology breast cancer risk tool. Radiologist Plus AI beats AI alone for clot detection.
Right. Top Journal says FDA cleared isn't FDA proven. Ultramex raises $55 million for Echo AI. FDA now clears an AI device every 31 hours. That's crazy. KPMG puts Claude in front of 276,000 staff.
Anthropic Eye's $900 billion valuation may pass OpenAI. Wow. Microsoft's 100 agents find 16 Windows flaws. And XAI ships Grokbuild, its first coding agent. I mean, it is a staggering list, right?
Hearing it all at once, it sounds like a dispatch from 10 years in the future. It really does. But those first few items, they aren't happening in a research lab. They are actively hitting the hospital floor as we speak. The sheer capability of what is rolling out is breathtaking.
But we really need to examine the context of how it actually gets into the hands of a doctor. Right. Take AceMed, for example. They just won their third FDA clearance for their AIDS trauma suite. Yeah, their third one. Which is wild.
This is an AI that looks at a single X-ray for anyone over the age of two. And it hunts for three different trauma pathologies simultaneously. The performance numbers they're reporting are just, well, they're wild.
Extremely high, yeah. Roughly a 98% accuracy score for fractures, 97.5% for joint effusions, and 95.6% for dislocations. Which is incredible. So put this in perspective. Think of this like having a tireless genius super resident just hovering over the shoulder of an emergency room doctor at 3 in the morning.
That is the perfect analogy. Right. Because they are catching those incredibly subtle joint effusions, which is basically microscopic fluid building up around a joint. They are notoriously easy for a tired human eye to miss.
Oh, absolutely. And if you miss them, it almost always guarantees the patient will be returning to the hospital in extreme pain a few days later. Exactly. So that super resident analogy is the perfect lens for this. Because human fatigue is the exact variable this AI solves for.
But let's look at Artera AI breast, which was cleared on May 6. OK, yeah. Because this pulls AI out of the fast-paced emergency room and drops it into the highly complex, meticulous world of oncology. Right.
A totally different environment. Completely different. This is the very first digital pathology AI risk tool specifically built for breast cancer. Here is the workflow. It reads digitized H&E slides. And those are the standard tissue samples pathologists have used for decades, right?
Right, exactly. And it fuses that visual data with the patient's clinical chart to score the risk of distant metastasis in early stage breast cancer. Specifically, hormone receptor positive, HER2 negative, invasive breast cancer.
Hold on, let me make sure I'm getting this. Hormone receptor positive, HER2 negative. What does that specific classification actually mean for the patient? Or even the AI's task? Well, it is a very common type of tumor. But predicting if it will eventually spread to other organs is incredibly tricky.
Usually, figuring out the treatment path is somewhat of a guessing game unless the oncologist orders a highly expensive, time-consuming molecular assay. Right, where they have to physically slice the tissue and ship it out. Yeah, ship it to a specialized lab, and you lose precious weeks waiting for the results.
So instead of slicing the tissue and making the patient wait two agonizing weeks, the AI just looks at the digital slide, cross-references their clinical chart, and gives an immediate prediction. Exactly. It's a multimodal model trained on over 8,500 trial patients.
Yeah, so they get same-day prognostic results right inside the standard pathology workflow. It allows the oncologist to confidently dial the chemotherapy intensity up or down almost immediately. Wait, if the AI is giving us instant oncology results and catching 98% of fractures, I have to ask an uncomfortable question.
I think I know what you're going to ask. Why do we even need the human doctor in the loop anymore? I mean, if the machine is that good, doesn't the radiologist just become a bottleneck? Well, a lot of tech evangelists are aggressively pushing that exact narrative.
But the peer-reviewed data tells a very different story. How so? Look at the study published May 13th in the journal Radiology Artificial Intelligence. They examined pulmonary embolism detection. Blood clots in the lungs.
Right, which are highly fatal if missed. They tested an FDA-cleared CT pulmonary angiography tool. And what's fascinating here is that when you pair that AI tool with a radiologist, they reached a 99.2% sensitivity rate.
Wow, 99.2%. Exactly. That collaborative human-in-the-loop configuration definitively beat the AI operating autonomously. So the AI by itself was actually less effective than the AI working alongside the human.
How does the human actually improve the machine's score? Because they fail in fundamentally different ways. I mean, the AI is brilliant at rapid pattern recognition and flagging anomalies across thousands of pixels instantly.
But the human radiologist understands clinical context, anatomical variations, and the weird, unquantifiable artifacts from the scanning machine. Things that can easily trick an algorithm into seeing a clot that isn't there.
Precisely. The human catches the AI's blind spots, and the AI catches the human's fatigue. This data proves that the safest deployment in medicine is collaboration, not autonomous replacement. That makes total sense. Synergy over autonomy.
But, you know, this brings up another huge issue. If collaboration is the gold standard, how are so many of these fully autonomous or loosely regulated tools hitting the market in the first place? That is the million-dollar question. Or multi-million. I mean, investors are pouring unbelievable amounts of money into this space. Look at Ultramomix.
They just raised a $55 million Series C to scale their AI echocardiography platform. Yeah, the funding is massive. And it's designed to catch a very specific type of heart failure called HFPEF, long before severe symptoms show.
Right, and medically, that is a brilliant application. HFPEF stands for heart failure with preserved ejection fraction. The heart pumps normally, but the muscle itself is stiff and doesn't relax properly to fill with blood. So it's an incredibly subtle mechanical failure.
Very subtle. It is routinely missed on a standard echo scan until it's too late. Right, the clinical intent is fantastic, but the sheer volume of these stoles is dizzying. I mean, according to analytics data, the FDA is now clearing an AI or machine learning device roughly every 31 hours.
It's relentless. They cleared 24 in March, 27 in April, pushing the cumulative total past 1,400 devices. Are we supposed to believe all 1,400 of these are as rigorously tested as that pulmonary embolism tool? Definitely not, no.
And if we connect this to the bigger picture, you have to read the recent piece in the Annals of Internal Medicine by Rosen and Mandel. They're sounding a massive alarm, right? Massive alarm. The FDA, under Commissioner Makary, is actively signaling a policy shift toward post-market monitoring rather than intense pre-market scrutiny.
Wait, so they're checking them after they're already being used on patients? Basically. Yeah. What Rosen and Mandel are arguing is that FDA cleared absolutely does not mean clinically proven or superior. It is simply a permission slip to sell the software.
So it's legally compliant, but clinically it could be terrible. Yes. Think back to a finding from radiology we've discussed previously. They tested seven different FDA-cleared chest X-ray tools for detecting lung cancer nodules.
Right, I remember this. All seven had the exact same regulatory stamp of approval from the government. But when tested in the real world, their actual sensitivity varied wildly. From 78% all the way down to a terrifying 21%.
Right, exactly. So if a hospital buys the 21% tool because it's slightly cheaper, they're totally legally compliant, but they are fundamentally failing their patients. The volume of new software is completely outrunning the generation of comparative evidence.
Here's where it gets really interesting. Because while this somewhat chaotic wave of specialized AI is flooding hospitals, massive generalized AI is simultaneously invading the corporate enterprise. Oh, absolutely.
And it is doing it at a scale that requires funding so astronomical it almost doesn't sound real. Well, the transition from narrow medical AI to enterprise AI requires massive cognitive infrastructure.
We are moving from tools that analyze one specific type of X-ray to models capable of reasoning through entire corporate ecosystems. Just look at the footprint. KPRS-MG just announced a global alliance to embed Anthropx Claude into their operations.
They are putting this AI in front of over 276,000 employees. Targeting their tax and private equity clients first, which is a massive data load. Huge. If you work at a Fortune 500 company, this is the infrastructure being actively laid under your desk right now.
276,000 people suddenly armed with an AI assistant that can read thousands of pages of tax law, reason through it, and draft strategy. And KPMG is the third massive enterprise anchor to do this in just two weeks, following PWC and SAP.
It's moving so fast. It is. You have to look at the financial backdrop required to support an infrastructure rollout of that magnitude. Serving that kind of raw compute power to hundreds of thousands of enterprise workers simultaneously requires data centers, energy, and silicon on a planetary scale.
It's not just software anymore. No. They're trying to build the cognitive utility grid for the entire corporate world. That is why we are seeing reports that Anthropx is in talks to raise an additional $30 billion. $30 billion, with a B.
At a valuation of over $900 billion. That is just, it's hard to even conceptualize. If this closes by the end of May, it more than doubles their valuation from earlier this year. Crucially, it could push Anthropx past open AI in valuation for the first time.
The enterprise land grab we are seeing with KPMG and PWC is the direct financial justification for these numbers. But these models aren't just summarizing tax documents and writing polite emails anymore.
That's the generative phase. Right. That's already old news. Exactly. We are now officially entering the agentic phase where the models take autonomous action. Microsoft just unveiled something called MDAH.
This is a multi-model agentic scanning harness. Basically, they took over 100 specialized AI agents, gave them a goal, and let them loose against the Windows operating system. And the results were intense. They ended up finding 16 new vulnerabilities, including four critical remote code executions.
For context, a remote code execution is the holy grail for a hacker. Really? Oh, yeah. It means they can take over your system from across the world without you interacting with anything at all. No clicking a bad link, nothing. Wait, back up. A remote code execution means total control.
How did an AI system figure out how to do that completely autonomously? Through real-time adversarial learning. It's like locking 100 rival locksmiths in a room, handing them the blueprint to a bank vault and letting them argue.
Okay, I can picture that. So one AI agent acts as the attacker, writing a malicious script. A second agent acts as the critic, pointing out exactly why the operating system will block it, which forces the first agent to rewrite the approach.
Ah, I see. Then a third agent tests the new code. They methodically test, fail, rewrite, and execute until the digital door finally pops open. Okay, I get that Microsoft is using this to find flaws before the bad guys do.
Right, it's defensive. But aren't we just building a digital lockpick? I mean, if we are actively building agentic systems that are this incredibly good at executing complex cyber attacks, aren't we just one bad actor away from the ultimate cyber weapon?
It is a completely valid concern regarding the dual-use nature of agentic AI. What Microsoft's MDAH proves is that vulnerability discovery is officially moving out of academic research and into production-level defense.
Right. But the mechanics of production-level defense are identical to production-level offense. The AI doesn't know if it's working for Microsoft or a rogue state. It just knows how to break the code. And the tools to build these agents are becoming widely available.
Just look at XAI's shipping Grok build. Yeah, that just robbed. This is their first terminal-based coding agent. So instead of a friendly chat window in your browser, it lives right in the command line interface. The developer terminal is absolutely the ultimate battleground for agentic AI right now.
Because it's where the actual work happens. Exactly. Cloud Code is there, OpenAI Codex, Cursor. XAI is doing a fast follow to get into this space because the terminal gives the AI the deepest possible access to the machine's execution environment.
So how does that actually change the daily workflow for a software engineer? Well, it fundamentally shifts them from a writer of code to a manager of agents. In a chat interface, the AI suggests code and the human copies, pastes, tests, and fixes the errors.
Right, it's very manual. But in the terminal, the AI is living in the engine room of the computer. It can write the code, compile it, run the test suite, read the error logs when it fails, and debug itself autonomously without the human developer ever lifting a finger.
That's incredible. As this field consolidates, the pressure to make these agents cheaper, faster, and much more deeply autonomous is only gonna accelerate. So what does this all mean? We have these brilliant autonomous coding agents mapping out critical software flaws.
We have multi-billion dollar foundational models reading complex tax law for a quarter of a million corporate employees. And we have medical AIs diagnosing breast cancer and broken bones. It's a lot to process.
It is. But when we bring this all back to the real world, like to the hospital administrator who has to actually buy that X-ray AI to treat you or me, how do we reconcile this cutting edge brilliance with the reality of healthcare procurement? Well, it all comes back to understanding the FDA 510K clearance pathway.
When a manufacturer creates a new medical AI, they rarely go through a rigorous multi-year clinical trial to prove it actually saves lives. Instead, they use the 510K pathway, which simply requires them to prove substantial equivalence.
Substantial equivalence, meaning they just have to prove their brand new AI is basically similar to an older device that the FDA already cleared years ago. Precisely. To really understand the danger here, think of the 510K pathway like making a photocopy of a document.
The first copy looks pretty much like the original, but then you make a photocopy of the copy and then a photocopy of that copy. You do this over years and years. I see where this is going. Yeah, eventually the current page looks absolutely nothing like the original document, but legally the FDA still considers it the same picture.
Wow. So you get this long, backward-looking chain of predicates, stretching back years, where the bar is just similarity, not clinical superiority on the actual patient population walking into a hospital today. Because this clinical testing requirement is often so minimal, two devices cleared for the exact same task can perform wildly differently in the real world.
Right, like those long-nodule AIs we talked about. Exactly. Relying on the FDA clearance badge is simply no longer enough. The real test is head-to-head, site-specific validation. So if FDA clearance doesn't mean proven, how do you navigate this?
Should we build a one-page procurement scorecard tracking sensitivity, specificity, and predicate lineage so a hospital can actually compare two FDA-cleared AIs head-to-head? Thanks for listening. Find us on YouTube and your favorite podcast app.
See you tomorrow.