The Week AI Came for the Clinic | Weekly Recap May 2-8 — May 9, 2026
Listen & watch
Show notes
The week AI came for the clinic. OpenAI just made ChatGPT free for every U.S. doctor.
Run time: 11:39
In today's episode:
- OpenAI ChatGPT for Clinicians free for every U.S. physician
- Harvard study: o1 outscores two ER attendings on diagnosis
- FDA opens live data feed inside two cancer trials
- OBSCORE risk model beats BMI across 18 obesity conditions
- Würzburg study: patients underreport symptoms to AI
- Topol calls out the medical-AI implementation paradox
- NEJM Catalyst frames 2026 as the proof-or-pull-back year
- npj review finds bias in AI medical-education images
- EU AI Act medical high-risk obligations slip toward 2027
TL;DR:
- OpenAI launched ChatGPT for Clinicians on May 5: free for verified U.S. physicians, NPs, PAs, and pharmacists; physician advisors rated 99.6% of nearly 7,000 test conversations safe and accurate; HealthBench Professional benchmark released alongside.
- Harvard / Beth Israel published in Science on May 3: OpenAI's o1 hit 67% triage accuracy on 76 ER patients vs 55% and 50% for two internal-medicine attendings.
- FDA opened its first real-time live data feed into two AstraZeneca and Amgen cancer trials via Paradigm Health on April 28; AI Chief Walsh projects 20 to 40% trial-time savings if expanded.
Sources cited:
- OpenAI announcement
- TechCrunch summary
- STAT News
- Nature Medicine
- Nature Health
- Ground Truths Substack
- NEJM Catalyst
- npj Digital Medicine
- Petrie-Flom Center
Subscribe: YouTube
medAI Times is for educational and informational purposes only. The content does not constitute medical advice, diagnosis, treatment recommendation, or professional clinical guidance. Consult qualified healthcare professionals and refer to official sources before making clinical, research, regulatory, or business decisions.
Transcript
Auto-generated from the episode audio. Click any timestamp to jump the player there.
the week AI came for the clinic. OpenAI just made chat GPT free for every U.S. doctor. Welcome to MedAI Times podcast, your daily update on medical AI. Don't forget to like and subscribe. Welcome to this deep dive into the week's most critical sources. So we're tracking the week of May 2nd through May 8th, 2026. And honestly, it was a heavy week for medical
AI. Oh, absolutely massive. Massive, right? I mean, just looking at the framing here, OpenAI made chat GPT free for every U.S. doctor. The FDA turned on this like live data feed inside two cancer trials and a Harvard study showed OpenAI's O1 model actually beating ER attendings on diagnosis.
Yeah, the ground is completely shifting under our feet right now. Exactly. So the mission for our deep dive today is to really cut through this massive influx of headlines. We want to sort the Silicon Valley hype from the clinical reality and figure out what all this actually means for you and the future of patient care.
Okay, let's unpack this. Let's do it because I think the main story we have to start with is that May 5th launch of chat GPT for clinicians. Yeah, so this isn't some small pilot. OpenAI made the platform totally free for verified U.S. physicians, nurse practitioners, PAs and pharmacists.
Right, which is a huge demographic. It's huge. And they built in trusted clinical search with citations, prebuilt documentation skills, deep research, and they're even giving out free CME credit on eligible questions.
Which is such a smart hook. It really is. Plus you've already got Memorial Sloan Kettering, Boston Children's and Advent Health on the broader platform. But I have to push back a little here. Okay, go ahead. Like making it free for verified clinicians, it kind of feels like a VIP fast pass at a theme park, but is it really just a Trojan horse? Like are they just legally farming the best possible physician graded training data?
Oh, well, I mean, that is exactly the point. Free access for verified clinicians is the ultimate distribution mode. By gating access with an NPI lookup and a licensed attestation, OpenAI gets this incredibly high trust user base. So they are generating massive amounts of physician graded data every day.
And doing it safely, apparently. Because running an LLM in a clinic usually raises huge red flags. Yeah, the safety data they brought to the table is what makes this viable. Their physician advisors ran, I think it was nearly 7,000 real clinical conversations through it.
Wow, 7,000. Yeah. And they rated 99.6% of those responses as safe and accurate. But the bigger standard they set was releasing the health bench professional benchmark alongside the launch.
Because that becomes the proxy, right? Exactly. That becomes the proxy that purchasers are going to demand. If you're a hospital CIO, you're going to use that benchmark. So competitors like Anthropic and Google, they're basically being forced to match it within the next 6 to 12 months.
Which is wild. And, you know, getting back to actual clinical performance, we just got proof of why doctors might actually want this kind of reasoning power in their pocket. Right. The ER study. What's fascinating here is that this is the first published head to head where a frontier reasoning model actually crosses the clinician baseline in a real ER cohort.
Yeah, this was the May 3rd study in science from Harvard and Beth Israel Deaconess Medical Center led by Arjun Manrai and Adam Rodman. They looked at 76 ER patients. And this was just based on text only case summaries, right?
Yes, text only. And the numbers are just shocking. Open AIs 01 hit a 67% triage accuracy. And the human attendings? Two internal medicine attendings looked at the exact same cases. They hit 55% and 50%.
I mean, 01 beat them by a mile. Even though it was small in and text only, you pair that capability with the free clinician rollout we just talked about, and it fundamentally shifts the medical landscape. It really does. And it's not just individual patient diagnosis being revolutionized by data either.
It's the entire regulatory pipeline for new drugs. Oh, right. The FDA announcement. So Commissioner Marty Macri and their AI chief, Jeremy Walsh, just announced the first real-time live data feed into two cancer trials.
One with AstraZeneca and one with Amgen. Exactly. So instead of waiting years for a massive PDF, they're streaming predefined endpoint and signal data straight from the sponsor trials into this cloud platform built by Paradigm Health and the FDA monitors it live.
Right. Which is a massive paradigm shift, no pun intended. But I've got to ask, is plugging the FDA directly into the live matrix of raw trial data actually risky? I mean, what happens if the FDA sees early noise before the data is finalized?
Do they just panic and shut things down? Well, the key phrase there is predefined endpoint and signal data. They aren't just staring at raw, unfiltered noise. This live agency telemetry could actually compress oncology label timelines massively whilst projecting 20 or 40% reductions in overall trial time.
That's years of waiting just erased for cancer patients. Exactly. And the FDA wants to scale this up. They have public RFI responses due on May 29th. So they're moving fast. And you know, just as trial data is becoming more precise in real time, our definitions of chronic disease are shifting too.
Yeah. We're finally moving away from blunt metrics to high resolution AI data, which brings us to the OBSCOR study in Nature Medicine. This one was fascinating. It really was. They trained it on about 200,000 UK biobank participants.
And instead of just looking at BMI, it uses 20 clinical features like waist-to-height ratio, HbA1c, lipids. And it uses those to predict what, a 10-year incidence? Exactly. A 10-year incidence across 18 different obesity-related conditions.
But here is the real aha detail for me. About 40% of the model's highest risk cohort were merely overweight by traditional BMI. Wait, really? 40%? Yeah. They weren't even considered clinically obese by the old standards.
That is wild. And I mean, the broader implications of that are huge. This completely resets the eligibility logic for preventative programs or, you know, GLP-1 prescribing. Because right now, everything is anchored on those outdated BMI cutoffs.
Right. But, you know, all these brilliant diagnostic and predictive models rely on one crucial thing. Accurate input from human patients. Exactly. But humans don't talk to AI the way they talk to doctors.
No, they really don't. And that Bertsberg charity in Cambridge study in nature health proved it. They looked at 500 patients. It was a pre-registered study too, which adds to the weight. What were the numbers?
So when patients wrote symptom reports to an AI, they averaged about 228 characters. But they gave over 255 characters to a human physician. So they're self-truncating. Yeah.
It's like leaving a short, awkward voicemail versus having a real conversation. People just get weird when they know it's a bot. And that exposes a really critical flaw in these front door symptom checkers. Downstream AI is getting worse.
Less nuanced data than a human doctor would get. Which means the output is worse. Right. It leads to measurably worse triage suitability. The model can't evaluate symptoms that the patient refuses to type out. And honestly, this patient hesitation connects perfectly to the institutional hesitation we're seeing.
Oh, you mean Eric Topol's piece? Yeah, his piece on the ground truth substack about the paradox of medical AI implementation. Because healthcare systems themselves are struggling to adopt AI that actually works. Despite all this massive LLM hype we just talked about, Topol points out that we have decades old, rigorously proven image AI for mammography, CT, retinal scans, colonoscopy,
and it still isn't reaching patients. This raises an important question, though. Because we have something like 295 FDA-cleared AI and ML devices, but clearance is not adoption. And adoption is not outcomes.
Topol is calling for prospective, randomized, independently adjudicated outcome studies. Which is exactly what health system executives are now demanding, too. Look at the May commentary in NDJM Catalyst. The Tire to the Height. Totally.
They literally framed 2026 as the proof for pullback year. They pointed out that ambient scribes and triage chatbots are showing measurable productivity gains. Sure, because those save time. Right. But diagnostic AI lacks robust outcomes evidence outside of, like, isolated academic centers.
And this isn't just theory. This is the literal operational rubric that health system CIOs are using right now. If a project doesn't have proof, it's not getting funding through FY27. Proof is everything now. But before these tools even reach the hospital CIO, they're being used to train the next generation of doctors.
And that is where a massive hidden risk lies. The medical education bias study. Yes. The systematic review in Pichet Digital Medicine. It was a Prisma-guided synthesis of 36 empirical studies.
And the findings are just disturbing. Yeah. The AI-generated medical images. Right. They systematically underrepresent darker skin tones, female anatomy, and rare disease representations. And even worse, they render pathogmonic findings inaccurately enough to actually mislead trainees.
Which is incredibly dangerous. A pathogmonic finding is the definitive visual sign of a disease. If the AI hallucinates that, the student learns it wrong. Exactly. So my pushback here is, if medical school deans and USMLE prep vendors are using these generation pipelines to save money, are we basically just automating and hard-coding historical biases directly into the brains of young doctors?
Yeah, we absolutely are. Which is why this peer-reviewed evidence is so important. It finally gives medical educators a concrete mandate. They have to add strict human curation gates before using gen-AI illustrations.
They can't just blindly trust the output. No, they can't. But while educators are trying to build their own curation gates, formal government guardrails are slipping further down the calendar. Oh, the EU AI Act delays. Yeah. The Harvard Petrie-Flom Center just detailed this.
The medical high-risk obligations are likely slipping. So what are the new dates? Stand-alone high-risk AI obligations are pushed to December 2027. And AI-embedded regulated products are pushed to August 2028.
Plus, devices on the market before August 2, 2026 keep their grandfathered status. Okay, so they just get a pass. Right. If we connect this to the bigger picture, it creates a massive loophole and a strategic trap.
MedVLM and software-as-a-medical device vendors get another year of runway. But wait, don't they lose that grandfathered status if they change the software? Yes. If there is a significant design change, they lose the status.
And that will completely dominate software update planning. Because why would you update your software to fix a bug if it means triggering years of new regulatory hurdles? Exactly. It's potentially freezing innovation just to avoid the paperwork.
Wow. So, looking at everything we've unpacked today, we are seeing these massive shifts across the board. You have the rapid, completely free deployment of clinician AI happening at the exact same time that regulatory and implementation realities are just severely lagging behind.
It's a huge collision. It really is. Which leaves you with a final question to mull over this week. Given that OpenAI's new health bench professional is now the proxy standard grading these clinical LLMs, how do we ensure the grading rubric itself isn't baking in the very biases and blind spots we just discussed, effectively standardizing a flawed medical consensus?
Free is the new moat in medical AI. And the proof era starts now. Thanks for listening. Find us on YouTube and your favorite podcast app. See you tomorrow.