·21:22

Anthropic + Gates Aim Claude at HPV and Preeclampsia — May 15, 2026

Show notes

Anthropic just put two hundred million dollars into AI for diseases pharma forgot.

Run time: 21:22

In today's episode:

  1. Anthropic-Gates put $200M into Claude for neglected diseases
  2. Frontiers in Science manifesto on adaptive surgical robots
  3. Nature Medicine asks is AI actually improving care
  4. Companion Nature Medicine piece on meaningful AI evaluation
  5. npj Digital Medicine: RL boosts LLM radiology accuracy
  6. medRxiv preprint on AI literacy micro-credentials
  7. Claude for Small Business launches with fifteen workflows
  8. PwC adopts Claude for enterprise deal execution

TL;DR:

  • Anthropic and the Gates Foundation announced a four-year, $200M partnership on May 14 — the headline medical pillar is Claude usage credits for research centers hunting drug candidates in HPV and preeclampsia, two indications big pharma has long skipped.
  • Nature Medicine's May issue keeps hammering the same theme it opened the year with: "Is AI actually improving healthcare?" and a companion methodology piece argue static benchmarks are misleading and most deployed medical AI still lacks evidence of attributable clinical benefit.
  • A Frontiers in Science consensus piece from Dasgupta and Granados (King's College London) calls for an entirely new regulatory pathway for adaptive surgical robots that "continue to learn and change after approval" — a direct shot at how FDA and notified bodies currently classify SaMD.

Sources cited:

Subscribe: YouTube

medAI Times is for educational and informational purposes only. The content does not constitute medical advice, diagnosis, treatment recommendation, or professional clinical guidance. Consult qualified healthcare professionals and refer to official sources before making clinical, research, regulatory, or business decisions.

Transcript

Auto-generated from the episode audio. Click any timestamp to jump the player there.

Anthropic just put $200 million into AI for diseases. Pharma forgot. Welcome to MedAI Times podcast, your daily update on medical AI. Don't forget to like and subscribe. You know, usually when we talk about a medical diagnosis, there's this expectation of mechanical precision, right?

Like you break your arm, the X-ray shows that jagged white line, and the physician just points and says, you know, there it is, broken. Right, yeah. We want a binary answer. I mean, we really prefer our biology to be visible, neatly categorized into an actionable box.

But you step into the world of artificial intelligence and health care, and suddenly that X-ray machine is kind of broken. Oh, absolutely. We're looking at a diagnostic and, frankly, a regulatory landscape that is incredibly murky.

So we're going to look at the map of how the industry is trying to navigate that right now. It's a lot to unpack. It is. So let me just run through the quickfire headlines from our stack of courses real quick to set the parameters for you. First, Anthropic and the Gates Foundation put $200 million into CLAWD for neglected diseases.

A huge deal. Right. Then, a Frontiers in Science manifesto on adaptive surgical robots. Nature Medicine asks if AI is actually improving care. A companion nature medicine piece on meaningful AI evaluation.

Yeah, those two go hand in hand. Totally. Then, NPGA Digital Medicine shows reinforcement learning boosts LLM radiology accuracy. A Med Expo preprint on AI literacy microcredentials.

Anthropic launches CLAWD for small business with 15 workflows. And finally, PwC adopts CLAWD for enterprise deal execution. I mean, it is a massive cross-section of data. It stretches from the deepest corners of molecular biology all the way to multinational corporate restructuring.

Yeah, it really does. So let's start with the resources making this all possible. Because the capital flow here is highly unusual. Very unusual, yeah. Usually when we hear about billion dollar AI investments in the medical space, the money is sprinting toward those blockbuster high ROI areas.

You know, oncology, obesity therapeutics. The areas where pharmaceutical conglomerates have a guaranteed massive return on their R&D spend. Exactly. But this first move is disruptive. Anthropic and the Gates Foundation just formalized this four year $200 million partnership.

Anthropic is dedicating like heavy usage credits for their CLAWD model and embedded technical staff. And the Gates Foundation brings the operational framework and the grant funding. And critically, they are aiming this specifically at drug discovery for HPV and preeclampsia.

Which to me is like taking a cutting edge super telescope, pointing it away from the most popular galaxies and focusing it on the dark corners everyone else ignores. That's a great way to put it. Clinically speaking, those are notoriously underprioritized indications.

I mean, preeclampsia for instance, disproportionately affects pregnant women, particularly in lower resource settings. And testing novel compounds on pregnant populations carry staggering liability risks. Oh, I bet. So traditional venture capital runs in the exact opposite direction.

Yeah. The profit margins simply don't justify the legal exposure when you compare it to say developing another chronic lifestyle drug for a wealthy nation. Sure. But does throwing a massive language model at a biological problem actually bypass that bottleneck?

I mean, computing power isn't a wet lab. No, of course not. Yeah. You can't just ask an AI to hallucinate a safe drug and completely skip the clinical trials. Right, you definitely can't skip the trials. But what you do is completely alter the geometry of the funnel leading up to them.

Okay, how so? Well, before you ever get to a physical wet lab, researchers have to sift through millions of molecular structures and biological pathways just to find a viable candidate. That sounds incredibly expensive.

It is. It's the most expensive stage. So by routing frontier level reasoning engines at these specific neglected diseases, the Gates Foundation is essentially subsidizing that massive computational cost.

You know, screening chemical libraries, predicting protein binding affinities. Oh, I see. So they are absorbing the upfront risk that big pharma refuses to take. Exactly, they're setting a new philanthropic template.

That's fascinating. So they are democratizing that early stage, but I guess that assumes the models are actually highly capable and reliable. Well, yes, and that is the billion dollar question. Right, because money and compute aren't magic, which brings us to a massive friction point in the data.

Our sources include a pair of pieces from the journal Nature Medicine that kind of throw a serious bucket of cold water on the whole ecosystem. They really do. The lead editorial by Goldenberg and Wiens asks bluntly, is AI actually improving healthcare?

Their conclusion is surprisingly sobering. I mean, they argue we simply don't have the methodological framework to know for sure. Which is wild to hear. I found their argument fascinating because they point out a major confounding variable. You know, we constantly see press releases claiming a hospital deployed an AI tool and patient outcomes improved by some impressive percentage.

Right, the victory laps. Yeah, but Goldenberg and Wiens argue that the evaluation frameworks rarely isolate the AI's actual direct contribution. Exactly. This is the fundamental difference between a technological intervention and a workflow intervention.

Break that down for me. So let's say a hospital deploys a predictive AI for sepsis. Suddenly, mortality rates drop. The vendor claims a massive victory. But did the patient survive because the AI identified a subtle biomarker, or?

Did they survive because the hospital administrators basically mandated strict new hourly vital sign checks just to support the new software? Maybe they hired a dedicated floor nurse just to monitor the AI dashboard.

So the AI is just acting as a very expensive catalyst for better management. It basically just forces the humans to pay attention. Yeah. The field lacks rigorous ablation studies to separate the software's efficacy from the massive operational overhaul that almost always accompanies it.

Okay, that makes sense. But the companion piece in nature medicine goes further into the technical wheeze, right? It exposes a vulnerability that, frankly, should terrify any hospital procurement board. Syntactic fragility, yes.

Stop me in my tracks. The piece notes that if you test a large language model's diagnostic accuracy on a medical exam, its performance can swing from 25% all the way up to 98%. Just massive variance. And that variance has nothing to do with the complexity of the medical case.

It relies almost entirely on the prompt wording and the structural format you use to ask the question. It exposes the illusion of reasoning, you know? We assume the AI, quote-unquote, understands medicine.

But it is highly sensitive to the grammatical wrapper of the question. Wait, a 25% to 98% swing? That is catastrophic in a clinical setting. It's completely unacceptable. It sounds like having a genius medical student who completely fails the board exam if you just use the wrong font.

How can hospitals possibly buy these tools with confidence if the system is that temperamental? Well, they shouldn't, not based on current metrics. I mean, if a doctor types a query using shorthand or abbreviations instead of clean academic prose, the AI might misdiagnose the patient simply because it didn't like the syntax.

That's terrifying. It is. And this is exactly the core argument that nature medicine authors are making against fixed test sets, you know, static benchmarks. Vendors currently sell their AI by pointing to a 95% score on a standardized data set.

But that data set is a sterile, pristine environment. It doesn't reflect reality. Exactly. So the authors provide a methodological blueprint for hospitals to push back during procurement. They need to demand what they call clinical context shift testing.

Meaning you test the AI with the messy reality of like a Tuesday afternoon shift. Exactly. You test it with a rushed community clinic's poorly scanned PDF. Or with shorthand notes from an exhausted resident who hasn't slept in 20 hours.

Yeah, or external lab results formatted in some ancient legacy software system. Right. Hospitals need to demand a degradation curve. They should be asking the vendor, show us mathematically how fast your model's accuracy collapses when the prompt format deviates from your perfect training data.

So if we can't trust the vendor's sterile benchmarks and we know the prompts are volatile, hospitals are left in a dangerous spot. Very dangerous. They either fly blind or they have to build the radar themselves. And based on our next source from MPJ Digital Medicine, building it themselves is actually becoming the superior option.

Yeah, the Wayne Flanders paper. This is a highly technical study, but it outlines a massive shift in power dynamics back toward local hospital IT teams. Right. They demonstrated that hospitals can fix this accuracy problem by training smaller models locally.

Specifically using a technique called reinforcement learning from verifiable rewards or RLVR. Yes, RLVR. And the study showed it significantly outperforms standard supervised fine-tuning when classifying diseases from radiology reports.

But help me translate the mechanics here. Why is RLVR so much better? We can actually look at it through the lens of human training. So standard supervised fine-tuning is kind of like an actor memorizing a medical script for a television show.

Okay, I like that. The actor studies thousands of human radiology reports and learns to predict the next word so perfectly that their output sounds incredibly professional and eloquent, but they don't actually know any medicine. Right, they're just acting.

Exactly. If they hallucinate a tumor, the model's internal loss function doesn't actually care as long as the sentence structure mimics a real doctor. Wow. So it's optimizing for fluency, not factual reality. Spot on.

But reinforcement learning from verifiable rewards, RLVR, changes that fundamental incentive structure. Oh, so. It's akin to a medical student being graded strictly on the final outcome. The AI only receives its mathematical reward if its final disease classification matches the verified ground truth of the patient.

Oh, I see. Yeah, it forces the model to optimize for logical accuracy, completely ignoring whether the prose sounds pretty. And the broader implication here is that a hospital doesn't need to sign a $100 million contract with a frontier AI vendor just to get top-tier results.

Exactly. A small, local informatics team can take an open-weights model, apply this RLVR technique to their own specific archive of patient reports, and suddenly they have a tool that inherently understands their local physician's shorthand.

It bypasses the massive tech conglomerates entirely, which is huge. But it does introduce a secondary bottleneck. What's that? You can tune the algorithm perfectly, right? But it still outputs text onto a screen that a human being has to read.

And the human in the loop is the most unpredictable variable in any deployment. Oh, absolutely. Which brings us right to the human training aspect. Our next source is a medial preprint proposing stackable, role-specific AI literacy micro-credentials for healthcare staff.

And the context for this proposal is rooted in a massive psychological vulnerability. Right, we have to talk about automation bias here. Yes, we have to reference a recent New England Journal of Medicine study on automation bias to really understand why these micro-credentials are being pitched in the first place.

And just for clarity, automation bias is that human tendency to blindly trust the machine, like the GPS effect, you know, where someone drives straight into a lake because the navigation app confidently told them to keep going straight.

Exactly that. And in the medical equivalent, the NEJM study found that even when physicians were provided with standard AI literacy training, they still fell victim to it. Wait, really? Even with training? Yes. When an AI generated a highly confident, but factually flawed medical recommendation, these trained human doctors frequently deferred to the machine.

They basically abandoned their own clinical judgment. Because the text the LLM generates is so fluid and authoritative, it acts as a bypass valve for our natural skepticism. So how do these micro-credentials attempt to fix a psychological bias that standard training couldn't fix?

Well, the preprint suggests a very pragmatic, targeted approach. Instead of sitting a nurse down for a grueling 20-hour theoretical course on neural network architecture, which, you know, is totally disconnected from their daily reality.

Right, they don't have time for that. Exactly. Instead, you package the training into short, role-specific modules, like getting a driver's license for AI in bite-sized chunks. But wait, let me push back on that. 15 minutes of training for an exhausted nurse working a 12-hour shift.

That sounds like a Band-Aid for a bullet wound. Is a quick micro-credential actually going to stop a doctor from trusting a hallucinating machine in a life-or-death moment at 2 in the morning? Look, it's a very fair criticism. And the authors of the preprint admit the evidence base proving these micro-credentials prevent automation bias is still quite thin.

OK. However, the alternative is alert fatigue. If you mandate massive generalized training blocks, clinicians just click through them as fast as possible to get back to their patients. Yeah, that makes sense. Everyone hates those mandatory compliance videos.

But by delivering a targeted module on, say, how to spot LLM hallucinations in a cardiology discharge summary, you are teaching highly specific skepticism. You aren't teaching them how the engine works.

You are teaching them how to spot when the engine is misfiring. It's an ongoing experiment in measuring human-AI collaboration. But, you know, if relying on flawed text output is dangerous when a doctor is writing a prescription, the stakes escalate exponentially when the AI takes physical form.

Oh, absolutely. We need to transition from digital text on a screen to our spotlight source physical action in the real world. Yes, the surgical robots. This is where regulatory frameworks are currently fracturing under the weight of the technology.

The manifesto in Frontiers in Science by Da Scupta and Granados calls for a totally new regulatory category for adaptive surgical AI. And adaptive is the operative word here, right?

It is. It means the robot continues to learn and change its parameters after it has been approved for use in the operating room. Okay, let's unpack this. Because current device regulation, specifically the software as a medical device framework, SAMMD, is fundamentally built on the assumption of a static tool, right?

Exactly. If a manufacturer wants to update the algorithm in a pacemaker, they freeze the code, submit the exact new version to the FDA, and wait for clearance. But a continuously learning surgical robot shatters that entire paradigm.

Because you can't logically approve a surgical device on a Monday if its machine learning model is going to invent a new, slightly altered tissue handling technique by Friday. Right. How does the FDA even regulate that? Well, the manifesto leans heavily into a concept the FDA has been piloting called the Predetermined Change Control Plan, or PCCP.

It functions like a pre-approved mathematical sandbox. The manufacturer agrees that the AI can learn and adapt independently, but only within highly specific predefined boundaries. And if the algorithm stays inside those boundaries, it doesn't require constant re-clearance.

That sounds reasonable enough for, say, an imaging algorithm analyzing pixels on a screen. I mean, if the statistical drift gets out of hand, a radiologist just ignores the red box on the X-ray. Yes, but an imaging algorithm experiencing statistical drift is fundamentally different from a surgical robot experiencing physical drift.

Oh, yeah. If a robot altering its motion policy decides to apply 10% more physical pressure, it could sever an artery. The physical translation of the data changes the entire risk profile. So how do you regulate a sandbox when the sandbox is holding a scalpel?

That's the terrifying part. The authors argue that extending the PCCP framework to robots requires post-market monitoring with what they call physical safety teeth. What does that look like? You cannot rely solely on software checks to monitor statistical drift.

You have to install physical hardware limiters, like hard-coded mechanical kill switches that absolutely cannot be overridden by the AI, regardless of what its continuously updating neural net decides is the optimal surgical path.

Wow. The device law itself has to evolve from regulating static software to regulating bounded physical autonomy. It's just mind-bending. Medicine is wrestling with these massive life-or-death physical stakes, yet the underlying engines driving all of this, models like Anthropic's CLAWD, are simultaneously restructuring the wider day-to-day business ecosystem.

You really can't look at the clinical applications in a vacuum. The commercial deployment of these models dictates their funding and their evolution. Exactly. Which brings us to the general AI updates. Anthropic just executed two massive maneuvers at opposite ends of the economic spectrum.

First, they launched CLAWD for small business. We're talking 15 pre-built agentic workflows, native integration with platforms like QuickBooks and PayPal, and they are pushing this via a multi-city roadshow directly to SMBs.

But then, a day later, PWC announced they are deploying CLAWD as their primary reasoning engine for hyperscale enterprise deal execution and back-office redesign. It is a textbook dual-lane strategy.

They are aiming to fundamentally rewire how work is executed at both the micro and macro levels. Let's break down the mechanics of the macro level first. When PWC says they are using an LLM for deal execution, what does that actually look like in an M&A context?

Traditionally, enterprise deal execution requires armies of junior lawyers and analysts sequestered in data rooms, right? They're reading thousands of pages of contracts to identify indemnification clauses, liability gaps, regulatory risks.

It takes weeks. Months, sometimes. PWC integrating CLAWD means the AI parses the unstructured data of an entire corporate merger in minutes. It maps out the risk surface across disparate legal documents instantly.

So it changes the firm's architecture from human-labor intensive to human-verification intensive. Exactly. And partnering with a Big Four accounting firm signals that Anthropic is aggressively challenging open AI for the most lucrative enterprise contracts on the planet.

So that's the top-down approach, reinventing corporate infrastructure. But the small business launch is completely different. How does an LLM change the daily operations of, like, a local plumbing business or a neighborhood bakery?

It connects fragmented systems. A small business owner usually relies on a patchwork of software. You know, an inbox for client requests, QuickBooks for accounting, maybe HubSpot for marketing. And none of those talk to each other. Right.

They don't communicate natively. CLAWD's new agentic workflows act as a connective cognitive layer. Okay, give me an example. So if a plumber receives a text message about a leak, the AI can theoretically parse that unstructured text, draft a formal quote, dispatch the team via a scheduling app, and then generate the invoice in QuickBooks automatically.

Wow. They are embedding the reasoning engine into the mundane fabric of the broader economy. They are selling automated operational capacity to people who do not care how a neural network functions. It's the path to ubiquitous deployment.

You secure the hyperscale enterprise capital to fund the model's development, and you build horizontal lane integrations into Main Street to ensure the model becomes as fundamental as electricity. The scale of integration across all these sectors is just staggering.

We've mapped out a landscape today where the exact same underlying architecture is being pushed to cure preeclampsia, guide a surgical robot, and balance a local bakery's ledger. It's everywhere. But, you know, if we pull the threads of this entire ecosystem together, it exposes a massive lingering contradiction.

Yes, a structural paradox between the digital capability and the physical reality. Right, because we started out talking about that massive $200 million Anthropic and Gates Foundation partnership, using frontier AI to hunt for drug targets for neglected diseases like HPV and preeclampsia.

Diseases that legacy pharmaceutical companies ignored because the commercial incentives simply weren't compelling enough. Right, we position the AI as the ultimate equalizer for global health. But we have to consider the physical supply chain. I mean, let's assume the absolute best-case scenario.

What if the AI does exactly what it is designed to do? Right, what if Claude sifts through millions of molecular structures and successfully discovers a brilliant, completely novel compound that cures preeclampsia?

It would be a historic triumph for computational biology. Digitally, yes. But the physical manufacturing facilities, the clinical trial infrastructure, the global distribution networks, sorry, the physical assets, those are still overwhelmingly controlled by the exact same legacy pharmaceutical conglomerates that neglected the disease in the first place.

Because it wasn't profitable. Exactly, so if the algorithm solves the biological puzzle, but the physical world is still governed by the old economic incentives. We're left asking if AI will actually democratize the cure or if it's only gonna democratize the blueprint.

If you possess the recipe, but you don't own the kitchen, how much power do you actually have? Thanks for listening. Find us on YouTube and your favorite podcast app. See you tomorrow.