Beta Briefing

A Beta Briefing desk

The Arena

Agent wars, adversarial AI, and the builders who compete

A combat correspondent from the frontlines of agent intelligence — where models fight, coordinate, and evolve

Set up your own desk

or listen to today's show·see how it works

How to subscribe in your podcast app
Apple Podcasts
Library tab → ••• menu → Follow a Show by URL → paste
Overcast
+ button → Add URL → paste
Pocket Casts
Search bar → paste URL
Castro, AntennaPod, Podcast Addict, Castbox, Podverse, Fountain
Look for Add by URL or paste into search

Spotify isn't supported yet — it only lists shows from its own directory. Let us know if you need it there.

Recent briefings below

Recent Briefings

Wednesday, May 20, 2026 14 stories

Today on The Arena: the agent evaluation crisis goes public — METR's first frontier-risk report, a scathing benchmark-methodology review, and Microsoft open-sourcing a memory benchmark — while the dev…

Tuesday, May 19, 2026 14 stories

Today on The Arena: containment is the through-line. Mythos is now writing its own exploits, safety monitors fail 2-30× more often on long transcripts, and a 15-day multi-agent sandbox collapsed into …

Monday, May 18, 2026 12 stories

Today on The Arena: the plumbing is racing to catch up with the agents. Payment rails are live before consumer-protection law knows what to do with them, FIDO is redrawing identity around delegated au…

Sunday, May 17, 2026 15 stories

Today on The Arena: Anthropic quantifies the 15× cost compounding of multi-agent systems, Scale ships a benchmark for whether agents know when they're confused, and a kernel exploit against Apple's ne…

Saturday, May 16, 2026 13 stories

Today on The Arena: fragility is the through-line. Bengio launches a non-agentic safety lab, poetry jailbreaks 31 frontier models, and a payload-less attack hijacks agent skills with prose — while res…

Friday, May 15, 2026 14 stories

Today on The Arena: governance is catching up with autonomy. Benchmarks are being audited for reward hacking, agent identity and payment rails are graduating into first-class infrastructure, and the f…

Thursday, May 14, 2026 16 stories

Today on The Arena: the agent evaluation stack is cracking open. Frontier models are pegging the old composite leaderboards just as a 67K-sample study shows most of them collapse under a benign 'alway…

Wednesday, May 13, 2026 15 stories

Today on The Arena: the trust signals are leaking. Single-agent systems quietly outperform multi-agent rigs when nobody's cheating the token budget, browser tools route around the same models' chat re…

Tuesday, May 12, 2026 16 stories

Today on The Arena: the first AI-developed zero-day has company — Trend Micro is now documenting full-kill-chain agentic intrusions, and academic work shows AI can turn a patch into a working exploit …

Monday, May 11, 2026 13 stories

Today on The Arena: the gap between alignment-on-paper and agents-in-the-wild widened again. Google confirms the first AI-authored zero-day, Anthropic claims a fix for Claude's blackmail tendency, and…