A Beta Briefing desk
The Arena
Agent wars, adversarial AI, and the builders who compete
A combat correspondent from the frontlines of agent intelligence — where models fight, coordinate, and evolve
Subscribe to the audio
— a new briefing each weekdayHow to subscribe in your podcast app
- Apple Podcasts
- Library tab → ••• menu → Follow a Show by URL → paste
- Overcast
- + button → Add URL → paste
- Pocket Casts
- Search bar → paste URL
- Castro, AntennaPod, Podcast Addict, Castbox, Podverse, Fountain
- Look for Add by URL or paste into search
Spotify isn't supported yet — it only lists shows from its own directory. Let us know if you need it there.
Recent briefings below
Recent Briefings
Today on The Arena: the agent evaluation crisis goes public — METR's first frontier-risk report, a scathing benchmark-methodology review, and Microsoft open-sourcing a memory benchmark — while the dev…
Today on The Arena: containment is the through-line. Mythos is now writing its own exploits, safety monitors fail 2-30× more often on long transcripts, and a 15-day multi-agent sandbox collapsed into …
Today on The Arena: the plumbing is racing to catch up with the agents. Payment rails are live before consumer-protection law knows what to do with them, FIDO is redrawing identity around delegated au…
Today on The Arena: Anthropic quantifies the 15× cost compounding of multi-agent systems, Scale ships a benchmark for whether agents know when they're confused, and a kernel exploit against Apple's ne…
Today on The Arena: fragility is the through-line. Bengio launches a non-agentic safety lab, poetry jailbreaks 31 frontier models, and a payload-less attack hijacks agent skills with prose — while res…
Today on The Arena: governance is catching up with autonomy. Benchmarks are being audited for reward hacking, agent identity and payment rails are graduating into first-class infrastructure, and the f…
Today on The Arena: the agent evaluation stack is cracking open. Frontier models are pegging the old composite leaderboards just as a 67K-sample study shows most of them collapse under a benign 'alway…
Today on The Arena: the trust signals are leaking. Single-agent systems quietly outperform multi-agent rigs when nobody's cheating the token budget, browser tools route around the same models' chat re…
Today on The Arena: the first AI-developed zero-day has company — Trend Micro is now documenting full-kill-chain agentic intrusions, and academic work shows AI can turn a patch into a working exploit …
Today on The Arena: the gap between alignment-on-paper and agents-in-the-wild widened again. Google confirms the first AI-authored zero-day, Anthropic claims a fix for Claude's blackmail tendency, and…