3 items

Anthropic launched Claude Code Security on Feb 20. WSJ validated the capability on Mar 6 with the Firefox bug bonanza -- 100+ bugs, 14 high-severity, Mozilla asking for more. Same day, OpenAI shipped Codex Security with broader access and harder evidence (15 named CVEs). The meta-pattern: security scanning is the enterprise wedge play -- the CISO budget is the Trojan horse for the engineering budget. Neither announced pricing. When two frontier labs offer free security scanning, they're not selling a security product; they're buying enterprise platform adoption.

Anthropic 2026-03-09-1

Making frontier cybersecurity capabilities available to defenders

Product announcement dressed as research disclosure. Claude Code Security uses multi-stage self-verification to scan codebases beyond pattern-matching SAST. The 500-vuln claim has no CVEs, no false positive rates, and no comparison to existing tools. Zero external validation in the announcement itself -- the WSJ/Firefox piece did that work. The real play: security scanning as a loss-leader wedge for enterprise platform deals. Neither lab announced pricing.

OpenAI 2026-03-09-2

Codex Security: now in research preview

Same-day competitive counter to Anthropic with stronger receipts: 15 named CVEs in the appendix (GnuTLS heap overflows, GnuPG stack buffer overflow, GOGS 2FA bypass), published improvement curves (84% noise reduction, 90%+ severity over-reporting reduction, 50%+ false positive reduction). The threat model architecture -- building an editable intermediate artifact before scanning -- is the most interesting pattern: it generalizes as "make the agent's understanding inspectable before execution." Broader tier access (Pro through Edu) weakens the dual-use containment narrative but maximizes adoption velocity.

Wall Street Journal 2026-03-09-3

Anthropic's AI Hacked the Firefox Browser. It Found a Lot of Bugs.

The independent credibility piece for Anthropic's security capabilities. Claude found 100+ Firefox bugs (14 high-severity) in two weeks -- more high-severity than the world reports to Mozilla in two months. The Curl counter-narrative is the buried lede: AI bug reports are 95% garbage (Stenberg data), making Claude's hit rate the real differentiator, not the volume. Most important detail: Claude is better at finding bugs than exploiting them -- the defender/attacker asymmetry currently favors defenders, but that gap is temporary.

3 items

Three domains, one pattern: AI compresses cost and increases volume, but the gap between "approximation" and "automation" persists. Writing gets slop, not singularity. Clean-room reimplementation gets legal ambiguity, not settled IP. Market research gets faster backtesting, not predictive intelligence. The ceiling question — does AI raise it or just raise the floor? — remains open and domain-dependent.

The Intrinsic Perspective 2026-03-08-1

Bits In, Bits Out

Hoel argues writing is the canary domain for AI capability — 6 years in, LLMs produced efficiency gains and slop, not a quality revolution. The Amazon book data is compelling (average worse, top 100 unchanged), but the extrapolation from writing to all domains is structurally weak: verifiable domains like code and math behave differently from taste-dependent ones. Best articulation of the "tools not intelligence" thesis, but cherry-picks the hardest domain for AI to show measurable ceiling gains.

Simon Willison's Weblog 2026-03-08-2

Can coding agents relicense open source through a "clean room" implementation of code?

Coding agents can now reimplement GPL codebases against test suites in hours, making copyleft economically unenforceable. The chardet LGPL→MIT relicensing dispute is the first clean test case, but the real bomb is training data contamination: if the model was trained on the original code, no "clean room" claim holds. Generalizes to any governance mechanism that relies on cost-of-reimplementation as friction.

Wall Street Journal 2026-03-08-3

Can AI Replace Humans for Market Research?

$100M Series A announcement dressed as trend piece. CVS's "95% accuracy" claim is backtested against known answers — the real test is predicting unknown findings, which nobody's shown. Digital twins for market research are a cost/speed optimization, not a new form of intelligence. The hard-to-reach population simulation (chronic disease patients from sparse data) is where overconfidence becomes actively dangerous.