evalrig-adjacent

9 items

The New Yorker 2026-05-31-1

The Despair of the Professor in the Age of A.I.

Twelve professors put AI use at 50 to 90 percent of student writing and read the loss as the end of thinking, but the one calm voice, a CS instructor, already moved his course from writing code to grading AI-written code that is correct or subtly wrong. Generation was always the proxy; judgment was the skill, and the essay just got unbundled from it. The same gap drives enterprise AI, where generation is solved and verification was never built, which puts the pricing power in AI-resistant assessment and evaluate-the-output training rather than in another tutoring app.

WIRED 2026-05-27-3

AI Agents Plunged the Tech World Into Chaos. Here's Exactly How That Happened

OpenClaw plus NemoClaw is Linux Foundation plus Red Hat compressed from decades to months: 366K GitHub stars in under six months, Jensen Huang allocating 10 minutes of GTC 2026 to it, Nvidia shipping a 'more secure' enterprise variant before the upstream OSS turned one year old, and OpenAI capturing the founder talent that Anthropic answered with legal notices. The new agent-strategy question for every enterprise is now binary: upstream OSS, enterprise hardener, or neither, with 'neither' the dead zone. WIRED's 4,000-word canonization names the verification gap in a single closing sentence, which is the signal: verification, governance, and FinOps are the 12-24 month accumulation window the celebration forgot.

Google DeepMind · 2026-05-20 2026-05-22-w1

DeepMind Co-Scientist: A multi-agent AI partner to accelerate research

The detail that reorients the entire Co-Scientist paper: the majority of system compute goes to verifying hypotheses, not generating them. DeepMind didn't build a research assistant on top of Gemini — it built a verifier corpus (AlphaFold, ChEMBL, UniProt, the full literature stack) and wrapped a generator around it. That architectural choice is the same bet surfacing in the Bloomberg litigation data and the BBC manipulation piece: generation is cheap and increasingly generic, and the organizations that accumulated verification infrastructure before the model layer commoditized are holding the durable position. Every 'AI for vertical X' startup that priced the model layer priced the wrong thing. The moat was always the corpus that tells you whether the output is true.

Bloomberg · 2026-05-22 2026-05-22-w3

Courts Are Swamped With AI-Powered Do-It-Yourself Lawsuits

Pro se employment filings grew 49% year-over-year (4,100 to 6,400) while attorney-led filings grew 15% — and Nippon Life burned roughly $300K defending one ChatGPT-assisted plaintiff trying to reopen a settled case. AI didn't make those plaintiffs more legally sophisticated; it flipped the cost asymmetry so that filing is nearly free and response is not. That's the same structural gap the BBC piece exposes in information distribution and Co-Scientist exposes in research: generation costs collapsed, verification costs didn't move. The unoccupied product surface here sits on the defense side, sanctions detection, AI-authorship forensics, response-cost triage, and it's the same category as the verifier corpus DeepMind built, just at the opposite end of the market from Harvey. Volume markets with high cost-to-respond are permanently changed; the firms that figure out verification tooling own the economics of what comes next.

Bloomberg 2026-05-22-1

Courts Are Swamped With AI-Powered Do-It-Yourself Lawsuits

Bloomberg's DIY-lawsuit lede buries the structural point: pro se employment filings grew 49% YoY (4,100 → 6,400) while attorney-led grew 15%, and Nippon Life burned ~$300K defending one ChatGPT-assisted plaintiff trying to reopen a settled case. That's the actual story — AI didn't make plaintiffs smarter, it flipped the litigation cost asymmetry. Volume markets with high cost-to-respond just became permanently uneconomic for defendants, and the unoccupied product surface is defense-side: adversarial-output verification (sanctions-detection, AI-authorship forensics, response-cost triage) — EvalRig-adjacent, opposite end of the market from Harvey.

The Handbasket 2026-05-22-2

Hating AI is good, actually

Pew clocking 53% pessimism vs 16% optimism on AI and creativity landed the same day WSJ put 'AI Rebellion' on the front page — sentiment confirmation, not signal. The actual signal is the Rosenbaum book (fabricated quotes, author unrepentant) and Granta using Claude.ai to evaluate AI-suspected prize submissions landing in the same week: legitimacy is collapsing precisely where output verification was never built. Every CMO reading the WSJ piece has the same question their CTO hasn't answered yet — where in our stack does a Rosenbaum incident happen to us.

Wall Street Journal 2026-05-22-3

WSJ/Mims — 'Vibe Slop Crisis': 75% AI-generated code at Google, GitHub policy response, and the IPO-window verification arbitrage

Pichai says 75% of Google's new code is AI-generated, up from 50% six months ago; Claude Code's median user went from 20 minutes a day to 20 hours a week. GitHub changing its policies to fight AI-generated coding garbage in the same week the Zechner/Ronacher critique surfaces in WSJ isn't coincidence — it's practitioner alarm graduating to institutional press at exactly the OpenAI/Anthropic IPO moment. The market is pricing generation; the cliff it hasn't priced is verification.

Digiday 2026-05-21-1

The Economist's two-track web: agent-readable B2B pages, embedded pods, and the wholesale/retail split

The Economist is building two parallel surfaces: stripped-down Q&A for the agents that B2B buyers now start their research in, and the glossy human-facing product where subscription pricing actually lives. De Zanche names it correctly: agent optimization is a defensive baseline, not differentiation, which means the agent-track is wholesale and the human-track is the only place premium pricing survives. The quieter story is the org-shape change underneath: six to eight cross-functional pods, editorial staff embedded next to engineers, science-desk editors vibe-coding journal-credibility utilities, and a productivity number revised from 8 percent to more-than-doubled in a single news cycle.

Google DeepMind 2026-05-20-1

DeepMind Co-Scientist: A multi-agent AI partner to accelerate research

DeepMind's Co-Scientist paper in Nature drops the actual bombshell in one sentence — the majority of system compute goes to verifying hypotheses, not generating them. The moat isn't Gemini; it's the verifier corpus that grounds each claim: AlphaFold, ChEMBL, UniProt, the literature stack Google has quietly accumulated. Every "AI for vertical X" startup pricing the model layer is pricing the wrong layer of the stack.