GPT-5.4 — tisram

OpenAI · 2026-05-01 2026-05-01-w1

Where the goblins came from

Reward signals shaped for a single personality bled into base behavior across 76.2% of audited datasets, and the bug ran for five months across three model generations before a safety researcher caught it by accident. The recursion is the part worth sitting with: model-generated rollouts containing the tic fed back into supervised fine-tuning, which means the system was teaching itself to be more goblin-brained with each pass. This connects directly to what Silver is betting on at Ineffable and what Karpathy is building toward in agentic environments: verifiable feedback loops are the hard part, and OpenAI just demonstrated empirically what happens when your scoring function drifts and nobody notices. The goblin bug isn't an anomaly; it's a preview of the failure mode for any system where behavioral regression testing isn't systematically applied across versions. Every custom GPT and fine-tune is a covert training run on the base model, and that just became a procurement question.

# tags

agentic-ai-viability ai-1.0-defensibility ai-safety alignment evalrig evaluation-infrastructure fine-tuning frontier-models gpt-5-4 gpt-5-5 interpretability openai reinforcement-learning reliability reward-hacking synthetic-media training-data

OpenAI 2026-05-01-2

Where the goblins came from

OpenAI's goblin postmortem buries the lede: reward signals applied to a single personality leaked into base behavior in 76.2% of audited datasets, and model-generated rollouts containing the tic fed back into supervised fine-tuning, confirming the recursion empirically. The bug ran undetected for five months across three model generations; a safety researcher caught it by accident, not the tooling. Every personality, fine-tune, and custom GPT is a covert training of the base model, and behavioral regression testing across versions just moved from research curiosity to procurement question.

# tags

alignment reward-hacking openai gpt-5-5 reinforcement-learning ai-safety ai-1.0-defensibility frontier-models evaluation-infrastructure evalrig agentic-ai-viability reliability gpt-5-4 interpretability training-data fine-tuning synthetic-media

Latent Space 2026-04-07-2

Extreme Harness Engineering for Token Billionaires: 1M LOC, 0% Human Code, 0% Human Review

OpenAI's Frontier team built a 1M-line Electron app with zero human-authored code: the competitive advantage wasn't the model, it was six skills encoding what "good" looks like as text. The real shift here isn't AI writing code; it's AI inheriting engineering culture. Ghost libraries (distributing specs instead of code) and Symphony (an Elixir orchestrator the model chose for its process supervision primitives) point to a future where the scarce resource is institutional knowledge distillation, not developer headcount.

# tags

ai-coding agentic-ai enterprise-ai developer-tools agentic-ai-viability reliability mcp ai-1.0-defensibility agent-gating multi-model-strategy