Back to insightsAnthropicOpenAI

AI Model Behaviour Drift: The Signal Enterprise Teams Are Not Reading Yet

May 3, 2026
9 min
AI Model Behaviour Drift: The Signal Enterprise Teams Are Not Reading Yet
TL;DR. Within 48 hours — on 29 and 30 April 2026 — OpenAI published a post-mortem on GPT-5's goblin outputs and Anthropic updated its Responsible Scaling Policy. The pattern is not coincidental: foundation model behaviour drifts after deployment. Organisations that freeze their governance at go-live are running risks they cannot see.

A Recurring Pattern: Model Behaviour Is Not Fixed at Deployment

Two major publications within 48 hours. OpenAI documents how unpredictable personality traits — called goblins — emerged in GPT-5 after deployment: a detailed timeline, an identified root cause, fixes applied in post-production. Anthropic simultaneously publishes an update to its Responsible Scaling Policy, revising its commitments as its models' actual capabilities become visible.

The signal is structural: foundation model behaviour is not static. It reconfigures under the effect of human reinforcement loops (RLHF), successive updates, and deployment at massive scale. Governance frameworks built at a given point in time do not cover what the model will do six months later.

Three Documented Cases That Illustrate the Pattern

GPT-5 and the goblins

On 29 April 2026, OpenAI published an analysis of how unpredictable personality traits proliferated in GPT-5. Per that publication, these quirks emerged from positive reinforcement signals that amplified unanticipated behaviours. Diagnosis and fixes came after deployment — a genuine analytical effort, a resolutely reactive posture.

Anthropic's Responsible Scaling Policy update

Published the same day, 29 April 2026, Anthropic's RSP update shows that even the sector's most formalised safety frameworks are continuously revised — not before deployment, but as the model's capabilities exceed initial projections. A static governance policy is, by design, behind the model it claims to govern.

How people actually use Claude for personal guidance

On 30 April 2026, Anthropic published a study on how individuals ask Claude for personal advice. What it reveals: actual usage patterns diverge systematically from what the designers anticipated. The model responds to needs nobody fully predicted — confirming that initial assumptions about expected behaviour are structurally insufficient.

The Root Cause: Behavioural Emergence That Static Governance Cannot Track

Large language models generate emergent behaviour — configurations that were not explicitly programmed, arising from the interaction of training data, human feedback loops, and large-scale deployment. What the goblins case illustrates, per OpenAI's 29 April 2026 publication, is that behavioural traits can reinforce non-linearly from signals that appeared entirely benign.

A second factor: governance policies are drafted based on capabilities known at a given moment. As soon as the model evolves — through an update, a shift in usage context, or a scaling event — the initial assumptions become obsolete. Anthropic's RSP update of 29 April 2026 demonstrates that even a leading lab must revise its own certainties mid-flight.

Three Levers to Move from Reactive to Continuous Monitoring

  1. Treat every model update as a new software release. Define documented behavioural regression tests — before and after migration. What the model answered before an update is not guaranteed after. Software qualification processes apply here with the same rigour.
  2. Establish behavioural baselines before deployment. Identify the most critical prompts for your business and document expected responses. That baseline becomes the reference for continuous monitoring — and the starting point for detecting any drift.
  3. Read vendor governance publications as early-warning signals. Anthropic's RSP update and OpenAI's goblins post-mortem are not isolated crisis communications: they are indicators of what your own internal monitoring systems should already be capable of detecting.

Does your organisation know what its AI model is actually doing today — not at go-live, but right now?

If this analysis speaks to you, I publish a piece of this calibre every day on digital innovation and enterprise AI. 👉 Get the next one straight in your inbox — sign-up takes ten seconds, and each edition is read before 9 a.m. by leaders of European SMEs, mid-caps and public institutions.

Sources

Share this article

Ready to create something amazing together?

Let's discuss how I can help bring your vision to life through strategic design that delivers tangible results for your business.

AI Model Behaviour Drift: The Signal Enterprise Teams Are Not Reading Yet | Matthieu Pesesse