Safety Framework
Snapshot: based on the Preparedness Framework v2.0 (15 April 2025), 2025–2026 blog updates, and the Safety & Security Committee disclosures; incorporating actual-operation data from GPT-5.4’s first “High cyber” trigger (March 2026).
1. What is the Preparedness Framework?
Section titled “1. What is the Preparedness Framework?”Published by OpenAI since December 2023, the Preparedness Framework is an internal risk-management document designed to:
- Define frontier capability thresholds (what capabilities count as “frontier risks”)
- Map mitigations (what deployment or security actions must follow a trigger)
- Establish decision procedures (who judges and who signs off)
It is not the Usage Policy (which constrains user behaviour) and not the Model Spec (which specifies model behaviour); it is a self-regulatory archive of the company’s own training and deployment decisions — one of the two principal exemplars of the “capability-threshold self-regulation” genre, alongside Anthropic’s RSP.
2. Version history
Section titled “2. Version history”| Version | Date | Core structure | Key changes |
|---|---|---|---|
| v1.0 | 18 December 2023 | 4 tiers (Low / Medium / High / Critical) × 4 categories | Initial release; responding to the July 2023 White House Voluntary Commitments |
| v1.1 updates | May 2024, September 2024, multiple addenda | Same | Refinement of evaluation procedures for GPT-4o and o1 |
| v2.0 | 15 April 2025 | 2 tiers (High / Critical) × 3 tracked + 5 watched | Structural rewrite (see §3) |
| v2.1 addendum | October 2025 | Same as v2.0 | Added Sora 2 evaluation procedure |
| v2.2 addendum | March 2026 | Same as v2.0 | First trigger: GPT-5.4 “High cyber” determination |
3. Detailed v2 architecture
Section titled “3. Detailed v2 architecture”Three “Tracked Categories” (formally tracked)
Section titled “Three “Tracked Categories” (formally tracked)”- Biological & Chemical
- High threshold: for actors with relevant disciplinary background, the model provides “meaningful uplift” capable of executing large-scale biological / chemical harm
- Critical threshold: for actors without relevant background, the model can independently guide execution
- Cybersecurity
- High threshold: the model materially raises attack success for operators with baseline capability
- Critical threshold: the model can independently discover and exploit critical-infrastructure vulnerabilities
- AI Self-improvement
- High threshold: the model can independently execute substantial portions of contemporary AI-research workflows
- Critical threshold: the model can materially accelerate frontier AI research, threatening human control over AI development
Five “Watched Categories” (observed, not formally triggered)
Section titled “Five “Watched Categories” (observed, not formally triggered)”- Long-Range Autonomy
- Sandbagging (intentional under-performance)
- Autonomous Replication & Adaptation
- Undermining Safeguards
- Nuclear & Radiological — tracked in v1, downgraded to watched in v2
Threshold determination and response
Section titled “Threshold determination and response”| Threshold | Official description | Response |
|---|---|---|
| High | Materially increases existing severe-harm pathways | Deployment restriction + Safety Advisory Group sign-off |
| Critical | Opens previously-unprecedented severe-harm pathways | Suspend deployment pending independent safety evidence |
4. The v1 → v2 “dilution”: key deletions
Section titled “4. The v1 → v2 “dilution”: key deletions”| Item | v1 | v2 |
|---|---|---|
| Number of threshold tiers | 4 (Low / Medium / High / Critical) | 2 (High / Critical) |
| Number of risk categories | 4 (+ Persuasion) | 3 tracked + 5 watched |
| Persuasion category | Tracked (Medium trigger for targeted manipulation) | Downgraded (folded into Model Spec behavioural side) |
| Nuclear & Radiological | Tracked | Downgraded to Watched |
| Pause commitment | ”Pause deployment if Critical triggers” | Retained but softened (“on consideration”) |
| External review conditions | Explicit | Loose (“the Safety Advisory Group may include external advisors”) |
Official narrative (openai.com/safety/preparedness):
- “Focus on genuinely high-risk categories”
- “Reduce the compliance burden of Low / Medium to avoid slowing routine iteration”
- “Persuasion is better covered by Model Spec and user-layer policy”
5. Academic critique: arxiv 2509.24394 and “no guaranteed practice”
Section titled “5. Academic critique: arxiv 2509.24394 and “no guaranteed practice””The September 2025 paper arxiv 2509.24394, Does OpenAI’s Preparedness Framework Make Binding Safety Commitments? (authors include GovAI and SaferAI researchers), reaches the headline conclusion:
The 2025 OpenAI Preparedness Framework does not guarantee any AI risk mitigation practices.
The paper’s four-layer argument
Section titled “The paper’s four-layer argument”- “Safeguard sufficiency” adjudication is entirely internal
- The repeated phrasing “sufficient safeguards” is never externalised as an auditable standard
- The full Safety Advisory Group membership is not publicly disclosed
- The “on consideration” clause allows ex post re-interpretation
- The text’s “weighing considerations including capability, deployment scope, …” allows any mitigation decision to be legitimated
- The “capability–mitigation” mapping is not a hard binding
- Reaching High does not automatically trigger a specific measure; the text uses “may include” rather than “shall”
- The text contains no “shall not deploy” hard prohibitions
- A contrast with the original pause commitment in Anthropic RSP v2; v2 lacks even that
Other academic and policy critique
Section titled “Other academic and policy critique”- Zvi Mowshowitz (Don’t Worry About the Vase): version-by-version breakdown of v1 vs. v2 wording changes; characteristic argument that v2 reads closer to marketing than to an operative risk framework
- Stuart Russell (UC Berkeley): in repeated 2025 interviews, treats Preparedness v2 alongside RSP v3 as a systemic retreat in industry self-regulation
- FLI (Future of Life Institute) AI Safety Index: OpenAI Preparedness’s score was materially reduced after v2 release
- Markus Anderljung (GovAI Director): repeated 2025 commentary noting that voluntary commitments without hard-constraint text are unsustainable under competitive pressure
6. v2’s first operation: GPT-5.4 “High cyber” case (March 2026)
Section titled “6. v2’s first operation: GPT-5.4 “High cyber” case (March 2026)”On release of GPT-5.4 in March 2026, the official OpenAI blog “Deploying GPT-5.4 under Preparedness Framework v2” announced:
Following Preparedness evaluations, GPT-5.4 has been assessed at High capability in the Cybersecurity category.
Response measures:
- Tiered deployment
- ChatGPT surface: blocks reverse-engineering and exploit-development requests
- API: Trusted Access Program (TAP), open only to security researchers vetted by human review
- GPT-5.4-Cyber (14 April 2026): stand-alone endpoint with full cyber capabilities
- Asynchronous monitoring
- Zero Data Retention (ZDR) customers: asynchronous blocking + ex post audit
- Trusted Access vetting
- Identity verification + employer verification + use-case declaration + NDA
- Order of magnitude of approved vettings not officially disclosed (only informal press estimates)
Critique: triggering the threshold ≠ restricting the capability
Section titled “Critique: triggering the threshold ≠ restricting the capability”- Apollo Research April 2026 blog post: TAP vetting strength is lower than Anthropic’s equivalent “safeguarded deployment” mechanism for Claude Opus 4.7
- SaferAI: post-High mitigations are primarily at the access-control layer; the model’s underlying uplift has not been reduced
- UK CAISI (formerly UK AISI): proposed stricter deployment recommendations in pre-deployment testing; OpenAI opted for a weaker variant
7. Governance structure: Safety Advisory Group and Safety & Security Committee
Section titled “7. Governance structure: Safety Advisory Group and Safety & Security Committee”Safety Advisory Group (SAG)
Section titled “Safety Advisory Group (SAG)”- Composition: internal safety team leads + a small number of external advisors (full membership not fully public)
- Function: advises on Preparedness evaluation conclusions and deployment decisions
- Authority boundary: no veto authority — final decisions rest with the CEO and the Safety & Security Committee
Safety & Security Committee (SSC, established May 2024)
Section titled “Safety & Security Committee (SSC, established May 2024)”| Item | Detail |
|---|---|
| Context of founding | Established after the dissolution of the Superalignment team (May 2024) and the departures of Jan Leike / Sutskever |
| First chair | Sam Altman (concurrent with CEO) |
| September 2024 chair rotation | Zico Kolter (CMU) assumes the chair; Altman shifts to “member” |
| Members | Kolter, Bret Taylor (board chair), Adam D’Angelo, Nicole Seligman |
| Authority | Review safety matters; report to the board |
| Controversy | Members are simultaneously board members → independence in question |
Structural critiques (Helen Toner 2024 TED talk, Tim O’Reilly, Gary Marcus):
- SSC’s “self-evaluation” problem: less an independent oversight body than an internal company safety-reporting mechanism
- After the November 2023 board episode, members with external independent judgement (Toner, McCauley) had departed
- By contrast, the Anthropic Long-Term Benefit Trust has, in formal authority at least, a “director-removal” pathway; OpenAI’s 2024–2025 restructuring has weakened the check that the non-profit body holds over the commercial entity
8. Comparison with peer frontier labs
Section titled “8. Comparison with peer frontier labs”| Dimension | OpenAI Preparedness v2 | Anthropic RSP v3 | Google DeepMind FSF v3 | xAI |
|---|---|---|---|---|
| Release date | 15 April 2025 | 24 February 2026 | April 2026 | None |
| Architecture | Risk categories × 2-tier thresholds | ASL capability levels | CCL + TCL | None |
| Pause commitment | Weak (“if necessary”) | Rescinded | No explicit commitment | — |
| External review | SAG (includes external members) | Risk Reports with external review | FSF-report publication | — |
| First trigger case | GPT-5.4 High cyber (March 2026) | Claude Opus 4 ASL-3 biochem (May 2025) | Gemini 3 Pro TCL manipulation (November 2025) | — |
| Core academic critique | arxiv 2509.24394 | Abandonment of pause | TCL thresholds ambiguous | No framework |
Common pattern: the “loosening” of 2025–2026
Section titled “Common pattern: the “loosening” of 2025–2026”All three labs have retreated from commitments, in different directions:
- Anthropic: rescinded pause; shifted to “industry-shared obligation”
- OpenAI: consolidated thresholds; downgraded Nuclear / Persuasion priority
- DeepMind: deleted Gemini “military prohibition” (2024); expanded CCL (2025) but without pause
Structural implication: absent hard-law backstops, competitive pressure drives the weakest commitment to become the common floor (race-to-the-bottom dynamics) — the consistent observation of Mowshowitz, Anderljung, Toner, and the same reasoning behind California SB 53, Article 55 of the EU AI Act, and other proposed legislation.
9. Interface with hard law
Section titled “9. Interface with hard law”| Regime | Relevant provisions | Role of Preparedness |
|---|---|---|
| EU AI Act | Art. 55 systemic-risk mitigation | Citable as “state-of-the-art” mitigation practice for compliance |
| EU GPAI Code of Practice | Safety & Security chapter | OpenAI partially reserves (has not fully accepted chapter obligations) |
| California SB 53 | §22757.11 frontier-developer protocol | Preparedness may serve as the required “written protocol” |
| White House Voluntary Commitments (July 2023) | Capability-evaluation commitments | Preparedness provides compliance evidence |
| Seoul Commitments (May 2024) | Frontier AI safety commitments | Preparedness is OpenAI’s corresponding document under this 16-company framework |
10. Industry practice: how Preparedness shapes deployment decisions
Section titled “10. Industry practice: how Preparedness shapes deployment decisions”The following is inferable from public information as the actual decision chain:
- Pre-training complete → internal eval + baseline capability benchmarks
- Preparedness Team evaluates → scoring the 3 tracked + 5 watched categories
- External red team evaluations (METR / Apollo / UK CAISI / US CAISI) → reports submitted
- SAG deliberation → advisory memorandum
- SSC approval + CEO sign-off → deployment scope decided
- Release of System Card + Preparedness tables → public disclosure of the conclusions
- Ongoing monitoring → post-deployment re-evaluation (GPT-5.4 → v2.2 addendum)
Gating cases observed:
- o1 release (December 2024): Apollo’s disclosed “scheming” results led to deployment adjustments (disabling certain tool-use, adding CoT monitoring)
- Sora 2 (September 2025): Safety Systems’ influence-operations and child-safety evaluations delayed release by weeks
- GPT-5.4 (March 2026): the High-cyber trigger delayed default ChatGPT roll-out by several weeks and moved distribution to TAP mode
These illustrate that Preparedness has affected timing but has not affected whether to release. Contrasted with Anthropic’s delayed ASL-3 activation for Opus 4, or DeepMind’s staged CBRN roll-out for Gemini 3, OpenAI’s Preparedness remains the weakest on “stopping.”
11. Further reading
Section titled “11. Further reading”- Primary: Preparedness Framework v2 PDF; SSC announcement; Preparedness blog index
- Academic: arxiv 2509.24394; Anderljung et al., Frontier AI Regulation (2023, arxiv 2307.03718); Hendrycks et al., An Overview of Catastrophic AI Risks (2023, arxiv 2306.12001)
- Critique: Zvi Mowshowitz, OpenAI Delenda Est series; Helen Toner TED 2024; FLI AI Safety Index 2025; TIME, “Inside OpenAI’s Safety Shake-up” (June 2024)
- Cross-references: OpenAI overview, red-team disclosures, Anthropic safety framework, methodology