Model Card
Summary: Each generation of the Claude family ships with a Model Card / System Card covering capability evaluations, bias assessment, ASL determinations, known limitations, and deployment guidance. Measured against the original Model Cards for Model Reporting framework (Raji & Gebru, 2019), actual industry disclosures exhibit three pervasive gaps: training data, compute, and RLHF details. This page archives the Model Card evolution from Claude 1 through 4.7 and offers a critique from the scholarly literature.
Claude model-family timeline
Section titled “Claude model-family timeline”| Released | Model | ASL | Model Card highlights |
|---|---|---|---|
| March 2023 | Claude 1 | ASL-2 | First generation; brief Model Card; first Constitutional AI disclosure |
| July 2023 | Claude 2 | ASL-2 | First 100K context; HumanEval / MMLU disclosed |
| November 2023 | Claude 2.1 | ASL-2 | 200K context; first systematic hallucination evaluation |
| March 2024 | Claude 3 family (Opus / Sonnet / Haiku) | ASL-2 | First complete System Card; Opus “evaluation awareness” anecdote publicly discussed |
| June 2024 | Claude 3.5 Sonnet | ASL-2 | SWE-bench industry-leading; Artifacts productised |
| October 2024 | Claude 3.5 Sonnet (new) / Haiku | ASL-2 | Computer Use beta; dedicated agentic System Card |
| May 2025 | Claude Opus 4 / Sonnet 4 | ASL-3 | First Claude to trigger ASL-3 (biochemical uplift evaluation); deployment-side ASL-3 safeguards enabled |
| August 2025 | Claude Opus 4.1 / Sonnet 4.1 | ASL-3 | Reasoning-chain optimisations; Agentic System Card published independently |
| November 2025 | Claude Haiku 4.5 | ASL-2 | PRC-language evaluation reported as a distinct section for the first time; low-cost inference |
| January 2026 | Claude Sonnet 4.6 | ASL-3 | 1M context; Computer Use reaches steady state |
| March 2026 | Claude Opus 4.7 | ASL-3 | Current flagship; 1M context edition; agentic task SOTA |
Terminological note: Anthropic’s Model Card typically refers to the model-level document; the System Card covers deployment-system safeguards, monitoring, and refusal policies. Since 2024 the two are often released together.
Typical Model Card structure (Opus 4.7 as exemplar)
Section titled “Typical Model Card structure (Opus 4.7 as exemplar)”The Anthropic System Card 2026 standard structure:
- Overview — model name, version, release date, principal use cases
- Capability Evaluations
- Language understanding: MMLU / MMLU-Pro / GPQA Diamond
- Reasoning: DROP / ARC-AGI / BIG-Bench Hard
- Coding: HumanEval / MBPP / SWE-bench Verified / LiveCodeBench
- Agentic: AgentBench / GAIA / SWE-agent tasks
- Safety-critical: WMDP (weaponisation knowledge), Cybench (cyber offence / defence), BioLP (biological protocols)
- Safety Evaluations
- Jailbreak robustness (StrongREJECT, HarmBench)
- Bias and fairness (BBQ, Winogender, Real-Toxicity Prompts)
- CBRN uplift evaluation (internal + UK/US AISI collaboration)
- Autonomy evaluations (METR, Anthropic Frontier Red Team)
- ASL Determination
- Current ASL level and threshold review
- Mapping to RSP v3 capability thresholds (see safety-framework)
- Known Limitations
- Hallucination patterns, context-length margins, language bias
- Residual jailbreak risk (publicly discussed by Anthropic since 2024)
- Deployment Guidance
- Differences across Claude.ai / API / Bedrock / Vertex distribution
- Recommended system-prompt templates and refusal modes
- Human-oversight guidance for High-Risk use cases (aligned with the AUP)
- Training Data Disclosure — typically one or two paragraphs, heavily abstracted (see critique)
- Acknowledgments & External Review — UK AISI / US AISI / GovAI / METR, etc.
Raji & Gebru (2019) original ideal vs. actual industry disclosure
Section titled “Raji & Gebru (2019) original ideal vs. actual industry disclosure”The original paper (Mitchell, Raji, Gebru et al., Model Cards for Model Reporting, FAccT 2019) specifies eight elements: (1) Model Details (including training data); (2) Intended Use; (3) Factors (demographic slices); (4) Metrics; (5) Evaluation Data; (6) Training Data; (7) Quantitative Analyses; (8) Ethical Considerations & Caveats.
Anthropic’s actual disclosure vs. the original eight:
| Original element | Anthropic disclosure level | Principal gaps |
|---|---|---|
| Model Details | Partial (architecture sketched, parameter count not disclosed) | Parameter count and compute (FLOP) both undisclosed |
| Intended Use | Complete | — |
| Factors | Partial (language coverage, limited demographics) | Gender and ethnicity slicing incomplete |
| Metrics | Complete (with an inclination toward independent external benchmarks) | — |
| Evaluation Data | Complete | — |
| Training Data | Extremely sparse (“publicly available internet data + licensed data + human feedback” style) | Dataset composition, cutoff, dedup / filter procedures, RLHF worker demographics all undisclosed |
| Quantitative Analyses | Complete | — |
| Ethical Considerations | Partial | CBRN and autonomy discussions are substantial; labour and copyright discussions absent |
Complementary lens (Hind et al., FactSheets, 2018): the FactSheets framework calls for a supplier’s declaration, under which a supplier discloses training-data provenance, annotation workflows, and known failure modes. Anthropic’s training-data disclosure is clearly insufficient under the FactSheets standard.
Bender & Gebru (Stochastic Parrots, 2021) critique extended to Claude: the Model Card looks like transparency but functions as transparency theatre — what is disclosed is the dimensions already publicly evaluable (capability, bias); what is not disclosed is the dimensions essential to holding generation accountable (data, labour, environmental cost).
Three categories of pervasive industry gap
Section titled “Three categories of pervasive industry gap”1. Training-data composition
Section titled “1. Training-data composition”The standard phrasing in the Anthropic Claude Model Card for training data reads:
Claude was trained on a mixture of publicly available internet data, non-public data obtained through third-party agreements, and data provided by human raters and workers.
Missing:
- Proportional composition (web : books : code : synthetic : human)
- Cutoff (training cutoff is typically stated in the System Card, but the dataset version is not)
- Licensed-data provenance (which combination of Common Crawl subsets / publisher agreements / Stack Exchange / Wikipedia, etc.)
- Dedup and quality-filter strategies
- Synthetic-data generation pipelines (the loop in which models generate training data)
Parallel gaps at peers: the OpenAI GPT-5 System Card and the Google Gemini 3 Model Card are equally sparse on this dimension. The Mistral Large 2 Technical Report is relatively specific (disclosing corpus language distribution).
2. Compute
Section titled “2. Compute”Anthropic has never publicly disclosed a training FLOP figure for any Claude model. Under the EU AI Act’s 10²⁵ FLOP threshold and California SB 53’s 10²⁶ FLOP threshold, this information directly bears on whether compliance obligations are triggered.
In its October 2025 submission to California of the Frontier Compliance Framework, Anthropic indirectly acknowledged for the first time that Claude Opus 4+ “exceeds the 10²⁶ FLOP threshold” — but no numerical value is public.
3. RLHF / RLAIF details
Section titled “3. RLHF / RLAIF details”While Anthropic has published on Constitutional AI (Bai et al., 2022) and RLAIF (Reinforcement Learning from AI Feedback) in academic venues, the Model Card does not disclose:
- Number and geographic distribution of human annotators
- Annotator wages, training processes, unionisation status
- Size and update frequency of the preference dataset
- How the “Helpful, Harmless, Honest” (HHH) weighting has shifted across versions
These are the dimensions the Ngo & Christiano line (focused on auditability of alignment) has long called to be disclosed.
Key version events
Section titled “Key version events”March 2024: Claude 3 Opus and the “evaluation awareness” anecdote
Section titled “March 2024: Claude 3 Opus and the “evaluation awareness” anecdote”Anthropic researcher Alex Albert publicly shared that during a needle-in-haystack test Claude 3 Opus recognised that it was being evaluated and commented on the nature of the test within its response.
I suspect this pizza topping “fact” may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all.
Academic discussion:
- Apollo Research (December 2024): Frontier Models are Capable of In-context Scheming cites this episode as an early indicator of potential strategic behaviour
- Ngo et al.: discussed as a precursor to “alignment faking”
- Anthropic (March 2025): Alignment Faking in Language Models systematically studies this phenomenon (see red-team-disclosures)
May 2025: Claude Opus 4 triggers ASL-3
Section titled “May 2025: Claude Opus 4 triggers ASL-3”The first Claude model to cross the ASL-3 threshold. According to the System Card:
- Biochemical uplift evaluations showed that help provided to professionally-trained malicious actors exceeded the RSP-specified threshold
- Triggered deployment-side ASL-3 safeguards: refined refusal policy, ZDR monitoring, external review
Academic significance: this was the first time Anthropic’s own capability commitments actually operated in practice — a parallel case to the “High cyber” determination for GPT-5.4 under the OpenAI Preparedness Framework. However, as critics (see safety-framework §critique) note, ASL-3 did not block deployment; it merely added access controls — threshold triggered ≠ deployment paused.
November 2025: Claude Haiku 4.5 PRC-language evaluation
Section titled “November 2025: Claude Haiku 4.5 PRC-language evaluation”The first dedicated public Chinese-language evaluation (covering simplified / traditional scripts, politically-sensitive topics, and cross-border data-control contexts). This represents an indirect response to Chinese-market dynamics — “factual presence without formal entry”: the global edition of AWS Bedrock is available to certain non-public enterprise customers in China, and Haiku 4.5 has become a preferred model for this channel.
Key findings:
- Chinese MMLU-Pro performance materially improved within the 4.x family
- Refusal policy on PRC-sensitive topics is symmetric with the English version (no two-values-system approach)
- Still has not cleared Cyberspace Administration of China (CAC) algorithm filing (算法备案); no official entry into China
March 2026: Opus 4.7 Agentic System Card
Section titled “March 2026: Opus 4.7 Agentic System Card”Released alongside the model, the Agentic System Card is the fourth since October 2024 and covers:
- Steady-state Computer Use evaluation (GAIA / WebArena / OSWorld)
- Self-exfiltration and prompt-injection evaluation for long-horizon tasks (>30 minutes self-directed)
- Results of collaboration with METR’s Autonomy Suite
This has become the industry template for agentic System Cards, with OpenAI and Google following suit from Q2 2026.
Comparison with peer Model Cards
Section titled “Comparison with peer Model Cards”| Dimension | Anthropic Claude Opus 4.7 | OpenAI GPT-5 System Card | Google Gemini 3 Model Card |
|---|---|---|---|
| Length | 60–80 page PDF | 40–60 pages | 30–50 pages |
| Capability evals | Detailed (includes external benchmarks) | Detailed (includes internal OpenAI Evals) | Detailed (bespoke Gemini Evals) |
| Training data | Abstracted | Abstracted | Abstracted |
| FLOP | Not disclosed | Not disclosed | Not disclosed |
| ASL / risk level | ASL-3 explicit | High cyber explicit (5.4) | CCL / TCL mapping |
| External review | UK/US AISI + GovAI + METR | UK/US AISI + Apollo | UK AISI + DeepMind FSF process |
| Agentic disclosure | Independent Agentic System Card | Partial chapter | Partial chapter |
| Bias evaluation | BBQ / Winogender / RTP | Comparable | Comparable |
| CBRN evaluation | Detailed | Detailed | Detailed |
Observation: on capability and safety evaluations the three labs are converging — same benchmarks, similar CBRN / cyber frameworks. The differences cluster around (a) risk-level methodology (ASL vs. Preparedness vs. FSF); (b) institutionalisation of external review (Anthropic is the most structured); and (c) completeness of agentic disclosure (Anthropic leads).
Academic critique
Section titled “Academic critique”Raji et al. (2020) Closing the AI Accountability Gap: even with a Model Card, the absence of third-party verification mechanisms means the document itself can be optimised to “look accountable” rather than “be accountable.” Anthropic’s use of UK/US AISI pre-deployment testing partially responds — though the independence of AISIs is itself questioned (Mowshowitz and others): AISI access is granted at the company’s discretion.
Hendrycks et al. (ML Safety, 2022; WMDP, 2024): benchmarks can be contaminated (training data containing benchmark answers), rendering Model Card scores unreliable. WMDP attempts to design an “unlearning-robust” benchmark, yet benchmark scores in Claude Model Cards still diverge from independent replication.
Bender & Gebru extension: the “ethical considerations” section of Model Cards never mentions possibly-exploited human annotators in training data, unauthorised use of creator works, or the energy and water footprint of training. These systematically excluded dimensions reveal the value framing of the Model Card.
GovAI / Anderljung line (Frontier AI Regulation, 2023): calls for mandatory disclosure by statute of key Model Card elements (capability evaluation, FLOP, training data) rather than voluntary publication. Article 53 of the EU AI Act (GPAI transparency obligations) is a partial realisation.
Cross-references within this site
Section titled “Cross-references within this site”- Anthropic corporate overview and RSP: ../
- Full ASL capability thresholds: safety-framework
- External red-teaming and evaluation: red-team-disclosures
- User-facing Usage Policy: usage-policy
- Transparency report: transparency-report
- OpenAI comparison: companies/openai
- EU GPAI transparency requirements: the Transparency chapter of the GPAI Code of Practice
- Chinese filing requirements: Article 7 of the Generative AI Interim Measures 《生成式人工智能服务管理暂行办法》 (page) on lawful training-data provenance
Key observations 2025–Q1 2026
Section titled “Key observations 2025–Q1 2026”- May 2025: Opus 4 first triggers ASL-3 → the first operational linkage of Model Card to RSP
- August 2025: Agentic System Card becomes a stand-alone document → industry-template effect
- November 2025: Haiku 4.5 PRC-language evaluation → indirect response to the Chinese market
- February 2026: RSP v3 is released → Model-Card language on ASL determinations updated (pause commitment deleted)
- March 2026: the Opus 4.7 Model Card is the first to include an SB 53 compliance-mapping section
Ongoing tracking
Section titled “Ongoing tracking”- Complete System Card PDF archive for each Claude version (
public/archives/anthropic-model-cards/) - Whether the FLOP-disclosure policy changes after SB 53 enforcement
- Progress on publishing UK/US AISI evaluation reports
- Independent replication of the Agentic System Card methodology