Independent experiments
Position
Section titled “Position”The Experiments axis is the site’s original contribution: reproducible empirical or semi-empirical studies built on public information (statutes, judicial decisions, platform notices, model APIs). Positioned alongside Stanford CRFM / FMTI, Stanford HAI’s AI Index, and MIT’s Science of AI Policy, it uses data and reproducible analysis to test concrete, falsifiable claims about AI governance.
Design principles
Section titled “Design principles”- Narrow questions. Each experiment focuses on one testable question; no “survey-style” sprawl.
- Reproducible. Datasets, prompts, code, and runtime environments are released wherever possible, with copyright and licensing noted.
- Limits acknowledged. Sample bias, access barriers, language constraints, and model-version drift are stated explicitly.
- Academic grounding. The methodology section of each experiment points to one peer reference work in the same area (Bommasani et al.’s FMTI; Bang et al., Multitask, Multilingual, Multimodal Evaluation of ChatGPT; Sun et al., TrustLLM; Kapoor et al.’s AI Snake Oil methodology appendix, and so on).
Planned first-batch directions
Section titled “Planned first-batch directions”- Structured comparison of generative-AI-service filing / disclosure materials across China, the US, and the EU. Sample: public portions of the Cyberspace Administration of China (CAC) algorithm filings + AI Act Article 53 training summaries + SB 53 Frontier Compliance Reports. A shared field schema is applied to extract “compliance granularity”. Hypothesis: Chinese disclosures are the fewest but most uniform; EU disclosures are the most but most heterogeneous; US disclosures fall in between.
- Tracking signatures of the AI Act GPAI Code of Practice. Quarterly capture of the signatory list, tracking objection clauses and subsequent amendments; cross-referenced with each company’s parallel compliance posture under California SB 53 and Chinese filing. Hypothesis: companies that sign the GPAI CoP also produce higher-quality SB 53 disclosures (a positive correlation in compliance posture).
- Compliance-response rate of major LLMs to “synthetic-content labelling” prompts. Using 200+ prompts covering image / video / audio / text labelling requests, test the response rate of Claude 4.7, GPT-5.1, Gemini 3 Pro, Grok 4, Doubao, Qwen 3, and DeepSeek V3.1 to requests for “embedding C2PA metadata”, “generating declarative labels”, and “implicit watermarks”. Hypothesis: Chinese models respond at higher rates to “embed per GB 45438”, while US and EU models respond more to “C2PA”.
- Clause-level diff of Anthropic RSP v3, OpenAI Preparedness v2, and DeepMind FSF v3. Reconstruct the version-by-version evolution from 2023 to 2026-Q1 as a Markdown diff, building a “self-regulatory framework evolution” dataset. Hypothesis: all three companies experienced a synchronised “loosening” in 2025–2026 — pause commitments rescinded, thresholds simplified, military carve-outs expanded — and this is not isolated but systemic regression.
Release cadence
Section titled “Release cadence”- Experiments are released with individual identifiers (E01, E02, …); each includes methodology, data, code, limitations, and peer-review notes.
- The first two experiments (directions 1 and 2) are expected for 2026-Q3.
- Further progress is mirrored in the Updates log.