United States — Data and Training

Relevant rules

Rule	Relationship to training data
CCPA / CPRA (California)	State privacy law; obligations on personal information in AI training data
COPPA (federal)	Data of children under 13
Section 230	Platform-liability boundary for user-generated content
NIST AI RMF + GenAI Profile	Voluntary data-governance practice

The United States still has no comprehensive federal privacy law. Federal touchpoints on training data:

Sector-specific + minor-specific, but no federal framework for general-purpose training data.

California (CCPA / CPRA): since 2020; automated-decision rights (California ADMT rules in force 2025).
Virginia, Colorado, Utah, Connecticut, Texas, Oregon, etc.: GDPR-style privacy laws.
Illinois BIPA: biometric-information law with strong constraints on facial training data.
Washington My Health My Data: extended health data.
Tennessee ELVIS Act: voice-cloning data.

By Apr 2026, 20+ states have passed comprehensive privacy laws, with varying detail.

2023 NYT v. OpenAI / Microsoft: a copyright training lawsuit, ongoing in 2026.
Bartz v. Anthropic / Kadrey v. Meta / Tremblay v. OpenAI: multiple parallel cases.
Andersen v. Stability AI: text-to-image copyright.
Thaler v. Perlmutter: authorship of AI-generated works.

Fair use is the distinctive US defence, and its “transformative use” standard is still being shaped as applied to LLM training.

US law is more permissive about scraping “publicly accessible” data than the EU or China.
hiQ Labs v. LinkedIn (2019 / 2022): interpretation of the Computer Fraud and Abuse Act.
But breach of TOS (terms of service) can still be actionable.

COPPA: separate consent for those under 13.
California, New York, and other states: extend to those under 18.
Risk when training data contains minors’ faces or voices: BIPA + COPPA stacking.

No dedicated synthetic-data rule.
De-identification: HIPAA has clear standards (Safe Harbor / Expert Determination); other domains remain vague.

Federal layer: both China and the EU have comprehensive data-protection laws; the US does not.
Predictability: the US is lowest (state-law variation + unsettled fair-use litigation).
Enforcement: the US relies primarily on private litigation (BIPA, copyright, TCPA, etc.); China relies on CAC enforcement; the EU on DPAs.
Training-data summary: mandatory in the EU; absent in the US; absent in China (filing materials are not public).