Skip to content

United States — Data and Training

RuleRelationship to training data
CCPA / CPRA (California)State privacy law; obligations on personal information in AI training data
COPPA (federal)Data of children under 13
Section 230Platform-liability boundary for user-generated content
NIST AI RMF + GenAI ProfileVoluntary data-governance practice

The United States still has no comprehensive federal privacy law. Federal touchpoints on training data:

  • COPPA: separate protection for children under 13.
  • HIPAA: medical data (directly relevant to medical-AI training data).
  • GLBA: financial information.
  • FERPA: education records.
  • FCRA / ECOA: data used in credit decisions.

Sector-specific + minor-specific, but no federal framework for general-purpose training data.

  • California (CCPA / CPRA): since 2020; automated-decision rights (California ADMT rules in force 2025).
  • Virginia, Colorado, Utah, Connecticut, Texas, Oregon, etc.: GDPR-style privacy laws.
  • Illinois BIPA: biometric-information law with strong constraints on facial training data.
  • Washington My Health My Data: extended health data.
  • Tennessee ELVIS Act: voice-cloning data.

By Apr 2026, 20+ states have passed comprehensive privacy laws, with varying detail.

Copyright on training data has been the most intense legal debate in US AI law from 2023 to 2026:

  • 2023 NYT v. OpenAI / Microsoft: a copyright training lawsuit, ongoing in 2026.
  • Bartz v. Anthropic / Kadrey v. Meta / Tremblay v. OpenAI: multiple parallel cases.
  • Andersen v. Stability AI: text-to-image copyright.
  • Thaler v. Perlmutter: authorship of AI-generated works.

Fair use is the distinctive US defence, and its “transformative use” standard is still being shaped as applied to LLM training.

1. The lawfulness of “publicly accessible” data

Section titled “1. The lawfulness of “publicly accessible” data”
  • US law is more permissive about scraping “publicly accessible” data than the EU or China.
  • hiQ Labs v. LinkedIn (2019 / 2022): interpretation of the Computer Fraud and Abuse Act.
  • But breach of TOS (terms of service) can still be actionable.
  • COPPA: separate consent for those under 13.
  • California, New York, and other states: extend to those under 18.
  • Risk when training data contains minors’ faces or voices: BIPA + COPPA stacking.
  • Illinois BIPA has produced a surge in facial-training-data litigation.
  • Clearview AI is the canonical case.
  • BIPA allows a private right of action, with up to $5,000 per violation.
  • No dedicated synthetic-data rule.
  • De-identification: HIPAA has clear standards (Safe Harbor / Expert Determination); other domains remain vague.
  • Federal layer: both China and the EU have comprehensive data-protection laws; the US does not.
  • Predictability: the US is lowest (state-law variation + unsettled fair-use litigation).
  • Enforcement: the US relies primarily on private litigation (BIPA, copyright, TCPA, etc.); China relies on CAC enforcement; the EU on DPAs.
  • Training-data summary: mandatory in the EU; absent in the US; absent in China (filing materials are not public).