Filter by Category

How to Enforce California’s AI Training Data Rights (CPRA) When Your Photos Were Scraped Without Consent

California consumers can demand AI companies stop using and delete unlawfully “shared” personal information—including photos—under the CPRA’s opt-out and deletion rights, enforced by the California Privacy Protection Agency (CPPA) and the Attorney General. In Los Angeles, San Francisco, San Diego, and statewide, photographers and everyday users are discovering their images were scraped into AI training sets without consent. This article explains how to use CPRA requests, evidence preservation, and regulatory complaints to enforce California’s AI training data rights.

When your photos end up inside an AI training dataset, the harm is rarely limited to “my picture is online.” A scraped image can be replicated, stylized, used to infer identity, or linked to sensitive attributes—all without notice or compensation. In California, the California Privacy Rights Act (CPRA) gives consumers powerful tools to limit and unwind this kind of data use, even when the AI developer claims it obtained the data “from the internet.”

This article focuses on how to enforce CPRA rights when your photos were scraped without consent, what to ask for in a CPRA request, how to preserve evidence, how to escalate to regulators, and when to consider related claims (copyright, right of publicity, unfair competition, and biometric/privacy theories). It is informational and not legal advice for any particular matter.

1) Why CPRA can apply to scraped photos used for AI training

The CPRA (California Civil Code § 1798.100 et seq., as amended) regulates many businesses that collect and use “personal information” of California residents. Photos can qualify as personal information if they identify, relate to, describe, are reasonably capable of being associated with, or could reasonably be linked—directly or indirectly—with a particular consumer or household. A headshot, a family photo, a candid street photo with recognizable faces, metadata-laden images, and images tied to usernames or accounts can all fit.

Key CPRA concepts in the AI training context

Collection includes “receiving” data. A company that ingests images scraped by a crawler, acquired from a third-party dataset, or downloaded from a public website may still be “collecting” personal information.

“Sharing” can trigger opt-out rights. CPRA draws a distinction between “selling” and “sharing” personal information, with “sharing” tied to cross-context behavioral advertising. Not every AI training use equals “sharing,” but many AI ecosystems involve downstream disclosures to affiliates, model hosting partners, ad-tech, analytics, or dataset vendors. Your enforcement strategy should be built to test those pathways with targeted requests.

Sensitive personal information may be implicated. Images can reveal health conditions, religious practice, union participation, precise geolocation (through embedded metadata), or biometric identifiers when used for face recognition. “Biometric information” is included in sensitive personal information under CPRA; whether a company created or used biometric templates from your photos is a critical factual question.

Businesses must honor consumer rights and limit use. CPRA provides rights to know/access, delete, correct, and opt out of sale/sharing. It also imposes data minimization and purpose limitation principles—businesses should not collect, use, retain, or share personal information beyond what is reasonably necessary and proportionate for disclosed purposes.

2) Threshold issue: does the AI company qualify as a CPRA “business”?

CPRA applies to a “business” doing business in California that meets certain thresholds (e.g., annual gross revenue over a statutory amount, or buys/sells/shares personal information of 100,000+ consumers/households, among other triggers). Many AI developers and dataset vendors will meet these thresholds, but you should not assume. Your strategy should include:

Identifying the legal entity behind the model, dataset, API, or product (parent/subsidiary relationships matter).
Reviewing privacy policies and CPRA disclosures (look for a “California Privacy Notice,” “Do Not Sell or Share,” and methods to submit requests).
Mapping vendors and “service providers/contractors.” If the AI developer claims it is a service provider, it must have contractual restrictions and must process data only for specified business purposes. That claim can be tested through requests and follow-up questions.

3) Step 1 — Preserve evidence your photos were scraped

Before sending demands, preserve proof. Scraping disputes often devolve into “we never had your data” or “we can’t locate it.” Evidence preservation improves the odds of a meaningful response and future enforcement.

Practical evidence checklist

Source location: URL(s) where the photos were posted (profile pages, portfolio sites, social media posts, image CDNs).
Timestamps: When you posted the image; when you discovered AI use; keep screenshots with visible time/date where possible.
Copies and hashes: Save the original files and compute file hashes (MD5/SHA-256) to show identity.
Metadata: Preserve EXIF data if present (camera, location, date). Don’t overwrite originals.
AI outputs: If an AI tool generates images resembling your photo, capture prompts, outputs, and session logs.
Dataset references: If you found your image in a dataset index or LAION-style URL list, save the dataset entry and any associated identifiers.
Correspondence: Save emails, support tickets, and any prior DMCA takedowns.

Attorneys often send a litigation hold letter early, even before filing anything, particularly when there is a risk of routine deletion of training logs, ingestion pipelines, or vendor records.

4) Step 2 — Use CPRA to demand access/knowledge, deletion, and opt-out

CPRA requests are the core enforcement lever. The goal is to force the company to (1) confirm whether it has your images or derived data, (2) disclose sources and disclosures, and (3) delete and stop certain uses where required.

A. Right to know/access: what to request in an AI photo-scrape case

In addition to a standard “categories and specific pieces” request, tailor your ask to AI training realities:

Specific pieces of personal information: copies of your photos in their possession, and any associated metadata (URLs, timestamps, labels, captions, embeddings/feature vectors tied to you, faceprints/biometric templates if any).
Sources: the dataset name(s), vendors, scraping sources, and URLs where collected.
Purposes: training, fine-tuning, evaluation, safety filtering, marketing, product personalization, or other uses.
Disclosures: third parties or categories of recipients (model hosting platforms, affiliates, dataset vendors, ad-tech, analytics).
Retention: how long training data, embeddings, and logs are retained; whether deletion is propagated to backups and derived artifacts.
Automated decision-making: whether your images are used in profiling or automated decision-making technology (ADMT), and what logic/impacts are involved, if applicable under current regulations and guidance.

B. Deletion: be explicit about derivatives

A deletion request should be drafted to reach not just the raw image files, but also “derived data” and identifiers that can continue to affect you:

Delete copies of the photo(s) and associated metadata.
Delete or disassociate embeddings, feature vectors, labels, captions, and identity links tied to the photo(s) or to you.
Delete training records that associate your identifiers (name, handle, email, account ID) with the images.
Instruct contractors/service providers to delete downstream copies.

Expect the company to argue that deleting from a trained model is not feasible or that the request is subject to exceptions. That is precisely why the request should target all stored instances and derived datasets, and require a description of technical steps taken to comply.

C. Opt out of sale/sharing (and limit use of sensitive PI)

Even if the company disputes that “training” is a sale/sharing, your opt-out request creates a compliance record and forces the business to position itself. Where images may be used for advertising-related purposes, opt-out rights become especially relevant.

If facial recognition, face clustering, or identity inference is involved, consider a request to limit the use and disclosure of sensitive personal information. Ask the company to confirm whether it created or used biometric identifiers or biometric information derived from your photos.

D. Identity verification and authorized agents

Companies can require verification before producing “specific pieces” of information. For photographers and creators, using an attorney as an authorized agent can streamline communications, set deadlines, and prevent scope creep. Keep verification proportional—do not provide unnecessary documents that increase identity theft risk.

5) Step 3 — Challenge common denials and “we can’t” responses

AI companies often respond with variations of:

“We don’t have your data.” Counter with file hashes, URLs, dataset entries, and screenshots; ask them to search by URL, hash, and any dataset identifiers.
“We scraped only publicly available content.” Public availability is not a blanket CPRA exemption. Ask for the legal basis, notice provided, and whether they honor opt-out signals and deletion requests.
“We can’t delete from a trained model.” Deletion obligations often focus on personal information in the business’s possession and data stores. Even if model “unlearning” is disputed, insist on deletion from datasets, logs, and derived tables, and on preventing re-ingestion.
“We’re a service provider.” Ask who the business is, what instructions they received, and whether the contract prohibits using the data to build or improve their own products.

A useful follow-up is to ask for a sworn declaration (or at least a detailed written explanation) describing search methods, data stores queried, and deletion propagation steps