Introduction: Discovery in the Age of AI#
Discovery in AI litigation presents challenges unlike any the legal system has previously faced. Traditional e-discovery concerns, email preservation, document production, metadata integrity, seem quaint compared to the complexities of preserving a machine learning model, obtaining training data that may encompass billions of data points, or compelling production of algorithms that companies claim as their most valuable trade secrets.
Yet effective discovery is the key to successful AI litigation. Plaintiffs cannot prove that an AI system was defectively designed without access to design documentation. They cannot establish bias without analyzing training data. They cannot demonstrate negligent deployment without operational logs. Defendants, meanwhile, need discovery to understand what plaintiffs actually allege and to develop evidence supporting their defenses.
This guide provides practical strategies for both sides of AI litigation, covering evidence preservation, discovery requests, trade secret challenges, and expert discovery.
Preserving AI Evidence#
The Litigation Hold Imperative#
Upon anticipation of litigation, parties have a duty to preserve relevant evidence. In AI cases, this obligation extends far beyond traditional documents to encompass the AI system itself and all associated data.
Triggering Events for AI Litigation Holds:
- Complaint or regulatory investigation concerning AI system
- Internal discovery of AI malfunction or harm
- Customer complaints alleging AI-caused injury
- Adverse media coverage of AI practices
- Receipt of litigation hold notice from potential adversary
Unique Challenges:
Unlike static documents, AI systems are dynamic, continuously learning, being updated, and potentially being retrained. A model that exists today may be fundamentally different tomorrow. This creates preservation obligations that traditional e-discovery frameworks struggle to address.
What to Preserve: The AI Evidence Inventory#
Effective AI preservation requires identifying and securing multiple categories of evidence:
1. The AI Model Itself
The trained model, the actual software artifact that makes predictions or decisions, is primary evidence:
- Model architecture (code defining the neural network structure or algorithm)
- Trained weights and parameters
- Version history showing model evolution
- Deployment configurations
Preservation Method: Create a complete copy of the model in its deployed state. Document the exact version, including any version numbers, commit hashes, or deployment timestamps. Store in a forensically sound manner.
2. Training Data
The data used to train the model is often central to AI disputes, particularly bias claims:
- Raw training data
- Data preprocessing and transformation records
- Data labeling and annotation (human or automated)
- Data quality assessments
- Data sourcing documentation
Preservation Challenges: Training datasets can be enormous, terabytes or petabytes of images, text, or transactions. Preservation may require significant storage resources. Document chain of custody carefully.
3. Testing and Validation Data
Separate from training data, testing data is used to evaluate model performance:
- Test datasets and their composition
- Validation methodology documentation
- Benchmark results
- Bias testing results
- Performance metrics across subpopulations
4. Operational Logs and Telemetry
Once deployed, AI systems generate operational data:
- Input data received by the system
- Predictions or decisions made
- Confidence scores or uncertainty metrics
- Error logs and exceptions
- User feedback and corrections
- A/B testing data
Retention Challenge: Many organizations purge operational logs after short periods. Immediate preservation is critical.
5. Development Documentation
The human decisions behind AI development are highly relevant:
- Design documents and specifications
- Model selection rationale
- Training decisions and hyperparameter choices
- Internal communications about model performance
- Risk assessments and ethical reviews
- Deployment approval documentation
6. Governance and Compliance Records
Evidence of AI oversight (or lack thereof):
- AI governance policies and procedures
- Model risk management documentation
- Regulatory submissions and approvals
- Audit reports and findings
- Incident reports and remediation
7. Human-in-the-Loop Evidence
For systems involving human oversight:
- Human review rates and outcomes
- Override documentation
- Escalation procedures and records
- Training materials for human reviewers
Preservation Best Practices#
Issue Targeted Holds
The litigation hold notice should specifically address AI-related evidence categories:
Sample Language: “You must preserve all data, models, code, documentation, and communications relating to [AI System Name], including but not limited to: the trained model and all prior versions; training, validation, and test datasets; model architecture and code; operational logs and telemetry; design and development documentation; and internal communications regarding system performance, errors, or bias.”
Engage Technical Personnel
Litigation hold implementation requires technical expertise:
- Involve ML engineers and data scientists
- Identify where AI artifacts are stored (cloud, on-premises, version control)
- Understand CI/CD pipelines that might overwrite evidence
- Document preservation procedures taken
Prevent Automatic Deletion
AI systems often have automated data lifecycle management:
- Suspend routine log rotation and purging
- Disable automatic model retraining
- Preserve cloud resources scheduled for deletion
- Document all deletion prevention measures
Create Forensic Copies
For critical AI artifacts:
- Use forensically sound copying methods
- Calculate and document hash values
- Maintain chain of custody records
- Store copies in secure, write-protected environments
Document the Dynamic System
If the AI system cannot be frozen entirely:
- Document the state at time of preservation
- Implement logging of all subsequent changes
- Preserve both pre-litigation and ongoing versions
- Maintain clear timestamps and version tracking
Crafting Discovery Requests for AI Systems#
Interrogatories#
Interrogatories in AI litigation should seek foundational information about the AI system:
System Identification and Overview:
Identify each AI system that was involved in the decision(s) at issue in this litigation, including the system’s name, version, and date of deployment.
For each AI system identified, describe: (a) the system’s purpose and intended use; (b) the type of AI technology employed (machine learning, deep learning, rule-based, etc.); (c) the specific algorithm or model architecture used; and (d) whether the system was developed in-house, purchased from a vendor, or obtained through open source.
Identify all individuals involved in the design, development, training, testing, deployment, and ongoing monitoring of each AI system at issue.
Training and Data:
Describe the training data used for each AI system, including: (a) the source(s) of the training data; (b) the approximate quantity of data; (c) the time period covered by the data; (d) any data preprocessing, cleaning, or transformation performed; and (e) any known limitations or biases in the training data.
State whether any protected characteristics (race, gender, age, disability, etc.) were included in the training data, directly or through proxies, and if so, describe how such data was used.
Testing and Validation:
Describe all testing conducted on each AI system before deployment, including: (a) the methodology used; (b) the metrics evaluated; (c) the results obtained; and (d) any deficiencies identified.
State whether bias testing or disparate impact analysis was conducted, and if so, describe the methodology and results.
Operation and Monitoring:
Describe the role of human oversight in the AI system’s operation, including whether and when humans review, override, or approve AI decisions.
Identify all known errors, malfunctions, or complaints involving each AI system, and describe any remedial measures taken.
Requests for Production#
Document requests should target specific AI artifacts:
Models and Code:
All versions of the AI model(s) at issue, including source code, trained weights, and deployment configurations.
All documentation describing the architecture, algorithms, or logic of the AI system(s) at issue.
All version control histories, commit logs, or change documentation for the AI system(s) at issue.
Data:
Representative samples of training data used to develop the AI system(s) at issue, sufficient to evaluate data quality and potential bias.
All testing data used to validate the AI system(s) at issue.
All operational logs showing inputs to and outputs from the AI system(s) at issue for the time period relevant to this litigation.
Development Documentation:
All design documents, specifications, or requirements for the AI system(s) at issue.
All documents concerning the selection of algorithms, model architectures, or training approaches for the AI system(s) at issue.
All documents concerning known or potential risks, limitations, or biases of the AI system(s) at issue.
Testing and Performance:
All documents concerning testing, validation, or quality assurance of the AI system(s) at issue.
All documents concerning the accuracy, error rate, or performance of the AI system(s) at issue.
All bias audits, disparate impact analyses, or fairness assessments of the AI system(s) at issue.
Governance and Compliance:
All policies, procedures, or guidelines concerning the development, deployment, or monitoring of AI systems.
All AI governance, ethics review, or risk management documentation for the AI system(s) at issue.
All regulatory submissions, certifications, or approvals for the AI system(s) at issue.
Incidents and Complaints:
All documents concerning complaints, incidents, or errors involving the AI system(s) at issue.
All internal communications (emails, chat logs, etc.) concerning problems, concerns, or failures of the AI system(s) at issue.
Requests for Admission#
Requests for admission can establish foundational facts efficiently:
Admit that [Company] developed the AI system known as [System Name].
Admit that [System Name] was used to [make the decision at issue] on [date].
Admit that [Company] did not conduct bias testing on [System Name] before deployment.
Admit that [System Name] was trained on data that included [specific data type].
Admit that no human reviewed the AI’s recommendation before it was [acted upon/communicated to plaintiff].
Deposition Topics (30(b)(6))#
For corporate defendants, Rule 30(b)(6) depositions should cover:
The development history and technical architecture of the AI system(s) at issue.
The collection, curation, and use of training data for the AI system(s) at issue.
Testing and validation procedures for the AI system(s) at issue.
Known limitations, errors, or biases in the AI system(s) at issue.
Human oversight and review procedures for AI-generated decisions.
AI governance policies and practices.
Complaints, incidents, or errors involving the AI system(s) at issue.
Post-deployment monitoring and updates to the AI system(s) at issue.
Handling Proprietary Algorithms and Trade Secrets#
The Trade Secret Obstacle#
AI developers routinely resist discovery by claiming trade secret protection. Their arguments typically include:
- Model architecture represents proprietary innovation
- Training data includes competitively sensitive information
- Performance metrics reveal competitive advantages
- Production would enable reverse engineering
These claims are not frivolous:AI systems can represent billions of dollars in development investment. But they cannot categorically defeat legitimate discovery needs.
Legal Framework#
Trade Secret Privilege
Unlike attorney-client privilege, trade secret protection is not absolute. Courts balance the discovery need against potential harm from disclosure.
Factors courts consider:
- Relevance and importance of the requested information
- Availability of the information from other sources
- Safeguards that could protect against disclosure
- Competitive harm from disclosure
- Party’s willingness to offer alternatives
Protective Orders
The primary mechanism for balancing these interests is the protective order under Rule 26(c).
Protective order provisions for AI discovery:
- Confidentiality tiers: “Confidential” and “Highly Confidential:Attorneys’ Eyes Only” designations
- Expert access: Provisions for technical experts to review AI systems
- Source code protocols: Special procedures for reviewing source code
- No reverse engineering: Prohibitions on using produced information to recreate the AI system
- Limited retention: Return or destruction of materials after litigation
- Use limitations: Restrictions on using produced information for any purpose other than the litigation
Strategies for Plaintiffs#
Establish Relevance and Necessity
Before seeking AI discovery, build the record:
- Document why the AI system is central to claims
- Explain why general descriptions are insufficient
- Identify specific questions that only AI discovery can answer
- Show that less intrusive alternatives have been exhausted
Propose Protective Measures
Pre-empt objections by proposing safeguards:
- Offer to use a neutral expert who will sign confidentiality agreements
- Propose source code review protocols (secure room, no copies)
- Accept redaction of truly irrelevant proprietary elements
- Agree to return/destroy materials after litigation
Challenge Overbroad Claims
Trade secret protection has limits:
- Demand specificity about what exactly is claimed as trade secret
- Challenge blanket designations covering clearly non-secret materials
- Point out that information already in public domain cannot be trade secret
- Note that defendant has waived protection by disclosing to customers or regulators
Seek Alternative Access
If direct production is blocked, consider alternatives:
- Interrogatories requiring description of AI operation
- Inspection of AI system in controlled environment
- Demonstration of AI system processing sample inputs
- Production of outputs/logs without underlying model
Strategies for Defendants#
Document Trade Secret Status
Before litigation, establish trade secret protection:
- Maintain confidentiality (access controls, NDAs, etc.)
- Document trade secret value and competitive sensitivity
- Ensure consistent treatment across the organization
Object Specifically
Generic “trade secret” objections are disfavored. Instead:
- Identify exactly what is claimed as trade secret
- Explain why each element qualifies
- Describe the harm that would result from disclosure
- Propose specific protective measures
Offer Alternatives
Courts appreciate good faith efforts to accommodate discovery:
- Provide high-level descriptions or diagrams
- Allow expert inspection under controlled conditions
- Produce related documentation without revealing core secrets
- Create demonstrative summaries of AI operation
Negotiate Protective Orders
Work with opposing counsel to develop protective measures:
- Limit access to outside counsel and designated experts
- Use secure review rooms for sensitive materials
- Prohibit copies of source code or model weights
- Include clawback provisions for inadvertent disclosure
Source Code Review Protocols#
For particularly sensitive AI code, courts may order special procedures:
Secure Review Environment:
- Inspection in person at counsel’s office or neutral site
- Air-gapped computer with no network access
- No copying, photographing, or screen capture capability
- Limited note-taking (handwritten notes subject to review)
- Supervised by defendant’s counsel
Expert Qualifications:
- Expert must sign confidentiality agreement
- Expert may be barred from working for competitors for specified period
- Expert’s notes and opinions remain confidential
- Expert may prepare sanitized summary for use in proceedings
Time Limitations:
- Specified number of hours for review
- Extensions require showing of good cause
- No fishing expeditions, targeted review only
Expert Discovery#
Disclosures Under Rule 26(a)(2)#
AI expert disclosures require particular attention:
Expert Report Contents:
Under Rule 26(a)(2)(B), expert reports must include:
- Complete statement of opinions and basis/reasons
- Data or information considered
- Exhibits to be used
- Qualifications (including publications)
- Cases testified in (prior 4 years)
- Compensation
AI-Specific Considerations:
- Ensure expert clearly articulates methodology for evaluating AI systems
- Document all AI artifacts and data reviewed
- Identify any technical tools or analyses performed
- Address known limitations in available evidence
Deposing AI Experts#
Deposing opposing AI experts requires technical preparation:
Qualification Probing:
- Specific experience with the type of AI at issue
- Understanding of relevant technical standards
- Publication and peer review history
- Potential bias from consulting relationships
Methodology Examination:
- What methodology did you use?
- What is the scientific basis for that methodology?
- How was that methodology validated?
- What are the known error rates or limitations?
Opinion Testing:
- What facts did you assume?
- Did you verify those assumptions?
- What alternative explanations did you consider?
- What would change your opinion?
Daubert Preparation:
- Is this methodology generally accepted?
- Has it been peer-reviewed?
- Can it be tested?
- What is the error rate?
Expert Access to AI Systems#
Experts often need hands-on access to AI systems:
Types of Access:
- Static review: Examination of code, documentation, and data
- Black box testing: Running inputs through the system and observing outputs
- White box analysis: Full access to model internals
- Forensic analysis: Technical examination of system artifacts
Negotiating Access:
- Specify scope of access (what systems, what data)
- Define permitted activities (observation only vs. active testing)
- Establish supervision requirements
- Address confidentiality obligations
Common Disputes:
- Defendant refuses any expert access → motion to compel
- Defendant allows only limited observation → argue necessity of fuller access
- Defendant demands excessive supervision → seek reasonable protocols
- Expert needs to copy data for analysis → address in protective order
Special Discovery Issues#
Third-Party AI Vendors#
When the AI system was provided by a third party:
- Subpoena the vendor for documents and testimony
- Seek discovery from defendant about vendor selection and contracts
- Determine whether vendor is indispensable party
- Address indemnification and finger-pointing issues
Multi-Jurisdictional Discovery#
AI systems often involve data and operations across jurisdictions:
- GDPR and international data protection may restrict certain discovery
- Consider Hague Convention procedures for foreign evidence
- Address choice of law for privilege claims
- Navigate cross-border data transfer restrictions
Regulatory Files#
If regulators have investigated the AI system:
- FOIA/state public records requests for non-exempt materials
- Subpoena regulatory agency for testimony
- Seek discovery of party’s regulatory submissions
- Address regulatory privilege claims
Preservation Sanctions#
Failure to preserve AI evidence can result in sanctions:
Spoliation Analysis for AI:
- Duty to preserve existed (anticipated litigation)
- Relevant AI evidence was destroyed or altered
- Destruction was with culpable state of mind
- Lost evidence was relevant to claims/defenses
Common AI Spoliation Scenarios:
- Model retrained, destroying prior version
- Training data deleted per retention policy
- Operational logs purged before preservation
- Cloud resources terminated
Potential Sanctions:
- Adverse inference instruction
- Exclusion of evidence
- Issue preclusion
- Monetary sanctions
- Default judgment (extreme cases)
Practical Checklists#
Plaintiff’s AI Discovery Checklist#
- Identify all AI systems potentially involved in claims
- Issue litigation hold to preserve plaintiff’s own AI-related evidence
- Research defendant’s AI technology (public information, patents, publications)
- Retain technical consultant to assist with discovery planning
- Draft targeted interrogatories covering system overview
- Draft document requests for models, data, and documentation
- Draft 30(b)(6) notice with AI-specific topics
- Prepare for trade secret objections with proposed protective order
- Plan expert discovery strategy
Defendant’s AI Preservation Checklist#
- Issue immediate litigation hold to all relevant personnel
- Inventory all AI systems potentially relevant to claims
- Preserve current model versions and all prior versions
- Suspend automatic data deletion for training data and logs
- Document AI system architecture and operation
- Identify and preserve all development documentation
- Collect and preserve internal communications about AI system
- Engage forensic vendor if needed for preservation
- Document all preservation steps taken
Protective Order Checklist#
- Define confidentiality tiers
- Specify who may access each tier
- Address expert access to confidential materials
- Include source code review protocols if needed
- Prohibit reverse engineering or competitive use
- Require return/destruction at conclusion
- Include clawback provisions
- Address inadvertent disclosure
- Specify challenge procedures for designations
Frequently Asked Questions#
How long do we need to preserve AI models and training data?#
Preserve until litigation concludes, including appeals. Some organizations retain AI artifacts longer for regulatory or business reasons. When in doubt, preserve, destruction after litigation hold triggers can result in severe sanctions.
Can we require defendants to “run” their AI system on our test cases?#
Possibly. Courts have ordered parties to demonstrate AI systems in controlled settings. This can be valuable for understanding system behavior without requiring full source code access. Frame it as an inspection under Rule 34.
What if the AI system has been updated since the events at issue?#
This is common and doesn’t defeat discovery. Seek both the current version and historical versions from the relevant time period. Version control systems and deployment logs can help reconstruct historical states.
How do we handle training data that includes third-party personal information?#
Protective orders can address privacy concerns. Techniques include anonymization, sampling, and limiting access to designated experts. GDPR and state privacy laws may impose additional requirements.
Can defendants refuse to produce AI systems because production would enable competitors to copy their technology?#
No categorical refusal is permitted, but legitimate trade secret concerns deserve protection. Propose tiered access, expert-only review, or alternative forms of discovery. Courts will balance discovery need against competitive harm.
What constitutes adequate preservation of an AI model?#
Preserve the complete model, code, weights, configurations, in a form that could be redeployed if needed. Document the technical environment. For complex systems, engage technical experts to ensure preservation is complete.
How do we authenticate AI-generated evidence?#
Foundation testimony from a witness with knowledge of the AI system’s operation. This typically means technical personnel who can testify about system reliability, proper functioning, and accuracy of the specific output.
Can we use our own AI tools for e-discovery in AI litigation?#
Yes, with appropriate validation. Document the AI tools used, their accuracy rates, and quality control measures. Be prepared for challenges to AI-assisted review methodology. Human review of key documents remains advisable.
Conclusion#
Discovery in AI litigation requires adaptation of traditional e-discovery practices to address novel evidence types and unique challenges. The AI system itself, models, training data, operational logs, constitutes evidence that must be preserved and can be discovered. Trade secret claims, while legitimate, cannot defeat discovery entirely when appropriate protections are in place.
Successful AI litigation depends on mastering these discovery tools. Plaintiffs must craft targeted requests that will yield meaningful evidence while anticipating objections. Defendants must implement robust preservation practices while protecting legitimate confidentiality interests. Both sides need technical expertise to handle AI evidence effectively.
As AI litigation matures, discovery practices will continue to evolve. Attorneys who understand both the technology and the legal framework for accessing it will be best positioned to serve their clients.
This resource is updated regularly as AI discovery practices evolve. Last updated: January 2025.