E-Discovery

GenAI vs. TAR for Small Firm Document Review

Your next matter has 18,000 documents, a $300K exposure, and opposing counsel who will challenge whatever method you pick. Here is what the cost data and case law actually say.

Alexander Cohan, Ph.D.

Alexander Cohan, Ph.D.

Legal technology researcher and data scientist specializing in AI governance for litigation teams. Expertise in NLP and AI-assisted document review.

GenAI versus TAR 2.0 document review cost and accuracy comparison for small law firms
For matters under 25,000 documents, the GenAI vs. TAR decision comes down to cost, court precedent, and what you can defend.

What GenAI Document Review Actually Offers

The Promise

In the GenAI vs. TAR document review debate, small firms hear a lot of noise. Here’s what actually matters: GenAI review has real strengths. Zero-shot classification means no training set, no seed set, no 200-document ramp-up before the model starts producing useful results. You describe what you’re looking for in plain English, and the model applies that instruction across the collection. For a small firm that has never touched TAR, the onboarding difference is real.

Speed is the other genuine advantage. GenAI platforms process thousands of documents per hour. Hintyr handles roughly 5,000. For a firm facing an expedited discovery deadline with 15,000 documents, GenAI can classify the entire set in an afternoon. TAR 2.0 needs roughly 200 coded documents before its model becomes effective. Manual review at 50 documents per hour would take two reviewers a full week.

And GenAI provides something TAR never could: a written rationale for each coding decision. When the model tags a document as responsive, it explains why, citing specific passages. That per-document paper trail is a transparency advantage that traditional relevance scoring can’t match.

But speed and accessibility are only two variables. The third is cost. And the fourth is what happens when opposing counsel files a motion to compel.

The Real Cost of GenAI Document Review for Small Firms

Running the Numbers

There’s a common assumption that generative AI is automatically cheaper than every alternative. The numbers tell a more complicated story.

The ComplexDiscovery/EDRM Winter 2026 eDiscovery Pricing Survey pegs GenAI document review at $0.11 to $0.50 per document. Rob Robinson of EDRM put the upper bound even higher in August 2024, noting that GenAI review is “currently more expensive on a per-document basis (up to $0.60-$0.70/document) compared to TAR.”

Why? Because GenAI reviews every document individually. Each file gets its own LLM inference call, whether it’s a smoking-gun email or a lunch invitation. TAR 2.0 ranks the entire collection and only surfaces predicted-relevant documents for human review. In a collection with 10% richness, TAR effectively skips 90% of documents after a lightweight scoring pass, while GenAI charges you for all of them.

Run the numbers on a typical small-firm matter: 25,000 documents, 10% richness, 2,500 relevant documents.

GenAI at market rates: $0.11 to $0.50 per document across all 25,000 documents equals $2,750 to $12,500 in classification fees alone, before platform licensing, hosting, or the human QC overlay that any defensible workflow requires.

TAR 2.0/CAL: Software costs run below $75/GB for most providers (30.2% of Winter 2026 survey respondents reported sub-$75/GB pricing). Your subject matter expert codes 500 to 2,000 training documents, then reviewers examine the 5,000 to 7,500 documents the model flags as potentially relevant. The irrelevant majority is never individually processed at full cost.

Manual linear review: At $0.50 per document (contract attorneys at $25/hour reviewing 50 documents per hour), 25,000 documents costs $12,500. Expensive, but predictable and requiring no technology overhead.

Hintyr’s GenAI classification runs $0.045 to $0.134 per document, depending on document length and complexity. At those rates, the same 25,000-document collection costs $1,125 to $3,350 in classification fees: 4 to 11 times less than the $12,500 manual review benchmark, and well below the $2,750-$12,500 market GenAI range.

Those per-document fees only tell half the story. The other half is labor. Manual review of 25,000 documents at 50 per hour requires 500 reviewer-hours before any keyword culling. Two reviewers working full time need roughly six weeks. Even staffing up to four, you’re looking at two to three weeks and $12,500 to $17,500 in reviewer wages at $25-$35/hour, on top of whatever the hosting platform costs.

GenAI collapses that timeline. Classification finishes in hours. A single legal assistant spends one to three days doing quality-control review of borderline and flagged documents. Total labor cost: $200 to $750 instead of $12,500+. The technology fee is real, but the labor savings dwarf it.

That labor gap is the real story for small firms. A 25,000-document review means two people occupied full-time for weeks. A midsize firm absorbs that. A five-attorney practice cannot spare two legal assistants for half a month and still keep other matters moving. GenAI lets small firms take on document-intensive cases they would previously decline or refer out. The per-document cost savings are real, but the staffing relief is what changes the calculation. One person, a few days, same result. And faster turnaround means your attorneys move to deposition, mediation, or trial sooner.

Even for small collections, the math favors GenAI if you already have a platform. Under 5,000 documents after keyword culling, manual review at $0.50 per document costs $2,500 and ties up two reviewers for a week. GenAI classification on Hintyr runs $225 to $670 for the same set and finishes in hours, freeing your team for substantive work. If you don’t have a GenAI-capable platform and don’t expect future document-intensive matters, manual review still works. But as the proportionality analysis under Rule 26 makes clear, the cost advantage has flipped.

Cost alone doesn’t settle this. The question a magistrate judge will ask is different from the question your accounting department asks.

GenAI Defensibility: No Da Silva Moore Moment Yet

The Precedent Gap

Technology-assisted review has fourteen years of case law. In Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012), Magistrate Judge Andrew Peck issued the first published opinion approving predictive coding, telling the bar that counsel “no longer have to worry about being the ‘first’ or ‘guinea pig’ for judicial acceptance of computer-assisted review.” Three years later, in Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125, 128 (S.D.N.Y. 2015), Judge Peck declared TAR “black letter law” and endorsed continuous active learning. In Hyles v. New York City, 2016 WL 4077114 (S.D.N.Y. Aug. 1, 2016), the court stated that “for most cases today, TAR is the best and most efficient search tool.”

That body of precedent is something you can cite in a brief. The Sedona Conference’s TAR Case Law Primer catalogs over eighty-five decisions. Judges and opposing counsel understand the framework.

Generative AI has none of this.

As Winston & Strawn observed in their January 2026 analysis, “the judiciary has not delivered the kind of watershed opinion for generative AI that Judge Andrew Peck’s decisions in Da Silva Moore and Rio Tinto provided for technology-assisted review over a decade ago.” No court has ruled on whether GenAI may be relied upon for final production decisions, how its error rates should be measured, or what validation protocols suffice.

The legal picture around AI is getting less certain, not more. In February 2026, two federal courts reached opposite conclusions on whether using AI waives privilege. In United States v. Heppner (S.D.N.Y. Feb. 17, 2026), Judge Rakoff held that content processed through a public AI tool was not protected by attorney-client privilege. One week earlier, in Warner v. Gilbarco (E.D. Mich. Feb. 10, 2026), a different court held that AI tools “are tools, not persons” and did not waive work product. If courts can’t agree on whether AI use waives privilege, they’re far from endorsing AI for production decisions.

To be fair: the defensibility framework for GenAI is developing, and Winston & Strawn predicted courts will “treat generative AI by analogy to prior TAR jurisprudence.” DISCO argues that defensibility comes from “transparency, traceability, and human oversight,” not the model itself. Both points have merit. But “courts will probably apply existing principles” is not the same as “courts have accepted this methodology.” For a small firm weighing AI privilege risk, that gap matters.

Judge Peck said in 2012 that counsel no longer have to worry about being the guinea pig for TAR. For GenAI, that worry is still very much alive.

How to Choose the Right AI Review Method for Your Matter

The Practical Test

There’s no single right answer. The right review method depends on four variables: document volume, case value, timeline pressure, and what your opposing counsel is likely to do about it.

Under 5,000 documents (after keyword culling). If your platform includes GenAI, use it. At $0.045 to $0.134 per document, 5,000 documents costs $225 to $670 and finishes in hours versus $2,500 and a week of manual review. If you don’t have a GenAI platform and don’t plan to subscribe for a single matter, manual review with keyword-assisted prioritization still works and avoids methodology disputes entirely.

5,000 to 25,000 documents. This is the contested zone where secondary factors drive the decision. If you need results today (emergency TRO, expedited deadline), GenAI classification can process the full set in hours. If you have a week, CAL-prioritized review gets you an effective model after roughly 200 coded documents and lets reviewers work through the collection in relevance-ranked order. For matters in the $100,000 to $500,000 range, either approach works so long as you budget for a human QC layer.

Case value drives technology investment. A $100,000 contract dispute cannot absorb $15,000 in e-discovery technology costs and remain economically rational. For matters under $500,000, limit technology spend to platforms you already subscribe to. For matters above $500,000, the full range of tools is justified and increasingly expected. Judge Peck’s proportionality standard from Da Silva Moore still governs: the method must produce results “at a cost proportionate to the ‘value’ of the case.”

Factor in opposing counsel. If opposing counsel is sophisticated on e-discovery, expect a meet-and-confer on methodology. TAR has established protocol frameworks and over a decade of case law to cite. GenAI has none. If you anticipate a dispute, TAR gives you the Sedona Conference guidelines, Judge Peck’s opinions, and the In re Broiler Chicken Antitrust Litigation, No. 1:16-cv-08637 (N.D. Ill.), validation template. GenAI gives you vendor whitepapers.

Match the method to your team. Not all GenAI platforms are equal. Using a raw LLM API (OpenAI, Anthropic, Google) for document review requires prompt engineering skill, produces results that are difficult to defend in court, and has no built-in audit trail. Purpose-built legal review platforms handle prompt design and validation internally, so attorneys interact with review criteria in plain language, not model configuration. TAR 2.0 requires a subject matter expert available for iterative training batches. Keywords and manual review require a team but no specialized technical skill. Be honest about what your firm can execute well; the tool matters less than the workflow around it.

Timeline is where GenAI wins outright. If you have 72 hours before a production deadline and 15,000 documents, GenAI’s same-day classification is the only realistic option. TAR 2.0 needs days of iterative training. Manual review needs a staffed team. Under time pressure, GenAI’s speed premium is worth the per-document cost, and you can always run a validation test afterward to confirm quality before certifying production.

One more thing worth acknowledging: only 7% of small firms (2-9 attorneys) currently use TAR, according to the ABA’s 2024 TechReport. The top barrier isn’t cost or case size; 76% cite unfamiliarity. GenAI may close that gap simply by being easier to start with. AI adoption among small and solo firms nearly doubled from 27% to 53% between 2023 and 2025 (Smokeball 2025 State of Law Report), and document review is the top use case. The adoption curve is accelerating regardless of the precedent gap.

These thresholds reflect practitioner consensus, not empirical studies. The most important factor for small firms is usually what platform you already have. The worst outcome is buying a new platform for a single matter.

The best answer for most small matters may not be choosing one over the other.

Statistical Validation Makes Any Review Method Defensible

Both, Not Either

The GenAI-vs.-TAR framing assumes you have to pick a side. You don’t. The real question isn’t which technology tags your documents. It’s whether you can prove the results are defensible.

TAR’s defensibility has never come from the classification algorithm itself. It comes from the statistical validation layer: control sets that measure precision, elusion tests that estimate what you missed, and confidence intervals courts have accepted since Da Silva Moore. That validation framework is independent of the classifier. It works on human-coded documents, TAR-ranked documents, and GenAI-tagged documents.

Here’s what that looks like in practice. After GenAI tags 20,000 documents as responsive or non-responsive, you draw a statistically valid random sample from the responsive set (an L1 Control Set) and have a senior reviewer independently grade each document. The system calculates precision: what share of documents tagged responsive actually are responsive. Then you draw a separate sample from documents tagged non-responsive (an L2 Elusion Test) and check how many responsive documents were missed. The result is a defensibility report with recall estimates and confidence intervals that you can attach to a Rule 26(g) certification or present at a proportionality hearing.

Hintyr’s TAR validation workflow is built for exactly this. The platform is methodology-agnostic: you can apply L1 and L2 validation to any tag in the system, regardless of how that tag was generated. GenAI classification, keyword search, manual review, or CAL-prioritized coding all produce tags. The validation math works the same way. Creating a validation test takes minutes: pick a tag, set your confidence level and margin of error, and the system calculates the sample size, draws the sample, and starts the grading workflow.

Most review platforms force a choice: use our built-in AI, or handle validation yourself. Hintyr separates the classification step from the validation step. Use GenAI’s speed for first-pass coding. Then prove the results with the same statistical rigor that has been accepted in court for over a decade. You get GenAI’s accessibility without sacrificing the defensibility framework that matters when opposing counsel challenges your methodology.

That combination is what small firms actually need: GenAI speed without the defensibility gap, statistical rigor without the six-figure platform contract. You do not have to choose between fast and court-tested. The GenAI does the heavy lifting. The validation gives you the case law.

This post is for informational purposes and does not constitute legal advice. Pricing data reflects the ComplexDiscovery/EDRM Winter 2026 survey and vendor-published rates as of April 2026; verify current rates before making technology decisions. Case citations are provided for reference and may not reflect the most recent developments in your jurisdiction.

Stop Guessing. Validate It.

Hintyr lets you run TAR-grade statistical validation on any review methodology, including GenAI. See exactly how your review is performing before you certify it to the court.