Email thread deduplication

Last updated: 2026-05-16

Email thread deduplication collapses redundant copies of the same conversation so you review the full thread once instead of reading the same exchange across five custodian mailboxes. Hintyr keeps the inclusive email (the most complete copy of the thread) and hides the earlier duplicate replies from review. The hidden records stay in the case database so the audit trail remains complete. Run a preview first, then Apply with a confirmation checkbox. Apply is blocked on cases under legal hold.

Deduplicate dialog: Emails tab

Pick a policy, scope, and BCC mode, then preview the run.

Deduplicate
Policy
Scope
BCC handling

Inclusive emails, explained

An inclusiveemail is the most complete copy of a thread. It contains the latest reply at the top with all earlier messages quoted underneath. If you read the inclusive copy, you've read the conversation.

Worked example. A four-reply chain between Robin, Jordan, and Casey produces four separate messages in each person's mailbox. After collection that's 12 emails for one conversation. The fourth message (Robin's reply quoting everything above it) is the inclusive copy. Email dedup keeps that one visible. The other 11 stay in the case record but drop out of the default review surface. Same coverage, less reading. Inclusive-email logic is the approach the EDRM Processing Standards describe for thread suppression.

How to deduplicate email threads

  1. Open the case menu from the top navigation and click Deduplicate.
  2. Switch to the Emails tab.
  3. Pick a policy, scope, and BCC mode. Defaults work for the common case: Last reply plus unique, Global, Default (ignore BCC). For the worked examples and trade-offs see the email dedup strategy page.
  4. Click Preview. Hintyr returns a summary card with threads analyzed, inclusive count, hidden count, and reduction ratio. Sample threads list the hidden-copy count per thread.
  5. Read the permanence warning, check I understand this cannot be undone, and click Apply. The dialog reports how many copies were hidden across how many threads.

Emails tab: after preview

Summary card, sample threads, and the permanence warning.

Deduplicate
Preview results
Threads analyzed
412
Inclusive
391
Hidden
856
Reduction
69%
Sample threads in this preview

Re: Contract review draft 3

5 hidden
7 messages, 2 inclusive. Robin Vasquez, Jordan Hsu, Casey Lin. Mar 12-15, 2025.

FW: Quarterly numbers

3 hidden
4 messages, 1 inclusive. Morgan Patel, Casey Lin. Apr 02-04, 2025.

Re: Meeting prep, Smith deposition

3 hidden
5 messages, 2 inclusive. Eli Brennan, Robin Vasquez, Sam Rivera. Apr 22-25, 2025.

Options and fields

Action

Three values:

  • Preview (default): compute the result without changing anything. Use this to sanity-check the reduction ratio before you commit.
  • Apply: commit the change. Duplicate copies are hidden from review after a confirmation checkbox.
  • Report only: same compute as Preview, but the numbers are logged for reporting. No copies are hidden. Use this when you want a record of the would-be reduction without acting on it (for example, a billing or proportionality report).

Policy, Scope, BCC handling

These three controls shape which copies count as duplicates. For worked examples and trade-offs see the email dedup strategy page. Quick summary: Last reply plus unique catches earlier messages that have unique attachments or unique content; Last reply only keeps just the latest reply. Global dedups across the case; Per custodian dedups within each custodian. Default (ignore BCC) collapses sender and BCC copies; Strict (include BCC) treats them as distinct messages.

Permanence warning banner

The banner appears under the preview summary. It reads: This action hides duplicate copies from review. The records stay in the case database. You can't reverse this from the review interface. That last bit is important. Hidden copies are removed from the review surface, but their record is retained, so you never lose evidence. This is review-side suppression, not destruction. The full record (custodian information, original storage paths, dates received) stays available for production-side reconciliation.

Confirm-irreversible checkbox

After the preview lands, an I understand this cannot be undone checkbox appears next to the Apply button. Apply stays disabled until the box is checked. Two-step confirmation prevents a misclick during a long review session.

What gets hidden, what stays

Hidden copies are removed from the review surface but their record is retained, so you never lose evidence. This is review-side suppression, not destruction. On the inclusive copy's record the system stamps the custodian list, original paths, and dates received for every subsumed copy so the underlying provenance survives. Hidden copies remain queryable for production-side audit, and they reappear in production exports when the protocol calls for full inclusion.

Hintyr identifies the most complete version

Hintyr automatically identifies the most complete version of each thread so you see the full conversation in one place. The algorithm reconstructs the reply tree from headers and quoted text, then picks the inclusive copy. If a mid-thread reply has a unique attachment or a one-off comment that isn't quoted in any later message, the default policy keeps that one too. You don't lose evidence to the dedup pass.

When the case is on legal hold

Preview is read-only and allowed during a hold. Running deduplication is blocked until the hold is released. The banner in the dialog spells out why and points to the hold details. Release the hold first, then re-open the dialog.

Confirm and legal-hold-blocked states

The confirmation checkbox gates Apply; legal hold blocks Apply outright.

Confirm state
Deduplicate

You're about to hide 856 duplicate copies across 412 threads. Confirm to proceed.


Case on legal hold
Deduplicate

Preview results stay visible while the hold is in effect. Apply is disabled.


Edge cases and limits

  • Apply can't be reversed from the review interface. Run Preview first.
  • Only emails are subject to thread dedup. Other file types use the Files tab.
  • You can run email dedup repeatedly. Subsequent runs only act on new threads introduced by later uploads.
  • Forwarded threads with edited quoted text are still detected. The algorithm tolerates whitespace and quoting drift.
  • Hidden copies still appear in production exports when the protocol calls for inclusion of the full set. Review-side suppression doesn't propagate to production.

Frequently asked questions

Can I undo email deduplication after I apply it?
No. Apply is permanent from the review side. The hidden copies stay in the case record, so the evidence isn't lost, but you can't bring them back into the default review surface from the dialog. Run Preview first and confirm the reduction ratio looks reasonable before you click Apply.
Does deduplication run on cases under legal hold?
Preview runs. Preview is read-only and produces the same numbers you would get if you applied the change. Apply is blocked until the hold is released. The dialog shows a banner explaining the block.
What is an inclusive email?
The most complete copy of a thread. It contains the latest reply at the top with all earlier messages quoted underneath. Reading the inclusive copy is equivalent to reading the whole conversation, so Hintyr can hide the earlier duplicate replies and still leave the case reviewer with full coverage.
Will deduplication affect what shows up in a production export?
No. Email dedup is review-side suppression, not destruction. Production exports follow the protocol you negotiate with opposing counsel. If your protocol calls for inclusive-only production, that's a separate export choice. If it calls for the full set, the hidden copies are still available.
How does Hintyr know which copy is the most complete?
Hintyr reconstructs the reply tree from email headers (Message-ID, In-Reply-To, References) and from quoted text. The inclusive copy is the one whose body subsumes all earlier replies. If a mid-thread message has a unique attachment or a comment that doesn't appear in any later reply, the default policy keeps that one too.