File deduplication

Last updated: 2026-05-16

File deduplication finds files with byte-identical content across your case, even when filenames differ. Open the dialog from the case menu, switch to the Files tab, preview the duplicate groups, and remove the extras. The earliest upload in each group stays; the rest go. Useful after big imports, after multiple custodians contributed overlapping documents, and right before production.

Deduplicate dialog: Files tab

Review duplicate groups and confirm before removing.

Deduplicate

Found 3 duplicate groups (7 files) across 482 scanned files.

Group 1: Contract_MSA_2024.pdf
2 copies
Identical content, different filenames. Earliest copy kept; 1 file removed.
Group 2: Email_Thread_RE_Settlement.eml
3 copies
Identical content, collected from three custodians. Earliest copy kept; 2 files removed.
Group 3: Financial_Records_Q3.xlsx
2 copies
Identical content. Earliest copy kept; 1 file removed.

What counts as a duplicate

Hintyr compares file content, not filenames. Two files with different names but identical bytes are duplicates. Contract_MSA_2024.pdf and Contract_MSA_2024 (1).pdf? Same content, same group. The detection is exact. Near-duplicates (one extra comma, one different scan resolution) are not grouped here. Use Emails mode for fuzzy thread-level overlap.

How to deduplicate files

  1. Open the case menu from the top navigation and click Deduplicate.
  2. Confirm the dialog opens on the Filestab. If it doesn't, switch to it.
  3. With dry run enabled (default), Hintyr scans the case and shows duplicate groups. Each group lists filenames, sizes, upload dates, and the copy flagged to keep.
  4. Review each group. By default the earliest upload is the keep copy. Click a different row to change which copy stays.
  5. Click Run to remove the unselected duplicates.

Options and fields

Dry run toggle

On by default. Dry run computes the groups without removing anything, so you can review the preview before any file leaves the case. Turning dry run off and clicking Run commits the change. Keep dry run on the first time you run dedup on a fresh case so you can sanity-check the count.

Group display

Each duplicate group displays:

  • Filename: the name of each copy.
  • File size: the size on disk, useful for visual confirmation that the files truly match.
  • Upload date: when the file landed in the case.
  • Keep badge: the copy flagged to retain. By default it's the earliest upload.

Choosing which copy to keep

Click a different file in a group to flip the keep selection. The kept file retains its full record: tags, redactions, custodian assignments, Bates numbers, comments, and folder location. Anything tied to a removed file is gone with that file. If a non-default copy carries the tags or redactions you want to preserve, switch the keep flag to that row before you run.

When to run file dedup

Common scenarios:

  • After importing a large batch where the same document was collected from multiple sources.
  • When team members upload independently and some overlap.
  • Before production, so a single document doesn't produce twice.
  • When consolidating collections from custodians who shared the same attachments.

Edge cases and limits

  • Removal is permanent. There's no trash. Re-upload if you need a removed copy back.
  • Tags, redactions, comments, and Bates numbers on removed files go with them. Only the kept copy retains its metadata.
  • Cases on legal hold can't run file dedup. Release the hold first, then re-open the dialog.
  • You can run file dedup repeatedly. If no duplicates remain, the dialog reports none found.

Frequently asked questions

Can I undo file deduplication after removing files?
No. Removing duplicate files is permanent. Review each group carefully with dry run on before you commit. If you accidentally remove a file, re-upload it.
Does deduplication compare file content or just filenames?
Content. Files with different names but identical bytes are detected as duplicates. Filenames don't influence the match.
What happens to tags and redactions on removed files?
They go with the removed file. Tags, redactions, Bates numbers, and custodian assignments are deleted along with the file. The kept copy retains its own metadata.
Can I run file deduplication multiple times?
Yes. Run it as often as you want. Subsequent runs only find new duplicates introduced by later uploads. If no duplicates remain, the tool reports none.
Why does the dialog show the earliest upload as the keep copy?
Earliest upload is the safest default because it carries the longest history of tags and review work. If a later copy has the metadata you care about, click that row to flip the keep flag before you run.