E-Discovery
What Is a Load File? DAT, OPT, and Modern Production Formats
A load file is the structured text bundle that ties your TIFFs, natives, and metadata together. Get it wrong and the production bounces on first import. Get it right and nobody notices.

Legal technology researcher and data scientist specializing in AI governance for litigation teams. Expertise in NLP and AI-assisted document review.

What a Load File Actually Is
If you’ve never put a production together, “load file” sounds like one thing. It isn’t. A load file is a bundle that travels with your TIFFs and natives so opposing counsel’s platform knows what each page is, where it begins, where it ends, and what metadata belongs to it. EDRM puts the point cleanly: a load file makes the raw image come “alive” by tying every TIFF, every text file, and every native back to its document family. Without it, the receiving party gets a folder of disconnected pages.
The Sedona Conference Glossary, Fifth Edition, anchors the legal vocabulary. A load file “relates to a set of scanned images or electronically processed files, and that indicates where individual pages or files belong together as documents, to include attachments, and where each document begins and ends.” Sedona Conference Glossary, 5th ed., entry “Load File” (2020). Federal courts cite the Sedona Glossary as authority. See, e.g., Race Tires Am., Inc. v. Hoosier Racing Tire Corp., 674 F.3d 158, 161 (3d Cir. 2012). That’s the definition you reach for when writing your first ESI protocol.
You’re already familiar with the broader e-discovery workflow. Load files sit at the production boundary, where your processing tool hands off to the receiving review platform. That’s where most avoidable mistakes happen, and if you don’t know what’s in your load file, you can’t QC the production going out or the one coming in.
The Four Formats You’ll Actually See
You’ll meet four formats with any regularity, and the two dominant ones are DAT and OPT. A DAT (Concordance-style delimited text) carries document-level metadata, one row per document. An OPT (Opticon cross-reference) carries page-level image links, one row per page. Together they hand the review platform what it needs to reconstruct your production.
The other two are LFP and EDRM XML. LFP is IPRO’s image cross-reference format, similar in purpose to OPT but built from code-prefixed records (IM, VOLM, OF) rather than seven comma-separated fields. You’ll see LFP when the receiving party uses IPRO Eclipse, IPRO Allegro, or ADD. Most everywhere else (Relativity, Everlaw, DISCO, Reveal, Logikcull, Nextpoint, Lexbe), OPT wins. EDRM XML 1.2 was supposed to replace the proprietary load file zoo and didn’t. The dominant workflow remains DAT plus OPT plus folders of natives, single-page TIFFs, and extracted-text files.
A practical note: “Concordance load files” in a protocol means DAT plus OPT. “IPRO format” means DAT plus LFP. “EDRM XML” means ask which version, because half the platforms that nominally support it haven’t exercised that code path in years.
DAT Files: The Metadata Backbone
Open any DAT in a hex editor and the first thing you’ll see is three characters that don’t appear on a US keyboard. The pilcrow (¶, ASCII 020) separates fields, the thorn (þ, ASCII 254) qualifies text, and the registered symbol (®, ASCII 174) marks line breaks within a field. Some platforms count thorn as 231 in the CP437/CP850 code page; the glyph is identical, the code-page differs, and a misread can replace your delimiters with garbage. Real human content rarely contains those three glyphs, which is why they were chosen. They’re not bulletproof, and Section VIII covers what happens when one slips through.
The first row of a DAT is the header, which names every column. A typical header, with the þ and ¶ rendered for legibility:
þBegBatesþ¶þEndBatesþ¶þBegAttachþ¶þEndAttachþ¶þCustodianþ¶þFromþ¶þToþ¶þSubjectþ¶þDateSentþ¶þFileNameþ¶þMD5Hashþ¶þNativeLinkþ
Each subsequent row is one document. The District of Delaware codified the industry-consensus field list in its Default Standard for Discovery, Schedule A: Custodian, File Path, Email Subject, From, To, CC, BCC, Date Sent, Time Sent, Date Received, Time Received, Filename, Author, Date Created, Date Modified, MD5 Hash, File Size, File Extension, Control Number Begin, Control Number End, Attachment Range, Attachment Begin, and Attachment End. When a receiving party objects that your DAT is “incomplete,” they’re usually comparing to that list.
Two fields anchor each document: BegBates and EndBates, the inclusive Bates range. If you’ve never numbered a production before, the Bates numbering primer walks through the convention. BegAttach and EndAttach carry the family relationship.
OPT and LFP Files: The Image Cross-Reference
Where the DAT carries metadata, the OPT carries the image map. It’s comma-delimited, one row per page. The DOJ Antitrust Division’s Standard Specifications for the Production of ESI defines the structure: a page-level comma-delimited file with seven fields per line, namely PageID, VolumeLabel, ImageFilePath, DocumentBreak, FolderBreak, BoxBreak, PageCount. Field one is the page identifier (must equal the image filename minus its extension). Field three is the relative path from the delivery root. Field four is a Y flag marking the first page of a unique document. Fields five and six (folder and box markers) are rarely populated. Field seven is the page count on the first page of each document.
A short OPT for a four-page production:
PRODABC00000001,PRODABC001,\IMAGES\001\PRODABC00000001.tif,Y,,,3 PRODABC00000002,PRODABC001,\IMAGES\001\PRODABC00000002.tif,,,, PRODABC00000003,PRODABC001,\IMAGES\001\PRODABC00000003.tif,,,, PRODABC00000004,PRODABC001,\IMAGES\001\PRODABC00000004.tif,Y,,,1
Document one runs from PRODABC00000001 through 00000003; document two is one page. The Y tells the platform where one document ends and the next begins. Drop a Y and the platform glues two unrelated documents together. Add a Y to the wrong row and you split a document in half.
LFP is the IPRO equivalent and uses a code-prefixed schema: a VOLM record names the volume, and IM records describe each image. An IM record carries a Dflag in its document-break field to mark the start of a unique document. It’s the same information as OPT, just different syntax. If your receiving platform takes both, send OPT.
Parent-Child Families and the BegAttach/EndAttach Rule
A document family is a parent (usually an email) plus its children (usually attachments). The DAT preserves the family through BegAttach and EndAttach, which together mark the contiguous Bates range belonging to one logical family. For a stand-alone document, BegAttach equals BegBates and EndAttach equals EndBates. For a parent email with three attachments numbered immediately after it, every record carries the same BegAttach (the parent’s BegBates) and the same EndAttach (the last child’s EndBates). The New Jersey Bureau of Securities rule states the convention plainly: “All attachments should sequentially follow the parent document/email. Parent email and attachment document families should be kept intact.”
This isn’t formatting fussiness. The Sedona Principles, Third Edition, anchor the framing: Principle 6 places search and production methodology with the responding party, and Principle 12 sets the form-of-production rule that family relationships travel with the production. Case law has been merciless when families fall apart: In re Seroquel Products Liability Litigation, 244 F.R.D. 650 (M.D. Fla. 2007), catalogued AstraZeneca’s deficient production practices across multiple discovery sins, including missing attachments, blank pages, broken family relationships, and load files that weren’t searchable. Id.at 660 to 665. The court’s takeaway, summarized in the practitioner literature, was that without quality-control oversight, problems of that scale are inevitable.
The duty to preserve families connects to the broader duty to preserve evidence you owe under Rule 37(e). A family that disintegrates between collection and production isn’t just sloppy. Depending on what fell out, it can be evidence you should have produced.
The mechanical fix: number attachments sequentially after the parent (parent at PRODABC00000001 to 00000003, first attachment at 00000004 to 00000007, second at 00000008 to 00000010), and write the same BegAttach (00000001) and EndAttach (00000010) on every row. Verify with a QC tool before you ship.
Modern Attachments: The Format That No Load File Format Was Designed For
Modern attachments are the part of this story that’s genuinely unsettled. A modern attachment is a hyperlink inside an email or chat pointing to a file in SharePoint, OneDrive, Google Drive, Box, Dropbox, or Microsoft Teams. The file doesn’t travel with the message. The message holds a pointer URL, and the file lives in the cloud, where it may or may not still exist in the version originally shared.
Microsoft Purview’s eDiscovery documentation states the reality: “[O]nly the cloud attachment link and not the actual content in the shared document are returned in an eDiscovery search.” Microsoft Purview eDiscovery, Collect cloud attachments. Purview’s modern-attachment extraction pulls the live version at collection by default, but it skips encrypted mail, plain-text mail, non-clickable links, URLs over 2,048 characters, and anything past the first 50 links per message.
Three federal opinions frame the doctrine, and they don’t agree. Nichols v. Noom, Inc., 2021 WL 948646 (S.D.N.Y. Mar. 11, 2021), held that “[t]o the extent that hyperlinks are not exportable via Google Vault, those hyperlinks are not attachments.” Magistrate Judge Parker refused to compel re-collection where pulling each hyperlink would mean duplicating the same document hundreds of times and adding costs over $180,000. The District court affirmed. Nichols v. Noom, 2021 WL 12307293 (S.D.N.Y. Apr. 30, 2021) (Schofield, D.J.).
In re Uber Technologies, Inc., Passenger Sexual Assault Litigation, 2024 WL 1772832 (N.D. Cal. Apr. 23, 2024), went the other way. Magistrate Judge Cisneros adopted a broad definition of “Attachment(s)” including “modern attachments, pointers, internal or non-public documents linked, hyperlinked, stubbed or otherwise pointed to within or as part of other ESI.” Id. at *6. The order included a feasibility carve-out, and the litigation has since produced follow-on rulings refining when point-in-time reconstruction is required.
Between Noom and Uber sits In re StubHub Refund Litigation. The parties’ ESI Protocol required that “[h]yperlinked files must be produced as separate, attached documents.” 2023 WL 3092972 (N.D. Cal. Apr. 25, 2023). Magistrate Judge Hixson held StubHub to the bargain, then thirteen months later modified the order on a “good cause” showing that compliance was technologically impossible most of the time. In re StubHub Refund Litig., 2024 WL 2305604 (N.D. Cal. May 20, 2024).
The line keeps moving. Recent district-court orders compelling point-in-time reconstruction of hyperlinked documents on a sample basis (with fixed turnaround windows and protocol-bound feasibility carve-outs) signal partial enforcement, not categorical exclusion. The pattern that’s settling in across the trial courts: if the protocol bound you, you ship; if the technology won’t support the bargain, you renegotiate.
Slack and Teams chat-message production sits adjacent to this debate but follows its own family logic. Sedona’s twenty-four-hour unitization framework treats a day of channel messages as one logical document, not as parents and children, and traditional load files don’t carry that context cleanly. The Sedona Conference’s Commentary on Discovery of Collaboration Platforms Data (April 2025) remains in public-comment status, so treat its framework as guidance, not authority.
What Courts and Standards Expect
The rule that drives any first production is Rule 26(f)(3)(C). The discovery plan must state the parties’ views on “any issues about disclosure, discovery, or preservation of [ESI], including the form or forms in which it should be produced.” Translation: if you don’t negotiate form of production at the meet-and-confer, you live with whatever opposing counsel sends. Magistrate Judge Maas put the corollary in Aguilar v. ICE, 255 F.R.D. 350, 357 to 358 (S.D.N.Y. 2008): “[I]f a party wants metadata, it should ask for it. Up front. Otherwise, if the party asks too late or has already received the document in another form, it may be out of luck.”
Rule 34(b)(2)(E)(ii) sets the default: absent a specified form, the producing party must produce ESI “in a form or forms in which it is ordinarily maintained or in a reasonably usable form.” Your ESI protocol, not your production letter, decides what you’ll actually have to send. The negotiation overlaps with Rule 26(b)(2)(C) proportionality analysis.
Two standards function as the defaults the rest of the industry copies. The District of Delaware Default Standard prescribes Concordance and Opticon load files for image productions, with native files only “for files not easily convertible to image format, such as Excel and Access files.” The DOJ Antitrust Division’s Standard Specificationsare among the most exacting load-file specs a private practitioner is likely to see, and they’re a useful template even when you’re not in an antitrust matter.
Common Mistakes That Derail Your First Production
Quality control is where most of this gets caught or missed. A paralegal at a five-attorney firm walks the BegAttach/EndAttach ranges, hex-checks the delimiters, and test-loads into a throwaway workspace before the production goes out. Hintyr is an Agentic Document Review platform built for small and mid-size firms; the same QC steps run automatically alongside the review work, which helps when small and mid-size firms running their first big production are short on dedicated litigation-support staff. With or without that help, the four categories below cover most of what goes wrong, and each one’s preventable.
Encoding mismatches.UTF-8 vs UTF-16 vs Windows-1252. The thorn is FE in extended ASCII and C3 BE in UTF-8. UTF-8 with byte-order mark (BOM) is the safest default. If your receiving platform reads garbage where the þ should be, the encoding’s wrong.
Delimiter conflicts. A user-typed pilcrow inside a Subject line tears the row in half. Use ASCII 020, 254, and 174 strictly. Scrub or escape those characters before export. Verify with a hex editor.
Family breaks and time zone drift.Run a script that walks the BegAttach/EndAttach ranges and confirms every Bates number is contiguous. Lock the time zone: Relativity’s docs are explicit that, by default, date and time metadata produces to UTC with no abbreviated time-zone indicator. Recipients reading UTC as local time misalign every email by hours. Negotiate it in your Rule 26(f) plan, then state it in the production cover letter.
Bates numbering inconsistencies. Variable-length Bates numbers (ABC1 vs ABC0001) sort lexically rather than numerically and break attachment-range lookups. Pad to a fixed width. Always.
Frequently Asked Questions
What’s the difference between a DAT file and an OPT file?
A DAT carries document-level metadata, one row per document (BegBates, Custodian, From, To, Subject, DateSent, MD5Hash). An OPT carries page-level image links, one row per page, seven fields per row. A standard production includes both, plus folders of single-page TIFFs, extracted-text files, and natives.
Why does a Concordance load file use symbols like ¶, þ, and ®?
Real text rarely contains those characters, so they make practical separators. The pilcrow (ASCII 020) separates fields, the thorn (ASCII 254) qualifies text, and the registered symbol (ASCII 174) marks in-field line breaks. Some platforms treat thorn as ASCII 231 in CP437/CP850; check your sender’s specs.
What happens if BegAttach and EndAttach values are wrong?
Family integrity breaks at import. The platform treats the parent and children as unrelated, or it glues unrelated documents together. That’s the failure mode In re Seroquel, 244 F.R.D. 650 (M.D. Fla. 2007), catalogued: the kind of family-break problem that gets sanctioned when quality control is missing.
Are hyperlinked Google Drive or SharePoint documents attachments?
It depends on the court and the ESI protocol. Nichols v. Noom(S.D.N.Y. 2021) said no where the collection tool can’t export them. In re Uber Techs.(N.D. Cal. 2024) defined “Attachment(s)” broadly to include pointers. In re StubHub(N.D. Cal. 2023, modified 2024) enforced the parties’ agreement, then modified it when compliance proved impossible. Negotiate this in your Rule 26(f) plan.
What’s the most common encoding for a modern DAT file?
UTF-8 with byte-order mark (BOM). Some legacy Concordance Desktop installations expect Windows-1252 and will misread the þ if the file’s saved as plain UTF-8 without BOM. Test-load before shipping.
Can opposing counsel make me produce a load file in a specific format?
The producing party generally chooses the form, subject to Rule 34(b)(2)(E) and the Rule 26(f) conference; Sedona Principle 6 places the choice with the producer. But once you agree on a format in an ESI protocol or court order, you’ve bound yourself. In re StubHub Refund Litig., 2024 WL 2305604 (N.D. Cal. May 20, 2024), is the cautionary tale.
This article is for general informational purposes only. It does not constitute legal advice and does not create an attorney-client relationship. Statements about case law and rules reflect publicly available sources as of May 2026 and may not address your jurisdiction or matter. Consult qualified counsel before acting on any of the topics discussed.
Run your first production with a load file that won’t bounce.
Hintyr is the Agentic Document Review platform built for small and mid-size firms. We keep your families intact, your Bates ranges contiguous, and your load files inside the spec, so the production you ship clears opposing counsel’s QC on first import. Always intuitive, always accurate, always cited.