AI Ethics

What Is an AI Agent? Agentic AI for Law Firms

Today, the agents your inbox keeps pitching claim to plan, run, and adjust on real legal tasks. Some of those claims do real engineering work. Some do marketing work. Here is how to tell the difference, and what Rule 5.3 has been quietly waiting to do about both kinds.

Alexander Cohan, Ph.D.

Alexander Cohan, Ph.D.

Legal technology researcher and data scientist specializing in AI governance for litigation teams. Expertise in NLP and AI-assisted document review.

Key Takeaways

  • An AI agent plans, calls tools, and runs until a task is done. A chatbot only answers.
  • The Lilian Weng formula, agent equals a language model plus memory plus planning plus tool use, is the cleanest engineering description and the one a procurement reviewer can verify.
  • Anthropic itself concedes there is no industry-standard definition. Every vendor claim has a footnote.
  • ABA Formal Opinion 512 supplies the supervision floor for any agent doing legal work, and Rule 5.3 supplies the duty to enforce it.
  • Federal sanctions for unverified AI citations continued through 2026; Damien Charlotin’s running database had logged more than 1,350 verified hallucination cases worldwide by April 2026.
Stylized AI agent depicted as a luminous neural network branching outward, illustrating agentic AI for law firms
"The Software That Decides Itself"

What Is an AI Agent in Legal Practice?

AI Agent (legal practice)
Software that takes a goal in plain language, drafts its own plan, and runs that plan by calling other tools. Per Lilian Weng’s formula, an AI agent equals a language model plus memory plus planning plus tool use. In a law firm, the agent is a Rule 5.3 nonlawyer assistant whose work the supervising lawyer remains fully responsible for under ABA Formal Opinion 512.

You have heard “AI agent” three times this month, at least once from a vendor email. The term is doing real engineering work and a lot of marketing work, and it is worth pinning down before the next pitch lands on your desk.

An AI agent, in the working sense this post will use, is software that takes a goal in plain language, drafts its own plan, and runs that plan by calling other tools until the work is finished. Anthropic, the lab that builds Claude, defines agents two ways. The first runs autonomously for long stretches, using whatever tools the task needs. The second is a more prescriptive implementation that follows a predefined workflow. Both are in production today.

The engineering version is cleaner. It comes from researcher Lilian Weng, who summarized it as a formula. An agent equals a language model plus memory plus planning plus tool use. The model handles language. Memory tracks what has happened. Planning breaks the goal into steps. Tool use does them. Strip any one of those four out and you have a chatbot.

The legal-industry version sits inside the same formula. Clio puts it plainly: when you give it an objective, it can determine the necessary steps, create a plan, and go through the process with minimal guidance. The operative word is minimal. Not unsupervised. The agent operates with less supervision per step than a chatbot, which is what triggers everything below.

Anthropic itself concedes there is no agreed-upon definition of what an agent is. We will use the working version above, because it lets us reason about supervision, ethics, and what to write into a vendor agreement.

"A Field Guide to Look-Alikes"

AI Agent vs Chatbot vs Assistant vs RAG

Vendors collapse the vocabulary on purpose. Chatbot, assistant, copilot, agent, all four words show up on the same product page, and procurement is supposed to know the difference. The four mean four different things.

A chatbot answers. You type, the model replies, and the reply is a paragraph of text. The output is information; you decide what to do with it. Most consumer ChatGPT use is chatbot use.

An assistant suggests. It outlines a brief, drafts a clause, summarizes a deposition. The work product still lands in your hands; you decide what to keep, edit, and send. Microsoft 365 Copilot inside Word and Outlook is an assistant.

RAG, or retrieval-augmented generation, is a retrieval pattern any of the above can use. It grounds the answer in a curated document set before the model writes anything. The chatbot, assistant, or agent goes hunting through your case file first, then writes from what it finds. Research, then writing.

An agent does. It plans a sequence, calls tools to execute each step, watches the result, and decides what to do next. The output is a candidate work product the lawyer must verify: a draft brief whose citations the lawyer must still check, a privilege log the lawyer must still review, a redacted production the lawyer must still inspect before export. Different verbs entirely.

One usable formulation comes from the North Carolina Bar Association: agentic AI takes initiative, interpreting instructions and executing them across tools. Or, less formally, NetDocuments puts the operational distinction this way: an agent is software that is “not just giving answers, it’s getting things done.”

That doing separates agents from every previous wave. It also produces every supervision question in the second half of this post.

AI agent vs chatbot vs assistant vs RAG
PatternWhat it doesOutputExample
ChatbotAnswers questionsA paragraph of textConsumer ChatGPT
AssistantSuggests and draftsHand-edited contentMicrosoft 365 Copilot
RAGGrounds answers in your documentsCited paragraphRetrieval-augmented chatbots and assistants
AgentPlans, acts, watches, decidesA finished work productSpellbook Associate, Harvey Agents
"Plan, Act, Watch, Reflect"

How AI Agents Work: The Agent Loop in Plain Terms

Here is the loop in four moves. The agent reads the goal. It drafts a plan, executes step by step by calling tools, and watches the results. Then the agent decides whether to adjust or finish. Around it goes.

Each turn is small on its own. The lift is in the iteration. A modern agent runs for many turns before it returns a finished product, which is what separates it from a single-shot chatbot exchange.

How long is “many turns”? Anthropic measured this on its own product. Claude Code’s longest running sessions roughly doubled in three months, from under twenty-five minutes to over forty-five. The session length keeps growing with each release, which suggests today’s models can run longer than they currently do.

That iteration produces real benefit and real risk. Anthropic flags both: the autonomous nature of agents means higher costs, and the potential for compounding errors. Sandbox testing and guardrails are not optional. For a small firm, the translation is direct. The same loop that drafts a clean privilege log on the Smith production due Friday can drift on the third turn, and you will not see the drift unless your tooling shows you.

Three controls keep the loop honest. The first is scope: an agent should run inside one matter’s documents, not across your whole drive. The second is logging: every tool call should be visible after the fact, and every output should be checkable on the page level. The third is a stop condition: the agent should know when it is finished, and so should you. AI agents are safe for legal work only when those three controls hold; without all three, ABA Op. 512 supervision duties cannot be met. Those three controls travel into every workflow you point an agent at, and five of those workflows are already running at firms you have heard of.

"Five Workflows on the Floor"

Five Legal Workflows Where Agents Are Already Running

This is not a clean taxonomy. Vendors are piloting legal AI agents in five overlapping workflows, and each one runs into a different ethics rule before it lands.

Client intake

An intake agent answers website chats, asks for matter facts, runs a conflict screen, and drafts a retainer letter. ABA Op. 506 allows nonlawyer client intake only when the lawyer trains and supervises under Rule 5.3 and the prospective client has the chance to consult with a lawyer. Florida Op. 24-1 adds that the program must not offer legal advice and must refer any legal question back to the lawyer. The agent inherits both.

Contract review

Wilson Sonsini’s Neuron platform pairs with Dioptra contract intelligence for fixed-fee commercial reviews. Simmons & Simmons partnered with Flank, a Berlin startup, whose marketing claims its agents “take whole tasks away from lawyers.” That is vendor language. The supervising firm still carries the output. Small firms will not buy Neuron or Flank, but the same loop now shows up in pitches at every size.

E-discovery

The agent ingests a production population, classifies for responsiveness, and surfaces hot documents. ABA Op. 512 recognizes that lawyers already use AI in e-discovery, citing Technology-Assisted Review (TAR) as a long-standing example. Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012), found computer-assisted review “an acceptable way to search for relevant ESI in appropriate cases.” Rio Tinto v. Vale, 306 F.R.D. 125 (S.D.N.Y. 2015), three years later observed that allowing TAR on the producing party’s request “is now black letter law.” One caveat: TAR is a court-blessed statistical classifier, and an agentic language model is not. As of April 2026, no published opinion has approved LLM agents for privilege review under the TAR analogy.

Privilege review and production QC

The agent flags privileged candidates, suggests redactions, and drafts the log entry. The risk profile is well-documented; see where AI redaction silently fails on privileged data. FRCP 26(g) still requires reasonable inquiry before signing a discovery response, and FRE 502(b) governs whether inadvertent disclosure waives privilege. Both apply to the agent’s output the way they apply to a paralegal’s. Hintyr is built around that inquiry: privilege answers point to a specific source page, and the supervising partner clicks through before signing.

Compliance and monitoring

Continuous-watch agents flag policy deviations across firm systems. The largest legal-tech vendors are announcing offerings here; small firms are not yet. Thomson Reuters frames its CoCounsel agentic workflows as moving from “one single-shot task, answering one question, to actually getting to a work output.”

Five workflows, five different ethics surfaces. Each surface has a rule, and the rule is older than the agent.

"Competence in the Loop"

Rule 1.1 Competence Applied to AI Agents

ABA Op. 512 does not prohibit AI agents; it imposes a verification duty when lawyers use them. Comment 8 to ABA Model Rule 1.1 has required lawyers to keep up with relevant technology since 2012. The standard is reasonable understanding, not expert mastery.

ABA Formal Opinion 512, issued July 2024, applied that competence duty to generative AI directly. The line that matters most for agents is this: lawyers may not abdicate their responsibilities by relying solely on a GAI tool to perform tasks that call for the exercise of professional judgment. Verification is the duty. In practice, supervision is the live question, not disclosure, and an agent puts that question on the front page.

The cases say the same thing in plainer language. Mata v. Avianca, Inc., 678 F. Supp. 3d 443 (S.D.N.Y. 2023), set the template. Steven Schwartz prepared the research, his colleague Peter LoDuca signed and filed the brief, and the brief rested on ChatGPT’s hallucinated cases, including a fictional Varghese v. China Southern Airlines. Judge Castel sanctioned all three jointly, finding that LoDuca had “consciously avoided learning the facts.”

Park v. Kim, 91 F.4th 610 (2d Cir. 2024), followed: counsel was referred to the Second Circuit’s Grievance Panel after citing a non-existent case generated by ChatGPT. People v. Crabill, No. 23PDJ067 (Colo. O.P.D.J. Nov. 22, 2023), captured the failure mode in the lawyer’s own text message: “I think all of my case cites from ChatGPT are garbage.” He filed the motion anyway. He was suspended.

Wadsworth v. Walmart, No. 2:23-cv-118-KHR (D. Wyo. Feb. 24, 2025), reaffirmed the duty: “the duty to check their sources and make a reasonable inquiry into existing law remains unchanged.” Mid Central Operating Engineers v. HoosierVac LLC, No. 2:24-cv-00326 (S.D. Ind. May 28, 2025), imposed a $6,000 sanction (down from a $15,000 magistrate recommendation) for citing fabricated authority, finding violations of Indiana RPC 1.1, 3.1, and 3.3.Damien Charlotin’s running database had logged more than 1,350 verified hallucination cases worldwide by April 2026. The malpractice exposure courts now treat as the new standard of care is rising, not falling.

Two takeaways for any agent. You are competent only if you can describe what the agent reads, what tools it calls, and how to spot its drift. The agent’s draft gets the same Rule 11 inquiry as a paralegal’s, and you still sign the brief.

"The Nonlawyer in Your Server"

Rule 5.3 Supervision: Treating Agents as Nonlawyer Assistants

If Rule 1.1 says you must be competent in the technology, Rule 5.3 says you must supervise it. Partners and managerial lawyers must “make reasonable efforts to ensure that the firm has in effect measures giving reasonable assurance that the person’s conduct is compatible with the professional obligations of the lawyer.”

The agent is a nonlawyer assistant. Joseph, Hollander & Craft put the bridge plainly in 2023: “There seems to be little doubt that the use of an AI program to provide services to clients would constitute the use of a nonlawyer assistant under the Rules.” Comment 3 to Rule 5.3 already covers cloud services, document-management vendors, and printing third parties; the agent is the next entry.

Rule 5.3 imposes three concrete duties: reasonable efforts to set firm-wide policy on permissible use, reasonable assurance the agent’s conduct is compatible with the rules, and direct responsibility up the chain if a partner knew about misbehavior and did nothing in time. ABA Op. 512 made the policy duty explicit; managerial lawyers must establish clear policies on the firm’s permissible use of GAI. California’s 2023 Practical Guidance and Pa./Phila. Joint Formal Op. 2024-200 (2024) say the same in different words.

A working firm policy on agents has five elements. A list of what the agent may and may not do without lawyer review. A confidentiality screen naming the data the agent may touch. Mandatory citation checks before any filing. A log of every prompt and tool call on a client matter. And a specific person, by role, who reviews the log on a defined cadence. None of this is novel; it is what your firm already does for paralegals and outside vendors.

What the rules do not yet say. Three questions remain open. First, Rule 1.6 confidentiality when an agent’s memory persists across matters: does the storage itself trigger Op. 512’s self-learning disclosure risk, even without training on your data? Second, what “reasonable assurance” means under Rule 5.3 when tool calls run autonomously and the lawyer reviews only the output. Third, what counts as Rule 11 reasonable inquiry when the agent has read a corpus the lawyer did not. Bar guidance through April 2026 does not resolve any of the three.

For longer treatment, see your duty to supervise AI under Rules 5.1 and 5.3. The short version: an agent without a supervisor is a malpractice claim wearing a vendor logo.

"What a Small Firm Does Monday"

How Small Law Firms Should Adopt AI Agents Today

ABA Op. 512 ends with the line every Monday-morning manager needs taped to the monitor: regardless of the level of review the lawyer selects, the lawyer is fully responsible for the work on behalf of the client. Pa./Phila. Joint Formal Op. 2024-200 (2024) adds the floor: lawyers must be proficient in using technological tools to the same extent they are in employing traditional methods.

For a small or mid-size firm picking an agent vendor this quarter, three filters do most of the work. The first is scope. Does the agent run inside one matter at a time, or across your whole drive? An agent that crawls everything is a Rule 1.6 problem before it is a productivity one. The second is citations. Does every output point to a specific page in the source documents, or to a vague summary? An agent that cannot show its work is an agent you cannot supervise. The third filter is refusal. What does the agent do when the answer is not in the corpus? An agent that confabulates instead of refusing is the next sanctions case waiting on a docket.

Hintyr was built around those three filters. Hintyr is Agentic Document Review for small and mid-size firms. Files stay scoped to the case they were uploaded into. Every answer cites the exact page in the source, so accuracy can be checked one click at a time. Searches run across emails, PDFs, images, and audio in one place. Always intuitive, always accurate, always cited.

The doctrine has been on the books for years. The only open question is whether the tools on your desk match the duty already on your shoulders.

This post is for informational purposes only and is not legal advice. Agentic AI capabilities, vendor claims, and ethics guidance vary by jurisdiction and tool. Confirm your state bar’s current AI guidance and verify any AI-assisted citation against the underlying authority before relying on it in client work.

See Agentic Document Review in Your Case File

Hintyr is built for small and mid-size firms that need cited answers from real case files. Drop your discovery production in, ask the question that has been on your whiteboard, and read the answer with the page citation already attached.