Playbook · AI

AI tooling for small firms — what's worth adopting in 2026.

Most small firms are either over-buying AI (every fee-earner on a different tool, no measurable benefit) or under-buying it (waiting for the dust to settle while competitors quietly take five hours a week off matter prep). The real answer is a small, deliberate stack — picked against time-saved, not feature lists.

9 min readUpdated April 2026

AI tooling at small firms tends to follow one of two patterns. Pattern one: someone tries ChatGPT, likes it, the firm pays for ten Plus seats, six people use it once a fortnight, and nobody can say what it's done. Pattern two: the firm waits because “the regulation is going to change” or “the next version will be better,” and ends up watching one or two competitors quietly absorb work it used to win.

The middle path: a small, named stack adopted on a deliberate sequence, measured against time freed up. This piece covers where AI actually returns time at small firms, how to think about the stack, and the 90-day adoption pattern that produces results you can put in front of partners.

Where AI actually returns time at small firms

The honest list — work where small firms have measured real savings, not vendor case studies:

First-draft generation. Letters of advice, standard contract clauses, due-diligence checklists, retainer letters. Saves 30–60% of drafting time on standard work; saves nothing on novel work.
Summarisation. Pulling key terms out of long contracts, summarising disclosure bundles, condensing client emails into action lists. Saves 40–70% on summarisation tasks where the source is unstructured.
Intake triage. Routing new enquiries by matter type, flagging conflicts at first contact, drafting the qualification email. Saves reception/paralegal time, lifts conversion by reducing first-response delay.
Time-entry support.Generating narratives from calendar entries and email metadata. Saves the 30 minutes of end-of-day or end-of-week reconstruction — which often surfaces unbilled hours that weren't in the system at all (see reducing billing leakage ).
Research support. First-pass research, identifying relevant authority, drafting research notes. Saves 20–40% on research tasks; the watch-out is hallucinated citations, which the team has to learn to spot.

Notice what isn't on this list: client-facing comms without review, advice generation, complex drafting. That pattern holds — AI is genuinely useful as a junior assistant whose work a fee-earner reviews; it's not useful as a replacement for fee-earner judgement, and pretending otherwise is how firms end up writing apology letters.

The three-layer stack

Useful mental model: AI tooling at a small firm sits in three layers, and you need to be deliberate about each.

Layer 1: general-purpose (ChatGPT, Claude, Gemini, Copilot)

Cost: £20–30/seat/month. What they're good for: drafting, summarisation, brainstorming, research support, code/sheet formula help. Best paid plans run at the same monthly cost as a single billed-out hour, so the bar to net-positive is low.

Watch-outs: data confidentiality (do not paste client matter material into a free tier or a non-business plan), prompt discipline (results vary wildly with how you ask), and the creep of treating these as authoritative. They're not.

Layer 2: legal-specific (Harvey, Spellbook, Henchman, Latch, etc.)

Cost: £50–300/seat/month, often with annual commitments. Specialised on legal work — contract review, redlining against firm playbooks, automated due diligence, citation checking. Higher per-seat price, but precision and legal-context-awareness are meaningfully better than general-purpose tools on the work they're built for.

Watch-out: don't buy at this layer until you know the general-purpose layer isn't enough. Many small firms leap to legal-specific tooling without a baseline measure of what general-purpose AI was actually doing for them — and end up paying premium prices for marginal improvement.

Layer 3: embedded (in your practice management, document, accounts systems)

Cost: usually bundled with the underlying product, sometimes a small uplift. Examples: AI summarisation inside a document management system, AI-assisted time-narrative generation inside a practice management tool, automated bank-feed categorisation inside the accounts system.

This is the layer where time savings compound — because the AI is sitting on data that already exists in the firm's system of record, no copy/paste, no context-switching. Most firms underestimate this layer because the vendors talk about it less loudly than standalone AI startups do.

The two questions before any procurement

Before paying for anything in any layer:

1. Where does the data go, and on what terms?

Three things to look for in the supplier's terms: (a) is your input used to train their models — yes/no, and is the default the right way round, (b) where is the data stored geographically, (c) what's the retention period and deletion mechanism. If a vendor can't answer these in writing in under an hour, that's your answer. The better legal-AI vendors have clear data-handling pages; the worse ones don't.

For most small firms, the practical position is: enterprise plans (which don't train on your data) for general-purpose AI, contract-reviewed legal-specific tools, and a clear policy with the team about what's not allowed in any consumer-tier AI.

2. What time would this save, measurably?

Pick the use case before the tool. “We'll save time on document review” is too vague to evaluate. “Each first-draft NDA review currently takes 45 minutes; we want to get it to 20 with the same accuracy” is testable. Run the test on five real matters before scaling. If the saving is real, fold it into your Utilisation Calculatorand see what 25 minutes per NDA × your NDA volume looks like annualised. That's the number you're buying.

The 90-day adoption sequence

Days 1–14: baseline measurement

Pick three use cases — drafting, summarisation, and one firm-specific candidate (intake triage, time-entry narratives, research notes — pick what hurts most). Measure current time spent: log every instance of the use case for two weeks across five fee-earners. You need a number to compare against.

Days 15–30: pilot the general-purpose layer

One enterprise plan (Claude Pro, ChatGPT Team, Copilot — your pick), three or four committed users, the same use cases from week 1–2. Same measurement. Two end-of-week 20-minute debriefs in the four weeks: what worked, what didn't, what prompts to share. Output: a small library of the prompts that worked, and time-saved numbers per use case.

Days 31–60: roll out the wins, kill the losses

Where the pilot showed clear time savings, expand the seat count and write down the workflow (so the next person doesn't have to re-discover it). Where it didn't — be honest. Some use cases just don't pay back at the general-purpose layer. Note them as candidates for the legal-specific layer later, don't pretend they worked.

Days 61–90: evaluate the legal-specific layer

For the use cases that the general-purpose layer didn't crack — typically contract review against a firm playbook, automated DD, jurisdiction-specific citation checking — evaluate one or two legal-specific tools against the same baseline. The bar is whether they beat the general-purpose result by enough to justify the price gap. Sometimes yes, sometimes no. Either answer is fine.

The traps

Over-buying. Stacking three legal-AI tools that all do similar things because they each demoed well. Pick one in each layer, run it for a quarter, then decide.
Hallucination blindness.Trusting AI output that looks authoritative. The first time someone on the team cites a fake case in a piece of advice, you have a real problem. Build the “cite-check everything” rule in from day one.
The “AI redo” trap.Tearing up working processes to rebuild around AI. Layer AI in where it saves measurable time; don't redesign your whole stack around it.
Skipping the policy.If the team doesn't have a one-page AI usage policy (what tools, what data is allowed, what review is required) by month three, you're inviting a confidentiality incident.

What good looks like at six months

A defined stack — usually one general-purpose tool firm-wide, zero or one legal-specific tool for a specific high-volume use case, the embedded AI inside the practice management system turned on. A short prompt library shared across the team. Measurable time savings on three or four use cases, adding up to typically 3–6 hours per fee-earner per week — the real prize, because that's either more billable capacity or more breathing room.

Re-run the evaluation annually. The market moves fast enough that what wasn't worth buying in 2025 might be obvious in 2027 — and what looked essential in 2024 might be a commodity by now.

§ Discussion

Notes from other operators.

Comments on what worked, what didn’t, and where this piece missed the mark. All comments are moderated before they appear — we’re looking for substance, not noise.

No comments yet. Be the first.

Add a comment

Members add to the discussion. Free Member account — takes ten seconds. We’ll email a sign-in link, no password.

Create free account →Or sign in

Need help implementing?

We also run Techsperience (legal-tech support) and Clearmatter (matter management). Mostly we write. Learn more →

← Back to the library