Field notes

Insights from
shipping agents,
not slides about them.

What we've learned building production agents inside small and mid-sized companies. Practical, opinionated, no AI-influencer hype.

What an agent actually is — and what it isn't.

"Agentic AI" became a buzzword sometime in late 2024. Most people using the term mean something between a chatbot and a Zapier flow. The actual definition is more useful — and more demanding. Here's how we draw the line, and what it means for SMEs deciding whether to build one.

An agent perceives, reasons, acts, and reflects in a loop. The hard parts are the same hard parts they've been for forty years: knowing when to stop, knowing when you're wrong, and knowing when to ask a human.

Read essay

The "tool-use harness" pattern: how we make agents reliable in production.

Why we wrap every tool call in a typed contract — and the eval suite we run before letting an agent write to a real system.

The first agent worth building in a 50-person company is almost always inbound triage.

A walkthrough of why support and lead triage have the right shape for early agentic deployment — and the three you should NOT start with.

SMEs vs enterprise AI: why the playbooks don't transfer.

Five things every "AI transformation" framework gets wrong when applied to a company under 500 people, with worked examples.

A 90-day playbook for shipping your first production agent.

Week-by-week: discovery, baseline, build, eval, soft-launch, ramp. The version we run with every new SME engagement.

Memory that doesn't lie: writing reflection loops for long-running agents.

Agents accumulate beliefs. Most of them are wrong. Here's the audit pattern we use to keep agent memory grounded and revisable.

The honest ROI math: what agents actually save vs what vendors promise.

A teardown of three real engagements with the actual labor cost displaced — and the costs no one mentions in the brochure.

Read-only first: how to deploy an agent without breaking trust on day one.

The phased access matrix we use across every engagement, and why "shadow mode" is non-negotiable for the first two weeks.

What private LLMs cost in 2026 — and when on-prem actually makes sense.

A field guide to running open-source models on your own hardware: the break-even calculation, the maintenance reality, and the compliance wins.

Eval-driven agent development: tests as the spec.

Why we treat eval suites as the contract for an agent's behavior, and the four eval categories every production agent needs from day one.