You're building on Claude API or another LLM. Multi-agent orchestration, loop patterns and harness aren't clearly documented — we know them in production.
On demand: half-day debugging, architecture design day, or monthly technical CTO retainer.
These are the fundamental building blocks of any robust LLM system in production.
A central agent receives the task, decomposes it, delegates to specialized sub-agents and synthesizes their outputs. Each sub-agent has minimal context, a precise tool, a schematized output. The orchestrator manages handoffs, shared memory and dependencies.
Result: complex tasks (legal analysis, report generation, large-scale code review) that you couldn't automate become reliable and repeatable.
Anti-pattern: Common mistake: putting everything in one agent with a 200k token context. The model loses track, hallucinates, costs a lot.
Loop-until-dry: we re-run finder agents until K consecutive rounds find nothing new. Ideal for exhaustive audit, bug detection, edge case coverage. Loop-until-budget: we dynamically scale the number of agents based on a defined token budget — more budget = more depth.
Result: an audit system that stops when it's truly found everything, not when told to stop. Or content generation that adapts to the allocated budget without manual intervention.
Anti-pattern: Common mistake: for(i=0; i<10; i++) loop with an arbitrary count. You miss the tail of rare cases, or spin for nothing.
The harness is the framework that drives each agent call: structured output enforcement via JSON Schema, Zod validation on receipt, automatic retry on mismatch, abort signal for ghost agents, token budget shared across all workflow agents.
Result: zero free-text parsing, zero "the model responded in the wrong format", zero agent running in the background without our knowledge. Reliability goes from 70% to 99%+.
Anti-pattern: Common mistake: parsing the text response with a regex or JSON.parse without validation. A single unexpected character crashes the pipeline.
Pipeline: each item traverses all steps continuously, without waiting for others. Item A can be in step 3 while item B is still in step 1. Barrier: wait until ALL items of a step are done before moving to the next. Use only when step N needs the aggregated result of N-1 (cross-item deduplication, early-exit if 0 results).
Result: a workflow processing 50 documents can be 4x faster in pipeline than sequential mode, without changing a single line of business logic.
Anti-pattern: Common mistake: using parallel() everywhere "because it's cleaner". Each barrier adds latency — wasted parallelism cancels the gain.
The Anthropic bill can be divided by 5 to 20 with the right levers. We know them all.
Reduce 60-90% costs on repetitive system prompts. Enabled in 2 lines.
Haiku for classification, Sonnet for drafting, Opus for critical decisions. 10x cost factor.
Passing 200k tokens on every call = 20x the price. Synthesize, summarize, keep only what matters.
For non-real-time tasks, the Batch API halves costs with a few-hour delay.
We don't recommend tools we haven't used in production.
On demand, no forced subscription.
We listen to your project, stack, blockers. We tell you honestly if we can help.
Architecture review, targeted debugging or pair programming on a specific problem.
Full agent system design, working prototype or refactor of an existing architecture.
Part-time technical CTO: regular reviews, async availability, LLM architecture ownership.
No. We know Claude well (it's our reference model), but we also work on GPT-4o, Gemini, Llama, or hybrid stacks. The advice is model-agnostic — we recommend what's best for your case.
Production LLM patterns (harness, orchestration, cost optimization) aren't improvised. Your devs may have 6 months of LLM experience; we have 3+ years on real multi-agent systems. That's 1 day of joint work vs 3 months of trial and error.
Both. Half-day and full-day include delivered code (PR, documented architecture). The monthly retainer includes direct technical work on your repo.
A harness is the framework that wraps each LLM call: it enforces structured output, validates the schema, handles retries, cuts off ghost agents and shares the token budget. Without a harness, a multi-agent pipeline is brittle. With a good harness, it runs in production without intervention.
Share your project, stack and blockers. We'll tell you honestly if and how we can help.
Book a call