CTO on Demand — LLM & Agents

Architecture, orchestration and harness for your Claude projects

You're building on Claude API or another LLM. Multi-agent orchestration, loop patterns and harness aren't clearly documented — we know them in production.

On demand: half-day debugging, architecture design day, or monthly technical CTO retainer.

Free discovery session (1h)All services

Who is this for?

Startup building a SaaS product with Claude API at its core

Dev team stuck on a multi-agent orchestration problem

Non-tech CTO who needs an expert eye before hiring or investment

Claude Code / autonomous agent project going to production

Anthropic bill exploding — LLM architecture to optimize

Migration from a GPT-based system to Claude, without breaking everything

The patterns we master

These are the fundamental building blocks of any robust LLM system in production.

Architecture

Orchestrator + sub-agents

How it works

A central agent receives the task, decomposes it, delegates to specialized sub-agents and synthesizes their outputs. Each sub-agent has minimal context, a precise tool, a schematized output. The orchestrator manages handoffs, shared memory and dependencies.

Business impact

Result: complex tasks (legal analysis, report generation, large-scale code review) that you couldn't automate become reliable and repeatable.

Anti-pattern: Common mistake: putting everything in one agent with a 200k token context. The model loses track, hallucinates, costs a lot.

Loop patterns

Loops: loop-until-dry, loop-until-budget

How it works

Loop-until-dry: we re-run finder agents until K consecutive rounds find nothing new. Ideal for exhaustive audit, bug detection, edge case coverage. Loop-until-budget: we dynamically scale the number of agents based on a defined token budget — more budget = more depth.

Business impact

Result: an audit system that stops when it's truly found everything, not when told to stop. Or content generation that adapts to the allocated budget without manual intervention.

Anti-pattern: Common mistake: for(i=0; i<10; i++) loop with an arbitrary count. You miss the tail of rare cases, or spin for nothing.

Harness

Agent harness: structured output + retry

How it works

The harness is the framework that drives each agent call: structured output enforcement via JSON Schema, Zod validation on receipt, automatic retry on mismatch, abort signal for ghost agents, token budget shared across all workflow agents.

Business impact

Result: zero free-text parsing, zero "the model responded in the wrong format", zero agent running in the background without our knowledge. Reliability goes from 70% to 99%+.

Anti-pattern: Common mistake: parsing the text response with a regex or JSON.parse without validation. A single unexpected character crashes the pipeline.

Parallel orchestration

Pipeline vs barrier: when to parallelize

How it works

Pipeline: each item traverses all steps continuously, without waiting for others. Item A can be in step 3 while item B is still in step 1. Barrier: wait until ALL items of a step are done before moving to the next. Use only when step N needs the aggregated result of N-1 (cross-item deduplication, early-exit if 0 results).

Business impact

Result: a workflow processing 50 documents can be 4x faster in pipeline than sequential mode, without changing a single line of business logic.

Anti-pattern: Common mistake: using parallel() everywhere "because it's cleaner". Each barrier adds latency — wasted parallelism cancels the gain.

Reduce LLM costs without sacrificing quality

The Anthropic bill can be divided by 5 to 20 with the right levers. We know them all.

Prompt caching

Reduce 60-90% costs on repetitive system prompts. Enabled in 2 lines.

Model routing

Haiku for classification, Sonnet for drafting, Opus for critical decisions. 10x cost factor.

Context trimming

Passing 200k tokens on every call = 20x the price. Synthesize, summarize, keep only what matters.

Async batching

For non-real-time tasks, the Batch API halves costs with a few-hour delay.

Stack & tools we master

We don't recommend tools we haven't used in production.

Claude API (Anthropic) — Opus 4 / Sonnet 4 / Haiku 4 — task-based routing

Vercel AI SDK — Streaming, tool use, native structured output

MCP (Model Context Protocol) — Context servers, external tools, integrations

Zod / JSON Schema — Structured output validation, agent output typing

Supabase / pgvector — Persistent memory, RAG, conversation history

LangChain / LlamaIndex — When it makes sense — often overkill

Formats & pricing

On demand, no forced subscription.

Discovery session

1h — Free

We listen to your project, stack, blockers. We tell you honestly if we can help.

Half-day

4h — €500

Architecture review, targeted debugging or pair programming on a specific problem.

FAQ

Do you work only with Claude?

No. We know Claude well (it's our reference model), but we also work on GPT-4o, Gemini, Llama, or hybrid stacks. The advice is model-agnostic — we recommend what's best for your case.

We already have devs — why bring you in?

Production LLM patterns (harness, orchestration, cost optimization) aren't improvised. Your devs may have 6 months of LLM experience; we have 3+ years on real multi-agent systems. That's 1 day of joint work vs 3 months of trial and error.

Do you write code or just consult?

Both. Half-day and full-day include delivered code (PR, documented architecture). The monthly retainer includes direct technical work on your repo.

What exactly is a 'harness'?

A harness is the framework that wraps each LLM call: it enforces structured output, validates the schema, handles retries, cuts off ghost agents and shares the token budget. Without a harness, a multi-agent pipeline is brittle. With a good harness, it runs in production without intervention.

Let's start with a 30-minute call

Share your project, stack and blockers. We'll tell you honestly if and how we can help.

Book a call

WithClaude

CTO on Demand — LLM & Agents

Architecture, orchestration and harness for your Claude projects

You're building on Claude API or another LLM. Multi-agent orchestration, loop patterns and harness aren't clearly documented — we know them in production.

On demand: half-day debugging, architecture design day, or monthly technical CTO retainer.

Free discovery session (1h)All services

Who is this for?

Startup building a SaaS product with Claude API at its core

Dev team stuck on a multi-agent orchestration problem

Non-tech CTO who needs an expert eye before hiring or investment

Claude Code / autonomous agent project going to production

Anthropic bill exploding — LLM architecture to optimize

Migration from a GPT-based system to Claude, without breaking everything

The patterns we master

These are the fundamental building blocks of any robust LLM system in production.

Architecture

Orchestrator + sub-agents

How it works

Business impact

Result: complex tasks (legal analysis, report generation, large-scale code review) that you couldn't automate become reliable and repeatable.

Anti-pattern: Common mistake: putting everything in one agent with a 200k token context. The model loses track, hallucinates, costs a lot.

Loop patterns

Loops: loop-until-dry, loop-until-budget

How it works

Business impact

Result: an audit system that stops when it's truly found everything, not when told to stop. Or content generation that adapts to the allocated budget without manual intervention.

Anti-pattern: Common mistake: for(i=0; i<10; i++) loop with an arbitrary count. You miss the tail of rare cases, or spin for nothing.

Harness

Agent harness: structured output + retry

How it works

Business impact

Result: zero free-text parsing, zero "the model responded in the wrong format", zero agent running in the background without our knowledge. Reliability goes from 70% to 99%+.

Anti-pattern: Common mistake: parsing the text response with a regex or JSON.parse without validation. A single unexpected character crashes the pipeline.

Parallel orchestration

Pipeline vs barrier: when to parallelize

How it works

Business impact

Result: a workflow processing 50 documents can be 4x faster in pipeline than sequential mode, without changing a single line of business logic.

Anti-pattern: Common mistake: using parallel() everywhere "because it's cleaner". Each barrier adds latency — wasted parallelism cancels the gain.

Reduce LLM costs without sacrificing quality

The Anthropic bill can be divided by 5 to 20 with the right levers. We know them all.

Prompt caching

Reduce 60-90% costs on repetitive system prompts. Enabled in 2 lines.

Model routing

Haiku for classification, Sonnet for drafting, Opus for critical decisions. 10x cost factor.

Context trimming

Passing 200k tokens on every call = 20x the price. Synthesize, summarize, keep only what matters.

Async batching

For non-real-time tasks, the Batch API halves costs with a few-hour delay.

Stack & tools we master

We don't recommend tools we haven't used in production.

Claude API (Anthropic) — Opus 4 / Sonnet 4 / Haiku 4 — task-based routing

Vercel AI SDK — Streaming, tool use, native structured output

MCP (Model Context Protocol) — Context servers, external tools, integrations

Zod / JSON Schema — Structured output validation, agent output typing

Supabase / pgvector — Persistent memory, RAG, conversation history

LangChain / LlamaIndex — When it makes sense — often overkill

Formats & pricing

On demand, no forced subscription.

Discovery session

1h — Free

We listen to your project, stack, blockers. We tell you honestly if we can help.

Half-day

4h — €500

Architecture review, targeted debugging or pair programming on a specific problem.

FAQ

Do you work only with Claude?

No. We know Claude well (it's our reference model), but we also work on GPT-4o, Gemini, Llama, or hybrid stacks. The advice is model-agnostic — we recommend what's best for your case.

We already have devs — why bring you in?

Do you write code or just consult?

Both. Half-day and full-day include delivered code (PR, documented architecture). The monthly retainer includes direct technical work on your repo.

What exactly is a 'harness'?

Let's start with a 30-minute call

Share your project, stack and blockers. We'll tell you honestly if and how we can help.

Book a call