The orchestration layer that doesn't collapse under its own weight

It starts clean.

A planner, three tools, a simple loop. You can read the whole flow in five minutes. You're proud of it.

Six months later, you can't explain what it does without opening four files simultaneously. New team members are afraid to touch it. Every change breaks something upstream.

I have seen this happen at every scale — solo projects, funded startups, engineering teams of twenty. The collapse is predictable. And it is almost never caused by the thing people blame.

The real cause: mixed responsibilities

When you write if tool_a_fails: try tool_b, you have hardcoded a planning decision inside an execution step.

When you write if context_length > 3000: summarize_first, you have embedded a memory management decision inside your main loop.

When you write if user_is_premium: use_gpt4 else: use_gpt35, you have put a routing decision inside the agent that should be outside it.

These decisions seem small individually. They compound into a system that no single person fully understands — including the person who wrote it.

The three layers that need to stay separate

Every production orchestration system I have seen that holds up over time separates three things that most teams bundle together.

The planner

Decides what needs to happen. It knows about the current goal, the conversation context, what tools are available in principle, and what has been tried. It does not know about HTTP errors, retry counts, API rate limits, or token costs. Those are not its concerns.

The planner's output is a structured intent: what should happen next and why. Not code. Not a tool call. An intention that the next layer can act on.

The executor

Knows how to run one specific tool call. It handles timeouts, retries, error formatting, and cost tracking for that call. It does not decide whether to try a different tool. It does not know about the broader plan. It runs one thing and reports back: success with a result, or failure with a reason.

The router

Sits between planner and executor. When the executor reports failure, the router decides whether to retry the same tool, try an alternative, ask the planner to revise the plan, escalate to a human, or stop.

This is where your conditional logic lives — isolated, testable, changeable without touching the planner or the executor.

The diagnostic question

There is one question that tells you whether your orchestration is well-designed:

Can you add a new tool without changing anything in the planner?

If the answer is no — you have coupled things that should be separate. The planner knows too much about the specific tools. Adding a tool means touching the planner, which means touching a component that was already working, which means regression risk.

In a well-designed system, adding a new tool means: registering it with the executor, describing it in the tool registry the planner reads from, and testing the executor in isolation. Nothing in the planner changes.

What this looks like in practice

This is not about which framework you use. LangChain, LlamaIndex, custom code — the separation applies to all of them. The framework gives you primitives. The design is yours.

In practice, the separation looks like this:

The planner is a function that takes state and returns an intent object. It is purely functional — given the same state, it returns the same intent. It is easy to test, easy to reason about, and easy to change without side effects.

The executor is a collection of functions, one per tool, each independently testable. They know nothing about each other.

The router is a state machine — explicit about the states it can be in and the transitions between them. Every conditional branch lives here, visible and documented.

When you debug a failure in this system, you know immediately which layer to look at. The planner made a bad decision. The executor failed to call the tool. The router took the wrong path. The fault is localised.

When you debug a failure in a coupled system, you follow a chain of conditionals across six files and discover that a retry decision three levels deep is silently swallowing the error you need to see.

The cost of getting this wrong

Coupled orchestration is not just a maintenance problem. It is a scaling problem.

Every new capability you add to a coupled system increases the chance that something unrelated breaks. At some point — usually around the time you are trying to ship a feature that would take two days in a clean system — the cost of not refactoring becomes obvious.

The teams that refactor at that point spend two to four weeks rebuilding something they already built. The teams that designed the separation upfront spend those two to four weeks shipping.

The separation costs almost nothing at design time. It costs a great deal to retrofit.

The real cause: mixed responsibilities

The three layers that need to stay separate

The diagnostic question

What this looks like in practice

The cost of getting this wrong

Building something like this?