Routing

How spawn picks the right model and harness for each task using a tiny local classifier.

spawn decides who does what. When you create a task, the router reads its description, classifies it once, and maps that classification to a concrete model and harness — all locally, before any session starts. Routing happens once per task, not per turn, so there's no thrashing between models mid-conversation.

The classifier

The router is the spawn-router (also called the tiny-coding-router): a text-only DeBERTa multi-head classifier. From a single task description it predicts several dimensions at once:

task_type — design, refactor, exploration, docs, and so on.
complexity — how hard the task is.
risk — how dangerous the change is.
Sub-dimensions such as security_surface and production_exposure.

It is built to run locally and fast. Classification completes in under 100ms; the quantized int8 ONNX model runs at roughly ~55ms median on CPU (Windows 11). Because it classifies once per task rather than on every turn, the chosen model stays stable for the life of the task and there's no per-turn thrash.

From prediction to decision

The raw prediction is mapped to a decision through a policy layer — a TypeScript map and config that turns dimensions into a choice. The policy intent looks like:

task_type = design | refactor → prefer Claude (or Grok for UI work).
complexity = hard or risk = high → bump to a top-tier model.
exploration or docs → a cheaper, faster model.

The decision returned has this shape:

{
  driverKind: "claude" | "codex" | "grok" | ...,
  model: "...",
  rationale?: string
}

How it runs

Inference lives in the server's CodingRouter service. It shells out to a Python predict script that runs the ONNX model as a subprocess, then parses the JSON result. Results are cached with a TTL so repeated classifications of the same task description don't re-run inference.

The router is consulted at task creation: spawn routes first, then creates the session or thread with the chosen provider and model already preselected. See CLI for how tasks are created.

Roadmap

The following are planned, not yet shipped:

Production feedback loop — log overrides and outcomes to retrain the classifier.
File-context augmentation — pass open file paths into classification for sharper decisions.
Session momentum — inherit recent tiers so related tasks stay on consistent models.

Routing and coordination work together: the router and coordinator share awareness of the dependency graph, so colliding tasks can be sequenced instead of racing. For how models are actually reached, see the gateway.

The classifier

From prediction to decision

How it runs

Roadmap

On this page