Policy Kernel — Maestro

What the kernel decides

The policy kernel answers six questions before every model call:

Route — which model tier (Haiku / Sonnet / Opus, or GPT equivalent) handles this tick?
Mode — sync, async, or batched?
Branching — spawn children or do the work inline?
Tool gating — which tools is this tick allowed to call, given remaining quota?
Termination — is the marginal value of continuing positive given the remaining budget?
Escalation — should the next tick interrupt the researcher?

It is not an LLM

Three reasons, and they are all load-bearing:

Latency. The kernel runs before every model call, dozens of times per minute per researcher. 5ms GBT beats 500ms LLM every time.
Determinism. Cargill compliance needs to know why a decision was made. A rules-plus-learned-priors cascade is auditable; an LLM is not.
Cost. Putting an LLM in front of every LLM call is a regress. The control plane has to be cheap.

The decision input

DecisionInput = {
  continuation:   { goal_frame, state, status, generation },
  budget:         { remaining_tokens_by_tier, remaining_dollars,
                    deadline, tool_quotas, human_interrupts_left },
  context_field:  { size_tokens, top_score, score_spread,
                    novelty_vs_prior_tick },
  history:        { last_N_ticks_costs, last_N_ticks_progress,
                    consecutive_no_progress_ticks },
  global:         { current_load, model_health, price_signals }
}

The layered cascade

Hard gates

Non-negotiable rules. Budget exhausted? Deadline passed? No signal in the field? Terminate or escalate. Compliance freeze? Stop.

Tier routing

Pick the cheapest model that is likely to make progress. A small capability estimator classifies the tick as extract/synthesize/plan. Soft cap triggers downshift.

Mode & branching

Fork when subgoals are parallelizable and budget allows fan-out. Async when expensive and no human is waiting. Batch when work can defer to the next window.

Tool gating

Cheapest eligible tool first, sorted by learned expected-value-per-dollar. Quota-aware: a tool with no remaining quota is invisible to the tick.

Escalation

Interrupt the human only when the continuation has a blocking question, when confidence is low and spend is high, or when an external signal contradicts the current state.

How the kernel learns

The kernel is initialized with hand-tuned rules and improves via offline supervised learning on the warm store (Redshift / Synapse).

Input: DecisionInput features for past ticks
Label: did the tick produce useful progress, measured by downstream outcomes (publish events, researcher approvals, brief quality scores)
Model: gradient-boosted trees per decision dimension (route, branching, tools)
Deployment: shadow-mode first, then gated rollout with a kill-switch

We do not use RL on production traffic. The kernel is too critical to be exploratory.

Model routing across clouds

The kernel emits a tier, not a model id. Dispatch resolves the tier to a concrete model on a concrete provider, per tenant. AWS Bedrock and Azure AI Foundry Models are first-class peer surfaces — same Claude family available on both, plus each cloud's own catalog. Failover order and cost levers (Bedrock CRIS, service tiers, prompt caching; Foundry PTU, Intelligent Prompt Routing) are per-tenant configuration.

See Inference for the full tier→provider→model matrix and the per-cloud cost-lever table.

Delegating to managed agent runtimes

For bounded sub-tasks — a managed browser crawl, a code-interpreter session, an MCP-tool-heavy workload — the kernel may delegate to AWS Bedrock AgentCore or Azure AI Foundry Agent Service. Eligibility is decided by the same kernel layers (Layer A gates, Layer B tier match), plus three new checks: expected wall-clock fits the vendor's session ceiling, the sub-task does not need to fork/merge/sleep on a multi-day external signal, and the tenant's policy class allows external-runtime execution.

See Managed Agents for the full delegation rule and the AgentCore vs Foundry Agent Service comparison.

The rationale field

Every decision the kernel emits carries a human-readable rationale. This is what the audit trail uses, what the dashboards visualize, and what the researcher sees when they ask "why did Maestro do that." It is non-negotiable.

What the kernel decides

It is not an LLM

The decision input

The layered cascade

Hard gates

Tier routing

Mode &amp; branching

Tool gating

Escalation

How the kernel learns

Model routing across clouds

Delegating to managed agent runtimes

The rationale field

Mode & branching