Primitive 03
Policy Kernel
A small, fast, deterministic service that runs before every model call. Sub-10ms p99. No LLMs on the critical control path.
What the kernel decides
The policy kernel answers six questions before every model call:
- Route — which model tier (Haiku / Sonnet / Opus, or GPT equivalent) handles this tick?
- Mode — sync, async, or batched?
- Branching — spawn children or do the work inline?
- Tool gating — which tools is this tick allowed to call, given remaining quota?
- Termination — is the marginal value of continuing positive given the remaining budget?
- Escalation — should the next tick interrupt the researcher?
It is not an LLM
Three reasons, and they are all load-bearing:
- Latency. The kernel runs before every model call, dozens of times per minute per researcher. 5ms GBT beats 500ms LLM every time.
- Determinism. Cargill compliance needs to know why a decision was made. A rules-plus-learned-priors cascade is auditable; an LLM is not.
- Cost. Putting an LLM in front of every LLM call is a regress. The control plane has to be cheap.
The decision input
DecisionInput = {
continuation: { goal_frame, state, status, generation },
budget: { remaining_tokens_by_tier, remaining_dollars,
deadline, tool_quotas, human_interrupts_left },
context_field: { size_tokens, top_score, score_spread,
novelty_vs_prior_tick },
history: { last_N_ticks_costs, last_N_ticks_progress,
consecutive_no_progress_ticks },
global: { current_load, model_health, price_signals }
} The layered cascade
Hard gates
Non-negotiable rules. Budget exhausted? Deadline passed? No signal in the field? Terminate or escalate. Compliance freeze? Stop.
Tier routing
Pick the cheapest model that is likely to make progress. A small capability estimator classifies the tick as extract/synthesize/plan. Soft cap triggers downshift.
Mode & branching
Fork when subgoals are parallelizable and budget allows fan-out. Async when expensive and no human is waiting. Batch when work can defer to the next window.
Tool gating
Cheapest eligible tool first, sorted by learned expected-value-per-dollar. Quota-aware: a tool with no remaining quota is invisible to the tick.
Escalation
Interrupt the human only when the continuation has a blocking question, when confidence is low and spend is high, or when an external signal contradicts the current state.
How the kernel learns
The kernel is initialized with hand-tuned rules and improves via offline supervised learning on the warm store (Redshift / Synapse).
- Input:
DecisionInputfeatures for past ticks - Label: did the tick produce useful progress, measured by downstream outcomes (publish events, researcher approvals, brief quality scores)
- Model: gradient-boosted trees per decision dimension (route, branching, tools)
- Deployment: shadow-mode first, then gated rollout with a kill-switch
We do not use RL on production traffic. The kernel is too critical to be exploratory.
Model routing across clouds
The kernel emits a tier, not a model id. Dispatch resolves the tier to a concrete model on a concrete provider, per tenant. AWS Bedrock and Azure AI Foundry Models are first-class peer surfaces — same Claude family available on both, plus each cloud's own catalog. Failover order and cost levers (Bedrock CRIS, service tiers, prompt caching; Foundry PTU, Intelligent Prompt Routing) are per-tenant configuration.
See Inference for the full tier→provider→model matrix and the per-cloud cost-lever table.
Delegating to managed agent runtimes
For bounded sub-tasks — a managed browser crawl, a code-interpreter session, an MCP-tool-heavy workload — the kernel may delegate to AWS Bedrock AgentCore or Azure AI Foundry Agent Service. Eligibility is decided by the same kernel layers (Layer A gates, Layer B tier match), plus three new checks: expected wall-clock fits the vendor's session ceiling, the sub-task does not need to fork/merge/sleep on a multi-day external signal, and the tenant's policy class allows external-runtime execution.
See Managed Agents for the full delegation rule and the AgentCore vs Foundry Agent Service comparison.
The rationale field
Every decision the kernel emits carries a human-readable rationale. This is what the audit trail uses, what the dashboards visualize, and what the researcher sees when they ask "why did Maestro do that." It is non-negotiable.