Capability · Substrate
The Maestro Pager
Maestro pages models on demand. Every call is a pure function over a goal frame, a context field, and tool definitions. The continuation store owns all session state — model providers own none of it. That decoupling is what lets a journal sleep for weeks, migrate between workers, fail over between providers mid-run, and downshift tiers without ever reconstructing a session.
Why not Foundry Agent threads
Foundry Agents, OpenAI Assistants, and Bedrock Agents all expose a thread lifecycle: createThread → addMessage → createRun → poll → getMessages → deleteThread. The lifecycle is chat-shaped: minutes long, state inside the provider, lost on provider outage. Maestro journals run for weeks. Binding session state to a model provider would mean no migration between workers, no failover, no budget-driven tier switching mid-run, and no replay from the event log.
The Foundry proxy endpoint at {resource}.services.ai.azure.com/anthropic/v1/messages exposes the raw Anthropic Messages API — tools, extended thinking, prompt caching, advisor strategy — with billing inside Azure Marketplace. That is the right integration point. Each tick is a fresh Messages call; the model has no memory of prior ticks because it does not need any.
The two architectures, side by side
- Model provider owns the session
- Thread created → messages accumulate → thread deleted
- Duration: minutes — state inside the provider
- Provider outage = session lost
- No migration, no failover, no replay
- Continuation store owns the session
- Each tick: assemble context → stateless model call → persist
- Duration: weeks — state in Cosmos / DynamoDB
- Provider outage = retry or failover on next tick
- Migrate workers, swap providers, replay any tick
The model call is a pure function:
f(goal_frame, field_manifest, state, tools) → response
The three deployed tiers
Three Claude models are deployed in Foundry as Global Standard, and equivalents are reachable on Bedrock. The kernel emits a tier name — never a model id — and dispatch resolves the tier per tenant.
| Tier | Foundry deployment | Priority | Purpose |
|---|---|---|---|
opus | claude-opus-4-6 | P0 — Critical | Planning, synthesis, complex reasoning, KB authoring |
sonnet | claude-sonnet-4-6 | P1 — Standard | Drafting, analysis, tool selection, evidence summaries |
haiku | claude-haiku-4-5 | P2 — Batch | Classification, tagging, field reranking, extraction |
The ModelDispatch protocol
Every substrate (Azure, AWS, direct) implements the same stateless contract. The continuation does not know which provider answered the call — that is the dispatch layer's job.
@runtime_checkable
class ModelDispatch(Protocol):
async def call(
self,
tier: str, # "opus" | "sonnet" | "haiku"
messages: list[dict], # assembled from continuation state + field
tools: list[dict] | None, # MCP tools available for this tick
budget: BudgetVector, # remaining allowances
mode: str, # "sync" | "async" | "batch"
use_advisor: bool = False, # invoke Opus as advisor on Sonnet/Haiku ticks
system: str | None = None, # system prompt from goal frame + policy
thinking: str = "adaptive", # extended thinking knob
) -> ModelCallResult: ...
async def health_check(self) -> dict[str, bool]: ...
def supported_tiers(self) -> list[str]: ... The result carries an explicit executor vs advisor token split, prompt-cache reads, dollar cost, latency, tier used, and stop reason. The audit trail reads identically regardless of which provider served the call.
The five-tier routing gradient
Layer B of the policy kernel sees a richer gradient than just three tiers. The Advisor strategy lets Sonnet or Haiku consult an Opus advisor mid-call — recovering most of the quality of a tier-up at a fraction of the cost.
| Gradient step | Advisor | When the kernel chooses it |
|---|---|---|
opus | — | Complex planning, multi-step reasoning, evidence evaluation |
sonnet+advisor | ✓ | Synthesis with strategic guidance — novel territory |
sonnet | — | Standard drafting, analysis, comparison |
haiku+advisor | ✓ | Extraction with a quality check from an Opus advisor |
haiku | — | Classification, tagging, field reranking |
Under budget pressure, the kernel downshifts and turns on the advisor as compensation: opus → sonnet+advisor recovers most of Opus's reasoning quality at roughly 30% of its cost; sonnet → haiku+advisor recovers most of Sonnet's at roughly 40%. Hard budget pressure drops everything to plain haiku.
The failover chain
The FailoverModelRouter wraps an ordered list of providers. The continuation does not know which one served any given tick.
Azure-resident tenants
- Foundry proxy primary
- Direct Anthropic API backup
- Bedrock cross-cloud last resort
AWS-resident tenants
- Bedrock primary
- Direct Anthropic API backup
- Foundry proxy cross-cloud last resort
Only recoverable errors (429, 5xx) trigger failover. Non-recoverable errors (auth, bad request, content policy) skip the chain and surface to the worker, which decides whether to retry on the next tick or escalate.
Metering and budget events
Every model call emits a budget_charge event with per-tier executor tokens, advisor tokens, cache reads / cache creations, dollars, latency, and an advisor_consulted flag. The warm store materializes fact_model_call for cost analytics. Continuations get an alert at 80% of the soft cap; at 60% the kernel begins downshifting; at 30% it drops to plain haiku.
Execution phases
- Phase 0
Foundry portal deployments
Deploy claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5 as Global Standard models in a supported region. Verify each tier returns HTTP 200. Manual, no code.
- Phase 1
Substrate interface
Extend the Substrate Protocol with a ModelDispatch contract. Define ModelCallResult and ModelUsage with explicit executor/advisor token splits. Pure types, no cloud SDKs.
- Phase 2
Azure Foundry implementation
AzureFoundryModelDispatch hits the Anthropic proxy endpoint (services.ai.azure.com/anthropic/v1). Stateless. Adds advisor tool injection for Sonnet/Haiku. Pricing + usage reported back through the result.
- Phase 3
AWS Bedrock implementation
BedrockModelDispatch through boto3 invoke_model. Advisor Strategy not native on Bedrock — falls back to direct Anthropic API for advisor calls. Same test suite as Azure must pass.
- Phase 4
Policy kernel routing
Layer B emits one of five tiers in the gradient. Budget-aware downshift compensates with the advisor when crossing tier boundaries.
- Phase 5
Failover chain
FailoverModelRouter wraps providers in preference order: Foundry → direct Anthropic → Bedrock (Azure path) or Bedrock → direct Anthropic → Foundry (AWS path).
- Phase 6
Metering + budget events
Every call emits a budget_charge event with per-tier tokens, cache hits, advisor tokens, dollars, and latency. Warm store materializes fact_model_call.
- Phase 7
Environment + IaC
Env vars for endpoint, key, three deployment IDs, advisor toggle, failover toggle. Key Vault / Secrets Manager provisioning in IaC.
The architectural bet
The model market commoditizes faster than control planes do. By making model calls stateless and the dispatch layer pluggable, Maestro's value lives where it belongs — in the continuation, the context field, and the policy kernel — not in any single model vendor's session state. Foundry Anthropic, Bedrock Claude, and direct Anthropic are interchangeable backends behind one protocol. The pager is the seam.
The full implementation spec — prerequisites, file inventory, env vars, test plan — lives in TASK-maestro-foundry-model-dispatch. This page is the architectural surface. The task doc is the build sheet.