The Maestro Pager — Stateless Model Dispatch

Why not Foundry Agent threads

Foundry Agents, OpenAI Assistants, and Bedrock Agents all expose a thread lifecycle: createThread → addMessage → createRun → poll → getMessages → deleteThread. The lifecycle is chat-shaped: minutes long, state inside the provider, lost on provider outage. Maestro journals run for weeks. Binding session state to a model provider would mean no migration between workers, no failover, no budget-driven tier switching mid-run, and no replay from the event log.

The Foundry proxy endpoint at {resource}.services.ai.azure.com/anthropic/v1/messages exposes the raw Anthropic Messages API — tools, extended thinking, prompt caching, advisor strategy — with billing inside Azure Marketplace. That is the right integration point. Each tick is a fresh Messages call; the model has no memory of prior ticks because it does not need any.

The two architectures, side by side

Traditional agent session

Model provider owns the session
Thread created → messages accumulate → thread deleted
Duration: minutes — state inside the provider
Provider outage = session lost
No migration, no failover, no replay

Maestro continuation

Continuation store owns the session
Each tick: assemble context → stateless model call → persist
Duration: weeks — state in Cosmos / DynamoDB
Provider outage = retry or failover on next tick
Migrate workers, swap providers, replay any tick

The model call is a pure function:
f(goal_frame, field_manifest, state, tools) → response

The three deployed tiers

Three Claude models are deployed in Foundry as Global Standard, and equivalents are reachable on Bedrock. The kernel emits a tier name — never a model id — and dispatch resolves the tier per tenant.

Tier	Foundry deployment	Priority	Purpose
`opus`	`claude-opus-4-6`	P0 — Critical	Planning, synthesis, complex reasoning, KB authoring
`sonnet`	`claude-sonnet-4-6`	P1 — Standard	Drafting, analysis, tool selection, evidence summaries
`haiku`	`claude-haiku-4-5`	P2 — Batch	Classification, tagging, field reranking, extraction

The ModelDispatch protocol

Every substrate (Azure, AWS, direct) implements the same stateless contract. The continuation does not know which provider answered the call — that is the dispatch layer's job.

@runtime_checkable
class ModelDispatch(Protocol):
    async def call(
        self,
        tier: str,                    # "opus" | "sonnet" | "haiku"
        messages: list[dict],         # assembled from continuation state + field
        tools: list[dict] | None,     # MCP tools available for this tick
        budget: BudgetVector,         # remaining allowances
        mode: str,                    # "sync" | "async" | "batch"
        use_advisor: bool = False,    # invoke Opus as advisor on Sonnet/Haiku ticks
        system: str | None = None,    # system prompt from goal frame + policy
        thinking: str = "adaptive",   # extended thinking knob
    ) -> ModelCallResult: ...

    async def health_check(self) -> dict[str, bool]: ...
    def supported_tiers(self) -> list[str]: ...

The result carries an explicit executor vs advisor token split, prompt-cache reads, dollar cost, latency, tier used, and stop reason. The audit trail reads identically regardless of which provider served the call.

The five-tier routing gradient

Layer B of the policy kernel sees a richer gradient than just three tiers. The Advisor strategy lets Sonnet or Haiku consult an Opus advisor mid-call — recovering most of the quality of a tier-up at a fraction of the cost.

Gradient step	Advisor	When the kernel chooses it
`opus`	—	Complex planning, multi-step reasoning, evidence evaluation
`sonnet+advisor`	✓	Synthesis with strategic guidance — novel territory
`sonnet`	—	Standard drafting, analysis, comparison
`haiku+advisor`	✓	Extraction with a quality check from an Opus advisor
`haiku`	—	Classification, tagging, field reranking

Under budget pressure, the kernel downshifts and turns on the advisor as compensation: opus → sonnet+advisor recovers most of Opus's reasoning quality at roughly 30% of its cost; sonnet → haiku+advisor recovers most of Sonnet's at roughly 40%. Hard budget pressure drops everything to plain haiku.

The failover chain

The FailoverModelRouter wraps an ordered list of providers. The continuation does not know which one served any given tick.

Azure-resident tenants

Foundry proxy primary
Direct Anthropic API backup
Bedrock cross-cloud last resort

AWS-resident tenants

Bedrock primary
Direct Anthropic API backup
Foundry proxy cross-cloud last resort

Only recoverable errors (429, 5xx) trigger failover. Non-recoverable errors (auth, bad request, content policy) skip the chain and surface to the worker, which decides whether to retry on the next tick or escalate.

Metering and budget events

Every model call emits a budget_charge event with per-tier executor tokens, advisor tokens, cache reads / cache creations, dollars, latency, and an advisor_consulted flag. The warm store materializes fact_model_call for cost analytics. Continuations get an alert at 80% of the soft cap; at 60% the kernel begins downshifting; at 30% it drops to plain haiku.

Execution phases

Phase 0

Foundry portal deployments

Deploy claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5 as Global Standard models in a supported region. Verify each tier returns HTTP 200. Manual, no code.
Phase 1

Substrate interface

Extend the Substrate Protocol with a ModelDispatch contract. Define ModelCallResult and ModelUsage with explicit executor/advisor token splits. Pure types, no cloud SDKs.
Phase 2

Azure Foundry implementation

AzureFoundryModelDispatch hits the Anthropic proxy endpoint (services.ai.azure.com/anthropic/v1). Stateless. Adds advisor tool injection for Sonnet/Haiku. Pricing + usage reported back through the result.
Phase 3

AWS Bedrock implementation

BedrockModelDispatch through boto3 invoke_model. Advisor Strategy not native on Bedrock — falls back to direct Anthropic API for advisor calls. Same test suite as Azure must pass.
Phase 4

Policy kernel routing

Layer B emits one of five tiers in the gradient. Budget-aware downshift compensates with the advisor when crossing tier boundaries.
Phase 5

Failover chain

FailoverModelRouter wraps providers in preference order: Foundry → direct Anthropic → Bedrock (Azure path) or Bedrock → direct Anthropic → Foundry (AWS path).
Phase 6

Metering + budget events

Every call emits a budget_charge event with per-tier tokens, cache hits, advisor tokens, dollars, and latency. Warm store materializes fact_model_call.
Phase 7

Environment + IaC

Env vars for endpoint, key, three deployment IDs, advisor toggle, failover toggle. Key Vault / Secrets Manager provisioning in IaC.

The architectural bet

The model market commoditizes faster than control planes do. By making model calls stateless and the dispatch layer pluggable, Maestro's value lives where it belongs — in the continuation, the context field, and the policy kernel — not in any single model vendor's session state. Foundry Anthropic, Bedrock Claude, and direct Anthropic are interchangeable backends behind one protocol. The pager is the seam.

The full implementation spec — prerequisites, file inventory, env vars, test plan — lives in TASK-maestro-foundry-model-dispatch. This page is the architectural surface. The task doc is the build sheet.

Why not Foundry Agent threads

The two architectures, side by side

The three deployed tiers

The ModelDispatch protocol

The five-tier routing gradient

The failover chain

Azure-resident tenants

AWS-resident tenants

Metering and budget events

Execution phases

Foundry portal deployments

Substrate interface

Azure Foundry implementation

AWS Bedrock implementation

Policy kernel routing

Failover chain

Metering + budget events

Environment + IaC

The architectural bet