Capability · Substrate

The Maestro Pager

Maestro pages models on demand. Every call is a pure function over a goal frame, a context field, and tool definitions. The continuation store owns all session state — model providers own none of it. That decoupling is what lets a journal sleep for weeks, migrate between workers, fail over between providers mid-run, and downshift tiers without ever reconstructing a session.

↳ full task spec ↳ tier → provider matrix ↳ policy kernel

Why not Foundry Agent threads

Foundry Agents, OpenAI Assistants, and Bedrock Agents all expose a thread lifecycle: createThread → addMessage → createRun → poll → getMessages → deleteThread. The lifecycle is chat-shaped: minutes long, state inside the provider, lost on provider outage. Maestro journals run for weeks. Binding session state to a model provider would mean no migration between workers, no failover, no budget-driven tier switching mid-run, and no replay from the event log.

The Foundry proxy endpoint at {resource}.services.ai.azure.com/anthropic/v1/messages exposes the raw Anthropic Messages API — tools, extended thinking, prompt caching, advisor strategy — with billing inside Azure Marketplace. That is the right integration point. Each tick is a fresh Messages call; the model has no memory of prior ticks because it does not need any.

The two architectures, side by side

Traditional agent session
  • Model provider owns the session
  • Thread created → messages accumulate → thread deleted
  • Duration: minutes — state inside the provider
  • Provider outage = session lost
  • No migration, no failover, no replay
Maestro continuation
  • Continuation store owns the session
  • Each tick: assemble context → stateless model call → persist
  • Duration: weeks — state in Cosmos / DynamoDB
  • Provider outage = retry or failover on next tick
  • Migrate workers, swap providers, replay any tick

The model call is a pure function:
f(goal_frame, field_manifest, state, tools) → response

The three deployed tiers

Three Claude models are deployed in Foundry as Global Standard, and equivalents are reachable on Bedrock. The kernel emits a tier name — never a model id — and dispatch resolves the tier per tenant.

TierFoundry deploymentPriorityPurpose
opus claude-opus-4-6 P0 — Critical Planning, synthesis, complex reasoning, KB authoring
sonnet claude-sonnet-4-6 P1 — Standard Drafting, analysis, tool selection, evidence summaries
haiku claude-haiku-4-5 P2 — Batch Classification, tagging, field reranking, extraction

The ModelDispatch protocol

Every substrate (Azure, AWS, direct) implements the same stateless contract. The continuation does not know which provider answered the call — that is the dispatch layer's job.

@runtime_checkable
class ModelDispatch(Protocol):
    async def call(
        self,
        tier: str,                    # "opus" | "sonnet" | "haiku"
        messages: list[dict],         # assembled from continuation state + field
        tools: list[dict] | None,     # MCP tools available for this tick
        budget: BudgetVector,         # remaining allowances
        mode: str,                    # "sync" | "async" | "batch"
        use_advisor: bool = False,    # invoke Opus as advisor on Sonnet/Haiku ticks
        system: str | None = None,    # system prompt from goal frame + policy
        thinking: str = "adaptive",   # extended thinking knob
    ) -> ModelCallResult: ...

    async def health_check(self) -> dict[str, bool]: ...
    def supported_tiers(self) -> list[str]: ...

The result carries an explicit executor vs advisor token split, prompt-cache reads, dollar cost, latency, tier used, and stop reason. The audit trail reads identically regardless of which provider served the call.

The five-tier routing gradient

Layer B of the policy kernel sees a richer gradient than just three tiers. The Advisor strategy lets Sonnet or Haiku consult an Opus advisor mid-call — recovering most of the quality of a tier-up at a fraction of the cost.

Gradient stepAdvisorWhen the kernel chooses it
opus Complex planning, multi-step reasoning, evidence evaluation
sonnet+advisor Synthesis with strategic guidance — novel territory
sonnet Standard drafting, analysis, comparison
haiku+advisor Extraction with a quality check from an Opus advisor
haiku Classification, tagging, field reranking

Under budget pressure, the kernel downshifts and turns on the advisor as compensation: opus → sonnet+advisor recovers most of Opus's reasoning quality at roughly 30% of its cost; sonnet → haiku+advisor recovers most of Sonnet's at roughly 40%. Hard budget pressure drops everything to plain haiku.

The failover chain

The FailoverModelRouter wraps an ordered list of providers. The continuation does not know which one served any given tick.

Azure-resident tenants

  1. Foundry proxy primary
  2. Direct Anthropic API backup
  3. Bedrock cross-cloud last resort

AWS-resident tenants

  1. Bedrock primary
  2. Direct Anthropic API backup
  3. Foundry proxy cross-cloud last resort

Only recoverable errors (429, 5xx) trigger failover. Non-recoverable errors (auth, bad request, content policy) skip the chain and surface to the worker, which decides whether to retry on the next tick or escalate.

Metering and budget events

Every model call emits a budget_charge event with per-tier executor tokens, advisor tokens, cache reads / cache creations, dollars, latency, and an advisor_consulted flag. The warm store materializes fact_model_call for cost analytics. Continuations get an alert at 80% of the soft cap; at 60% the kernel begins downshifting; at 30% it drops to plain haiku.

Execution phases

  1. Phase 0

    Foundry portal deployments

    Deploy claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5 as Global Standard models in a supported region. Verify each tier returns HTTP 200. Manual, no code.

  2. Phase 1

    Substrate interface

    Extend the Substrate Protocol with a ModelDispatch contract. Define ModelCallResult and ModelUsage with explicit executor/advisor token splits. Pure types, no cloud SDKs.

  3. Phase 2

    Azure Foundry implementation

    AzureFoundryModelDispatch hits the Anthropic proxy endpoint (services.ai.azure.com/anthropic/v1). Stateless. Adds advisor tool injection for Sonnet/Haiku. Pricing + usage reported back through the result.

  4. Phase 3

    AWS Bedrock implementation

    BedrockModelDispatch through boto3 invoke_model. Advisor Strategy not native on Bedrock — falls back to direct Anthropic API for advisor calls. Same test suite as Azure must pass.

  5. Phase 4

    Policy kernel routing

    Layer B emits one of five tiers in the gradient. Budget-aware downshift compensates with the advisor when crossing tier boundaries.

  6. Phase 5

    Failover chain

    FailoverModelRouter wraps providers in preference order: Foundry → direct Anthropic → Bedrock (Azure path) or Bedrock → direct Anthropic → Foundry (AWS path).

  7. Phase 6

    Metering + budget events

    Every call emits a budget_charge event with per-tier tokens, cache hits, advisor tokens, dollars, and latency. Warm store materializes fact_model_call.

  8. Phase 7

    Environment + IaC

    Env vars for endpoint, key, three deployment IDs, advisor toggle, failover toggle. Key Vault / Secrets Manager provisioning in IaC.

The architectural bet

The model market commoditizes faster than control planes do. By making model calls stateless and the dispatch layer pluggable, Maestro's value lives where it belongs — in the continuation, the context field, and the policy kernel — not in any single model vendor's session state. Foundry Anthropic, Bedrock Claude, and direct Anthropic are interchangeable backends behind one protocol. The pager is the seam.

The full implementation spec — prerequisites, file inventory, env vars, test plan — lives in TASK-maestro-foundry-model-dispatch. This page is the architectural surface. The task doc is the build sheet.