Capability

Model routing across Bedrock and Foundry Models

The policy kernel emits a tier, not a model id. Dispatch resolves the tier to a concrete model on a concrete provider, per tenant. Bedrock and Foundry Models are peers; failover is a config knob.

Tier → provider → model

The same logical tier names route to different concrete models on each cloud. This table is the per-tenant default; tenant policy can override on a per-call basis.

TierBedrock model idFoundry deploymentFailover order
haiku anthropic.claude-haiku-4-5-20251001-v1:0 claude-haiku-4-5 Bedrock → Foundry → Anthropic direct
sonnet anthropic.claude-sonnet-4-6 (inf. profile) claude-sonnet-4-6 Bedrock → Foundry → Anthropic direct
opus anthropic.claude-opus-4-7 (inf. profile) claude-opus-4-6 Bedrock → Foundry → Anthropic direct
gpt-fast openai.gpt-oss-120b-1:0 gpt-4.1-mini · gpt-oss-120b Foundry → OpenAI direct
gpt-reason o4 · gpt-5 family Foundry → OpenAI direct

Two surfaces, peer capabilities

DimensionAWS BedrockAzure AI Foundry Models
Surface One control plane (bedrock) + one data plane (bedrock-runtime) One Foundry resource (Microsoft.CognitiveServices/accounts kind AIServices)
Unified API Converse / ConverseStream normalizes prompts + tool schemas across providers OpenAI-compatible /openai/v1/chat/completions and /embeddings
Tool schemas Normalized via toolConfig.tools[].toolSpec across Claude, Cohere, Llama, Nova Provider-native inside Responses/Chat Completions; Maestro adapts at the boundary
Pricing tiers On-demand · Provisioned Throughput · Batch (−50%) · Priority (+75%) · Flex (−50%) · Reserved MaaS pay-per-token · PTU (Global / Data Zone / Regional Provisioned) · Managed Compute (dedicated GPU)
Cross-region Cross-Region Inference profiles — Geographic (us./eu./apac.) and Global (≈ −10% vs Geo), no routing surcharge Per-deployment regional config; Data Zone PTU spans Geo
Cost attribution Application Inference Profiles for per-app tagging; IAM-principal allocation in CUR 2.0 Azure Cost Management; partner models via Marketplace, MACC-eligible
Guardrails Bedrock Guardrails inline via guardrailConfig; PII + denied topics + contextual grounding + Automated Reasoning Azure AI Content Safety at the deployment; six categories, togglable severity
RAG primitive Knowledge Bases over OpenSearch Serverless / Neptune / Kendra GenAI Azure AI Search hybrid; Foundry Knowledge tools
Auth IAM SigV4 · IRSA Entra ID (DefaultAzureCredential) · Managed Identity · API key
Private network PrivateLink (com.amazonaws..bedrock(-runtime)); IAM gates with aws:sourceVpce Azure Private Link, VNet, public-access flag
Latency levers Latency-Optimized Inference (Haiku 3.5, Llama 70B/405B, Nova Pro); Intelligent Prompt Routing within a family; prompt caching via cachePoint PTU deployment routing; Foundry Intelligent Prompt Routing (OpenAI family)

Cost levers the kernel can pull

  • Bedrock Cross-Region Inference (CRIS). Geographic and Global profiles. Global ≈ 10% cheaper than Geographic with no routing surcharge. Set per-call when latency budget allows.
  • Bedrock service tiers. default | priority (+75%) | flex (−50%) | reserved. Deferred ticks go to flex; user-waiting ticks go to priority.
  • Bedrock latency-optimized inference. Haiku 3.5, Llama 70B/405B, Nova Pro. Used for mode=sync ticks where p50 matters.
  • Bedrock prompt caching via cachePoint blocks. Field manifests that don't change tick-over-tick are obvious caching candidates.
  • Foundry PTU. Global / Data Zone / Regional Provisioned. For tenants with steady predictable load — Cargill at scale — PTU at the right zone amortizes well below per-token MaaS.
  • Foundry Intelligent Prompt Routing within the OpenAI family. Used when the tier is gpt-fast and the kernel hasn't committed to a specific deployment.

Tool-schema normalization

Bedrock's Converse API normalizes toolConfig.tools[].toolSpec across providers; Foundry's Responses API has a different shape. The dispatch layer (core/maestro/dispatch/) exposes a single internal tool-call ABI; to_provider_tool_schema() and from_provider_tool_call() adapt at the boundary. Every provider is conformance-tested against the same suite — same inputs, same expected toolUse envelope out.

Governance and guardrails

  • Bedrock Guardrails applied inline on the Converse call (guardrailConfig) for Bedrock-routed traffic. PII, denied topics, contextual grounding, Automated Reasoning checks.
  • Azure AI Content Safety filters at the Foundry deployment for Foundry-routed traffic. Six categories with togglable severity.
  • Provider verdicts are normalized into a single safety_outcome enum on the model_call event so the audit trail reads the same regardless of route.

Why this is the right shape

The kernel sits above the providers. Adding a third inference surface — or swapping Claude on Bedrock for Claude on Foundry for a given tenant — is a config change, not a code change. The architectural bet is that the model market commoditizes faster than control planes do; the value lives in continuations, the field, and the kernel — not in the model vendor.

Sources