Capability
Model routing across Bedrock and Foundry Models
The policy kernel emits a tier, not a model id. Dispatch resolves the tier to a concrete model on a concrete provider, per tenant. Bedrock and Foundry Models are peers; failover is a config knob.
Tier → provider → model
The same logical tier names route to different concrete models on each cloud. This table is the per-tenant default; tenant policy can override on a per-call basis.
| Tier | Bedrock model id | Foundry deployment | Failover order |
|---|---|---|---|
haiku | anthropic.claude-haiku-4-5-20251001-v1:0 | claude-haiku-4-5 | Bedrock → Foundry → Anthropic direct |
sonnet | anthropic.claude-sonnet-4-6 (inf. profile) | claude-sonnet-4-6 | Bedrock → Foundry → Anthropic direct |
opus | anthropic.claude-opus-4-7 (inf. profile) | claude-opus-4-6 | Bedrock → Foundry → Anthropic direct |
gpt-fast | openai.gpt-oss-120b-1:0 | gpt-4.1-mini · gpt-oss-120b | Foundry → OpenAI direct |
gpt-reason | — | o4 · gpt-5 family | Foundry → OpenAI direct |
Two surfaces, peer capabilities
| Dimension | AWS Bedrock | Azure AI Foundry Models |
|---|---|---|
| Surface | One control plane (bedrock) + one data plane (bedrock-runtime) | One Foundry resource (Microsoft.CognitiveServices/accounts kind AIServices) |
| Unified API | Converse / ConverseStream normalizes prompts + tool schemas across providers | OpenAI-compatible /openai/v1/chat/completions and /embeddings |
| Tool schemas | Normalized via toolConfig.tools[].toolSpec across Claude, Cohere, Llama, Nova | Provider-native inside Responses/Chat Completions; Maestro adapts at the boundary |
| Pricing tiers | On-demand · Provisioned Throughput · Batch (−50%) · Priority (+75%) · Flex (−50%) · Reserved | MaaS pay-per-token · PTU (Global / Data Zone / Regional Provisioned) · Managed Compute (dedicated GPU) |
| Cross-region | Cross-Region Inference profiles — Geographic (us./eu./apac.) and Global (≈ −10% vs Geo), no routing surcharge | Per-deployment regional config; Data Zone PTU spans Geo |
| Cost attribution | Application Inference Profiles for per-app tagging; IAM-principal allocation in CUR 2.0 | Azure Cost Management; partner models via Marketplace, MACC-eligible |
| Guardrails | Bedrock Guardrails inline via guardrailConfig; PII + denied topics + contextual grounding + Automated Reasoning | Azure AI Content Safety at the deployment; six categories, togglable severity |
| RAG primitive | Knowledge Bases over OpenSearch Serverless / Neptune / Kendra GenAI | Azure AI Search hybrid; Foundry Knowledge tools |
| Auth | IAM SigV4 · IRSA | Entra ID (DefaultAzureCredential) · Managed Identity · API key |
| Private network | PrivateLink (com.amazonaws.); IAM gates with aws:sourceVpce | Azure Private Link, VNet, public-access flag |
| Latency levers | Latency-Optimized Inference (Haiku 3.5, Llama 70B/405B, Nova Pro); Intelligent Prompt Routing within a family; prompt caching via cachePoint | PTU deployment routing; Foundry Intelligent Prompt Routing (OpenAI family) |
Cost levers the kernel can pull
- Bedrock Cross-Region Inference (CRIS). Geographic and Global profiles. Global ≈ 10% cheaper than Geographic with no routing surcharge. Set per-call when latency budget allows.
- Bedrock service tiers.
default | priority (+75%) | flex (−50%) | reserved. Deferred ticks go toflex; user-waiting ticks go topriority. - Bedrock latency-optimized inference. Haiku 3.5, Llama 70B/405B, Nova Pro. Used for
mode=syncticks where p50 matters. - Bedrock prompt caching via
cachePointblocks. Field manifests that don't change tick-over-tick are obvious caching candidates. - Foundry PTU. Global / Data Zone / Regional Provisioned. For tenants with steady predictable load — Cargill at scale — PTU at the right zone amortizes well below per-token MaaS.
- Foundry Intelligent Prompt Routing within the OpenAI family. Used when the tier is
gpt-fastand the kernel hasn't committed to a specific deployment.
Tool-schema normalization
Bedrock's Converse API normalizes toolConfig.tools[].toolSpec across providers; Foundry's Responses API has a different shape. The dispatch layer (core/maestro/dispatch/) exposes a single internal tool-call ABI; to_provider_tool_schema() and from_provider_tool_call() adapt at the boundary. Every provider is conformance-tested against the same suite — same inputs, same expected toolUse envelope out.
Governance and guardrails
- Bedrock Guardrails applied inline on the
Conversecall (guardrailConfig) for Bedrock-routed traffic. PII, denied topics, contextual grounding, Automated Reasoning checks. - Azure AI Content Safety filters at the Foundry deployment for Foundry-routed traffic. Six categories with togglable severity.
- Provider verdicts are normalized into a single
safety_outcomeenum on themodel_callevent so the audit trail reads the same regardless of route.
Why this is the right shape
The kernel sits above the providers. Adding a third inference surface — or swapping Claude on Bedrock for Claude on Foundry for a given tenant — is a config change, not a code change. The architectural bet is that the model market commoditizes faster than control planes do; the value lives in continuations, the field, and the kernel — not in the model vendor.