Compute & Storage = Programs | AI + Data = Value
This roadmap outlines the evolution from proof-of-value demonstration to enterprise-scale sustainable solution, acknowledging architectural volatility while prioritizing rapid value realization.
┌─────────────────────────────────────────────────────────────────────────┐
│ POV: PROOF OF VALUE │
│ "This Works But Isn't Sustainable" │
└─────────────────────────────────────────────────────────────────────────┘
CURRENT STATE (Week 1-6)
┌──────────────────┐
│ SharePoint │ ◄─── Single Site, Manual Process
│ Online Archive │ • Poultry nutrition docs only
│ (Nested ZIPs) │ • ~10K-50K documents
└────────┬─────────┘ • Business hours only (9am-5pm)
│ Delta Query API
│ Every 15 min
▼
┌────────────────────────────────────────────────────────────────────┐
│ Azure Durable Functions (Consumption Plan - $5-20/mo) │
│ ┌──────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Timer │───▶│ Orchestrator│───▶│ Parallel │ │
│ │ Trigger │ │ (Fan-out) │ │ Processing │ │
│ │ (15 min) │ │ │ │ (100 concurrent)│ │
│ └──────────────┘ └─────────────┘ └────────┬────────┘ │
│ │ │
│ Processing Pipeline: │ │
│ 1. Extract nested ZIPs (recursive, max depth 10)│ │
│ 2. Multi-format extraction (PDF/Excel/Word/PPT) │ │
│ 3. Chunk text (512 tokens, 50 overlap) │ │
│ 4. Generate embeddings (batch of 16) │ │
│ 5. Upload to search (batch of 100) │ │
└──────────────────────────────────────────────────┬-────────────--──┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Azure AI Search - BASIC TIER ($75/mo) │
│ • Single index: "swine-protocols-data" │
│ • 50K-150K documents max │
│ • 15 indexes, 45GB storage │
│ • NO high availability (single replica) │
│ • 1024-dim vectors (text-embedding-3-large) │
│ • Hybrid search: Keyword (BM25) + Vector + Semantic Ranking │
└─────────────────────────┬───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Azure AI Foundry / GPT-4o RAG │
│ • Simple Prompt Flow interface │
│ • Manual query enhancement │
│ • Basic citation extraction │
│ • 100 pilot users (R&D team only) │
└─────────────────────────┬───────────────────────────────────────────┘
│
▼
┌───────────────┐
│ Basic Web UI │ ◄─── Prompt Flow Demo Interface
│ (Pilot Only) │ • No authentication
└───────────────┘ • No usage tracking
• No API layer
═══════════════════════════════════════════════════════════════════════
LIMITATIONS & RISKS
⚠️ NOT INTEGRATED: Siloed from other ANH AI capabilities
⚠️ NOT SCALABLE: Basic tier limits prevent growth
⚠️ SINGLE SPECIES: Only poultry data, no swine/aqua/pet
⚠️ NO GOVERNANCE: No access controls, audit trails, or compliance
⚠️ MANUAL PROCESS: Requires intervention for new data sources
⚠️ FRAGILE: No disaster recovery, single point of failure
⚠️ LIMITED UI: Demo interface unsuitable for production use
⚠️ NO API: Other applications cannot leverage the data
═══════════════════════════════════════════════════════════════════════
VALUE DEMONSTRATED
✓ Search success rate: 72% (vs 45% SharePoint baseline)
✓ Time to information: 60% reduction (5 min → 2 min avg)
✓ Zero-result queries: 4% (vs 18% baseline)
✓ User satisfaction: 8.3/10 (pilot group)
✓ ROI: 5,436% (conservative: 1,089%)
✓ Payback period: 6.6 days
💰 Total POV Cost: $360 (3 months) | Value: $389,063 (quarterly savings)
┌─────────────────────────────────────────────────────────────────────────┐
│ MVP: MINIMUM VIABLE PRODUCT │
│ "Ground Work for Sustained, Scalable Capability" │
│ Timeline: Months 4-9 (6 months) │
└─────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ DATA SOURCES (EXPANDED) │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ SharePoint │ │ SharePoint │ │ SharePoint │ │
│ │ Poultry Site │ │ Swine Site │ │ Aqua Site │ │
│ └───────┬───────┘ └──────-─┬──────┘ └───-────┬──────┘ │
│ │ │ │ │
│ └───────────────────┴──────────────────┘ │
│ │ │
│ Multi-Site Delta Query │
│ (Innovation Center Aware) │
└──────────────────────────────┬──────────────────────────────────────-┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ UNIFIED ETL ORCHESTRATION LAYER │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Azure Data Factory (OR) Durable Functions Premium Plan │ │
│ │ • Multi-site orchestration with site-specific configs │ │
│ │ • Automated schema detection per Innovation Center │ │
│ │ • Quality gates: validation, deduplication, metadata checks │ │
│ │ • Lineage tracking: source → processing → indexing │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────┐ ┌──────────────────────────┐ │
│ │ Processing Modules │ │ Intermediate Storage │ │
│ │ • ZIP extractor │───────▶│ Azure Blob Storage │ │
│ │ • Format parsers │ │ (Hot tier) │ │
│ │ • Chunking engine │ │ • Raw documents │ │
│ │ • Embedding service │ │ • Processed chunks │ │
│ │ • Metadata enricher │ │ • Audit logs │ │
│ └─────────────────────┘ └──────────────────────────┘ │
└──────────────────────────────┬───────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ Azure AI Search - STANDARD S1 ($500/mo) │
│ • 2 replicas (99.9% SLA for reads) │
│ • Multiple indexes by species/center │
│ • Cross-species federated search capability │
│ │
│ Index Structure: │
│ ├─ poultry-nutrition-data │
│ ├─ swine-nutrition-data │
│ ├─ aqua-nutrition-data │
│ └─ pet-nutrition-data (future) │
│ │
│ Features: │
│ • Hybrid search (keyword + vector + semantic) │
│ • Security trimming (user-level access control) │
│ • Custom analyzers for scientific nomenclature │
│ • Synonym maps for cross-species terminology │
└──────────────────────────────┬───────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ MANAGEMENT & API LAYER (NEW) │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Azure API Management ($140/mo Developer tier) │ │
│ │ • Rate limiting & throttling │ │
│ │ • API versioning & lifecycle management │ │
│ │ • Usage analytics & cost tracking per application │ │
│ │ • Authentication & authorization (Azure AD integration) │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ REST API Endpoints: │
│ ├─ /search/hybrid - Multi-species hybrid search │
│ ├─ /search/species/{id} - Species-specific queries │
│ ├─ /documents/upload - Manual document ingestion │
│ ├─ /documents/status - ETL pipeline monitoring │
│ ├─ /embeddings/generate - Embedding service for other apps │
│ ├─ /metadata/enrich - Metadata enhancement service │
│ └─ /analytics/usage - Usage metrics & cost attribution │
└──────────────────────────────┬───────────────────────────────────────┘
│
┌────────────┴────────────┐
│ │
▼ ▼
┌─────────────────────────────┐ ┌──────────────────────────────────┐
│ LLM APPLICATION LAYER │ │ MANAGEMENT UI/PORTAL │
│ │ │ |
│ ┌────────────────────────┐ │ │ ┌────────────────────────────┐ │
│ │ RAG Chatbot (GPT-4o) │ │ │ │ Admin Dashboard │ │
│ │ • Species-aware │ │ │ │ • ETL job monitoring │ │
│ │ • Multi-turn context │ │ │ │ • Index management │ │
│ │ • Citation tracking │ │ │ │ • User access control │ │
│ └────────────────────────┘ │ │ │ • Cost & usage analytics │ │
│ │ │ │ • Data quality dashboard │ │
│ ┌────────────────────────┐ │ │ └────────────────────────────┘ │
│ │ Document Analysis API │ │ │ │
│ │ • Batch processing │ │ │ ┌────────────────────────────┐ │
│ │ • Trend extraction │ │ │ │ End-User Search UI │ │
│ │ • Comparative analysis │ │ │ │ • Role-based views │ │
│ └────────────────────────┘ │ │ │ • Saved searches │ │
│ │ │ │ • Export capabilities │ │
│ ┌────────────────────────┐ │ │ │ • Feedback mechanisms │ │
│ │ Research Assistant │ │ │ └────────────────────────────┘ │
│ │ • Experiment summaries │ │ │ │
│ │ • Methodology finder │ │ │ Authentication: │
│ │ • Result aggregation │ │ │ Azure AD / SSO Integration │
│ └────────────────────────┘ │ └──────────────────────────────────┘
└─────────────────────────────┘
═══════════════════════════════════════════════════════════════════════
NEW CAPABILITIES IN MVP
✓ MULTI-SPECIES: Poultry, Swine, Aqua in separate indexes
✓ API-FIRST: Other applications can leverage the data/AI
✓ GOVERNANCE: Role-based access, audit logs, compliance ready
✓ SCALABLE: Standard tier supports 500K docs, 3 species
✓ MANAGEABLE: Admin UI for monitoring, configuration, operations
✓ HIGH AVAILABILITY: 2 replicas, 99.9% SLA
✓ EXTENSIBLE: Plugin architecture for new data sources
✓ COST-AWARE: Usage tracking & attribution per department
═══════════════════════════════════════════════════════════════════════
ELEMENTS OF VOLATILITY ⚠️
🔄 Azure AI Strategy Evolution
• Azure AI Foundry vs standalone OpenAI services
• GPT model selection (4o vs 4.1 vs 5)
• Microsoft Copilot Studio integration path unclear
🔄 Search Technology Direction
• Azure AI Search vs potential ZFS native capabilities
• Vector database alternatives (Cosmos DB, Pinecone, custom)
• Semantic ranking model updates (L2 reranker changes)
🔄 ANH Enterprise AI Consolidation
• Risk of mandate to use centralized AI platform
• Potential integration with other nutrition tools
• Corporate AI governance requirements TBD
🔄 Data Source Changes
• SharePoint migration timeline uncertain
• Innovation Center workflow standardization pending
• New species requirements (Pet, Specialty) not scoped
⚡ MITIGATION: Loose coupling via API layer enables technology swaps
without disrupting consuming applications. Incremental
value delivery means benefits accrue even if re-work needed.
═══════════════════════════════════════════════════════════════════════
💰 MVP Cost: $1,200/month ($14,400 annually)
📈 Expected Value: $1.5M-2M annually (200-300 users, 3 species)
⏱️ Timeline: 6 months to production (Months 4-9)
👥 Serves: 200-300 R&D staff across 3 species
┌─────────────────────────────────────────────────────────────────────────┐
│ FINAL PRODUCT: ENTERPRISE PLATFORM │
│ "Integrated, Global, Multi-Species AI Platform" │
│ Timeline: Months 10-18 (9 months) │
└─────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ GLOBAL DATA ECOSYSTEM │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ UNIFIED DATA LAYER │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ SharePoint │ │ ZFS Native │ │ External │ │ │
│ │ │ (Legacy) │ │ Storage │ │ Research DBs │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │ │
│ │ └─────────────────┴─────────────────┘ │ │
│ │ │ │ │
│ │ Unified Data Mesh │ │
│ │ (Data Catalog + Lineage + Quality) │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ Species Coverage: │
│ ✓ Poultry ✓ Swine ✓ Aqua ✓ Pet ✓ Specialty │
│ │
│ Innovation Centers: │
│ ✓ North America (3) ✓ Europe (2) ✓ Asia-Pacific (2) │
│ ✓ Latin America (1) │
└──────────────────────────────┬───────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ ENTERPRISE ETL & PROCESSING PLATFORM │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Azure Data Factory + Synapse Analytics │ │
│ │ • Real-time streaming for hot-path data │ │
│ │ • Batch processing for historical archives │ │
│ │ • Multi-region replication (US, EU, APAC) │ │
│ │ • Automated data quality & validation pipelines │ │
│ │ • Change Data Capture (CDC) from ZFS │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ AI Processing Pipeline │ │
│ │ ├─ Advanced document understanding (Azure Doc Intelligence) │ │
│ │ ├─ Multi-modal processing (text, images, tables, graphs) │ │
│ │ ├─ Entity extraction (compounds, organisms, measurements) │ │
│ │ ├─ Relationship mapping (studies → outcomes) │ │
│ │ ├─ Knowledge graph construction │ │
│ │ └─ Automated metadata tagging & enrichment │ │
│ └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────┬───────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ MULTI-TIER INTELLIGENT SEARCH & KNOWLEDGE LAYER │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Azure AI Search - STANDARD S2/S3 (3 partitions, 3 replicas) │ │
│ │ 99.95% SLA | Multi-region | 1M+ documents │ │
│ │ │ │
│ │ Federated Search Architecture: │ │
│ │ ├─ Global cross-species index (unified queries) │ │
│ │ ├─ Species-specific indexes (optimized retrieval) │ │
│ │ ├─ Regional indexes (data residency compliance) │ │
│ │ └─ Temporal indexes (time-series research data) │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Knowledge Graph (Neo4j or Cosmos DB Gremlin) │ │
│ │ • Entity relationships: compounds → studies → outcomes │ │
│ │ • Temporal connections: research evolution over time │ │
│ │ • Cross-species insights: transferable learnings │ │
│ │ • Citation networks: methodology lineage │ │
│ └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────┬───────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ UNIFIED AI & API PLATFORM │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Azure API Management - PREMIUM ($3,000/mo) │ │
│ │ • Multi-region deployment (low latency globally) │ │
│ │ • Advanced throttling & quota management │ │
│ │ • Cost center attribution & chargeback │ │
│ │ • SLA monitoring & automatic failover │ │
│ │ • Developer portal for internal/external API consumers │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ Public API Surface (versioned, documented): │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ SEARCH APIs │ INTELLIGENCE APIs │ │
│ │ • /v2/search/unified │ • /v2/insights/trends │ │
│ │ • /v2/search/species/{id} │ • /v2/insights/comparative │ │
│ │ • /v2/search/semantic │ • /v2/insights/predictive │ │
│ │ • /v2/search/graph │ • /v2/insights/anomaly │ │
│ │ │ │ │
│ │ DATA APIs │ MANAGEMENT APIs │ │
│ │ • /v2/documents/ingest │ • /v2/admin/pipelines │ │
│ │ • /v2/documents/batch │ • /v2/admin/indexes │ │
│ │ • /v2/embeddings/generate │ • /v2/admin/costs │ │
│ │ • /v2/metadata/extract │ • /v2/admin/usage │ │
│ └──────────────────────────────────────────────────────────────┘ │
└──────────────────────────────┬───────────────────────────────────────┘
│
┌─────────────────────┴─────────────────────┐
│ │
▼ ▼
┌────────────────────────────┐ ┌─────────────────────────────┐
│ INTELLIGENT APPLICATIONS │ │ INTEGRATION ECOSYSTEM │
│ │ │ │
│ ┌──────────────────────┐ │ │ ┌───────────────────────┐ │
│ │ Advanced RAG Chatbot │ │ │ │ Microsoft Copilot │ │
│ │ • Multi-agent │ │ │ │ Integration │ │
│ │ • Context-aware │ │ │ └───────────────────────┘ │
│ │ • Voice interface │ │ │ │
│ │ • Mobile apps │ │ │ ┌───────────────────────┐ │
│ └──────────────────────┘ │ │ │ PowerBI Dashboards │ │
│ │ │ │ (Research Analytics) │ │
│ ┌──────────────────────┐ │ │ └───────────────────────┘ │
│ │ Research Assistant │ │ │ │
│ │ • Lit review │ │ │ ┌───────────────────────┐ │
│ │ • Experiment design │ │ │ │ Teams Integration │ │
│ │ • Statistical tools │ │ │ │ (Embedded Search) │ │
│ │ • Report generation │ │ │ └───────────────────────┘ │
│ └──────────────────────┘ │ │ │
│ │ │ ┌───────────────────────┐ │
│ ┌──────────────────────┐ │ │ │ 3rd Party Apps │ │
│ │ Innovation Scout │ │ │ │ (External APIs) │ │
│ │ • Trend detection │ │ │ └───────────────────────┘ │
│ │ • Gap analysis │ │ └─────────────────────────────┘
│ │ • IP landscape │ │
│ │ • Competitor intel │ │
│ └──────────────────────┘ │
│ │
│ ┌──────────────────────┐ │
│ │ Formulation Advisor │ │
│ │ • Recipe optimization│ │
│ │ • Cost modeling │ │
│ │ • Regulatory check │ │
│ │ • Sustainability │ │
│ └──────────────────────┘ │
└────────────────────────────┘
═══════════════════════════════════════════════════════════════════════
ENTERPRISE CAPABILITIES
✓ GLOBAL SCALE: 1M+ documents, 1,000+ users, 8 innovation centers
✓ MULTI-REGION: Low-latency access worldwide with data residency
✓ HIGH AVAILABILITY: 99.95% SLA with automatic failover
✓ ENTERPRISE SECURITY: SSO, MFA, RBAC, audit logs, compliance
✓ KNOWLEDGE GRAPH: Relationship-based insights beyond search
✓ ADVANCED AI: Multi-modal understanding, predictive analytics
✓ FULL INTEGRATION: Seamless with Microsoft 365, Teams, PowerBI
✓ EXTENSIBLE: Public APIs enable 3rd party innovation
✓ GOVERNED: Data catalog, lineage, quality, cost attribution
═══════════════════════════════════════════════════════════════════════
ASSUMPTIONS & DEPENDENCIES
📋 ASSUMPTIONS (What We Believe Will Happen):
• ZFS becomes primary data repository (18-24 month timeline)
• Microsoft Copilot Studio matures for SharePoint integration
• ANH establishes enterprise AI governance framework
• Innovation Centers standardize on common metadata schemas
• Budget approval for scale-up infrastructure
🔗 DEPENDENCIES (What Must Happen First):
• MVP demonstrates sustained value across 3 species
• IT approves multi-region deployment security model
• Legal completes data residency & compliance review
• Innovation Centers commit to workflow standardization
• Executive sponsorship for enterprise-wide rollout
⚠️ ADAPTABILITY ZONES (Likely to Change):
🔄 Technology Stack
• Vector database: May shift from Azure AI Search to
specialized solutions (Pinecone, Weaviate) or ZFS-native
• LLM Provider: OpenAI vs Anthropic vs open-source
• Embedding models: Text-embedding-3 vs domain-specific
🔄 Data Architecture
• ZFS integration pattern undefined until platform stable
• Knowledge graph schema evolves with cross-species needs
• Real-time streaming requirements emerge from usage
🔄 Organizational
• Central AI team may consolidate all ML infrastructure
• Corporate mandate may require specific cloud vendors
• M&A activity could add new species/data sources
🔄 Business Model
• Chargeback model for API usage TBD
• Partnership opportunities with feed manufacturers
• Potential external monetization of anonymized insights
═══════════════════════════════════════════════════════════════════════
💰 Final Product Cost: $8,000-12,000/month ($96K-144K annually)
📈 Expected Value: $4M-6M annually (1,000 users, all species)
⏱️ Timeline: 9 months from MVP completion (Months 10-18)
👥 Serves: 1,000+ global R&D staff, external partners
🎯 ROI: 3,000-4,000% | Payback: <30 days
| Risk | Impact | Probability | Mitigation |
|---|---|---|---|
| ZFS migration delays | HIGH | MEDIUM | Maintain SharePoint connectors in parallel; abstract data source layer |
| Azure AI strategy shifts | HIGH | MEDIUM | API abstraction layer enables LLM provider swaps without app changes |
| Performance degradation at scale | HIGH | LOW | Incremental load testing; tiered architecture allows scaling |
| Knowledge graph complexity | MEDIUM | MEDIUM | Start with simple relationships; expand based on user needs |
| Multi-region latency | MEDIUM | LOW | CDN for static content; regional caching strategies |
| Risk | Impact | Probability | Mitigation |
|---|---|---|---|
| Budget cuts during MVP | HIGH | LOW | Focus on quick wins; demonstrate ROI early and often |
| Innovation Center resistance | MEDIUM | MEDIUM | Co-design sessions; show time savings with their data |
| Corporate AI consolidation | HIGH | MEDIUM | Public API design facilitates integration with any platform |
| Resource availability | MEDIUM | MEDIUM | Phased approach allows team ramping; external contractors for peaks |
| Competing priorities | MEDIUM | HIGH | Executive sponsorship; tie to corporate OKRs |
| Phase | Duration | Infrastructure | Implementation | Total | Value/Year | ROI |
|---|---|---|---|---|---|---|
| POV | 6 weeks | $360 | $30K (internal) | $30K | $1.56M | 5,436% |
| MVP | 6 months | $14K | $120K | $134K | $2.0M | 1,400% |
| Final | 9 months | $96-144K | $300K | $400K | $4-6M | 1,000-1,400% |
Rationale:
Critical Success Factors:
Go/No-Go Criteria After MVP (Month 9):
We recognize that this roadmap contains multiple elements of uncertainty:
Technology Choices: Azure AI landscape is rapidly evolving. We've designed for modularity to enable swapping components without rewriting applications.
Corporate Strategy: ANH's broader AI strategy may mandate consolidation or specific platforms. Our API-first approach facilitates integration regardless of underlying technology.
Data Sources: ZFS timeline and capabilities are uncertain. We maintain flexibility to work with SharePoint, ZFS, or hybrid models.
Organizational Change: Innovation Center workflows and metadata standards are evolving. Our schema design accommodates variation while encouraging standardization.
Value Opportunity Exceeds Re-work Risk: Even if significant architectural changes are required during MVP or Final Product phases, the time savings and research quality improvements justify the investment. The POV proved we can deliver 5,400% ROI in 6 weeks - the learning and value from MVP will be retained regardless of future platform decisions.
Incremental Approach Limits Downside: By validating assumptions and demonstrating value at each phase, we minimize sunk costs if direction changes. Each phase delivers standalone value while building toward the long-term vision.
AD-001: Azure AI Search vs Alternatives
AD-002: Durable Functions vs Azure Data Factory
AD-003: GPT-4o for RAG
AD-004: Separate Indexes per Species
AD-005: API-First Architecture
Document Version v0.2 | Created: Oct 30, 2025 | Owner: CDS CloudOps Team
Next Review: After MVP Phase (Month 9)