Implementation Guide

Step-by-step instructions for implementing end-to-end observability with OpenTelemetry

0

Discovery & Baseline

Duration: 1 day

Establish the current state of your system and collect baseline measurements. This phase identifies all instrumentation points and documents where time is being spent in the 14-second user experience.

🌐 Analyze Frontend Stack frontend

Analyze the current React/Next.js setup and identify all instrumentation points for the optimize flow.

bash
# Find all optimize-related components
grep -r "optimize" src/components --include="*.tsx"

# Check for existing performance monitoring
grep -r "performance.mark\|performance.measure" src/

# List API call patterns
grep -r "fetch\|axios\|useSWR" src/ --include="*.ts"

Key Areas to Document

  • Button click handler location and component tree
  • API call initiation point
  • State management approach (React Query, SWR, etc.)
  • Re-render triggers after optimization completes
  • Loading state management

Outputs

frontend_analysis.md instrumentation_points.json
☁️ Map Backend Code Paths backend

Map the CNC AppService and QSM code paths to identify trace injection points.

csharp
// Key areas to map in CNC AppService:

// 1. Controller entry point
[HttpPost("optimize")]
public async Task<IActionResult> Optimize(OptimizeRequest request)
{
    // REQUEST RECEIVED - first trace point
}

// 2. Queue message creation
await _serviceBusClient.SendMessageAsync(message);
// QUEUE INSERTION - second trace point

// 3. Response polling/waiting
var response = await WaitForResponse(correlationId);
// RESPONSE PICKUP - third trace point

Outputs

backend_analysis.md trace_injection_points.json
📊 Establish Baseline Measurements qa

Measure the current 14-second user experience and identify where time is being spent.

javascript
// Quick baseline measurement script (run in browser console)
const baseline = {
  buttonClick: 0,
  apiCallStart: 0,
  apiCallEnd: 0,
  stateUpdate: 0,
  renderComplete: 0
};

// Wrap the optimize button click
const originalClick = optimizeButton.onclick;
optimizeButton.onclick = function() {
  baseline.buttonClick = performance.now();
  originalClick.apply(this, arguments);
};

// Log results after optimization completes
console.table(baseline);

Expected Time Breakdown

Button Click → API Start ~50-200ms
API Round-trip ~2-8s (includes queue + Julia)
State Update ~100-500ms
Re-render (soft refresh?) ~4-10s (suspected bottleneck)

Quality Gates

All instrumentation points identified
Baseline measurements collected
Gap analysis complete
Time breakdown documented
1

Quick Wins - Manual Timing

Duration: 1-2 days

Add immediate visibility with manual performance marks. This provides instant insight into where time is being spent without the overhead of setting up full OpenTelemetry infrastructure.

⏱️ Implement Frontend Timing frontend

Add Performance API timing to the optimize button flow.

typescript
// src/lib/performance-timing.ts

interface OptimizeTimings {
  buttonClick: number;
  apiStart: number;
  apiEnd: number;
  stateUpdate: number;
  renderComplete: number;
}

class OptimizeTimer {
  private timings: Partial<OptimizeTimings> = {};
  private requestId: string | null = null;

  start(requestId: string) {
    this.requestId = requestId;
    this.timings = {};
    this.mark('buttonClick');
    performance.mark(`optimize-start-${requestId}`);
  }

  mark(point: keyof OptimizeTimings) {
    this.timings[point] = performance.now();
    performance.mark(`optimize-${point}-${this.requestId}`);
  }

  end() {
    if (!this.requestId) return;

    const measurements = {
      total: this.timings.renderComplete! - this.timings.buttonClick!,
      preApi: this.timings.apiStart! - this.timings.buttonClick!,
      api: this.timings.apiEnd! - this.timings.apiStart!,
      postApi: this.timings.stateUpdate! - this.timings.apiEnd!,
      render: this.timings.renderComplete! - this.timings.stateUpdate!,
    };

    console.group(`Optimize Timing [${this.requestId}]`);
    console.table(measurements);
    console.groupEnd();

    // Send to telemetry endpoint if configured
    this.report(measurements);
  }

  private async report(measurements: Record<string, number>) {
    try {
      await fetch('/api/telemetry', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          requestId: this.requestId,
          measurements,
          timestamp: new Date().toISOString(),
        }),
      });
    } catch (e) {
      // Silent fail for telemetry
    }
  }
}

export const optimizeTimer = new OptimizeTimer();
typescript
// src/hooks/useOptimizeTimer.ts

import { useCallback, useEffect, useRef } from 'react';
import { optimizeTimer } from '@/lib/performance-timing';

export function useOptimizeTimer() {
  const requestIdRef = useRef<string>('');

  const startOptimize = useCallback(() => {
    requestIdRef.current = crypto.randomUUID();
    optimizeTimer.start(requestIdRef.current);
    return requestIdRef.current;
  }, []);

  const markApiStart = useCallback(() => {
    optimizeTimer.mark('apiStart');
  }, []);

  const markApiEnd = useCallback(() => {
    optimizeTimer.mark('apiEnd');
  }, []);

  const markStateUpdate = useCallback(() => {
    optimizeTimer.mark('stateUpdate');
  }, []);

  const markRenderComplete = useCallback(() => {
    optimizeTimer.mark('renderComplete');
    optimizeTimer.end();
  }, []);

  return {
    startOptimize,
    markApiStart,
    markApiEnd,
    markStateUpdate,
    markRenderComplete,
  };
}

Acceptance Criteria

  • Marks at button click
  • Marks at API start/end
  • Marks at state update
  • Marks at render complete
  • Console table output visible
⏱️ Implement Backend Timing backend

Add Stopwatch timing to CNC AppService endpoints.

csharp
// CncTiming.cs

public class CncTiming
{
    public string RequestId { get; set; }
    public DateTime RequestReceived { get; set; }
    public DateTime? QueueInserted { get; set; }
    public DateTime? ResponseReceived { get; set; }
    public DateTime? ResponseSent { get; set; }

    public void LogTimings(ILogger logger)
    {
        var queueTime = QueueInserted - RequestReceived;
        var processTime = ResponseReceived - QueueInserted;
        var totalTime = ResponseSent - RequestReceived;

        logger.LogInformation(
            "CNC Timing [{RequestId}]: Queue={QueueMs}ms, Process={ProcessMs}ms, Total={TotalMs}ms",
            RequestId,
            queueTime?.TotalMilliseconds,
            processTime?.TotalMilliseconds,
            totalTime?.TotalMilliseconds
        );
    }
}

Quality Gates

Timing data visible in browser console
All segments measured
Backend logs show timing breakdown
2

OpenTelemetry Foundation

Duration: 1 week

Implement the core distributed tracing infrastructure. This phase establishes the trace schema, deploys the collector, and instruments all services for end-to-end visibility.

📡 Deploy OpenTelemetry Collector devops

Deploy the OTel Collector with Azure Monitor exporter.

yaml
# otel-collector-config.yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins:
            - "https://*.yourdomain.com"
            - "http://localhost:*"

processors:
  batch:
    timeout: 10s
    send_batch_size: 1024
  memory_limiter:
    check_interval: 1s
    limit_mib: 512

exporters:
  azuremonitor:
    connection_string: ${APPLICATIONINSIGHTS_CONNECTION_STRING}
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [azuremonitor, debug]
🌐 Implement Frontend OpenTelemetry frontend

Set up the WebTracerProvider and auto-instrumentation.

typescript
// src/lib/otel/provider.ts

import { WebTracerProvider } from '@opentelemetry/sdk-trace-web';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
import { FetchInstrumentation } from '@opentelemetry/instrumentation-fetch';
import { UserInteractionInstrumentation } from '@opentelemetry/instrumentation-user-interaction';

export function initTelemetry() {
  const resource = new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'cnc-frontend',
    [SemanticResourceAttributes.SERVICE_VERSION]: process.env.NEXT_PUBLIC_VERSION,
    [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV,
  });

  const exporter = new OTLPTraceExporter({
    url: process.env.NEXT_PUBLIC_OTEL_COLLECTOR_URL + '/v1/traces',
    headers: {},
  });

  const provider = new WebTracerProvider({ resource });
  provider.addSpanProcessor(new BatchSpanProcessor(exporter));
  provider.register();

  registerInstrumentations({
    instrumentations: [
      new FetchInstrumentation({
        propagateTraceHeaderCorsUrls: [
          new RegExp(`${process.env.NEXT_PUBLIC_API_URL}`),
        ],
      }),
      new UserInteractionInstrumentation({
        eventNames: ['click', 'submit'],
      }),
    ],
  });

  return provider;
}
☁️ Implement .NET OpenTelemetry backend

Configure ActivitySource and Azure SDK instrumentation.

csharp
// Program.cs - OpenTelemetry Configuration

using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
using Azure.Monitor.OpenTelemetry.Exporter;

builder.Services.AddOpenTelemetry()
    .ConfigureResource(resource => resource
        .AddService(
            serviceName: "cnc-appservice",
            serviceVersion: typeof(Program).Assembly.GetName().Version?.ToString()))
    .WithTracing(tracing => tracing
        // Auto-instrumentation
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSource("Azure.*")

        // Custom ActivitySources
        .AddSource("CNC.Optimize")
        .AddSource("CNC.Queue")

        // Exporters
        .AddAzureMonitorTraceExporter(options =>
        {
            options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
        })
        .AddOtlpExporter(options =>
        {
            options.Endpoint = new Uri(builder.Configuration["OtelCollector:Endpoint"]);
        }));
csharp
// Observability/OptimizeActivitySource.cs

using System.Diagnostics;

public static class OptimizeActivitySource
{
    public static readonly ActivitySource Source = new("CNC.Optimize");

    public static Activity? StartOptimize(string requestId)
    {
        return Source.StartActivity("Optimize", ActivityKind.Server)?
            .SetTag("cnc.request_id", requestId);
    }

    public static Activity? StartQueueSend(string correlationId)
    {
        return Source.StartActivity("QueueSend", ActivityKind.Producer)?
            .SetTag("messaging.destination", "optimize-queue")
            .SetTag("messaging.correlation_id", correlationId);
    }
}

Quality Gates

Traces visible in Azure Monitor
Trace context propagates end-to-end
All services emit spans
Sampling configured appropriately
3

Frontend Optimization

Duration: 1 week

Profile and optimize React rendering to eliminate the suspected "soft refresh" bottleneck. Target: 50% reduction in re-render time.

🔍 Profile Soft Refresh Behavior frontend

Use React DevTools Profiler to identify unnecessary re-renders and optimize the component tree.

  • Capture profiler session during optimize flow
  • Identify components re-rendering without prop changes
  • Map data refetch patterns after state updates
  • Document render cascade from root to leaf components

Quality Gates

Render time reduced by 50%
No unnecessary re-renders
Web Vitals within targets
4

Dashboards & Alerting

Duration: 3-5 days

Create operational dashboards in Azure Monitor and Grafana for ongoing visibility and SLA monitoring.

Quality Gates

E2E latency dashboard operational
Alerting rules configured
User-facing dashboard accessible
5

Documentation Site

Duration: 2-3 days

Build the ardeshir.io/open documentation site with interactive visualizations and implementation guides.

Quality Gates

Site builds successfully
All phases documented
Interactive elements functional