autotel

--- name: tune-sampling description: > Choose a sampling strategy for an autotel-instrumented service. Covers head sampling (per-span-kind rates, parent-based, ratio), tail sampling (keep errors, slow, AI-aware, debug-headers), cost vs cardinality tradeoffs, and the math for picking rates that hit a target spans/second budget. Includes recipes for low-volume admin services, high-volume APIs, AI agents, and Cloudflare Workers. license: MIT --- # Tune sampling Untuned tracing is either expensive (100 % at scale costs money + drowns dashboards) or unhelpful (1 % loses the failure modes you need to see). The right answer is almost always **head sample most of the boring traffic, tail keep all the interesting traffic**, with explicit overrides for AI calls and customer escalations. ## When to use - Hitting your observability budget - Dashboards too sparse to spot anomalies - "We have the trace IDs but the spans are gone" complaints - New service launching at scale - Long-running AI agents producing 50+ spans per request ## The mental model ``` Total cost = (spans/sec × $/span) + (storage_GB × $/GB-month) ↑ Head sampling reduces this directly. ``` Head sampling makes a decision **at span start** — fast, but coarse (it doesn't know if the span will fail). Tail sampling makes the decision **at span end** — slower, more storage upfront, but precise. The right mix: - **Head sample at the entry point** to keep volume tractable. - **Tail keep** the high-value subset (errors, slow, AI, debug-headered). - **Don't sample audit spans** — separate processor, see [`build-audit-trails`](../build-audit-trails/SKILL.md). ## Head sampling recipes ### Default for a typical web service ```typescript init({ service: 'my-app', sampling: { rates: { server: 25, // server entry spans — sample ¼ client: 5, // outbound HTTP — sample 1/20 internal: 5, // internal sub-spans — sample 1/20 }, }, }); ``` Children of a sampled root are **all** kept (parent-based propagation is the default). So `server: 25` means 25 % of _user requests_, complete trace each. ### High-volume API (>1 k req/s) ```typescript sampling: { rates: { server: 5, client: 1, internal: 1 }, // 5 % → tail keeps errors anyway tail: keepInterestingTraces, }, ``` ### Low-volume admin / internal service (<10 req/s) 100 % is fine. Don't penalise yourself for a service that produces 1 GB of traces a week. ### Cloudflare Workers (per-colo budget) Workers run distributed — head sampling is your friend because there's no central queue: ```typescript defineWorkerFetch( { service: { name: 'edge' }, sampling: { rates: { server: 10 } }, // 10 % per colo, scales naturally }, handler, ); ``` ## Tail sampling — keep interesting traces Tail sampling looks at the full trace (root span + children) before deciding. autotel ships `TailSamplingProcessor`: ```typescript import { TailSamplingProcessor } from 'autotel/processors'; import { SpanStatusCode } from '@opentelemetry/api'; const tail = new TailSamplingProcessor({ keep: (trace) => { // 1. Always keep errors if (trace.localRootSpan.status?.code === SpanStatusCode.ERROR) return true; if (trace.spans.some((s) => s.status?.code === SpanStatusCode.ERROR)) return true; // 2. Always keep slow traces (configurable threshold) if (durationMs(trace.localRootSpan) > 1_000) return true; // 3. Always keep customer-marked traces if (trace.localRootSpan.attributes['debug.trace'] === true) return true; // 4. Always keep AI traces (rare + expensive — full visibility helps) if ( trace.spans.some((s) => typeof s.attributes['gen_ai.system'] === 'string') ) return true; // 5. Otherwise: respect head sampling decision return false; }, }); ``` ### Combining with multi-backend ```typescript spanProcessors: composeSpanProcessors([ // Drop nothing here — we want the tail processor to see the full trace new BatchSpanProcessor(localExporter), tail, // filters before remote export new BatchSpanProcessor(expensiveRemoteExporter), ]); ``` ## AI / LLM-aware sampling LLM calls produce 5–50 spans per request and are 100× more expensive than a typical handler call. Tradeoffs: - **Don't head-sample AI handlers below 50 %** — debugging "why did the model loop" requires the full chain. - **Always tail-keep AI traces** — the `gen_ai.*` attributes flag them. - **Cost-aware sampling** — keep all calls above a $ threshold: ```typescript keep: (trace) => { const cost = trace.spans.reduce( (acc, s) => acc + (typeof s.attributes['gen_ai.cost.usd'] === 'number' ? (s.attributes['gen_ai.cost.usd'] as number) : 0), 0, ); if (cost > 0.1) return true; // any trace > $0.10 → keep if (cost > 0.01) return Math.random() < 0.5; // > $0.01 → 50 % return Math.random() < 0.1; // < $0.01 → 10 % }; ``` ## Customer-driven sampling (debug header) Let support flip on full tracing per request: ```typescript const tail = new TailSamplingProcessor({ keep: (trace) => trace.localRootSpan.attributes['x-debug-trace'] === '1' || /* … */, }) ``` In your middleware: ```typescript if (request.headers.get('x-debug-trace') === '1') { useLogger().set({ 'x-debug-trace': '1' }); } ``` Now any user can mark a request as "trace this fully" by sending the header — invaluable for reproducing customer reports. ## Sizing the rate Target volume: ``` spans/sec ≈ requests/sec × spans_per_request × head_rate × tail_keep_rate ``` Worked example for a 100 req/s API with 8 spans/req: | Head rate | Tail keep | Result | | --------- | -------------------------------- | -------------------------------------------- | | 100 % | 100 % | 800 spans/sec — expensive | | 10 % | 100 % (errors + slow + AI ≈ 5 %) | ≈ 110 spans/sec — sweet spot | | 1 % | 100 % | ≈ 18 spans/sec — too sparse for p99 alerting | For per-vendor pricing: - **Honeycomb**: $0.000005 / event for paid plans. 110 spans/sec × 86 400 s = 9.5 M events/day = $48/day. - **Datadog APM**: ~$1.27/M spans ingested (varies by region). Same volume → ~$12/day. - **Grafana Cloud**: 100 GB free tier; 110 spans/sec ≈ 5 GB/day. ## Anti-patterns | Anti-pattern | Fix | | -------------------------------------------- | ------------------------------------------------------------- | | 100 % sampling at scale "to be safe" | You're paying 10–100× without proportional value | | 1 % sampling with no tail keep | You'll miss every interesting failure | | Forgetting to tail-keep errors | Sampled traces with errors → silent customer pain | | Same rate for `server` and `internal` | Internal sub-spans are 5–20× more numerous; sample harder | | Ratio-based sampling on service entry point | Use parent-based — children of a sampled trace stay together | | Head-sampling AI calls below 50 % | Debugging tool loops requires the full chain | | Audit spans subject to sampling | Route them to a separate processor (see `build-audit-trails`) | | Tail processor before exporter (loses spans) | Tail processor goes between head sampler and remote exporter | | Rate-by-route hand-coded in handlers | Use head sampler + tail keep — declarative, one place |