Docs
Install, configure, send telemetry, and operate a self-hosted Fanout.
Fanout is one binary you run yourself. It accepts OpenTelemetry over gRPC, stores the data on disks you control, and serves a UI, a chat investigator, an alert engine, and an MCP server — all from the same process.
This page is everything you need to get from zero to a working install, sending telemetry, and operating it day-to-day. Use On this page to jump around.
Install
Pick whichever path matches how you already deploy. Fanout is a single self-contained executable — about 30 MB, no runtime dependencies beyond a recent libc.
Docker
docker run -d --name fanout \
-p 7520:7520 -p 4317:4317 \
-v $PWD/data:/var/lib/fanout/data \
ghcr.io/labstack/fanout:latest
| Port / path | Purpose |
|---|---|
7520 | HTTP — web UI, API, and the MCP endpoint. |
4317 | OTLP gRPC ingest. |
./data | Persistent storage — telemetry, application state, and saved reports. |
The container listens on all interfaces by default. For a host-only install, add -e OTLP_GRPC_ADDR=127.0.0.1:4317.
Pre-built binary
Download the artifact for your platform from the releases page and run it:
./fanout
Defaults: HTTP on :7520, OTLP gRPC on 127.0.0.1:4317, data under ./data.
Sizing
Guidelines, not hard limits. The binary is small; the data is what consumes resources.
| Resource | Recommended starting point |
|---|---|
| CPU | 2 vCPU |
| Memory | 1 GB (raise via DUCKDB_MEMORY for larger workloads) |
| Disk | 20 GB on fast local storage; budget ~1 GB / day per million spans at default retention |
First boot
Fanout refuses to start without JWT secrets, SMTP credentials (for email login codes), and an LLM API key (for the chat investigator). Everything else has a default.
Minimum viable command
docker run -d --name fanout \
-p 7520:7520 -p 4317:4317 \
-v $PWD/data:/var/lib/fanout/data \
-e JWT_SECRET=$(openssl rand -hex 32) \
-e JWT_REFRESH_SECRET=$(openssl rand -hex 32) \
-e SMTP_HOST=smtp.example.com \
-e [email protected] \
-e SMTP_PASS=<smtp-password> \
-e SMTP_FROM='"Fanout" <[email protected]>' \
-e AI_API_KEY=<anthropic-or-openai-key> \
ghcr.io/labstack/fanout:latest
The JWT_* secrets must differ and each must be at least 32 characters. Generate fresh ones with openssl rand -hex 32.
Create the admin
On first boot Fanout logs a one-time setup token that authorises the admin-creation flow:
docker logs fanout 2>&1 | grep "setup token"
Open http://localhost:7520 and fill in the setup form with your name, email, and the token. Fanout creates the admin, signs you in, and prints the ingest token once — copy it now, it isn’t shown again. You can rotate it later from Settings → Ingest in the UI.
After this, the setup form is closed for the lifetime of the data directory. New users join via email invites; logins use one-time codes delivered via SMTP. No passwords are ever stored.
Send telemetry
Fanout speaks OTLP over gRPC on port 4317. Anything that can export OTLP — an SDK, a collector, a sidecar — will work without modification.
HTTP/protobuf and HTTP/JSON OTLP are not yet supported. If you need them, run an OpenTelemetry Collector in front and point its
otlpexporter at Fanout.
Authentication
Every request must carry a valid ingest token. Two header forms are accepted, equivalently:
x-fanout-ingest-token: fo_<token>
Authorization: Bearer fo_<token>
A missing or invalid token returns Unauthenticated. The same token works for every signal type.
Direct from an SDK
export OTEL_EXPORTER_OTLP_ENDPOINT=https://fanout.example.com:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_HEADERS=x-fanout-ingest-token=fo_<token>
export OTEL_SERVICE_NAME=checkout
Use http:// if your endpoint isn’t TLS-terminated. The headers env var takes a comma-separated list of key=value pairs.
Through an OpenTelemetry Collector
If you already run a collector (recommended for production — buffering, batching, sampling, per-tenant routing), add Fanout as another otlp exporter:
exporters:
otlp/fanout:
endpoint: fanout.example.com:4317
headers:
x-fanout-ingest-token: fo_<token>
service:
pipelines:
traces: { exporters: [otlp/fanout] }
logs: { exporters: [otlp/fanout] }
metrics: { exporters: [otlp/fanout] }
You can fan out to Fanout and an existing backend during a migration — exporters are list-typed.
Multi-product / multi-tenant namespaces
If a single Fanout serves more than one product or environment, set service.namespace in your OpenTelemetry resource attributes:
export OTEL_RESOURCE_ATTRIBUTES=service.namespace=product-a,service.name=checkout
The UI’s namespace picker (top-right of the header) filters every view; MCP tools accept namespace as an explicit argument. Payloads without a service.namespace land in DEFAULT_NAMESPACE (default unless overridden).
TLS
Two options — pick one:
- Behind a reverse proxy (recommended). Caddy, nginx, or Traefik terminates TLS for
fanout.example.com:4317and proxies plaintext gRPC to Fanout on127.0.0.1:4317. Your collector or SDK only ever sees the proxy. - Direct termination. Set
TLS_CERT_FILEandTLS_KEY_FILE. Both the HTTP and gRPC listeners use the same certificate. TLS 1.3 minimum.
Setting only one of the two TLS variables is a startup error — a guardrail to catch half-configured deployments.
Use the UI
Open http://<fanout-host>:7520 after signing in.
- Home — health grid for every service in the current namespace. Incidents (unhealthy or degraded) surface at the top with inline context and an Investigate button that launches Chat with the service pre-scoped. Healthy services show traffic, p95, and error-rate numbers.
- Service detail — latency and error-rate timeseries, top endpoints, example failing traces, dependencies. Every chart and row has its own Investigate shortcut.
- Chat — full-page investigator. Ask in plain English; the assistant calls the MCP tools behind the scenes and renders charts, tables, and traces inline. Suggested prompts appear on the empty state.
- Alerts — firing / pending / resolved list plus an inline editor for rules.
- Settings (admin only) — rotate the ingest token.
The namespace picker in the top-right header filters every page. The New chat button (only on /chat) resets the conversation.
Alerts
Rules are written in expr-lang, evaluated every ALERT_EVAL_INTERVAL seconds (default 30), and delivered by webhook.
Anatomy of a rule
| Field | Description |
|---|---|
name | Shown on the Alerts page and in webhook payloads. |
expression | An expr-lang boolean evaluated per service per interval. |
for_seconds | How long the expression must hold before the rule fires. 0 = fire immediately. |
webhook_url | Where to POST the alert payload. |
webhook_headers | Extra HTTP headers — typically auth. |
webhook_template | Override the default JSON payload. |
notify_on_resolve | Send a follow-up POST when the condition clears. |
Available fields
Every rule has these fields in scope.
| Field | Type | Description |
|---|---|---|
service | string | Service being evaluated. Useful for service == "checkout". |
error_rate | float | Error rate in this window, 0.0 – 1.0. |
p50 / p95 | float | Latency percentiles, milliseconds. |
throughput | float | Requests per second over the window. |
log_count | float | Log entries seen in the window. |
z_score | float | Anomaly score against the historical baseline. |
health_score | float | Composite score, lower is worse. |
error_rate_delta / p95_delta / throughput_delta | float | Percentage change vs. baseline (e.g. 50 = +50%, -50 = halved). |
Example rules
# Sustained error rate — ignore spikes.
name: "error rate > 5% for 5 min"
expression: error_rate > 0.05
for_seconds: 300
# Latency regression — sustained only.
name: "p95 latency > 2s for 10 min"
expression: p95 > 2000
for_seconds: 600
# Throughput collapse — ignores naturally low-traffic services.
name: "sudden traffic drop"
expression: throughput_delta < -50 && throughput > 10
for_seconds: 120
# Anomaly score — "something looks off".
name: "anomaly: z-score > 3"
expression: z_score > 3
for_seconds: 180
Lifecycle
A rule moves through three states:
- Pending — the expression just became true. The engine waits out
for_seconds. - Firing — the condition has held long enough. Webhooks deliver and a badge appears in the UI nav.
- Resolved — the expression returned false. If
notify_on_resolveis set, a final webhook fires.
Resolved alerts stay queryable for ALERT_HISTORY_DAYS (default 7) — visible in the UI and via the alerts MCP tool.
Webhook payload
A firing rule POSTs JSON to webhook_url:
{
"rule": "error rate > 5% for 5 min",
"service": "checkout",
"namespace": "default",
"fired_at": "2026-04-20T14:22:08Z",
"expression": "error_rate > 0.05",
"values": {
"error_rate": 0.082,
"p50": 94,
"p95": 412,
"throughput": 1180
}
}
Override the shape with webhook_template if your downstream expects a different schema (PagerDuty, Slack, OpsGenie, etc.).
MCP server
Fanout ships an MCP (Model Context Protocol) server at /mcp. Connect Claude Code — or any MCP-capable assistant — and these tools become available for investigation. The same server backs the chat investigator inside the Fanout UI.
Connect Claude Code
# Production
claude mcp add fanout --transport http https://fanout.example.com/mcp
# Local
claude mcp add fanout --transport http http://localhost:7520/mcp
The MCP endpoint accepts an ingest token the same way as OTLP — pass Authorization: Bearer fo_<token> if your transport supports custom headers, or rely on session-based auth through a logged-in browser.
Tools
| Tool | What it does |
|---|---|
overview | System health, scores, top issues. |
topology | Service dependency map with blast radius. |
diagnose | Deep-dive on one service — latency, errors, saturation vs. baseline. |
spans | Search and aggregate trace spans. |
trace | Single distributed trace with root-cause analysis. |
logs | Search and aggregate log entries. |
metrics | Discover and query OTLP metric timeseries. |
compare | Side-by-side: two services, two time windows, or two operations. |
attributes | Discover filterable attribute keys for spans, logs, or metrics. |
alerts | List firing, pending, or resolved alerts — filterable by service or rule. |
alert_rules | Manage alert rules — list, create, update, delete. |
query | Raw SQL against the underlying data. |
Claude (or whichever model you use) decides which tools to call. A typical incident loop looks like overview → topology → diagnose → trace → logs, but you don’t have to memorise the order.
Tokens that can ingest can also query — there’s no separate read/write split today. If you need stricter isolation, gate the endpoint at your reverse proxy.
Operate
Data layout
Everything Fanout persists lives under DATA_DIR (default ./data). That’s the only directory you need to back up, and the only one you need to move when relocating a host.
Backups
- Stop the process (
docker stop fanoutorsystemctl stop fanout). - Copy the whole
DATA_DIRto your backup target. - Start it back up.
Snapshotting a live directory can capture mid-flush state — safer to stop first. Flushes happen every FLUSH_SECONDS (default 15), so downtime for a backup is under a minute for most installs.
To restore on a new host: put the backup at the same DATA_DIR path and start Fanout. Ingest tokens, users, saved reports, and all telemetry come with it.
Upgrades
Pull the new image (or binary) and restart. Schema migrations apply automatically at boot.
docker pull ghcr.io/labstack/fanout:latest
docker stop fanout && docker rm fanout
# re-run your original `docker run` command
Downgrading across a migration is not supported — back up before upgrading if you need an escape hatch.
Troubleshooting
A few common failure modes and what to check first.
- No services appear after sending telemetry. Confirm the token header reaches Fanout (some proxies strip custom headers), that the endpoint scheme is explicit (
http://orhttps://), and that port4317is reachable:nc -vz fanout.example.com 4317. Data takes up toFLUSH_SECONDSto appear — wait ~15 s before debugging. - Startup fails immediately. Check
docker logs fanout. The most common cause is missingJWT_*,SMTP_*, orAI_API_KEY. Setting only one of the two TLS files is also fatal by design. - Login codes not arriving. Verify SMTP credentials and sender domain. Fanout uses STARTTLS on port
587and25, implicit TLS on465. - Queries slow. First check the freshness — rollups update every
ROLLUP_EVERYseconds (default 60). RaisingDUCKDB_MEMORYcan help larger working sets. For very long time ranges, expect raw scans to take longer than rollup-backed queries.
Environment reference
Fanout is configured entirely through environment variables. A .env file next to the binary is loaded first; .env.${ENV} overrides it (ENV defaults to development).
Network
| Variable | Default | Description |
|---|---|---|
HTTP_ADDR | :7520 | Web UI, API, and MCP endpoint listen address. |
OTLP_GRPC_ADDR | 127.0.0.1:4317 | OTLP gRPC ingest address. The official Docker image overrides this to :4317 so off-host traffic is accepted. |
DEFAULT_NAMESPACE | default | Namespace assigned to OTLP payloads without service.namespace. |
Storage
| Variable | Default | Description |
|---|---|---|
DATA_DIR | ./data | Storage root for telemetry, query cache, and application state. |
DUCKDB_MEMORY | 512MB | In-memory budget for the embedded query engine. |
RETENTION_DAYS | 30 | Drop telemetry files older than N days. 0 keeps everything forever. |
Ingest tuning
| Variable | Default | Description |
|---|---|---|
FLUSH_SECONDS | 15 | How often pending rows are flushed to disk. Lower = fresher UI; higher = less I/O. |
FLUSH_BATCH_SIZE | 50000 | Cap on rows per flush, regardless of interval. |
ROLLUP_EVERY | 60 | How often per-minute rollups are recomputed. |
Authentication (required)
| Variable | Description |
|---|---|
JWT_SECRET | Required. HS256 signing key for short-lived access tokens. |
JWT_REFRESH_SECRET | Required. HS256 signing key for refresh tokens. Must differ from JWT_SECRET. |
Each must be at least 32 characters. Generate with openssl rand -hex 32.
Email (required)
| Variable | Default | Description |
|---|---|---|
SMTP_HOST | — | Required. SMTP server hostname. |
SMTP_PORT | 587 | 465 uses implicit TLS; 587 and 25 use STARTTLS when offered. |
SMTP_USER | — | Required. SMTP username. |
SMTP_PASS | — | Required. SMTP password or API key. |
SMTP_FROM | — | Required. From header — e.g. "Fanout" <[email protected]>. |
AI provider (required)
| Variable | Default | Description |
|---|---|---|
AI_PROVIDER | anthropic | anthropic or openai. |
AI_API_KEY | — | Required. Provider API key. |
AI_MODEL | (provider default) | Override the default model — e.g. claude-sonnet-4-6, gpt-4.1. |
AI_BASE_URL | (provider default) | Override the API base URL — useful for proxies and gateways. |
Alerts
| Variable | Default | Description |
|---|---|---|
ALERT_ENABLED | true | Set to false to disable the alert engine entirely. |
ALERT_EVAL_INTERVAL | 30 | How often (seconds) rules are evaluated against fresh rollups. |
ALERT_HISTORY_DAYS | 7 | How long resolved alerts stay queryable in the UI and via the alerts MCP tool. |
MCP
| Variable | Default | Description |
|---|---|---|
MCP_ENABLED | true | Expose the MCP server at /mcp. Disable if you don’t want it reachable. |
TLS
| Variable | Default | Description |
|---|---|---|
TLS_CERT_FILE | — | Path to the server certificate (PEM). |
TLS_KEY_FILE | — | Path to the server private key (PEM). |
When both are set, HTTP_ADDR and OTLP_GRPC_ADDR listen with TLS 1.3. Setting only one is a startup error.