Docs

Install, configure, send telemetry, and operate a self-hosted Fanout.

Fanout is one binary you run yourself. It accepts OpenTelemetry over gRPC, stores the data on disks you control, and serves a UI, a chat investigator, an alert engine, and an MCP server — all from the same process.

This page is everything you need to get from zero to a working install, sending telemetry, and operating it day-to-day. Use On this page to jump around.

Install

Pick whichever path matches how you already deploy. Fanout is a single self-contained executable — about 30 MB, no runtime dependencies beyond a recent libc.

Docker

docker run -d --name fanout \
  -p 7520:7520 -p 4317:4317 \
  -v $PWD/data:/var/lib/fanout/data \
  ghcr.io/labstack/fanout:latest

Port / path	Purpose
`7520`	HTTP — web UI, API, and the MCP endpoint.
`4317`	OTLP gRPC ingest.
`./data`	Persistent storage — telemetry, application state, and saved reports.

The container listens on all interfaces by default. For a host-only install, add -e OTLP_GRPC_ADDR=127.0.0.1:4317.

Pre-built binary

Download the artifact for your platform from the releases page and run it:

./fanout

Defaults: HTTP on :7520, OTLP gRPC on 127.0.0.1:4317, data under ./data.

Sizing

Guidelines, not hard limits. The binary is small; the data is what consumes resources.

Resource	Recommended starting point
CPU	2 vCPU
Memory	1 GB (raise via `DUCKDB_MEMORY` for larger workloads)
Disk	20 GB on fast local storage; budget ~1 GB / day per million spans at default retention

First boot

Fanout refuses to start without JWT secrets, SMTP credentials (for email login codes), and an LLM API key (for the chat investigator). Everything else has a default.

Minimum viable command

docker run -d --name fanout \
  -p 7520:7520 -p 4317:4317 \
  -v $PWD/data:/var/lib/fanout/data \
  -e JWT_SECRET=$(openssl rand -hex 32) \
  -e JWT_REFRESH_SECRET=$(openssl rand -hex 32) \
  -e SMTP_HOST=smtp.example.com \
  -e [email protected] \
  -e SMTP_PASS=<smtp-password> \
  -e SMTP_FROM='"Fanout" <[email protected]>' \
  -e AI_API_KEY=<anthropic-or-openai-key> \
  ghcr.io/labstack/fanout:latest

The JWT_* secrets must differ and each must be at least 32 characters. Generate fresh ones with openssl rand -hex 32.

Create the admin

On first boot Fanout logs a one-time setup token that authorises the admin-creation flow:

docker logs fanout 2>&1 | grep "setup token"

Open http://localhost:7520 and fill in the setup form with your name, email, and the token. Fanout creates the admin, signs you in, and prints the ingest token once — copy it now, it isn’t shown again. You can rotate it later from Settings → Ingest in the UI.

After this, the setup form is closed for the lifetime of the data directory. New users join via email invites; logins use one-time codes delivered via SMTP. No passwords are ever stored.

Send telemetry

Fanout speaks OTLP over gRPC on port 4317. Anything that can export OTLP — an SDK, a collector, a sidecar — will work without modification.

HTTP/protobuf and HTTP/JSON OTLP are not yet supported. If you need them, run an OpenTelemetry Collector in front and point its otlp exporter at Fanout.

Authentication

Every request must carry a valid ingest token. Two header forms are accepted, equivalently:

x-fanout-ingest-token: fo_<token>
Authorization: Bearer fo_<token>

A missing or invalid token returns Unauthenticated. The same token works for every signal type.

Direct from an SDK

export OTEL_EXPORTER_OTLP_ENDPOINT=https://fanout.example.com:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_HEADERS=x-fanout-ingest-token=fo_<token>
export OTEL_SERVICE_NAME=checkout

Use http:// if your endpoint isn’t TLS-terminated. The headers env var takes a comma-separated list of key=value pairs.

Through an OpenTelemetry Collector

If you already run a collector (recommended for production — buffering, batching, sampling, per-tenant routing), add Fanout as another otlp exporter:

exporters:
  otlp/fanout:
    endpoint: fanout.example.com:4317
    headers:
      x-fanout-ingest-token: fo_<token>

service:
  pipelines:
    traces:  { exporters: [otlp/fanout] }
    logs:    { exporters: [otlp/fanout] }
    metrics: { exporters: [otlp/fanout] }

You can fan out to Fanout and an existing backend during a migration — exporters are list-typed.

Multi-product / multi-tenant namespaces

If a single Fanout serves more than one product or environment, set service.namespace in your OpenTelemetry resource attributes:

export OTEL_RESOURCE_ATTRIBUTES=service.namespace=product-a,service.name=checkout

The UI’s namespace picker (top-right of the header) filters every view; MCP tools accept namespace as an explicit argument. Payloads without a service.namespace land in DEFAULT_NAMESPACE (default unless overridden).

TLS

Two options — pick one:

Behind a reverse proxy (recommended). Caddy, nginx, or Traefik terminates TLS for fanout.example.com:4317 and proxies plaintext gRPC to Fanout on 127.0.0.1:4317. Your collector or SDK only ever sees the proxy.
Direct termination. Set TLS_CERT_FILE and TLS_KEY_FILE. Both the HTTP and gRPC listeners use the same certificate. TLS 1.3 minimum.

Setting only one of the two TLS variables is a startup error — a guardrail to catch half-configured deployments.

Use the UI

Open http://<fanout-host>:7520 after signing in.

Home — health grid for every service in the current namespace. Incidents (unhealthy or degraded) surface at the top with inline context and an Investigate button that launches Chat with the service pre-scoped. Healthy services show traffic, p95, and error-rate numbers.
Service detail — latency and error-rate timeseries, top endpoints, example failing traces, dependencies. Every chart and row has its own Investigate shortcut.
Chat — full-page investigator. Ask in plain English; the assistant calls the MCP tools behind the scenes and renders charts, tables, and traces inline. Suggested prompts appear on the empty state.
Alerts — firing / pending / resolved list plus an inline editor for rules.
Settings (admin only) — rotate the ingest token.

The namespace picker in the top-right header filters every page. The New chat button (only on /chat) resets the conversation.

Alerts

Rules are written in expr-lang, evaluated every ALERT_EVAL_INTERVAL seconds (default 30), and delivered by webhook.

Anatomy of a rule

Field	Description
`name`	Shown on the Alerts page and in webhook payloads.
`expression`	An expr-lang boolean evaluated per service per interval.
`for_seconds`	How long the expression must hold before the rule fires. `0` = fire immediately.
`webhook_url`	Where to POST the alert payload.
`webhook_headers`	Extra HTTP headers — typically auth.
`webhook_template`	Override the default JSON payload.
`notify_on_resolve`	Send a follow-up POST when the condition clears.

Available fields

Every rule has these fields in scope.

Field	Type	Description
`service`	string	Service being evaluated. Useful for `service == "checkout"`.
`error_rate`	float	Error rate in this window, `0.0` – `1.0`.
`p50` / `p95`	float	Latency percentiles, milliseconds.
`throughput`	float	Requests per second over the window.
`log_count`	float	Log entries seen in the window.
`z_score`	float	Anomaly score against the historical baseline.
`health_score`	float	Composite score, lower is worse.
`error_rate_delta` / `p95_delta` / `throughput_delta`	float	Percentage change vs. baseline (e.g. `50` = +50%, `-50` = halved).

Example rules

# Sustained error rate — ignore spikes.
name:        "error rate > 5% for 5 min"
expression:  error_rate > 0.05
for_seconds: 300

# Latency regression — sustained only.
name:        "p95 latency > 2s for 10 min"
expression:  p95 > 2000
for_seconds: 600

# Throughput collapse — ignores naturally low-traffic services.
name:        "sudden traffic drop"
expression:  throughput_delta < -50 && throughput > 10
for_seconds: 120

# Anomaly score — "something looks off".
name:        "anomaly: z-score > 3"
expression:  z_score > 3
for_seconds: 180

Lifecycle

A rule moves through three states:

Pending — the expression just became true. The engine waits out for_seconds.
Firing — the condition has held long enough. Webhooks deliver and a badge appears in the UI nav.
Resolved — the expression returned false. If notify_on_resolve is set, a final webhook fires.

Resolved alerts stay queryable for ALERT_HISTORY_DAYS (default 7) — visible in the UI and via the alerts MCP tool.

Webhook payload

A firing rule POSTs JSON to webhook_url:

{
  "rule": "error rate > 5% for 5 min",
  "service": "checkout",
  "namespace": "default",
  "fired_at": "2026-04-20T14:22:08Z",
  "expression": "error_rate > 0.05",
  "values": {
    "error_rate": 0.082,
    "p50": 94,
    "p95": 412,
    "throughput": 1180
  }
}

Override the shape with webhook_template if your downstream expects a different schema (PagerDuty, Slack, OpsGenie, etc.).

MCP server

Fanout ships an MCP (Model Context Protocol) server at /mcp. Connect Claude Code — or any MCP-capable assistant — and these tools become available for investigation. The same server backs the chat investigator inside the Fanout UI.

Connect Claude Code

# Production
claude mcp add fanout --transport http https://fanout.example.com/mcp

# Local
claude mcp add fanout --transport http http://localhost:7520/mcp

The MCP endpoint requires a user API key — generate one from Settings → API key and pass it as Authorization: Bearer fo_<token> if your transport supports custom headers, or rely on session-based auth through a logged-in browser. (The ingest token is for OTLP only; it is not accepted here.)

Tools

Tool	What it does
`overview`	System health, scores, top issues.
`topology`	Service dependency map with blast radius.
`diagnose`	Deep-dive on one service — latency, errors, saturation vs. baseline.
`spans`	Search and aggregate trace spans.
`trace`	Single distributed trace with root-cause analysis.
`logs`	Search and aggregate log entries.
`metrics`	Discover and query OTLP metric timeseries.
`compare`	Side-by-side: two services, two time windows, or two operations.
`attributes`	Discover filterable attribute keys for spans, logs, or metrics.
`alerts`	List firing, pending, or resolved alerts — filterable by service or rule.
`alert_rules`	Manage alert rules — list, create, update, delete.
`query`	Raw SQL against the underlying data.

Claude (or whichever model you use) decides which tools to call. A typical incident loop looks like overview → topology → diagnose → trace → logs, but you don’t have to memorise the order.

Tokens that can ingest can also query — there’s no separate read/write split today. If you need stricter isolation, gate the endpoint at your reverse proxy.

Operate

Data layout

Everything Fanout persists lives under DATA_DIR (default ./data). That’s the only directory you need to back up, and the only one you need to move when relocating a host.

Backups

Stop the process (docker stop fanout or systemctl stop fanout).
Copy the whole DATA_DIR to your backup target.
Start it back up.

Snapshotting a live directory can capture mid-flush state — safer to stop first. Flushes happen every FLUSH_SECONDS (default 15), so downtime for a backup is under a minute for most installs.

To restore on a new host: put the backup at the same DATA_DIR path and start Fanout. Ingest tokens, users, saved reports, and all telemetry come with it.

Upgrades

Pull the new image (or binary) and restart. Schema migrations apply automatically at boot.

docker pull ghcr.io/labstack/fanout:latest
docker stop fanout && docker rm fanout
# re-run your original `docker run` command

Downgrading across a migration is not supported — back up before upgrading if you need an escape hatch.

Troubleshooting

A few common failure modes and what to check first.

No services appear after sending telemetry. Confirm the token header reaches Fanout (some proxies strip custom headers), that the endpoint scheme is explicit (http:// or https://), and that port 4317 is reachable: nc -vz fanout.example.com 4317. Data takes up to FLUSH_SECONDS to appear — wait ~15 s before debugging.
Startup fails immediately. Check docker logs fanout. The most common cause is missing JWT_*, SMTP_*, or AI_API_KEY. Setting only one of the two TLS files is also fatal by design.
Login codes not arriving. Verify SMTP credentials and sender domain. Fanout uses STARTTLS on port 587 and 25, implicit TLS on 465.
Queries slow. First check the freshness — rollups update every ROLLUP_EVERY seconds (default 60). Raising DUCKDB_MEMORY can help larger working sets. For very long time ranges, expect raw scans to take longer than rollup-backed queries.

Environment reference

Fanout is configured entirely through environment variables. A .env file next to the binary is loaded first; .env.${ENV} overrides it (ENV defaults to development).

Network

Variable	Default	Description
`HTTP_ADDR`	`:7520`	Web UI, API, and MCP endpoint listen address.
`OTLP_GRPC_ADDR`	`127.0.0.1:4317`	OTLP gRPC ingest address. The official Docker image overrides this to `:4317` so off-host traffic is accepted.
`DEFAULT_NAMESPACE`	`default`	Namespace assigned to OTLP payloads without `service.namespace`.

Storage

Variable	Default	Description
`DATA_DIR`	`./data`	Storage root for telemetry, query cache, and application state.
`DUCKDB_MEMORY`	`512MB`	In-memory budget for the embedded query engine.
`RETENTION_DAYS`	`30`	Drop telemetry files older than N days. `0` keeps everything forever.

Ingest tuning

Variable	Default	Description
`FLUSH_SECONDS`	`15`	How often pending rows are flushed to disk. Lower = fresher UI; higher = less I/O.
`FLUSH_BATCH_SIZE`	`50000`	Cap on rows per flush, regardless of interval.
`ROLLUP_EVERY`	`60`	How often per-minute rollups are recomputed.

Authentication (required)

Variable	Description
`JWT_SECRET`	Required. HS256 signing key for short-lived access tokens.
`JWT_REFRESH_SECRET`	Required. HS256 signing key for refresh tokens. Must differ from `JWT_SECRET`.

Each must be at least 32 characters. Generate with openssl rand -hex 32.

Email (required)

Variable	Default	Description
`SMTP_HOST`	—	Required. SMTP server hostname.
`SMTP_PORT`	`587`	`465` uses implicit TLS; `587` and `25` use STARTTLS when offered.
`SMTP_USER`	—	Required. SMTP username.
`SMTP_PASS`	—	Required. SMTP password or API key.
`SMTP_FROM`	—	Required. From header — e.g. `"Fanout" <[email protected]>`.

AI provider (required)

Variable	Default	Description
`AI_PROVIDER`	`anthropic`	`anthropic` or `openai`.
`AI_API_KEY`	—	Required. Provider API key.
`AI_MODEL`	(provider default)	Override the default model — e.g. `claude-sonnet-4-6`, `gpt-4.1`.
`AI_BASE_URL`	(provider default)	Override the API base URL — useful for proxies and gateways.

Alerts

Variable	Default	Description
`ALERT_ENABLED`	`true`	Set to `false` to disable the alert engine entirely.
`ALERT_EVAL_INTERVAL`	`30`	How often (seconds) rules are evaluated against fresh rollups.
`ALERT_HISTORY_DAYS`	`7`	How long resolved alerts stay queryable in the UI and via the `alerts` MCP tool.

MCP

Variable	Default	Description
`MCP_ENABLED`	`true`	Expose the MCP server at `/mcp`. Disable if you don’t want it reachable.

TLS

Variable	Default	Description
`TLS_CERT_FILE`	—	Path to the server certificate (PEM).
`TLS_KEY_FILE`	—	Path to the server private key (PEM).

When both are set, HTTP_ADDR and OTLP_GRPC_ADDR listen with TLS 1.3. Setting only one is a startup error.