Skip to content

Scaling and performance

This page summarizes operational knobs that affect capacity and safe deployment behind reverse proxies.

Reverse proxy and rate limiting

The Floh API uses @fastify/rate-limit with a per-IP limit. When the app runs behind a load balancer or ingress, set TRUST_PROXY=true so Fastify resolves the real client IP from X-Forwarded-For. Only enable this when the proxy strips or overwrites untrusted client-supplied forwarded headers; otherwise clients can bypass rate limits by spoofing IPs.

Workers and BullMQ

Queue concurrency is defined per queue in packages/server/src/modules/scheduler/queue-config.ts. Tune concurrency after load tests for workflow execution, escalations, and integration jobs. Horizontal scaling of worker processes (or replicas in Kubernetes) increases throughput as long as Postgres connection pools and Redis are sized for the total concurrent workers.

Puppeteer and PDF/report rendering

The server depends on Puppeteer for report rendering. Chromium can consume hundreds of megabytes per page generation. In Kubernetes or Docker, set memory requests/limits high enough for peak concurrent PDF jobs (for example 1–2 GiB per worker if reports are heavy), or isolate report generation on dedicated worker pools or a separate service if PDF throughput becomes a bottleneck.

API documentation

Swagger UI is off in production unless ENABLE_API_DOCS=true. Avoid exposing /api/docs on the public internet without additional controls.