NATS JetStream
NATS JetStream is available as an optional in-cluster event backbone in the charts/clawql-mcp Helm chart. It is intended for Ouroboros workflow events, multi-agent coordination, and edge worker synchronization.
Why this exists
For issue #127, the goal is a durable, lightweight event bus that can sit beside the existing ClawQL stack without introducing Kafka-level operational complexity.
JetStream provides:
- Durable streams and replay for long-running workflows
- Pull/push consumer patterns for mixed worker types
- Low-latency request/reply for orchestration control planes
- Small operational footprint for self-hosted clusters
Enable it in Helm
helm upgrade --install clawql ./charts/clawql-mcp -n clawql --create-namespace \
--set nats.enabled=true \
--set nats.persistence.enabled=true \
--set nats.persistence.size=20Gi
When nats.enabled=true, the chart deploys:
- A single NATS pod with JetStream configuration
- An internal ClusterIP service
- Optional PVC-backed JetStream storage
ClawQL gets CLAWQL_NATS_URL automatically (nats://<release>-nats:4222) unless you set nats.url explicitly for an external NATS cluster.
Chart architecture
When enabled, the chart renders:
- ConfigMap with
nats-server.confand JetStream settings - Service exposing:
- client:
4222(app connections) - cluster:
6222(future node clustering) - monitor:
8222(health + metrics endpoint)
- client:
- Deployment with one NATS server pod (single-node default)
- PVC (optional) for persistent stream data
Environment wiring into ClawQL
The main clawql-mcp-http deployment receives:
CLAWQL_NATS_URLfrom:nats.url(if set; external/shared cluster), or- in-cluster service DNS (if
nats.enabled=true)
CLAWQL_NATS_JETSTREAM=1whennats.jetStream.enabled=true
This makes the event backbone discoverable to application code without additional extraEnv wiring.
Key values
nats.enabled— toggle in-cluster NATS deploymentnats.url— external NATS URL overridenats.jetStream.enabled— toggle JetStream on/offnats.jetStream.maxMemoryStore,nats.jetStream.maxFileStore— retention sizingnats.persistence.*— PVC behavior (enabled,size,existingClaim,storageClass)nats.service.*— client/cluster/monitor ports
Recommended values patterns
Minimal local testing
helm upgrade --install clawql ./charts/clawql-mcp -n clawql --create-namespace \
--set nats.enabled=true
Durable single-cluster production baseline
helm upgrade --install clawql ./charts/clawql-mcp -n clawql --create-namespace \
--set nats.enabled=true \
--set nats.persistence.enabled=true \
--set nats.persistence.size=50Gi \
--set nats.jetStream.maxMemoryStore=512Mi \
--set nats.jetStream.maxFileStore=40Gi
External/shared NATS cluster
helm upgrade --install clawql ./charts/clawql-mcp -n clawql --create-namespace \
--set nats.enabled=false \
--set-string nats.url='nats://nats.shared.svc.cluster.local:4222'
Subject taxonomy suggestion
Use stable subjects so multiple components can interoperate:
clawql.workflow.*— workflow lifecycle and checkpointsclawql.agent.*— agent state, assignment, and handoffsclawql.document.*— document pipeline eventsclawql.edge.*— edge worker join/leave/status/completion
If you need tenant isolation, append namespace/tenant segments (for example clawql.workflow.team-a.*).
Verify after rollout
kubectl -n clawql get deploy,svc | rg nats
kubectl -n clawql logs deploy/clawql-mcp-http-nats
kubectl -n clawql port-forward svc/clawql-mcp-http-nats 8222:8222
curl -s http://127.0.0.1:8222/healthz
Additional checks:
# Confirm ClawQL got NATS env vars
kubectl -n clawql get deploy clawql-mcp-http -o yaml | rg "CLAWQL_NATS_URL|CLAWQL_NATS_JETSTREAM" -n
# Show rendered chart resources before apply
helm template test ./charts/clawql-mcp -n clawql --set nats.enabled=true | rg "nats|jetstream" -n
Operations notes
- Keep
nats.service.type=ClusterIPunless you explicitly need external client access. - Enable persistence for any environment where replay/recovery matters.
- Size
maxFileStorebelow actual PV capacity to leave filesystem headroom. - Expose monitor port (
8222) to internal Prometheus scrape only, not public ingress. - Back up PVC snapshots based on your RPO if streams are compliance-relevant.
Troubleshooting
NATS pods are up, but ClawQL is not publishing/consuming
- Verify
CLAWQL_NATS_URLinclawql-mcp-httpenv. - Confirm DNS/service reachability from pod:
kubectl -n clawql exec deploy/clawql-mcp-http -- sh -lc 'nc -vz clawql-mcp-http-nats 4222'
- Check network policies denying pod-to-service traffic.
JetStream appears disabled
- Confirm
nats.jetStream.enabled=true. - Check ConfigMap contents:
kubectl -n clawql get cm clawql-mcp-http-nats-config -o yaml
- Review NATS startup logs for config parse errors.
Stream storage fills too quickly
- Increase PVC size and
nats.jetStream.maxFileStore. - Tighten stream retention/consumer ACK policies at the app layer.
- Add stream compaction/archival policy in your worker stack.
If you use an external NATS cluster, keep nats.enabled=false and set only nats.url.
