Benchmarks

The ClawQL README highlights planning-context comparisons: the byte size of merged on-disk specs versus offline workflow outputs from multi-step search scenarios. These numbers illustrate how expensive it would be to paste entire OpenAPI corpora into a thread—not a guaranteed line item on a provider dashboard.

What the highlights measure

All-providers complex release-stack — broad scenario across bundled Google Cloud APIs plus Bitbucket, Cloudflare, GitHub, Jira, n8n, Sentry, Slack. Artifacts: docs/workflows/workflow-complex-release-stack-latest.json, stats under docs/benchmarks/all-providers-complex-workflow/.
Default multi-provider — Google Cloud (bundled) + Cloudflare + GitHub + Slack + Paperless + Stirling + Tika + Gotenberg (older artifacts or blog copy may still describe older naming). Artifacts: docs/workflows/workflow-multi-provider-latest.json, stats under docs/benchmarks/multi-provider-complex-workflow/.

Token estimates in docs often use ~4 characters per token as a heuristic.

Reproduce

Phase 1 / Phase 2 token repro: npm run benchmark:tokens → docs/benchmarks/latest.json and latest.md. Full steps: docs/benchmarks/REPRODUCE.md.

Normal usage vs pasted specs

Under typical ClawQL usage, specs stay inside the server; the model issues search / execute and receives bounded tool outputs. Total billed tokens depend on model, history, and response sizes—see the README section “Planning-context numbers vs your API bill” for careful interpretation.