Benchmarks
The ClawQL README highlights planning-context comparisons: the byte size of merged on-disk specs versus offline workflow outputs from multi-step search scenarios. These numbers illustrate how expensive it would be to paste entire OpenAPI corpora into a thread—not a guaranteed line item on a provider dashboard.
What the highlights measure
- All-providers complex release-stack — broad scenario across bundled Google Cloud APIs plus Bitbucket, Cloudflare, GitHub, Jira, n8n, Sentry, Slack. Artifacts:
docs/workflows/workflow-complex-release-stack-latest.json, stats underdocs/benchmarks/all-providers-complex-workflow/. - Default multi-provider — Google Cloud (bundled) + Cloudflare + GitHub + Slack + Paperless + Stirling + Tika + Gotenberg (older artifacts or blog copy may still describe older naming). Artifacts:
docs/workflows/workflow-multi-provider-latest.json, stats underdocs/benchmarks/multi-provider-complex-workflow/.
Token estimates in docs often use ~4 characters per token as a heuristic.
Reproduce
- Phase 1 / Phase 2 token repro:
npm run benchmark:tokens→docs/benchmarks/latest.jsonandlatest.md. Full steps:docs/benchmarks/REPRODUCE.md.
Normal usage vs pasted specs
Under typical ClawQL usage, specs stay inside the server; the model issues search / execute and receives bounded tool outputs. Total billed tokens depend on model, history, and response sizes—see the README section “Planning-context numbers vs your API bill” for careful interpretation.
