Skip to main content
VisionCanonical vision

ClawQL — Master Enablement Document

Unified Living Reference & Technical Bible — May 2026 Edition Public Document · Open for Community Review & Contribution · Apache 2.0 / MIT

Canonical vision. This file is the authoritative product and architecture reference for ClawQL. Older companions — clawql-modularization-v2.md (modularization + gateway notes) and clawql-modularization.md (v1.9 package matrix) — may lag this document; when they disagree, enablement wins. For what ships in main today, pair with docs/clawql-ecosystem.md, docs/mcp/mcp-tools.md, and docs/readme/configuration.md.


Document Control

FieldValue
Version2026.05
StatusLiving Document
Last UpdatedMay 15, 2026
LicenseApache 2.0 (core); MIT (clawql-pageindex); CC-BY-SA 4.0 (documentation)

Version History

VersionSummary
2026.05Consolidated public master reference. Merged vision, architecture, deployment tiers, and compliance framework.
2026.04Initial modular Effect-TS architecture baseline.
EarlierInternal vision documents (April 2026).

⚠️ Current Platform Status

Read this before anything else.

ClawQL is under active development. This document describes the target modular platform; today most capability lives in the monolithic clawql-mcp npm package while packages are extracted (#306).

Implementation today vs target architecture

TopicToday (main)Target (this document)
Code layoutTypeScript monorepo; clawql-mcp + clawql-ouroboros + mcp-grpc-transportTurborepo packages (clawql-core, clawql-api, …) composed with Effect-TS Layers (§6)
MCP toolssearch, execute, Core audit / cache; optional flags for memory, documents, sandbox, ouroboros, schedule, notifySame search() / execute() surface via clawql-api; optional single-tool host profiles are not required
Ground truth for env/toolsdocs/mcp/mcp-tools.md, docs/readme/configuration.mdPackage boundaries and operator CRD in §4–§13

Package extraction status (modularization epic #306)

PackageAs standalone npm package
clawql-core🔨 In development
clawql-api🔨 In development
clawql-auth🔨 In development
clawql-documents🔨 In development
clawql-memory🔨 In development
clawql-pageindex🔨 In development
clawql-mcp✅ Shipped
clawql-ouroboros✅ Shipped
mcp-grpc-transport✅ Shipped
clawql-data📋 Planned
clawql-automation📋 Planned
clawql-telemetry📋 Planned
clawql-sandbox📋 Planned
clawql-printingpress📋 Planned
clawql-goose📋 Planned
clawql-lending📋 Planned
clawql-legal📋 Planned
clawql-healthcare📋 Planned
clawql-insurance📋 Planned
clawql-supplychain📋 Planned
clawql-government📋 Planned
clawql-manufacturing📋 Planned
clawql-education📋 Planned
clawql-engineering📋 Planned
clawql-blockchain📋 Planned
Kubernetes Operator📋 Planned
Natural Language Dashboard📋 Planned

Capability in clawql-mcp today (may run before packages split)

CapabilityIn clawql-mcp todayNotes
Core search / execute✅ Always onOpenAPI / Discovery + optional native GraphQL/gRPC
audit / cache✅ Always onIn-process; not separate packages
Vault memory✅ Default onmemory_ingest / memory_recall; CLAWQL_ENABLE_MEMORY=0 to hide
Document stack✅ Default onBundled providers + ingest_external_knowledge; CLAWQL_ENABLE_DOCUMENTS=0
Automation🔶 Opt-inschedule, notify behind CLAWQL_ENABLE_*
Sandbox🔶 Opt-insandbox_exec behind CLAWQL_ENABLE_SANDBOX
Ouroboros MCP🔶 Opt-inouroboros_* behind CLAWQL_ENABLE_OUROBOROS
Telemetry🔶 Partial/metrics, Grafana dashboard JSON; full operator sidecar model planned
Printing Press / Goose📋 PlannedSpecified in §14; not MCP tools today
Industry verticals📋 PlannedNo clawql-lending etc. packages yet

This document specifies the intended complete design. Implementation is phased and demand-driven; no fixed delivery dates are set.


Intended Audience & How to Use This Document

AudienceStart here
Quick evaluators§2, §4, §17
Developers & contributors§5, §6, §7, §19
Platform operators§13, §14, §16
Architects§6, §7, §8, §9
Compliance & legal teams§12, §11, §18

Cross-references are provided throughout. All YAML, schemas, and code examples are written against the reference implementation.

Resources


Table of Contents

  1. Executive Summary & Vision
  2. Core Principles
  3. Deployment Tiers & Resource Profiles
  4. Complete Package Ecosystem
  5. Architecture & Dependency Graph
  6. Effect-TS Foundation & Registry System
  7. Data & Infrastructure Stack
  8. Intelligent MCP Gateway (clawql-api)
  9. Horizontal Layers
  10. Vertical Workflows
  11. Security: Defense-in-Depth
  12. Kubernetes Operator & ClawQLInstance CRD
  13. Natural Language Interface & Dashboard
  14. Agent Runtime, Tool Generation & Self-Improvement
  15. Testing, Observability & Operations
  16. Deployment Guides & Quick-Starts
  17. Regulatory & Compliance Readiness
  18. Versioning, Contribution & Ecosystem
  19. Appendices

1. Executive Summary & Vision

1.1 What ClawQL Is

ClawQL is a single intelligent MCP gateway and modular orchestration platform. It lets any agent securely search, reason over, and execute across documents, persistent hierarchical memory, structured data, workflows, and optional on-chain actions — all through one unified, hardened surface with full Defense-in-Depth, natural language operations, and true opt-in modularity.

1.2 Vision Evolution

The original April 2026 concept centred on a unified MCP server as a natural-language gateway to documents and a knowledge graph. By May 2026 this evolved into a rigorous, production-hardened architecture:

  • Effect-TS as the foundational layer for typed, composable, resource-safe pipelines
  • Acyclic modular monorepo with clean Effect Layers
  • clawql-api as the sole intelligent gateway
  • Horizontal layers providing shared capabilities
  • 10+ opt-in vertical packages with zero runtime footprint when disabled
  • Full Defense-in-Depth integration (Kata Containers, Presidio redaction, Merkle auditing, ATR enforcement)
  • Kubernetes Operator with natural-language reconciliation (planned)
  • Persistent-first design including Goose runtime abstraction and Printing Press tool generation (planned)

1.3 Two Forks, One Core, Zero Drift

ClawQL maintains a single core codebase with two supported forks:

  • Public / Web3 Fork (ClawQL-Web3): Includes x402 micropayments, ERC-4337 agent wallets, The Graph, Chainlink, and open community verticals.
  • Regulated Enterprise Fork (ClawQL-MCP / SeeTheGreens): Enhanced compliance controls, full audit provenance, Hyperledger Fabric toggle, and flagship regulated workflows (e.g., lending LOS).

Shared assets make up 95%+ of code. Both forks use the same security baseline, Helm chart, and Operator.

1.4 The Problems ClawQL Solves

Fragmented tooling. Agent systems today bolt together disconnected document stores, vector DBs, workflow engines, and MCP servers with no coherent contract between them. ClawQL provides a single, consistent surface.

Context window explosion. Naively feeding documents into LLM context is expensive and often impossible at scale. ClawQL's PageIndex, GraphQL projection, and token-budgeted recall solve this structurally.

Institutional memory loss. Agent state and generated artefacts vanish on pod restart. ClawQL's persistent-first design — Merkle-rooted, bind-mounted, Memory 2.0-indexed — ensures nothing is lost.

Document intelligence gaps. Most platforms treat documents as raw text. ClawQL runs a full Tika → Gotenberg → Stirling-PDF → Presidio → Paperless pipeline with per-stage auditing.

Regulatory and provenance shortfalls. Regulated industries need chain-of-custody, redaction, and WORM audit trails, not just logs. ClawQL makes this the default, not an afterthought.

Production hardening deficits. LLM tooling is commonly research-grade. ClawQL's circuit breakers, chaos-tested failure isolation, Kata Containers, and ATR enforcement target genuine enterprise readiness.

1.5 Key Differentiators

ComparisonHow ClawQL differs
vs. base GooseAdds persistent memory, document intelligence, compliance controls, and a unified MCP surface. Goose is the execution runtime; ClawQL is the full platform.
vs. Stripe MinionsClawQL is self-hostable, open-source, and regulation-first. Minions targets cloud-native payment workflows; ClawQL targets any regulated domain.
vs. Hermes / OpenClawHermes/OpenClaw are messaging and supervisor layers. ClawQL embeds them as components within a larger orchestration platform.
vs. LangGraph / Semantic KernelClawQL is not an agent framework — it is the infrastructure layer beneath agents. It provides the tools those frameworks call, not the reasoning logic.
vs. generic MCP serversGeneric MCP servers are point integrations. ClawQL is a composable, audited, compliance-aware platform that hosts and manages many MCP surfaces under one gateway.

1.6 Ultimate Goal

ClawQL enables any organisation to run production-grade autonomous agent swarms that maintain perfect long-term memory, operate under strict compliance boundaries, self-heal under load, evolve their own tool surface through on-demand Printing Press generation, and remain fully auditable, privacy-first, and insurable against AI operational risks.


2. Core Principles

These principles are non-negotiable. They are enforced across all packages, the Kubernetes Operator, CI pipelines, admission webhooks, and Effect-TS layer composition.

2.1 Natural Language as the Primary Interface

Every operation — configuration, scaling, debugging, governance, tool generation, vertical enablement, and workflow execution — must be reachable via natural language through the Hermes/OpenClaw Agent Chat or the Dashboard. Direct YAML edits or kubectl commands are emergency-only. The NL-to-tool-call pipeline (§13) translates intent into audited clawql-api.execute() calls.

2.2 Single Intelligent MCP Surface

All agent, human, and system interaction occurs exclusively through clawql-api via unified search() and execute() methods. No direct backend access is permitted. clawql-api owns intelligent routing, ATR enforcement, Presidio redaction, token optimisation via GraphQL projection, Merkle auditing, and dynamic tool registration.

2.3 Effect-TS as the Foundational Effect System

The entire platform is built on Effect-TS for type safety, composable Layers, structured concurrency with Fibers, typed errors, streaming pipelines, and resource management. Traditional async/await and OOP decorators are avoided.

2.4 Strict Separation of Concerns and Acyclic Dependency Graph

  • Primitives live in clawql-core.
  • Universal access lives in clawql-api.
  • Horizontal capabilities sit below verticals.
  • No vertical may import another vertical.
  • All cross-communication routes through clawql-api or gated Memory 2.0 recall.

The graph is enforced by ESLint rules, TypeScript project references, Turborepo, Effect-TS Layer validation, and CI checks.

2.5 Zero Operational Burden via Kubernetes Operator

The Operator performs continuous reconciliation, self-healing, secret rotation, volume management, dynamic MCP registration, and Effect Layer composition. Natural language commands are translated into safe CRD mutations. Day-to-day operations require no manual YAML or kubectl intervention.

2.6 Defense-in-Depth as Non-Negotiable Baseline

Security is implemented in overlapping layers under a "secure the capabilities, assume breach" mindset:

  • Kata Containers / gVisor runtime isolation
  • Panguard MCP proxy with real-time ATR enforcement
  • Presidio redaction at every data boundary (always before Merkle rooting)
  • Merkle trees + Cuckoo filters for immutable auditing
  • ATR claims with vertical RLS and cross-vertical gating
  • WORM audit tables and cryptographic erasure for GDPR

No component may bypass these controls.

2.7 True Opt-in Modularity with Zero Runtime Footprint

All vertical and non-essential horizontal packages are disabled by default. When disabled, they contribute zero runtime code, zero bundle size (tree-shaking), and zero Docker layers. This guarantee is enforceable at both compile time and deployment time.

2.8 Persistent-First Design

Nothing is ephemeral by default. Goose task outputs, Printing Press-generated binaries, document artefacts, and Memory 2.0 graph nodes all persist with full Merkle provenance. Bind-mounted volumes and Memory 2.0 ingestion ensure generated tools and agent state survive pod restarts and cluster upgrades.

2.9 Open, Community-Aligned, and Extensible

clawql-pageindex ships as a completely standalone MIT package with zero ClawQL dependencies. All core interfaces are fully documented. Goose integration uses a thin abstraction so alternative runtimes can be swapped without core changes. Community verticals follow a documented 12-step contribution checklist.

2.10 GDPR + WORM Compliance via Cryptographic Erasure

Personal data is encrypted at ingest with per-subject keys in Vault. Presidio redaction runs before Merkle rooting. WORM audit tables store only non-personal metadata and roots. On erasure request, the subject's key is destroyed, rendering personal data irrecoverable while preserving immutable audit integrity.


3. Deployment Tiers & Resource Profiles

ClawQL runs from a laptop to a large-scale enterprise cluster on the same codebase. The three tiers help operators right-size infrastructure.

Note on resource figures. The RAM numbers below are measured idle baselines on development hardware (modern x86-64, 2024-era Linux kernel). Active workloads — especially document processing (Tika, Presidio) and Goose tasks — will consume substantially more. Treat these as planning minimums, not ceilings. They will be updated as benchmarks on representative hardware are completed.

3.1 Tier 1 — Local Developer

Purpose: Local development, prototyping, personal use, evaluation. Runtime: Docker Compose (preferred) or single-node k3s. No Operator required.

Included:

  • clawql-api (single replica)
  • clawql-memory with SQLite backend
  • clawql-pageindex (embedded)
  • clawql-auth in explicit noAuth mode
  • Paperless-ngx, Apache Tika, Gotenberg
  • Redis (Paperless broker only)

Not included: Sandbox/Kata, Presidio (optional), NATS JetStream, vertical packages, Printing Press, Goose, full Operator.

Approximate idle RAM: ~880 MB Recommended hardware: 4 GB RAM, 2 CPU cores, 40 GB SSD

3.2 Tier 2 — Standard Self-Hosted

Purpose: Team use, early production, single-vertical workloads. Runtime: 3-node Kubernetes cluster (k3s or kubeadm). Operator enabled.

Approximate baseline RAM (no active Goose pods): ~5.5 GB Recommended hardware: 3 × (4-core, 8 GB RAM) nodes, or one 16 GB / 8-core VM for non-HA.

3.3 Tier 3 — Enterprise Production

Purpose: Multi-tenant, multi-vertical, regulated workloads at scale. Runtime: 5+ node Kubernetes cluster with dedicated node pools, Kata Containers, Istio service mesh.

Approximate baseline RAM (no active Goose pods): ~20–28 GB Recommended hardware: 5+ × (8-core, 32 GB RAM) nodes with fast NVMe and 10 Gbps networking.

3.4 Per-Component Resource Reference

These figures are measured at idle under minimal load. "Active" figures are approximate and vary widely with document size, concurrency, and model complexity.

ComponentIdle RAMActive RAMCPU at idle / active
clawql-api~150 MB300–500 MBLow / Medium
Postgres + TimescaleDB~512 MB1–2 GBLow / Medium
Apache Tika~200 MB800 MB–1.5 GBLow / High
Presidio~400 MB1.5–2 GBMedium / High
Goose (active task)512 MB–1 GBMedium / High
NATS JetStream~256 MB512 MB–2 GBLow / Medium

3.5 Cost Model

  • Software: Fully open-source (Apache 2.0 / MIT). No licensing fees.
  • Infrastructure: Pay only for hardware, storage, and cloud resources you provision.
  • Managed offering (optional): Includes Operator management, SLAs, compliance consulting, and priority support.

3.6 Tier Selection Decision Matrix

RequirementTier 1Tier 2Tier 3
Solo developer / prototyping
Team production (single vertical)
Multi-tenant / regulated
Kata / full isolationOptional
Goose + Printing Press at scaleLimited
Full vertical ecosystem1–2Unlimited

4. Complete Package Ecosystem

ClawQL is organised as a Turborepo-managed monorepo with strict layering. All packages follow the principles in §2.

4.1 Always-Enabled Packages

clawql-core 🔨 In development

Foundational types: EntityNode, Edge, ATRClaims, AuditEvent, PageIndexNode, RecallMode, SpecKind, ProviderSpec, Plugin. Also: Merkle utilities, Cuckoo filter, audit ring buffer with WORM semantics, structured error factories, cache helpers, ULID/Snowflake ID generation, normalizeOperationId utility, and base Effect-TS Layers and Schemas.

clawql-api 🔨 In development

Universal intelligent MCP gateway and primary product surface. Implements createApi() factory, unified search() and execute() surface, protocol-aware routing, GraphQL projection, dynamic tool registration, ATR enforcement, redaction hooks, Merkle auditing, circuit breakers, and multi-transport support.

4.2 Default-Enabled Horizontal Layers

PackageStatusResponsibilities
clawql-auth🔨 In developmentMulti-mode authentication, RBAC/ABAC, vertical RLS, ATR claim enrichment, session management
clawql-documents🔨 In developmentComplete document intelligence pipeline (Tika, Gotenberg, Stirling-PDF, Presidio, Paperless NGX)
clawql-memory🔨 In developmentMemory 2.0 hybrid system (vault + graph + PageIndex + Onyx)
clawql-pageindex🔨 In developmentStandalone MIT package — vectorless hierarchical indexing

4.3 Default-Disabled (Opt-In) Horizontal Layers

PackageStatusResponsibilities
clawql-data📋 PlannedPluggable data providers (Valkey, Postgres, DuckDB, SeaweedFS, etc.)
clawql-automation📋 PlannedNATS JetStream scheduling, HITL gates, notifications, workflows
clawql-telemetry📋 PlannedOpenTelemetry + Prometheus (Operator-injected sidecar)
clawql-sandbox📋 PlannedKata Containers / gVisor secure execution
clawql-printingpress📋 PlannedOn-demand generation of signed Go CLIs and MCP servers
clawql-goose📋 PlannedManagement of Goose agent runtimes

4.4 Vertical Packages

All verticals are planned and not yet shipped. They are specified here so that contributors, evaluators, and regulated customers can understand the intended scope and begin integration planning.

PackageDomain
clawql-lendingMortgage, auto, BNPL, commercial underwriting (flagship regulated workflow)
clawql-blockchainHyperledger Fabric, Chainlink, The Graph, ERC-4337, x402
clawql-legalContract intelligence, clause extraction, privilege redaction
clawql-healthcareFHIR, HL7, DICOM, HIPAA de-identification
clawql-insuranceClaims processing, fraud detection
clawql-supplychainBOL, customs, invoice matching, tariff compliance
clawql-governmentPermitting, FOIA, procurement
clawql-manufacturingWork orders, BOM, traceability
clawql-educationLMS, syllabus generation, FERPA
clawql-engineeringMATLAB/Simulink integration

4.5 Already Shipped Packages

  • clawql-mcp
  • clawql-ouroboros
  • mcp-grpc-transport

4.6 Internal modules (inside clawql-core)

Not separate npm packages — live as modules in clawql-core per rearchitecture plan §2:

  • Merkle — tamper-evident roots (merkle-tree today)
  • Cuckoo — ingest deduplication / filters
  • Utils — shared primitives (normalizeOperationId, IDs, etc.)

5. Architecture & Dependency Graph

ClawQL enforces a strict, unidirectional, acyclic architecture for long-term maintainability, compile-time safety, and zero-footprint modularity.

5.1 Canonical Layering Order

Primitives (Effect-TS base)

   clawql-core

   clawql-api  (Intelligent MCP Gateway + Layer Composition Root)

Horizontal Layers
   ├── clawql-auth
   ├── clawql-documents
   ├── clawql-memory (+ clawql-pageindex)
   ├── clawql-data
   ├── clawql-automation
   ├── clawql-telemetry   (Operator-injected sidecar)
   ├── clawql-sandbox
   ├── clawql-printingpress
   └── clawql-goose

Vertical Packages  (all planned, none shipped)
   ├── clawql-lending
   ├── clawql-blockchain
   ├── clawql-legal
   ├── clawql-healthcare
   ├── clawql-insurance
   ├── clawql-supplychain
   ├── clawql-government
   ├── clawql-manufacturing
   ├── clawql-education
   └── clawql-engineering

(Community verticals)

All arrows represent allowed import directions only. No upward or cross-layer imports are permitted except through explicit Effect-TS Layers or clawql-api.

5.2 Full Acyclic Dependency Graph

   clawql-core  (merkle · cuckoo · utils modules)

                   (exports to dependents)

           ┌────────────┴────────────────┐
           │                             │
      clawql-api            clawql-pageindex (MIT standalone)

    ┌──────┼──────────────────────┐
    │      │                      │
clawql- clawql-             clawql-
 auth  documents             memory

    ┌──────┼──────────────────────┼──────────────────────┐
    │      │                      │                      │
clawql- clawql-            clawql-               clawql-
sandbox printingpress       goose              automation

                                            [NATS JetStream]
    ┌──────────────────────────────────────────────────────┐
    │                                                      │
[All Vertical Packages]                         clawql-telemetry

5.3 Strict Dependency Rules & Enforcement

Rules:

  • No vertical package may import another vertical. Cross-vertical communication must route through clawql-api.execute() or gated clawql-memory recall using cross_vertical mode.
  • Horizontal layers may not import other horizontal layers directly (except through clawql-api).
  • clawql-telemetry is never imported; it is injected as an OpenTelemetry sidecar by the Operator.
  • clawql-lending declares clawql-blockchain as an optional peer dependency only.
  • All packages import types and utilities exclusively from clawql-core.
  • clawql-pageindex has zero dependencies on any other ClawQL package.

Enforcement mechanisms:

  • ESLint no-restricted-imports + custom architecture rules
  • TypeScript project references (tsconfig.json)
  • Turborepo dependency graph validation
  • Effect-TS Layer composition (compile-time dependency checks)
  • CI pipeline (madge + architecture diagram diff detection)
  • Operator admission webhooks

5.4 Plugin Interface

Every vertical and major horizontal package implements this interface from clawql-core:

export interface Plugin {
  readonly id: string
  readonly version: string
  readonly vertical?: string

  onRegister(api: ClawQLApi): Effect.Effect<void, ClawQLError, ClawQLApi>

  onIngestHook?(
    node: EntityNode,
    context: IngestContext,
  ): Effect.Effect<EntityNode, ClawQLError, ClawQLApi>

  onRecallFilter?(
    claims: ATRClaims,
    options: RecallOptions,
  ): Effect.Effect<RecallOptions, ClawQLError, ClawQLApi>

  onComplianceReport?(): Effect.Effect<ComplianceReport, ClawQLError, ClawQLApi>

  requiredSpecs?: ProviderSpec[]
  recommendedSpecs?: ProviderSpec[]

  onTeardown?(api: ClawQLApi): Effect.Effect<void, ClawQLError, ClawQLApi>
}

Registration occurs exclusively via Effect Layers.

5.5 ProviderSpec & Registry System

export interface ProviderSpec {
  kind: SpecKind // e.g. "postgres", "duckdb", "valkey", "fabric"
  id: string // unique within instance
  enabled: boolean
  secretRef?: string
  url?: string
  capabilities?: string[]
  options?: Record<string, unknown>
}

The registry performs compile-time and runtime validation of required providers for each vertical. Missing providers are rejected by CI and Operator admission webhooks before deployment.

5.6 Cross-Vertical Communication Rules & Data Lineage Model

  • All cross-vertical operations must use explicit cross_vertical ATR claims with a required purpose field.
  • Results are stamped with lineage metadata: \{ sourceVertical, recallPath, purposeClaim, atrSnapshotAtRecall, timestamp \}.
  • Compliance Center provides traceability queries for all decisions influenced by cross-vertical data.

6. Effect-TS Foundation & Registry System

Implementation plan (phased migration from today’s clawql-mcp monolith): docs/design/effect-ts-modularization-rearchitecture-plan.md — Effect-TS, Turborepo + modularization (#306), and the plugin/gateway model are one coordinated program. Locked choices: Turborepo with clawql-core first; Merkle/Cuckoo inside core; clawql-ouroboros Effect rewrite; Panguard as proxy Plugin; minors + deprecations (majors only when breaks are required).

Effect-TS is the architectural foundation of ClawQL, providing compile-time guarantees for dependencies, errors, resources, and concurrency across dozens of verticals and providers.

6.1 Rationale for Effect-TS

Effect-TS was selected for:

  • Full type safety for errors, resources, and dependencies
  • Composable Layers for declarative dependency injection
  • Structured concurrency with Fibers (prevents leaks and races)
  • Native streaming and backpressure support
  • Excellent testability with in-memory layer substitution
  • Mature ecosystem (Schema, Context, Match, etc.)

Runtime surprises become compile-time errors. Traditional async/await spaghetti and OOP decorators are eliminated.

6.2 Layer Composition Patterns

All functionality is expressed as composable, lazy Effect Layers:

export const DataLayer = Layer.mergeAll(
  ValkeyLive,
  PostgresLive,
  DuckDBLive,
  SeaweedFSLive,
  ProviderRegistryLive,
)

const AppLayer = Layer.mergeAll(
  CoreLayer,
  DataLayer,
  AuthLayer,
  DocumentsLayer,
  MemoryLayer,
  ...enabledVerticalLayers,
)

The Operator dynamically composes only enabled layers at startup. Disabled packages are excluded entirely from the bundle and runtime.

6.3 Provider Registry & Compile-Time Validation

export const LendingVertical = defineVertical({
  name: 'lending',
  requiredProviders: [
    createProviderSpec({ kind: 'postgres', id: 'operational' }),
    createProviderSpec({ kind: 'duckdb', id: 'analytics' }),
  ],
  layer: LendingLayer,
})

Validation occurs in CI and via Operator admission webhooks before CRD application.

6.4 Vertical & Horizontal Registration Pattern

export const LendingLayer = Layer.effect(ClawQLApi, (api) =>
  Effect.gen(function* () {
    yield* api.registerPlugin({
      id: 'lending',
      version: '1.0.0',
      onRegister: (api) =>
        Effect.gen(function* () {
          yield* api.registerTools(lendingTools)
          yield* api.registerSpecs(requiredSpecs)
        }),
      requiredSpecs: lendingSpecs,
      onIngestHook: redactionAndMerkleHook,
    })
  }),
)

6.5 Zero Runtime Footprint for Disabled Packages

  • Helm CRD toggles control inclusion
  • Effect lazy loading + tree-shaking removes disabled code at build time
  • No unused Docker layers or CRDs
  • A minimal Tier 1 deployment (core + memory + documents) has dramatically lower resource usage than a full regulated deployment — on the exact same codebase

6.6 In-Memory Test Layers

const TestLayer = Layer.mergeAll(
  ValkeyTestLayer,
  PostgresTestLayer,
  DuckDBInMemoryLayer,
  LendingTestLayer,
)

it('creates deal room', () =>
  Effect.gen(function* () {
    const result = yield* createDealRoom(input)
    expect(result).toBeDefined()
  }).pipe(Effect.provide(TestLayer)))

No external services required for most integration tests.

6.7 Effect Pipelines in clawql-api

Every execute() call flows through a typed, observable pipeline:

const execute = (action, input) =>
  Effect.gen(function* () {
    const session = yield* validateATR(input.atr)
    yield* Panguard.enforce(session, action)
    const redacted = yield* Presidio.redact(input)
    const route = yield* router.select(action, redacted)
    const result = yield* route.provider.execute(redacted)
    yield* updateMerkle(result)
    return result
  }).pipe(Effect.provide(SecurityLayer))

All steps are instrumented with OpenTelemetry traces.

6.8 Natural Language Dashboard as Effect Triggers

Dashboard and agent commands are translated into the same Effect pipelines. This guarantees perfect consistency between human operators and autonomous agents.


7. Data & Infrastructure Stack

ClawQL uses a tiered, open-source, commercially bundle-friendly infrastructure stack. All components register as ProviderSpec objects into clawql-api via Effect-TS Layers.

7.1 Tiered Storage Architecture

Hot / In-Memory     → Valkey
Transactional       → Postgres + TimescaleDB + pgvector
Analytical          → DuckDB + SeaweedFS (S3-compatible)
Knowledge           → Onyx (semantic) + clawql-memory (hierarchical graph + PageIndex)

Every boundary applies Presidio redaction (where applicable) and Merkle auditing for full provenance.

7.2 Component Details

Hot / In-Memory — Valkey

  • Redis-protocol compatible, BSD 3-clause licensed
  • Caching, rate limiting, session management, feature stores, vector similarity search
  • Primary hot path for ATR claims and transient operation state

Embedded / Local — SQLite

  • Zero-config local memory and per-agent state
  • Default for Tier 1 developer deployments and edge scenarios

Transactional / OLTP — Postgres

  • Core operational data, users, sessions, graph store for Memory 2.0
  • TimescaleDB extension for temporal queries
  • pgvector for hybrid relational + vector search

Analytical / Lakehouse

  • DuckDB (MIT): Embedded columnar analytics, zero-ETL Parquet/S3/Iceberg queries, ML features
  • SeaweedFS (Apache 2.0): S3-compatible object storage for raw files, documents, and generated binaries
  • Iceberg tables for schema evolution and transactional lakehouse semantics

Streaming & Real-Time

  • NATS JetStream (Apache 2.0): Durable messaging, event backbone, and workflow triggers
  • Apache Flink (Apache 2.0): Real-time ETL, document pipeline materialisation, and sync to DuckDB/Onyx

Knowledge & Documents

  • Onyx: Enterprise semantic search with real-time Flink synchronisation (optional)
  • clawql-documents pipeline: Tika → Gotenberg → Stirling-PDF → Presidio → Paperless NGX with hierarchy extraction for PageIndex

7.3 Optional Specialised Layers

  • CouchDB (Apache 2.0): Hyperledger Fabric state DB or edge sync
  • ClickHouse: High-concurrency analytics or advanced vector workloads

Both are fully optional via ClawQLInstance CRD toggles.

7.4 Observability & Telemetry

  • SigNoz (self-hosted, OpenTelemetry-native, ClickHouse backend)
  • Automatic Effect-TS instrumentation for all major operations
  • Per-vertical, per-provider, and per-workflow visibility
  • Privacy-first with zero-egress enforcement by default

7.5 Unified Provider Registration

const infrastructureLayer = Layer.mergeAll(
  ValkeyLayer.live({ url: config.valkeyUrl }),
  PostgresLayer.live({ secretRef: "postgres-uri" }),
  DuckDBLayer.live({ s3: seaweedfsSpec }),
  NATSJetStreamLayer.live({ servers: [...] }),
  // vertical and provider layers added dynamically
);

8. Intelligent MCP Gateway (clawql-api)

clawql-api is the single surface that all agents, humans, and systems interact with. It is the heart of ClawQL.

8.1 Unified Surface

// Natural language discovery across all registered tools and verticals
const results = await clawql.search('latest lending deals for client ABC123')

// Execution with full security, routing, and auditing
const outcome = await clawql.execute('lending.createDealRoom', input, {
  atr: sessionToken,
  projection: `dealRoom { id title amount status counterparty { name } }`,
})

8.2 Protocol-Aware Routing & GraphQL Projection

  • GraphQL supergraph for discovery (search()) and query-style operations
  • Protocol-native handlers for stateful/imperative protocols (Postgres, Redis, NATS, Fabric)
  • Automatic GraphQL field projection for token optimisation and precise data shaping

Projection drastically reduces token usage and prevents over-fetching of sensitive fields.

8.3 Pipeline Inside Every execute() Call

  1. ATR claim validation and enrichment
  2. Panguard MCP proxy enforcement (real-time capability scoping)
  3. Presidio redaction (vertical-aware policies)
  4. Intelligent provider selection based on capabilities and health
  5. Execution through the chosen protocol adapter
  6. Merkle root generation and audit logging
  7. Post-execution hooks (including Ouroboros feedback)

All steps are typed Effect pipelines with full OpenTelemetry tracing.

8.4 Dynamic MCP Registration & Circuit Breakers

  • Printing Press-generated tools and external MCP servers register dynamically
  • Incremental supergraph updates (no full rebuild on registration)
  • Every external tool is wrapped in a circuit breaker (5 failures → open for 30 seconds)
  • Health-check gating: tools must pass /healthz before registration
  • Conflict resolution and quarantine for operationId collisions

8.5 Security Hooks at Every Step

PhaseControls
beforeExecuteATR + Panguard + Presidio redaction
duringExecutionReal-time monitoring, resource limiting
afterExecuteMerkle update + WORM audit write
onErrorStructured error with recovery options

Verticals may register additional domain-specific hooks.

8.6 MCP Tool Surface & Vertical Registration

Tools are dynamically registered using normalised operation IDs (kind__provider__operation). Verticals register via Effect Layers.

8.7 Failure Modes & Resilience

FailureBehaviour
clawql-api pod restartSupergraph rebuilt from persisted specs
Protocol adapter failureDegraded status in supergraph
Circuit breaker openTool temporarily unavailable with automatic recovery
Schema conflict on registrationOld tool preserved; conflict flagged in dashboard and audit log

9. Horizontal Layers

Horizontal layers provide shared foundational capabilities used by all verticals. They sit below vertical packages and are composed via Effect-TS Layers.

9.1 Authentication & Authorization (clawql-auth) 🔨 In development

  • Multi-mode authentication: noAuth (explicit flag only), apiKey, OIDC, SAML, OAuth2, LDAP
  • RBAC + ABAC policy engine with natural language policy updates
  • ATR claim enrichment pipeline (full schema in §11.2)
  • Vertical-specific Row-Level Security (RLS) enforcement
  • Session management with Vault dynamic secret injection and hardware token support (YubiKey)
  • Task-scoped token refresh for long-running Goose workloads

noAuth mode is rejected by Operator admission webhooks in any multi-tenant deployment.

9.2 Document Intelligence Pipeline (clawql-documents) 🔨 In development

Full end-to-end pipeline with failure isolation:

  1. Apache Tika — extraction for 1,000+ formats
  2. Gotenberg — reliable PDF/HTML/Office conversion
  3. Stirling-PDF — OCR, merge, split, visual redaction
  4. Presidio Analyzer — PII, financial, medical, and privilege redaction (always runs before Merkle rooting)
  5. Paperless NGX — long-term archive with auto-tagging and Onyx sync

Key design decisions:

  • Hierarchy tree extraction feeds PageIndex
  • Per-stage Merkle roots
  • Failure isolation: partial results returned with stageErrors array
  • Presidio failure policy: block — ingest never proceeds with unredacted content

9.3 Memory 2.0 (clawql-memory) 🔨 In development

Hybrid persistent memory combining multiple storage models:

LayerBackendPurpose
VaultFilesystem (Obsidian-style)Raw document and note storage
GraphPostgres / SQLiteAdjacency-list store with temporal edges
PageIndexSQLite (default)Vectorless hierarchical tree
OnyxOptional, Flink-syncedSemantic search

Recall Modes: vault, graph, pageindex, hybrid (default), onyx, fabric, cross_vertical (ATR-gated)

Ingest Pipeline (9 steps): LLM extraction → confidence thresholding (default 0.78) → Cuckoo deduplication → Presidio redaction → Merkle rooting → PageIndex insertion → Graph linking → Onyx sync → Ouroboros hooks

Performance targets (measured on Tier 2 hardware: 4-core, 8 GB RAM node; dataset: <250,000 nodes; network: LAN):

OperationTarget
Single-hop recall (≤50 nodes)< 50 ms p99
Hybrid recall (≤250 nodes, 5 hops)< 500 ms p99
cross_vertical recall< 1 second p99

These are design targets, not measured production results. Benchmarks on representative hardware will be published as the platform matures.

Pruning scheduler runs daily, enforcing maxGraphNodes (default 250,000).

9.4 PageIndex — Vectorless Hierarchical Indexing (clawql-pageindex) 🔨 In development

Standalone MIT package. Designed for structural navigation of documents (contracts, patient records, BOMs, syllabi).

Capabilities:

  • Vectorless tree construction and weighted traversal (BFS/DFS)
  • Token-budgeted content synthesis for LLM context windows
  • Multiple storage adapters (SQLite default)
  • Builder, traversal, and MCP hook interfaces

Fully functional without any other ClawQL dependency. Complements Onyx semantic search.

9.5 Remaining Horizontal Layers

PackageStatusSummary
clawql-data📋 PlannedUnified provider lifecycle for Valkey, Postgres, DuckDB, SeaweedFS
clawql-automation📋 PlannedNATS JetStream scheduling, HITL gates, notifications, workflow blueprints
clawql-telemetry📋 PlannedOpenTelemetry + Prometheus + Grafana; injected as sidecar, never imported
clawql-sandbox📋 PlannedKata/gVisor execution with persistent volumes and resource quotas

10. Vertical Workflows

All verticals are planned and not yet shipped. This section specifies the intended design for contributor planning, regulated customer evaluation, and integration design. Implementation is demand-driven with no fixed dates.

Vertical packages extend ClawQL with domain-specific logic while maintaining strict architectural isolation. They are implemented as opt-in Effect-TS plugins.

10.1 Vertical Plugin Philosophy

Verticals are first-class Effect Layers that:

  • Implement the Plugin interface from clawql-core
  • Register domain-specific MCP tools via onRegister
  • Integrate with the shared clawql-documents pipeline and Memory 2.0
  • Declare required providers and compliance matrices
  • Are disabled by default with zero runtime footprint
  • Never import other verticals — all cross-vertical communication routes exclusively through clawql-api or gated cross_vertical recall

10.2 clawql-lending 📋 Planned — Flagship Vertical

Scope: Mortgage, auto, BNPL, payday, and commercial lending workflows.

Planned capabilities:

  • Deal room automation with document pipeline + Presidio redaction
  • Credit analysis and risk scoring (DuckDB + Flink)
  • Underwriting decision engine with DiGiFi plugins
  • Tokenised asset (RWA) issuance on Hyperledger Fabric (optional)
  • Investor reporting and compliance workflows

Flagship use case: SeeTheGreens LOS — full regulated lending platform with end-to-end provenance, ATR controls, and Merkle-rooted audit trails.

Scope: Contract intelligence and legal operations.

Planned tools: clause_extract, risk_flag, precedent_search, redact_privilege, timeline_generate, brief_draft, motion_draft, filing_validate, ethical_wall_check, chain_of_custody_export

Special controls: Privilege and ethical wall enforcement at graph traversal level; cold storage retention aligned with statute of limitations; strict cross-vertical restrictions with explicit purpose claims.

10.4 clawql-healthcare 📋 Planned

Scope: Clinical and administrative healthcare workflows.

Planned tools: fhir_parse, hl7_extract, dicom_analyze, ehr_structure, deidentify_phi, medication_reconcile, phi_erasure_request

Compliance features: HIPAA-sensitive Presidio models; patient-level partitioning in Memory 2.0; cryptographic erasure for GDPR/HIPAA right-to-be-forgotten.

10.5 clawql-insurance 📋 Planned

Scope: Policy and claims lifecycle.

Planned tools: claim_extract, policy_analyze, loss_run_reconcile, fraud_flag, underwriting_score, reserve_calculate

Special policies: Fraud pattern nodes retained indefinitely; NAIC model law compliance matrix.

10.6 clawql-supplychain 📋 Planned

Scope: Procurement-to-payment and trade compliance.

Planned tools: bol_extract, customs_validate, invoice_match, tariff_check, supplier_onboard, esg_compliance_scan

Integrates with ERP systems via OpenAPI/gRPC; OFAC/SDN screening at ingest.

10.7 clawql-government 📋 Planned

Scope: Federal, state, and local agency workflows.

Planned tools: permit_classify, foia_route, tax_form_extract, procurement_validate, audit_generate

Built with FedRAMP-ready defaults and classification level enforcement.

10.8 clawql-manufacturing 📋 Planned

Scope: Digital thread and traceability.

Planned tools: work_order_extract, bom_validate, qc_report_analyze; traceability query ("which finished goods contain lot 4821-B?")

Maintains full forward and backward traceability in Memory 2.0.

10.9 clawql-education 📋 Planned

Scope: Learning management and adaptive content.

Planned tools: syllabus_generate, rubric_create, adaptive_path_recommend; LMS connectors (Canvas, Moodle, Blackboard)

FERPA-compliant student record partitioning.

10.10 clawql-engineering 📋 Planned

Scope: MATLAB/Simulink integration for engineering teams.

Planned tools: matlab_script_execute, simulink_simulate, controls_bode_plot

Graceful degradation: falls back to Python (SciPy/Control) with equivalent code when MATLAB license is unavailable.

10.11 Community & Future Verticals

Any organisation can contribute new verticals using the standardised template and 12-step checklist (§18). New verticals are merged into the unified Helm chart with toggles.

10.12 Cross-Vertical Capabilities

All verticals automatically inherit:

  • Presidio redaction profiles
  • Merkle auditing and Cuckoo deduplication
  • ATR + vertical RLS enforcement
  • Compliance matrix registration

Cross-vertical recall requires explicit elevated ATR claims and stamps results with full data lineage for auditability.


11. Security: Defense-in-Depth

Security in ClawQL is a foundational, overlapping set of controls enforced at every layer. The platform assumes breach and follows one principle: secure the capabilities, not just the language. Containment over prevention.

11.1 Core Philosophy

  • Treat every agent action as potentially malicious
  • Enforce explicit, verified capabilities at runtime
  • Multiple independent layers must fail simultaneously for a violation to occur
  • Full auditability and recoverability are mandatory
  • Presidio redaction always runs before Merkle rooting
  • No component may bypass core security middleware in clawql-api

11.2 ATR Claims Schema

All requests carry enriched ATRClaims (Actor–Tenant–Role):

interface ATRClaims {
  actorId: string
  actorType: 'human' | 'agent' | 'service'
  sessionId: string
  issuedAt: number
  expiresAt: number

  tenantId: string
  tenantTier: 'local' | 'standard' | 'enterprise'

  roles: string[]
  scopes: string[]

  verticals: string[]
  crossVertical: boolean
  crossVerticalPurpose?: string

  memoryPrivileges: {
    read: boolean
    write: boolean
    crossVerticalRead: boolean
    pruneAccess: boolean
  }

  classificationLevel?: 'unclassified' | 'cui' | 'secret' | 'top_secret'
  minimumNecessary?: boolean
  purpose?: string

  requestId: string
}

Claims are JWT-encoded, verified at every layer, and immutable once issued.

11.3 Supply Chain & Build Hardening

  • Trivy + OSV-Scanner + Syft SBOM generation on every build
  • Cosign keyless signing for all container images
  • Kyverno image verification policies in the cluster
  • Reproducible builds for Printing Press artefacts
  • Gitleaks + TruffleHog in CI and pre-commit hooks

11.4 Immutability, Merkle Auditing & Cuckoo Filter

  • Every write (document, memory node, Goose output, Printing Press binary) generates a SHA-256 Merkle root
  • Cuckoo filter provides O(1) probabilistic deduplication at ingest
  • Ring buffer (90 days default) + cold storage bridge for long-term roots
  • WORM audit tables (Postgres rules + SQLite triggers) prevent tampering
  • Legal hold mode locks roots from eviction

11.5 Zero-Trust Identity & Runtime Containment

  • Short-lived ATR JWT tokens with dynamic Vault secrets
  • Default runtime: Kata Containers (strong isolation) with gVisor fallback
  • Default-deny NetworkPolicy + Istio mTLS
  • Strict seccomp profiles and resource quotas in sandbox

11.6 Panguard MCP Proxy & Presidio Redaction

  • Panguard: Real-time MCP proxy (<50ms target) enforcing ATR scoping and prompt/response scanning
  • Presidio: Runs on every document and memory ingest path with vertical-specific models. Failure policy is block — unredacted content is never stored or processed

11.7 Effect Layer Security Hooks

PhaseControls
beforeExecuteATR + Panguard + redaction
duringExecutionReal-time monitoring
afterExecuteMerkle update + WORM audit
Domain hooksVertical-specific rules

11.8 GDPR Right-to-Erasure + WORM Compliance

Solved via cryptographic erasure:

  1. Personal data encrypted at ingest with per-subject Vault keys
  2. Presidio redaction before Merkle rooting
  3. WORM tables store only metadata and roots
  4. Erasure request destroys the subject's key → data becomes permanently undecipherable while audit records remain intact

11.9 Multi-Tenancy Isolation

Enforced at four layers:

LayerMechanism
NetworkIstio NetworkPolicies + dedicated namespaces
DatatenantId filter in every graph traversal
ComputePer-tenant sandbox pods
EncryptionPer-tenant keys at rest

11.10 Observability, Incident Response & Recovery

  • SigNoz with automatic Effect-TS spans
  • Structured audit trails with Merkle roots
  • Automated alerts on ATR violations, Presidio failures, or Cuckoo overfill
  • Point-in-time recovery via snapshots
  • Immutable logs for forensic reconstruction

11.11 Security Deliverables Matrix

Controlclawql-apiVerticalsSandboxDocumentsMemory
Kata/gVisor
Panguard Proxy
Presidio Redaction
Merkle Auditing
ATR + RLS
WORM Audit Tables

11.12 Threat Model Coverage

ThreatPrimary Mitigation
Prompt injection / tool misuseBlocked by Panguard + ATR
Supply-chain attackSBOM + Cosign
Data exfiltrationRedaction + egress controls
Privilege escalationRLS + immutable claims
Audit tamperingMerkle + WORM
Cross-tenant data leakageMulti-tenancy isolation (§11.9)

12. Kubernetes Operator & ClawQLInstance CRD

📋 Planned — not yet shipped

The ClawQL Kubernetes Operator is the autonomic control plane for the platform. Written in Go using controller-runtime, it continuously reconciles ClawQLInstance custom resources, provisions dependent services, composes Effect-TS Layers, and translates natural language commands into safe configuration changes.

12.1 Operator Responsibilities

  • Full declarative reconciliation with exponential backoff and leader election
  • Dynamic Effect-TS Layer composition at API startup
  • Provisioning and scaling of document pipeline services
  • Management of persistent volumes for Printing Press artefacts and Goose state
  • Goose workload pool scaling (default 0 idle replicas)
  • Secret rotation, cert-manager integration, and Istio mTLS sidecar injection
  • RBAC RoleBindings and vertical RLS policy injection per enabled vertical
  • Validation and mutation admission webhooks
  • Natural language to CRD patch translation for Hermes/OpenClaw commands
  • Merkle root consistency verification and Cuckoo filter warm-up jobs
  • Status reporting with detailed .status.conditions[]

Reconciliation interval defaults to 15 seconds and is configurable.

12.2 Full ClawQLInstance CRD Specification

apiVersion: clawql.io/v1alpha1
kind: ClawQLInstance
metadata:
  name: clawql-production
  namespace: clawql
spec:
  tier: enterprise # local | standard | enterprise

  api:
    enabled: true
    replicas: 3
    minReplicas: 2
    maxReplicas: 12
    expose:
      rest: true
      grpc: true
    mcp:
      stdio: true
      http: true
      grpc: true
    bundledProviders:
      - github
      - slack
      - paperless
      - tika
      - gotenberg
    circuitBreaker:
      failureThreshold: 5
      halfOpenProbeIntervalSeconds: 30

  auth:
    enabled: true
    mode: oidc # noAuth requires explicit flag + webhook check
    oidc:
      issuer: https://auth.example.com
      clientId: clawql
      clientSecretRef:
        name: clawql-oidc-secret
        key: clientSecret
    rbac: { enabled: true }
    abac: { enabled: true }
    verticalRLS: true
    multiTenantIsolation: true

  documents:
    enabled: true
    failureIsolation: true
    tika:
      enabled: true
      replicas: 3
    gotenberg:
      enabled: true
      replicas: 3
    stirling: { enabled: true }
    paperless:
      enabled: true
      secretRef: paperless-api-key
    presidio:
      enabled: true
      models: [pii, financial, medical, privilege]
      failurePolicy: block # never skip redaction
      redactBeforeMerkle: true

  memory:
    hybrid: { enabled: true }
    storage:
      backend: postgres
      postgres:
        secretRef: memory-db
    layers:
      vault: true
      graph: true
      pageindex: true
      onyx: true
    ingest:
      confidenceThreshold: 0.78
      presidioEnabled: true
      failureIsolation: true
    recall:
      defaultMode: hybrid
      maxHops: 5
      maxNodes: 250
      tokenBudget: 32000
    pruning:
      enabled: true
      schedule: '0 4 * * *'
      maxGraphNodes: 250000

  sandbox:
    enabled: true
    runtimeClass: kata # or gVisor
    persistentVolumes:
      - name: generated-tools
        mountPath: /opt/clawql/generated-tools
        storageClass: standard
        size: 100Gi
      - name: goose-state
        mountPath: /opt/clawql/goose
        storageClass: standard
        size: 50Gi

  goose:
    enabled: true
    replicas: 0 # default: scale from 0
    maxReplicas: 50
    image: block/goose:v2026.05
    memoryIngest: true
    blueprintSupport: true
    checkpointOnOOM: true

  printingpress:
    enabled: true
    factoryBinaryPath: /usr/local/bin/pp
    outputDir: /opt/clawql/generated-tools
    autoRegisterMcp: true
    autoIngestMemory: true
    binarySigningEnabled: true

  automation:
    enabled: true
    nats: { enabled: true }
    hitl: { enabled: true }

  telemetry:
    enabled: true
    zeroEgress: true

  # Vertical toggles — all planned, none shipped
  lending: { enabled: false }
  blockchain: { enabled: false }
  legal: { enabled: false }
  healthcare: { enabled: false }
  # ... additional verticals follow the same pattern

12.3 Reconciliation, Admission Webhooks & Self-Healing

The Operator:

  • Validates CRD changes against version compatibility and security policies
  • Rejects unsafe configurations (e.g., noAuth in multi-tenant clusters)
  • Performs rolling updates with readiness gates
  • Automatically rolls back to the last known-good state on repeated reconciliation failures
  • Supports natural language rollback commands ("roll back the last two changes")

12.4 Natural Language → CRD Translation

Hermes/OpenClaw commands such as:

  • "scale goose replicas to 12 during business hours"
  • "enable duckdb analytics on seaweedfs lake"
  • "activate healthcare with presidio medical models"

…are parsed, validated, translated into atomic CRD patches, and applied safely through the Operator.


13. Natural Language Interface & Dashboard

📋 Planned — not yet shipped

Natural language is the primary interface for all human and agent interaction with ClawQL.

13.1 NL-to-Tool-Call Pipeline

User / Agent Input (natural language)


   Hermes Supervisor  (LLM with dynamic tool catalog)

   clawql-api.search(query)  →  ranked tools + schemas

   Intent classification + parameter extraction

          ├── Valid     → clawql-api.execute(operationId, params)

          └── Ambiguous → clarification request

   Result formatting  (token-budgeted synthesis)

   Audit log + Merkle entry

The Hermes system prompt is assembled at runtime from current ATR claims, the live tool catalog, relevant Memory 2.0 recall, and static behavioural instructions.

13.2 Hermes Supervisor & OpenClaw Messaging Gateway

  • Hermes: Conversational supervisor responsible for intent parsing, tool selection, and multi-turn dialogue
  • OpenClaw: Stateless messaging gateway handling WebSocket connections, queuing during reconnects, typing indicators, and streaming responses
  • Session state lives entirely in clawql-memory
  • Full gRPC streaming support for low-latency responses

Hallucinated operationIds are rejected by clawql-api.execute() with structured TOOL_NOT_FOUND errors and suggestions. Circular tool calls are automatically detected and halted.

13.3 Dashboard Pages & Capabilities

The ClawQL Dashboard uses only clawql-api.search() and clawql-api.execute() calls. All pages are fully agent-accessible.

PageKey capabilities
Memory ExplorerVault browser, force-directed graph, PageIndex tree, hybrid recall tester, provenance chains, pruning editor
Documents PipelineIngestion queue, drag-and-drop upload, per-stage Merkle logs, before/after redaction preview, quarantine management
Agents & ExecutionLive Goose monitor, task queue, blueprint library, Printing Press tool catalog, sandbox job history, HITL approvals
Tools RegistryAll MCP tools, operationId browser, schemas, usage examples, projected token costs, circuit breaker state
Configuration & VerticalsOne-click toggles, spec registration wizard, visual CRD editor, Effect Layer composition preview
Users & AccessRole/permission manager, ATR claim inspector and simulator, session audit viewer, vertical RLS matrix
ObservabilityPrometheus metrics, OpenTelemetry trace explorer, recall latency heatmaps, Cuckoo filter health
Compliance CenterUnified compliance matrices, audit report generator, chain-of-custody exporter, GDPR erasure workflow, data lineage viewer

13.4 Example Natural-Language Commands

Configuration & Scaling:

  • "enable duckdb analytics on seaweedfs lake with Iceberg support"
  • "scale goose replicas to 20 during business hours and 5 at night"
  • "activate healthcare claims pipeline with presidio medical redaction"

Workflow & Operations:

  • "process this W-2.pdf for underwriting — extract, redact, validate, sign, archive"
  • "create a new lending deal room for client ABC123 and invite underwriters"
  • "run cross_vertical recall between lending and legal for matter XYZ with elevated claims"

Governance:

  • "generate compliance report for all active verticals with Merkle proofs"
  • "roll back the last two configuration changes"
  • "rotate all Presidio models to latest version and reprocess last 500 documents"

14. Agent Runtime, Tool Generation & Self-Improvement

📋 Planned — not yet shipped

ClawQL treats agent runtimes and tool generation as first-class, persistent platform citizens, not ephemeral scripts.

14.1 clawql-goose — Agent Runtime Abstraction

Manages Block's Goose instances as ephemeral or persistent workloads inside the secure sandbox.

Planned features:

  • Default 0 idle replicas; scales to 1+ on demand and returns to 0 on completion
  • Persistent volumes for Goose state that survive pod restarts
  • Automatic output capture and ingestion into Memory 2.0
  • Blueprint support and verification loops
  • Checkpointing on OOM or failure for resumable tasks

AgentRuntime abstraction (defined in clawql-core):

interface AgentRuntime {
  provision(config: AgentConfig): Promise<AgentHandle>
  execute(handle: AgentHandle, task: Task): Promise<TaskResult>
  getTools(handle: AgentHandle): Promise<MCPTool[]>
  captureOutputs(handle: AgentHandle): AsyncIterable<Output>
  teardown(handle: AgentHandle): Promise<void>
}

This abstraction allows swapping Goose for Hermes, a custom agent, or any other MCP-compatible runtime without modifying core ClawQL packages.

14.2 clawql-printingpress — On-Demand Tool Generation

Enables agents to create new agent-native tools on demand.

Planned capabilities:

  • Generates Go CLIs and full MCP servers from natural language descriptions or schemas
  • Builds occur in isolated Kubernetes Jobs with network egress disabled
  • Every binary is Cosign-signed before installation
  • Automatic registration into clawql-api after health-check and circuit-breaker gating
  • Version lifecycle management: old versions archived with supersededBy edges in Memory 2.0
  • Pre-installed high-value CLIs (flight-goat, shopify-goat, etc.)

Security controls:

  • Reproducible builds with pinned base images
  • Signature verification before registration
  • Persistent volume isolation per tenant/vertical

14.3 Ouroboros Evolutionary Loops (clawql-ouroboros) ✅ Shipped

Provides the self-improvement layer for extraction schemas, workflows, and tool quality.

Core mechanism:

  • Seed-based evolutionary loops with clear goals, acceptance criteria, and brownfield context
  • Automatic ingestion of HITL corrections, validation set performance, and agent feedback
  • Postgres-backed event store for lineage and experiment tracking
  • Integration hooks in document ingest and workflow completion paths

Example seed (W-2 extraction evolution):

{
  "seedId": "w2-extraction-v3",
  "goal": "Improve LangExtract schema accuracy for W-2 forms",
  "acceptanceCriteria": [
    "F1 > 0.97 on validation set",
    "no regression on edge cases"
  ],
  "maxGenerations": 8,
  "ontology": ["employer_name", "ein", "wages", "federal_tax_withheld"]
}

14.4 Sandbox Integration & Persistence

All Goose executions and Printing Press builds will run inside clawql-sandbox with:

  • Kata Containers or gVisor isolation
  • Resource quotas and default-deny network policy
  • Bind-mounted persistent volumes for generated artefacts and Goose state
  • Full Merkle auditing and Memory 2.0 ingestion of outputs

15. Testing, Observability & Operations

15.1 Testing Strategy

ClawQL employs a multi-layered testing approach enforced in CI.

Unit tests apply to clawql-core, clawql-pageindex, and internal utilities. Near-100% coverage required for pure functions (Merkle computation, Cuckoo operations, normalizeOperationId, ATR validation).

Integration tests run against live service containers via Docker Compose, using fixture documents (PDF, DOCX, contracts, W-2s, clinical notes). They verify the end-to-end document pipeline, Memory 2.0 ingest/recall, and Presidio redaction.

Contract tests use Pact-style consumer-driven contracts for clawql-api.search() and clawql-api.execute(). Any breaking change to these surfaces triggers a major version bump.

End-to-end tests spin up a minimal Tier 1 Docker Compose stack. Each vertical runs at least one complete workflow (ingest → redaction → recall → workflow execution).

Chaos engineering runs weekly on a staging Tier 2 cluster using Chaos Mesh:

  • Kill Tika/Gotenberg mid-ingest
  • Corrupt Merkle roots
  • Fill Cuckoo filter to capacity
  • Kill Vault or Presidio
  • OOMKill Goose pods
  • Exhaust NATS JetStream storage

All chaos scenarios must recover gracefully with proper alerts and partial result handling.

15.2 Observability Stack

Primary tools:

  • SigNoz — Unified traces, logs, metrics, and exceptions (OpenTelemetry-native, ClickHouse backend)
  • Prometheus + Grafana — Operational metrics and dashboards
  • Jaeger / Langfuse — Distributed tracing (especially for complex workflows)

Automatic instrumentation: Every Effect-TS pipeline emits spans. Per-vertical, per-provider, and per-workflow metrics (latency, error rate, token usage, HITL rate, Ouroboros convergence).

Key dashboards included in Helm chart:

  • IDP Pipeline Overview (documents processed, HITL rate, redaction coverage)
  • Memory 2.0 Health (recall latency, node count, pruning status)
  • Goose Execution (active tasks, checkpoint recovery rate)
  • Security Posture (ATR violations, Presidio failures, circuit breaker trips)

Zero-egress enforcement is on by default.

15.3 Day-2 Operations & Natural Language Admin

Common operational commands:

  • "scale goose replicas to 20 during business hours"
  • "enable duckdb analytics on seaweedfs lake with Iceberg support"
  • "rotate all Presidio models and reprocess last 500 documents"
  • "generate compliance report for lending vertical with Merkle proofs"
  • "roll back the last two configuration changes"

All changes are audited with Merkle roots and visible in the Compliance Center.

Self-healing features:

  • Automatic pod restarts on Layer composition failure
  • Cuckoo filter warm-up on pod restart
  • Queue draining after service recovery
  • Circuit breaker auto-recovery

16. Deployment Guides & Quick-Starts

16.1 Tier 1: Local Developer Quick-Start (Docker Compose)

# 1. Clone and bootstrap
git clone https://github.com/danielsmithdevelopment/ClawQL.git
cd clawql/examples/clawql-local-docker-compose
./bootstrap.sh

# 2. Start the stack
docker compose up -d

# 3. Verify
clawql status
# Dashboard: http://localhost:8080

Included: clawql-api, clawql-memory (SQLite), Paperless-ngx, Tika, Gotenberg, basic auth (noAuth).

Next step: Upload a document or run @hermes process this W-2.pdf in the chat.

Full docker-compose.yml and clawql.local.yaml are in the examples directory.

16.2 Tier 2 / Tier 3: Helm Chart Deployment

# Add repository
helm repo add clawql https://charts.clawql.com
helm repo update

# Install with Tier 2 config
helm upgrade --install clawql clawql/clawql-full-stack \
  --namespace clawql --create-namespace \
  --values values-tier2.yaml

Full Helm chart templates (including KEDA ScaledObjects, ServiceMonitors, and Kyverno policies) are in the repository.

16.3 Vertical Starters

📋 Vertical starters will be published as verticals ship.

Lending W-2 Pipeline (planned):

  1. Upload W-2 → Tika/Gotenberg → Presidio redaction
  2. LangExtract structured output
  3. HITL review in Label Studio (optional)
  4. Merkle audit + Memory 2.0 ingest
  5. Deal room creation via lending.createDealRoom

Trigger via Dashboard/Slack: @openclaw process this W-2.pdf for underwriting

16.4 Argo Workflows & LangGraph Integration

Example Argo Workflow templates and LangGraph node integrations are in the repository.

16.5 One-Command Starters

  • Tier 1 Full Stack: curl -fsSL https://get.clawql.com | sh
  • Regulated Fork: Clone regulated fork + enable Fabric toggle

All starters include observability, basic security policies, and natural language verification steps.


17. Regulatory & Compliance Readiness

ClawQL is engineered for production use in regulated environments. This section covers built-in compliance capabilities. For insurance coverage, see the community-maintained guidance in the repository.

17.1 Compliance Frameworks Supported

DomainFrameworkPrimary mechanism
HealthcareHIPAA, HITECHclawql-healthcare + Presidio + cryptographic erasure
Legal / FinanceABA Model Rules, NAIC model lawsPrivilege enforcement, ethical walls
GovernmentFedRAMP-readyClassification level handling, clawql-government
EducationFERPAFERPA-compliant partitioning, clawql-education
ManufacturingITAR/EAR, ISO 9001, C-TPATclawql-manufacturing
GeneralGDPRCryptographic erasure, SOC 2 Type II controls
AI TransparencyEU AI ActAudit trails, lineage, decision provenance

Note: Vertical-specific compliance features are gated on those verticals shipping. See §4.4 for current status.

17.2 Compliance Center Features

  • Unified compliance matrix across enabled verticals
  • Automated audit report generation with Merkle proofs
  • Data lineage viewer for cross-vertical decisions
  • GDPR erasure request workflow with Vault key destruction
  • Every vertical registers its own compliance matrix entry, aggregated and queryable via natural language

17.3 Self-Hosted Operator Compliance Checklist

  1. Configure Presidio, Merkle auditing, and WORM tables
  2. Enable Kata/gVisor runtime class
  3. Maintain SigNoz audit trails and export procedures
  4. Document ATR and RLS controls for auditors
  5. Obtain appropriate Tech E&O / Cyber Liability / AI liability coverage for your deployment
  6. Review the community insurance guidance in the repository

ClawQL provides templates, evidence packs, and architecture decision records to accelerate regulatory audits and underwriter reviews.


18. Versioning, Contribution & Ecosystem

18.1 Versioning Policy

ComponentSchemeNotes
clawql-core + clawql-apiStrict SemVerMajor bump for any breaking change to public APIs, Effect Layer contracts, or ATR schema
Horizontal packagesIndependent SemVerWithin the same major as core
Vertical packagesIndependent SemVerDeclare compatible core version ranges in peerDependencies
Printing Press artefactsOwn SemVerInside persistent volumes; metadata stored in Memory 2.0
Operator & Helm ChartsCalendar versioning (e.g., 2026.5.0)Aligned with major feature releases

Major version coordination: Any breaking change in clawql-core triggers simultaneous major version increases across all dependent packages. A compatibility shim is provided during the transition period.

18.2 Dependency & License Policy

  • All packages depend only on types and utilities from clawql-core
  • No circular dependencies (enforced by TypeScript project references and Turborepo)
  • Horizontal layers declare optional peer dependencies where appropriate
  • Vertical packages never depend directly on other verticals
  • External libraries must use permissive licenses (MIT, Apache 2.0, BSD)
  • Effect-TS version is pinned across the monorepo
  • Core platform: Apache 2.0; clawql-pageindex: MIT
  • All dependencies scanned with Fossa on every PR; GPL-incompatible licenses are blocked

18.3 12-Step Vertical Contribution Checklist

  1. Fork the official clawql-vertical-template from the monorepo
  2. Implement the Plugin interface from clawql-core
  3. Define requiredSpecs and recommendedSpecs
  4. Register domain-specific tools using normalizeOperationId
  5. Integrate with clawql-documents and Memory 2.0 ingest hooks
  6. Declare compliance matrix entry
  7. Write unit + integration tests (≥80% coverage)
  8. Add end-to-end test in Tier 1 Docker Compose
  9. Update Operator CRD fragment and Helm values
  10. Provide documentation page and example natural-language commands
  11. Submit PR with architecture diagram diff check passing
  12. Community review → merged into unified Helm chart with toggle

Templates, CI validation scripts, and example PRs are in the repository.

18.4 Community & Ecosystem Growth

  • Public GitHub repository with templates, examples, and contribution guidelines
  • RFC process for major features and new verticals
  • Discord and GitHub Discussions for community support
  • Marketplace-ready structure for commercial vertical extensions

Phased priorities (no fixed dates — demand-driven):

  • Core horizontal package stabilisation
  • First vertical implementations (lending as flagship)
  • Kubernetes Operator and natural language dashboard
  • Additional verticals and provider adapters
  • Multi-cluster federation
  • Advanced governance and policy management

19. Appendices

19.1 Core Schemas

ATRClaims

interface ATRClaims {
  actorId: string
  actorType: 'human' | 'agent' | 'service'
  sessionId: string
  issuedAt: number
  expiresAt: number
  tenantId: string
  tenantTier: 'local' | 'standard' | 'enterprise'
  roles: string[]
  scopes: string[]
  verticals: string[]
  crossVertical: boolean
  crossVerticalPurpose?: string
  memoryPrivileges: {
    read: boolean
    write: boolean
    crossVerticalRead: boolean
    pruneAccess: boolean
  }
  classificationLevel?: 'unclassified' | 'cui' | 'secret' | 'top_secret'
  minimumNecessary?: boolean
  purpose?: string
  requestId: string
}

Plugin Interface

export interface Plugin {
  readonly id: string
  readonly version: string
  readonly vertical?: string
  onRegister(api: ClawQLApi): Effect.Effect<void, ClawQLError, ClawQLApi>
  onIngestHook?(
    node: EntityNode,
    context: IngestContext,
  ): Effect.Effect<EntityNode, ClawQLError, ClawQLApi>
  onRecallFilter?(
    claims: ATRClaims,
    options: RecallOptions,
  ): Effect.Effect<RecallOptions, ClawQLError, ClawQLApi>
  onComplianceReport?(): Effect.Effect<ComplianceReport, ClawQLError, ClawQLApi>
  requiredSpecs?: ProviderSpec[]
  recommendedSpecs?: ProviderSpec[]
  onTeardown?(api: ClawQLApi): Effect.Effect<void, ClawQLError, ClawQLApi>
}

ProviderSpec

export interface ProviderSpec {
  kind: SpecKind
  id: string
  enabled: boolean
  secretRef?: string
  url?: string
  capabilities?: string[]
  options?: Record<string, unknown>
}

19.2 Operation ID Convention (normalizeOperationId)

Format: kind__provider__operation (double-underscore separator) Example: lending__underwriting__createDealRoom

  • Single underscores in original names are preserved
  • Internal double-underscores are escaped as __ESC__
  • Published for third-party MCP client compatibility

19.3 Cuckoo Filter & Merkle Design Details

Cuckoo Filter:

  • Must declare capacity at creation (capacity: 500_000 recommended)
  • Default false-positive rate: 0.1%
  • Warm-up from audit table on pod restart
  • At 95% fill → warning; at 100% → fallback to audit table hash check

Merkle Auditing:

  • Ring buffer: 90 days default
  • Cold storage bridge for long-term retention and legal hold
  • Roots generated after Presidio redaction

19.4 Comprehensive Failure Modes Catalog

FailureBehaviour
Presidio unavailableIngest blocked — never skipped
Tika/Gotenberg timeoutPartial results with stageErrors
Goose OOMCheckpoint + resume
Circuit breaker openTool temporarily unavailable; auto-recovery
Cuckoo filter fullFallback to audit table hash check
Vault unavailableCached secrets used; alert triggered
Supergraph build failurePrevious version remains active

All failures are structured, observable, and auditable.

19.5 Glossary

TermDefinition
ATRActor–Tenant–Role. The claims schema carried by every request to enforce identity, tenancy, and role-based access at all layers.
Cuckoo filterA probabilistic data structure providing O(1) deduplication at ingest with configurable false-positive rates. Used to prevent duplicate nodes entering Memory 2.0.
Effect-TSA TypeScript library providing typed effects, composable Layers, structured concurrency, and resource management. The foundational runtime for ClawQL.
GooseBlock's open-source agent runtime. ClawQL manages Goose instances via clawql-goose as ephemeral or persistent workloads.
HermesThe conversational supervisor LLM responsible for intent parsing, tool selection, and multi-turn dialogue in the natural language interface.
Kata ContainersA container runtime using lightweight VMs for strong hardware-level isolation. The default sandbox runtime in Tier 2 and Tier 3.
MCPModel Context Protocol. The standard protocol ClawQL uses for all agent-to-tool communication.
Merkle treeA hash tree used by ClawQL to produce tamper-evident roots for all writes (documents, memory nodes, generated binaries). Stored in WORM audit tables.
NATS JetStreamA durable messaging layer (Apache 2.0) used for event streaming, workflow triggers, and HITL gate notifications.
OnyxAn open-source enterprise semantic search system. Used as the optional semantic recall layer in Memory 2.0.
OpenClawThe stateless WebSocket messaging gateway that sits in front of Hermes and handles connection management, queuing, and streaming.
OuroborosClawQL's evolutionary self-improvement loop system. Evolves extraction schemas, workflows, and tool quality using seed-based iteration and HITL feedback.
PageIndexVectorless hierarchical document indexing. A standalone MIT package that builds and traverses tree structures for structural document navigation without vector embeddings.
PanguardClawQL's real-time MCP proxy. Enforces ATR scoping, scans prompts and responses, and operates in-line with sub-50ms latency targets.
Paperless NGXAn open-source document management system used as the long-term archive in the clawql-documents pipeline.
PresidioMicrosoft's open-source data anonymisation and PII detection library. Runs at every data boundary in ClawQL; failure policy is always block.
Printing PressClawQL's on-demand tool generation system. Produces signed Go CLIs and MCP servers from natural language descriptions or schemas.
RLSRow-Level Security. Postgres-level data filtering enforced per vertical and per tenant throughout the platform.
SeaweedFSAn Apache 2.0-licensed distributed object storage system providing S3-compatible APIs. Used as the analytical lakehouse storage layer.
SeeTheGreensThe regulated enterprise fork of ClawQL, featuring enhanced compliance controls and the flagship lending LOS.
Stirling-PDFAn open-source PDF processing tool used in the clawql-documents pipeline for OCR, merging, splitting, and visual redaction.
ValkeyA BSD 3-clause-licensed, Redis-protocol-compatible key-value store. The hot-tier cache and rate-limiting layer in ClawQL.
WORMWrite Once Read Many. Audit tables that use Postgres rules and SQLite triggers to prevent any modification or deletion of audit records.

19.6 Key References


ClawQL Master Enablement Document · May 2026 Edition · Apache 2.0 / MIT / CC-BY-SA 4.0 Canonical vision document — companions: modularization v1.9 / v2.0. Implementation is phased; this document defines the intended design.