Vulnerability Management, Patch Cadence, and Cryptographic Agility

Hello and welcome to Module 26!

Modules 1–25 have given us trusted images, admission control, zero-trust networking, runtime enforcement, immutable memory, GPU isolation, automated response, secure development pipelines, living threat models, and continuous verification. But even the most perfect architecture still runs on software that contains known vulnerabilities the day after they are disclosed.

Module 1 told us to pin everything for supply-chain integrity. Vulnerability management tells us we must move those pins forward when CVEs appear. These two requirements are not in conflict — they are two halves of the same disciplined process. In this module we build a repeatable vulnerability management program with reachability-based triage, strict patch SLOs, zero-downtime rolling updates, dependency automation that respects pins, and planned cryptographic agility for when algorithms must be retired. By the end you will have a system that reconciles pinning and patching without ever creating windows of exposure or operational pain.

The Pinning/Patching Tension

Pinning (digest pinning, manifest hash pinning, lockfiles) gives us reproducibility and supply-chain integrity.
Patching requires us to move those pins forward when a CVE is disclosed.

The tension is real, but solvable. The failure modes are clear:

Never update → known-vulnerable stack.
Update without process → supply-chain risk.

We reconcile them with a single, auditable process that treats every CVE as a deliberate, gated change rather than a reactive scramble.

CVE Triage Criteria for MCP Components

CVSS score alone is not enough. We triage based on reachability in the tool-call dispatch path.

Four triage questions applied to every CVE:

Is the vulnerable component in the tool-call dispatch path?
Is the vulnerable code path reachable from agent input?
Does the exploit require network access from outside the cluster?
Is a public proof-of-concept exploit available?

Component priority ordering (highest to lowest):

Gateway and Panguard (Tier 1)
Vault integration
Memory store
Observability stack
ClawHub skill dependencies (lower priority because they run inside Kata sandboxes)

This reachability-first approach ensures that a medium-severity CVE in the gateway is patched faster than a critical CVE in a non-reachable observability component.

Patch SLOs by Severity

We define time-bound Service Level Objectives (SLOs) with clear escalation paths.

Critical (CVSS ≥9.0 or public PoC against a Tier 1 component)
→ Staging within 8 hours, production within 24 hours
→ Emergency CAB (Change Advisory Board) required
High (CVSS 7.0–8.9 and reachable with chaining)
→ Staging within 48 hours, production within 7 days
→ Standard CAB
Medium (CVSS 4.0–6.9 with limited reachability)
→ Within 30 days
→ Normal sprint cycle
Low (CVSS <4.0)
→ Within 90 days
→ Bundled with quarterly dependency update

Every exception must be documented with the CVE, reason, compensating control, expiry date (maximum 30 days beyond SLO), and security owner approval.

Session-Drain Rolling Update

Patches must never disrupt live agent sessions.

We use a zero-downtime strategy:

preStop lifecycle hook: gateway calls /admin/drain to stop accepting new sessions before the pod terminates.
Deployment strategy: maxSurge: 1, maxUnavailable: 0 — new patched pod starts before the old pod is terminated.
Existing sessions finish on the old pod; new sessions route to the patched pod.
terminationGracePeriodSeconds: 330 (5-minute drain window).

Before production promotion we run a pre-patch smoke test: 10 critical tool calls against the patched gateway in staging.

Rollback criteria (defined before every deploy):

20 % increase in Panguard block rate
Any unhandled 500 error
Any Falco CRITICAL alert in the observation window

Dependency Automation Compatible with Pinning

We use Renovate to keep pins current without removing them.

Configuration:

Renovate updates digest pins, never removes them.
Lockfile-based updates (package-lock.json, Pipfile.lock, Helm Chart.lock).
Patch-level updates auto-merge when all CI gates pass (supply-chain verification, security test suite, skill lint).
Major and minor updates always require human review — never auto-merge.
Harbor allowlist update is a required check before any image digest change can merge.

Weekly batched updates for minor/patch; unbatched PRs for CVE-triggered updates with the appropriate SLO label.

Cryptographic Agility: JWT Algorithm Migration

When an algorithm must be retired (e.g., RS256 → ES256):

Transition period where both algorithms are accepted for verification.
New tokens are issued with the new algorithm from day one of the transition.
Config: acceptedAlgorithms: [RS256, ES256] during the window.
Hard end date is set in advance; after that date RS256 is rejected.

No session disruption occurs — all active tokens were issued after the algorithm switch.

mTLS CA Algorithm Migration

Dual-signing strategy:

Issue a new ECDSA intermediate CA alongside the existing RSA CA.
Both CAs are trusted during the transition window.
New leaf certificates are issued from the ECDSA CA from day one.
RSA CA is removed from trusted roots only after all RSA leaf certificates have expired naturally.

cert-manager ClusterIssuer update triggers new certificate issuance across the mesh.

Memory Merkle Hash Function Migration

When we need to upgrade the hash algorithm (e.g., SHA-256 → SHA3-256):

clawql memory reroot --from sha256 --to sha3-256

Both old and new roots are stored during transition.
A transition record is written to the WORM audit trail.
Historical entries remain verifiable with the old algorithm; new entries use the new algorithm.

Certificate Lifecycle Management

We maintain a central certificate inventory with:

Name, issuer, algorithm, expiry, cert-manager renewal config, owner.
renewBefore: 720h (30 days) — cert-manager renews automatically.
Alert at <14 days remaining to catch any cert-manager failures early.

Annual forced rotation drill in staging: manually expire a certificate and verify that cert-manager renews without interruption.

Key Takeaways (Memorize These!)

Pinning and patching are complementary disciplines; the process that reconciles them is the vulnerability management program.
Triage based on reachability in the tool-call path, not just CVSS score — a critical CVSS in an observability component is less urgent than a medium CVSS in the gateway.
Session-drain rolling updates mean patches can be applied without service interruption.
Cryptographic agility must be planned before the algorithm is deprecated — a migration under pressure produces outages.

You now have a mature vulnerability management program that keeps the platform current without ever sacrificing supply-chain integrity or operational stability. Pinning and patching work together, cryptographic migrations are planned and painless, and every CVE is handled with speed appropriate to its actual risk. This is the operational discipline that keeps the entire security stack fresh and effective for the long term.