Agentic AI security curriculum · Security overview
Automated Response and Incident Recovery: Talon, Quarantine, PICERL, and WORM Audits
Talon, Quarantine, PICERL, and WORM Audits
Hello and welcome to Module 20!
Modules 1–19 have built a complete defense-in-depth stack that prevents, detects, and contains threats. Now we close the final gap: the minutes between detection and human response. In an agentic platform an attacker (or a compromised agent) can execute hundreds of tool calls in the time it takes an on-call engineer to acknowledge a PagerDuty page.
Automated containment must fire first — not as a supplement to human response, but as Phase 1 of it. In this module we introduce Talon for instant quarantine, circuit breakers for self-protection, the full PICERL incident response lifecycle, and the immutable WORM audit trail that makes every investigation forensic-grade. By the end you will have a system that responds faster than any human can, while preserving evidence for thorough recovery and learning.
The Detection-to-Response Window
Most damage in a security incident happens in the narrow window between detection and human action.
An automated agent can make hundreds of tool calls per minute. Waiting for a human to read an alert and click “quarantine” is no longer acceptable.
Automated containment (Talon) must act immediately on high-confidence signals. Humans then take over for investigation and remediation. This is not “set it and forget it” automation — it is Phase 1 of a structured incident response process.
Talon Automated Quarantine
Talon is the automated first responder. It is triggered by any of these high-confidence signals:
-
Panguard block threshold exceeded in a single session
-
Falco critical alert (e.g., unexpected exec or binding to 0.0.0.0)
-
Memory integrity check failure
Quarantine actions (executed in seconds):
-
Suspend all active sessions for the affected agent
-
Apply a deny-all egress NetworkPolicy to the agent pod
-
Freeze Vault lease renewal (leases remain valid for the current session but cannot be renewed)
Key properties:
-
Quarantine is fully reversible — it does not destroy forensic evidence.
-
The quarantine event itself is written to the WORM audit trail before any containment action is taken. The trigger and exact timestamp are permanently recorded.
Circuit Breakers
Talon provides per-agent containment. Circuit breakers provide self-protection at the gateway and tool level.
-
Gateway-level circuit breaker: if a single agent session triggers >X Panguard blocks in Y seconds, the entire session is terminated automatically.
-
Tool-level circuit breaker: if a specific tool produces >Z errors in W minutes, the tool is temporarily disabled for all sessions.
Circuit breaker state is logged and alerted. A tripped breaker is a signal that requires human investigation. Reset requires explicit human action — never automatic recovery.
PICERL Incident Response Lifecycle
Once Talon has contained the immediate threat, the structured PICERL process takes over:
-
Prepare: Runbooks are documented, on-call rotation is defined, forensic tools are pre-staged, and WORM access is confirmed.
-
Identify: Talon has already fired; on-call engineer is notified; incident commander is assigned; severity is classified.
-
Contain: Verify quarantine is in effect; expand if needed (full namespace isolation, pipeline halt); preserve forensic snapshot.
-
Eradicate: Identify root cause; remove the malicious or compromised component; patch or reconfigure as needed.
-
Recover: Restore service in a clean state; verify Merkle root continuity; verify audit trail completeness; run smoke tests before traffic is restored.
-
Learn: Post-incident review is completed within 48 hours; STRIDE model and red-team test cases are updated; new Panguard or Falco rules are authored if a gap was exposed.
WORM Audit Trail in Incident Response
The WORM audit trail is the single source of truth for every investigation.
-
Every tool call, token exchange, memory write, and admin action is recorded with timestamps and actor identities.
-
The Merkle root chain provides tamper-evidence proof that the audit trail was never modified after the incident.
-
Given a sessionId, the full session timeline can be reconstructed instantly from the WORM trail.
-
Evidence package: a signed, time-bounded export of all audit events related to the incident is delivered to legal or external investigators without exposing other sessions.
Forensic Preservation Sequence
Evidence preservation always comes before revocation:
-
Snapshot memory store, NATS message log, Vault lease audit log, and pod filesystem before any revocation.
-
Snapshot is written to a separate forensics bucket with access restricted to the IR team only.
-
Revocation (certificate, Vault leases, NATS subscriptions) happens only after the snapshot is confirmed complete.
Never delete the memory store or audit trail during an active investigation. Rushed cleanup trades short-term containment for long-term blindness.
Post-Incident Review Requirements
Every CRITICAL incident requires a mandatory review within 48 hours.
Required outputs:
-
Complete timeline
-
Root cause
-
How the attacker gained access
-
Which controls failed or were bypassed
-
Which controls limited the blast radius
-
Specific changes made to prevent recurrence
Any finding that reveals a Panguard or Falco gap results in a new rule being authored, tested, and deployed before the review is signed off. The signed post-incident report is stored in WORM and referenced in the next quarterly review (Module 25).
Key Takeaways (Memorize These!)
-
Automated containment is Phase 1 of PICERL, not a supplement to it — human response begins after quarantine is already in effect.
-
Circuit breakers are reversible; Talon quarantine is reversible; forensic snapshots are irreversible — the sequence matters.
-
The WORM audit trail is only useful if the Merkle chain is intact — integrity verification is the first action in every investigation.
-
A post-incident review that doesn’t result in a new rule, a runbook update, or a control change is not a post-incident review — it is a post-incident report.
You now have automated response that acts faster than any attacker and a structured incident lifecycle that turns every event into measurable improvement. Detection-to-response is no longer a vulnerability — it is a controlled, forensic-grade process. This completes the operational security layer that makes the entire platform resilient in the face of real incidents.
