Agentic AI security curriculum · Security overview
Data Classification and PII Redaction: Never Let Sensitive Data Hit Logs
Module 10 of 20 · Agentic AI Security Curriculum · May 2026
How to use this module
Use it as self-paced study or as instructor-led training. YAML, commands, and policy excerpts are illustrative; map them to your cloud, mesh, identity provider, and agent runtime—substitute your own names, namespaces, and tools while preserving the control intent.
Estimated time: ~30 minutes reading; add time for linked standards and team discussion.
Learning objectives
By the end of this module, you should be able to:
- Distinguish data classification from redaction and logging policy.
- Design redaction-before-write pipelines for SIEM and long-term retention.
- Balance privacy obligations with forensic usefulness.
Prerequisites
Suggested discussion / lab: Pick one diagram in your environment (build, deploy, runtime) and mark where this module’s controls apply; note gaps versus the checklist in the body.
Even with strong runtime protection and sandboxing (Modules 8–9), sensitive data inevitably flows through agent sessions, documents, and tool calls. This module explains how to prevent PII, financial data, and other sensitive information from ever reaching persistent log stores.
Classification vs Redaction
Data classification and redaction are distinct but complementary controls:Classification tells you what data is sensitive and how it should be handled. Redaction ensures sensitive data is removed or masked before it is written to any queryable or long-term storage.
Both are required. Classification without redaction leaves raw PII in logs. Redaction without classification leaves you unable to reason about your data holdings.Organizations should maintain a formal data classification policy with tiers (Public, Internal, Confidential, Restricted) that maps to redaction rules.
Presidio in the Fluent Bit Pipeline
Reference stacks often run Microsoft Presidio as a pipeline stage in Fluent Bit — not as per-pod sidecars.
Why pipeline-level redaction?
One consistent redaction engine for all log sources. Fewer failure modes and surfaces to maintain. Redaction happens before logs reach Loki.
Presidio identifies and redacts PII (names, SSNs, credit cards, medical records, etc.) and financial data in real time as logs are collected.
Redaction-Before-Write for WORM Compliance
All security-relevant logs are written to WORM storage. Because redaction occurs before write:No raw sensitive data ever lands in persistent stores. WORM compliance is maintained without needing record deletion (which defeats WORM). Forensic value is preserved — enough context remains for investigation while PII is removed.
Forensic-Friendly Logging Design
Redaction rules are tuned to balance privacy and usability:Entity replacement with tokens (e.g., [REDACTED_SSN]) rather than full removal. Context around redacted fields is retained where possible. Full unredacted logs (if ever needed for incident response) are available only through strict break-glass procedures with multi-party approval.
Key Takeaways
Redaction must happen before data reaches any persistent log store — never after. Pipeline-level Presidio integration provides consistent, maintainable coverage across the entire platform. Classification policy + redaction-before-write satisfies both privacy regulations and forensic requirements. This approach ensures sensitive data never becomes a liability in logs, even during full incident investigations.
Proper data handling completes the protection of information in motion and at rest, enabling safe monitoring and response in the following modules.
Next module: Model Integrity – Verifying Weights Before Inference.
Further reading (vendor-neutral)
These resources are independent of any single product; use them to deepen the topic for audits, architecture reviews, or procurement discussions.
Commercial training use
You may reuse this curriculum internally or in paid consulting / training engagements. Keep examples aligned to the customer’s actual stack; substitute your own runbooks, tool names, and compliance frameworks (SOC 2, ISO 27001, sector regulators) where cited examples use a reference architecture only.
