Agentic AI security curriculum · Security overview
Model Weight Integrity: Verifying Authenticity Before Every Load
Verifying Authenticity Before Every Load
Hello and welcome to Module 16!
Modules 1–15 have secured everything from images and skills to data classification and multi-agent pipelines. Now we protect the model weights themselves — the executable code that determines what the model actually computes.
A model weight file is not just data. Altering even a few bytes can change the model’s behavior as surely as changing the application source code. Because weights are large binary blobs, they are difficult to inspect visually, making them a perfect target for supply-chain attacks.
In this module we treat model weights with the same cryptographic rigor we apply to container images. By the end you will have a system that verifies authenticity on every single load, backed by HSM signing keys, immutable storage, and behavioral monitoring for the rare case where a backdoor was inserted during training.
Why Model Weights Are Executable Code
Think of a model weight file as compiled machine code for the neural network. Changing the weights changes the computation.
Real threats include:
-
Supply-chain compromise of the weight hosting infrastructure (Hugging Face, internal model registry, S3 bucket).
-
Man-in-the-middle attack during download.
-
Insider modification after training but before deployment.
A weight file that can be silently replaced between training and inference means every security control downstream of the model (Panguard, ATR rules, sandboxing) is built on a compromised foundation. We must close this window completely.
Cryptographic Hash Verification
Every approved weight file has a SHA-256 (or stronger SHA-3) hash recorded in a Vault-backed manifest.
The process:
-
At the end of training, the pipeline computes the hash of the final weight file.
-
The training pipeline signs the hash using its OIDC identity and writes it to the signed manifest in Vault.
-
At load time, the serving infrastructure recomputes the hash of the file it just downloaded and compares it to the signed manifest value.
Hash mismatch = load aborted immediately, alert fired, serving pod does not start.
Verification runs on every load — not just at deployment time. This closes the post-deployment modification window.
HSM-Backed Signing Keys
The signing key that approves weight manifests is stored in the same HSM used for Vault unsealing (Module 8).
Properties:
-
Training pipeline authenticates to the HSM via its OIDC workload identity before signing.
-
The private key can never be extracted from the HSM — even a fully compromised training pipeline cannot forge a signature.
-
Annual key rotation with a transition period where both old and new keys are trusted.
Weight Storage and Access Control
Approved weights live in a locked S3 bucket (or equivalent) with these controls:
-
Object Lock in COMPLIANCE mode + versioning enabled.
-
IAM policy: only the signing principal can write new weight versions; serving infrastructure is strictly read-only.
-
All downloads occur over TLS with certificate pinning to the storage endpoint.
Serving infrastructure never stores weights locally beyond the current serving session — they are loaded, verified, and released.
Behavioral Monitoring for Compromised Weights
Cryptographic verification catches post-training tampering, but what about a backdoor inserted during training?
We add behavioral monitoring:
-
At training completion, establish a baseline distribution of model outputs on a fixed evaluation set.
-
In production, monitor the serving output distribution (token distribution, unexpected output categories).
-
Weekly canary evaluation: run the fixed evaluation set against the live model and alert on benchmark score regression >5 %.
Any significant drift triggers an immediate investigation and weight rollback.
Multi-Provider Weight Management
When using weights from external providers (Hugging Face, public model registries):
-
Never download directly to serving infrastructure.
-
Pull to a staging environment, verify the provider’s published hash against the downloaded file.
-
Sign the verified weights with our own HSM key and promote to the approved store.
-
Cross-check the provider’s hash against a secondary source (GPG-signed release notes).
This applies the same verification pipeline used for internally trained models.
Key Takeaways (Memorize These!)
-
Weight files are executable code and must be treated with the same supply-chain controls as container images.
-
Hash verification at every load, not just deployment, closes the post-deployment modification window.
-
HSM-backed signing keys prevent a compromised training pipeline from forging a weight approval.
-
Behavioral monitoring catches backdoored weights that pass hash verification because the modification was made pre-training.
You now have model weights that are cryptographically verified on every load, stored immutably, signed by HSM-backed keys, and continuously monitored for behavioral anomalies. The foundation on which your agents reason is now as trustworthy as every other layer in the platform. This completes the last major supply-chain surface for the intelligence layer itself.
