Content Moderation & PII Guardrails
Enterprise data-safety platform: dual-path PII detection (Microsoft Presidio + LLM judge), Lakera-style bidirectional guardrails, and SOC 2 / GDPR / HIPAA-ready audit trails.
Digital-bank guardrail review profile for AI support
This composite profile models 14 branches and 52 business scenarios where LLM agents must safely handle national IDs, card numbers, addresses, and health insurance IDs. The SkyAIApp guardrail replay benchmark shows:
“We used to keep 12 engineers around two homegrown DLP stacks. SkyAIApp folds Presidio + LLM judge + audit trail into one SDK — and policies are versioned and rolled out gradually.”
— Composite profile, Head of AI Platform
Challenge
Four unavoidable compliance gaps when shipping AI to production
Regex-only PII misses
Pure-regex pipelines miss 30%+ of non-Latin names, mixed addresses, and insurance IDs.
Prompt injection & privilege escalation
Agents get tricked into calling send_email, execute_sql, or unbounded transfers. WAFs don't see semantic attacks.
Concurrent multi-jurisdiction compliance
GDPR, CCPA, PIPL, HIPAA, PCI-DSS each have their own retention and response windows. Hand-rolled glue rots fast.
Audit trail can't be trusted
Re-playing an incident can't prove which model and policy version were active, slowing security-audit evidence preparation.
System architecture
Input → detection → policy engine → redaction & tool access → audited output. Every hop writes an immutable trace, replayable for 90 days.
Six-layer guardrail stack
Prompt-injection defense
Lakera-style classifier · prompt-shield v3<8 ms classifier scores every incoming message; threshold hits get routed to a read-only model or queued for human review.
Real-time PII detection (dual-path)
Microsoft Presidio Analyzer + LLM judgePresidio NER + regex + checksums run on every token. Ambiguous high-impact cases get a fast second opinion from Claude Haiku 4.5 / Gemini 3 Flash.
Configurable redaction
Mask · hash · FPE · tokenize · removePolicy maps each entity type to a strategy — national IDs → format-preserving encryption, emails → domain-only mask, addresses → city-level downgrade.
Output-side content safety
Toxicity · bias · hallucination · policy-tagModel replies pass through 4 classifiers before return. Sensitive topics (self-harm, political, violent) trigger rewrite or refusal templates.
Zero-trust tool access
MCP-native scopes · per-call OPA policyEvery MCP tool call carries scopes and an OPA check. High-risk actions (writes, transfers, email send) require dual-approval or step-up auth.
Tamper-proof audit & replay
Append-only ledger · WORM storage · 90-day replayEach trace pins policy version, model, PII hits, and actions taken. One-click export for regulators and DPOs feeds SOC 2 / DPIA evidence.
SDK integration
Bidirectional guardrails are on by default — you don't hand-wire Presidio and classifiers yourself.
import { SkyAI } from "@skyaiapp/sdk";
const sky = new SkyAI({ apiKey: process.env.SKYAIAPP_API_KEY });
const response = await sky.route({
goal: "stability",
messages: [{ role: "user", content: userInput }],
// 1. Input-side guardrails
guardrails: {
promptInjection: { action: "block", threshold: 0.7 },
pii: {
detector: ["presidio", "llm-judge"], // dual-path
entities: ["PERSON", "SSN", "CREDIT_CARD", "PHONE", "ADDRESS"],
action: "redact", // redact | mask | hash | fpe
strategy: { SSN: "fpe", CREDIT_CARD: "fpe", PHONE: "mask-tail4" },
},
// 2. Output-side guardrails
output: {
toxicity: { action: "rewrite", threshold: 0.5 },
hallucination: { action: "warn", citations: "required" },
},
// 3. Audit
audit: { policyVersion: "pol_2026_05_q2", retentionDays: 90 },
},
// 4. Tool access (MCP-native)
tools: [
{
name: "lookup_customer",
mcpServer: "crm.internal",
scopes: ["customer:read"], // OPA check injected automatically
},
],
});
console.log(response.guardrails.piiHits); // entities found
console.log(response.guardrails.actionsTaken); // how each was handled
console.log(response.routing.traceId); // audit-trail idPII coverage (60+ entity types)
Identity
- National ID
- Passport
- Driver's license
- SSN
- TIN
Financial
- Card (Luhn)
- IBAN
- SWIFT/BIC
- Crypto wallet
- Phone
Health
- Insurance ID
- Medical record #
- NHS Number
- MRN
- Prescriptions
Credentials
- API keys
- JWT
- Private keys
- AWS / Azure / GCP creds
- SSH keys
Compliance framework coverage
| Framework | Coverage | Evidence |
|---|---|---|
| SOC 2 readiness | CC6.1 access controlCC7.2 monitoringCC8.1 change mgmt | Immutable traces + policy-version diffs |
| GDPR / PIPL / CCPA | MinimizationPurpose limitationDSAR rights | DSAR auto-lookup + 30-day export / delete flow |
| HIPAA | §164.312 technical safeguards§164.514 de-identification | PHI fields FPE + BAA template |
| PCI-DSS v4.0 | Req 3 storage protectionReq 10 logging | PAN tokenize + WORM audit |
Modeled results
Dual-path detection beats pure regex by 30+ percentage points.
Guardrails run in parallel with the LLM — no perceptible end-to-end overhead.
Every model + tool call lands in an immutable ledger.
No national-ID / card-number egress in the replay sample.
Rollout cadence
Discovery
Catalog sensitive fields, compliance scope, and current traffic shape.
Shadow deploy
Mirror 5% traffic, calibrate PII / injection thresholds.
Gradual rollout
Tenant- or surface-scoped ramp with false-positive monitoring.
Audit & renewals
DPO export templates + SOC 2 / DPIA evidence packs auto-generated.
Enterprise integrations
SSO + SCIM
DLP signal exchange
SIEM streaming
KMS / FPE keys