Skip to content

Injection & PII Auditor

The Injection Auditor (lucid-llm-judge-auditor) is a comprehensive security node that detects prompt injection attacks, jailbreak attempts, and Personal Identifiable Information (PII) like SSNs, emails, and credit card numbers using LLM-driven guardrails via NeMo.

Use Case

  • Prompt Injection Defense: Block OWASP LLM Top 10 #1 attacks including jailbreaks and instruction override attempts.
  • Regulatory Compliance: Enforce GDPR, CCPA, and HIPAA compliance by ensuring PII never reaches the model.
  • Data Leakage Prevention: Automatically detect and block sensitive identifiers in prompts.

Implementation

This auditor hooks into the Request phase to observe inputs and produce claims. A Cedar policy at the Gateway decides whether to block.

import re
from lucid_auditor_sdk import ClaimsAuditor, claims, serve, Phase
from lucid_schemas import Claim

class LLMJudgeAuditor(ClaimsAuditor):
    """Detects prompt injection and PII in user requests."""

    # PII Patterns
    SSN_PATTERN = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
    EMAIL_PATTERN = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')

    # Injection Patterns
    INJECTION_PATTERNS = [
        "ignore all previous instructions",
        "disregard the above",
        "system prompt:",
        "you are now",
    ]

    @claims(phase=Phase.REQUEST)
    def measure_injection(self, request: dict) -> list[Claim]:
        prompt = request.get("prompt", "").lower()
        matches = [p for p in self.INJECTION_PATTERNS if p in prompt]
        detected = len(matches) > 0

        return [
            Claim(name="injection_risk", type="score_normalized",
                  value=0.9 if detected else 0.0, confidence=0.95 if detected else 1.0),
        ]

    @claims(phase=Phase.REQUEST)
    def measure_pii(self, request: dict) -> list[Claim]:
        prompt = request.get("prompt", "")
        ssn_found = bool(self.SSN_PATTERN.search(prompt))
        email_found = bool(self.EMAIL_PATTERN.search(prompt))
        entities = []
        if ssn_found:
            entities.append("US_SSN")
        if email_found:
            entities.append("EMAIL_ADDRESS")

        return [
            Claim(name="pii_types", type="string_list", value=entities),
            Claim(name="pii_count", type="count", value=len(entities)),
        ]

serve(LLMJudgeAuditor())

Cedar Policy

Claims are evaluated by the Gateway's Cedar policy. Example:

// Block prompt injection with high confidence
@annotation("id", "guardrails-injection-deny")
@annotation("decision", "deny")
forbid (principal, action, resource)
when { context.claims.injection_risk > 0.7 };

// Block high-sensitivity PII (SSN)
@annotation("id", "pii-ssn-deny")
@annotation("decision", "deny")
forbid (principal, action, resource)
when { context.claims.pii_types.contains("US_SSN") };

// Warn on low-sensitivity PII (email) but allow
@annotation("id", "pii-email-warn")
@annotation("decision", "warn")
forbid (principal, action, resource)
when { context.claims.pii_types.contains("EMAIL_ADDRESS") };

Behavior

  • Injection Detection: If a user types "Ignore all previous instructions", the auditor produces injection_risk = 0.9. The Cedar policy evaluates to DENY, and the model is never invoked.
  • PII Blocking: If a user types "My SSN is 123-45-6789", the auditor produces pii_types = ["US_SSN"]. The Cedar policy evaluates to DENY.
  • PII Warning: If a user includes an email address, the auditor produces pii_types = ["EMAIL_ADDRESS"]. The Cedar policy evaluates to WARN but allows the request to proceed.