Stop Blocking Prompt Injections. Start Trapping Attackers.
The AI security market is where web application security was in 2005. Everyone knows there’s a problem, but most solutions are stuck in the “block and pray” mindset.
Here’s the uncomfortable truth: blocking prompt injections gives attackers instant feedback. They see the wall, adjust their technique, and try again. Within a few iterations, they’re through.
What if instead of blocking, you lied to the attacker?
The Problem with Block-and-Move-On
Most AI firewalls work like this:
- User sends input
- Firewall checks input against rules
- If suspicious, return an error: “I can’t help with that”
- Attacker notes the detection, modifies their prompt, tries again
The attacker gets a binary signal: detected or not detected. That’s all they need to iterate.
Studies show unprotected LLMs have a 70-85% attack success rate. The most dangerous attack — roleplay jailbreaking (ATK-006) — succeeds 89.6% of the time against undefended systems.
Enter Oubliette: Block, Deceive, Gather Intelligence
Oubliette Shield takes a fundamentally different approach with three pillars:
Detection
A 5-stage tiered pipeline that catches 85-90% of attacks:
Input → Sanitize (<1ms) → Pre-Filter (~10ms) → ML Classifier (~2ms) → LLM Judge → Session Update
Most attacks are obvious — a pattern match catches them in 10 milliseconds. Only the truly ambiguous inputs (5-15%) need the full LLM judge. This saves massive compute costs.
Deception
Three modes that waste attackers’ time:
- Honeypot: Returns convincing fake credentials, fake API keys, fake database entries
- Tarpit: Generates verbose stalling responses to waste the attacker’s time
- Redirect: Steers the conversation back without revealing detection
The attacker thinks they succeeded. They waste hours exploring fake data. Meanwhile, you’re watching everything.
Intelligence
Every detected attack generates structured threat intelligence:
- STIX 2.1 export for MISP, OpenCTI, ThreatConnect
- MITRE ATLAS technique mapping (13 techniques)
- IOC extraction for automated response
- CEF logging for SIEM integration
Getting Started: 3 Lines of Python
from oubliette_shield import Shield
shield = Shield(provider="openai", mode="honeypot")
result = shield.analyze(user_input)
That’s it. Install with extras for your framework:
pip install oubliette-shield[langchain,fastapi,litellm]
The ML Classifier
At the heart of the pipeline is a LogisticRegression + TF-IDF classifier:
| Metric | Value |
|---|---|
| F1 Score | 0.98 |
| AUC-ROC | 0.99 |
| Inference Time | ~2ms |
| False Positive Rate | Low |
| Training Samples | 1,365 |
| Feature Dimensions | 733 |
Why LogisticRegression over transformers? Because 2ms inference means you can run it on every single request without adding noticeable latency. The pre-filter handles the obvious attacks; the ML model catches the sophisticated ones; the LLM judge handles the truly ambiguous edge cases.
9 Framework Integrations
Drop-in security for every major LLM framework:
- LangChain —
OublietteCallbackHandler - FastAPI —
ShieldMiddleware - LiteLLM —
OublietteCallback - LangGraph —
create_shield_node - CrewAI —
ShieldTaskCallback+ShieldTool - Haystack —
ShieldGuard - Semantic Kernel —
ShieldPromptFilter - DSPy —
shield_assert+ShieldModule - LlamaIndex —
OublietteCallbackHandler
Why We Built This
Oubliette Security is a veteran-owned cybersecurity company with roots in purple teaming — cyber deception, threat hunting, and threat intelligence. We’ve spent years bridging offense and defense, and we know that giving attackers feedback is the worst thing you can do.
We build tools that don’t just defend — they fight back.
The core is open source under Apache 2.0 because we believe every AI system deserves protection, not just enterprise deployments.
What’s Next
- GitHub — Star the repo
- PyPI — Install the package
- Documentation — Full API reference
Every AI system deserves a firewall that fights back.