Stop Blocking Prompt Injections. Start Trapping Attackers.

The AI security market is where web application security was in 2005. Everyone knows there’s a problem, but most solutions are stuck in the “block and pray” mindset.

Here’s the uncomfortable truth: blocking prompt injections gives attackers instant feedback. They see the wall, adjust their technique, and try again. Within a few iterations, they’re through.

What if instead of blocking, you lied to the attacker?

The Problem with Block-and-Move-On

Most AI firewalls work like this:

User sends input
Firewall checks input against rules
If suspicious, return an error: “I can’t help with that”
Attacker notes the detection, modifies their prompt, tries again

The attacker gets a binary signal: detected or not detected. That’s all they need to iterate.

Studies show unprotected LLMs have a 70-85% attack success rate. The most dangerous attack — roleplay jailbreaking (ATK-006) — succeeds 89.6% of the time against undefended systems.

Enter Oubliette: Block, Deceive, Gather Intelligence

Oubliette Shield takes a fundamentally different approach with three pillars:

Detection

A 5-stage tiered pipeline that catches 85-90% of attacks:

Input → Sanitize (<1ms) → Pre-Filter (~10ms) → ML Classifier (~2ms) → LLM Judge → Session Update

Most attacks are obvious — a pattern match catches them in 10 milliseconds. Only the truly ambiguous inputs (5-15%) need the full LLM judge. This saves massive compute costs.

Deception

Three modes that waste attackers’ time:

Honeypot: Returns convincing fake credentials, fake API keys, fake database entries
Tarpit: Generates verbose stalling responses to waste the attacker’s time
Redirect: Steers the conversation back without revealing detection

The attacker thinks they succeeded. They waste hours exploring fake data. Meanwhile, you’re watching everything.

Intelligence

Every detected attack generates structured threat intelligence:

STIX 2.1 export for MISP, OpenCTI, ThreatConnect
MITRE ATLAS technique mapping (13 techniques)
IOC extraction for automated response
CEF logging for SIEM integration

Getting Started: 3 Lines of Python

from oubliette_shield import Shield

shield = Shield(provider="openai", mode="honeypot")
result = shield.analyze(user_input)

That’s it. Install with extras for your framework:

pip install oubliette-shield[langchain,fastapi,litellm]

The ML Classifier

At the heart of the pipeline is a LogisticRegression + TF-IDF classifier:

Metric	Value
F1 Score	0.98
AUC-ROC	0.99
Inference Time	~2ms
False Positive Rate	Low
Training Samples	1,365
Feature Dimensions	733

Why LogisticRegression over transformers? Because 2ms inference means you can run it on every single request without adding noticeable latency. The pre-filter handles the obvious attacks; the ML model catches the sophisticated ones; the LLM judge handles the truly ambiguous edge cases.

9 Framework Integrations

Drop-in security for every major LLM framework:

LangChain — OublietteCallbackHandler
FastAPI — ShieldMiddleware
LiteLLM — OublietteCallback
LangGraph — create_shield_node
CrewAI — ShieldTaskCallback + ShieldTool
Haystack — ShieldGuard
Semantic Kernel — ShieldPromptFilter
DSPy — shield_assert + ShieldModule
LlamaIndex — OublietteCallbackHandler

Why We Built This

Oubliette Security is a veteran-owned cybersecurity company with roots in purple teaming — cyber deception, threat hunting, and threat intelligence. We’ve spent years bridging offense and defense, and we know that giving attackers feedback is the worst thing you can do.

We build tools that don’t just defend — they fight back.

The core is open source under Apache 2.0 because we believe every AI system deserves protection, not just enterprise deployments.

What’s Next

GitHub — Star the repo
PyPI — Install the package
Documentation — Full API reference

Every AI system deserves a firewall that fights back.