AI SecurityPrompt InjectionCyber DeceptionLLM

Stop Blocking Prompt Injections. Start Trapping Attackers.

Oubliette Security ·

The AI security market is where web application security was in 2005. Everyone knows there’s a problem, but most solutions are stuck in the “block and pray” mindset.

Here’s the uncomfortable truth: blocking prompt injections gives attackers instant feedback. They see the wall, adjust their technique, and try again. Within a few iterations, they’re through.

What if instead of blocking, you lied to the attacker?

The Problem with Block-and-Move-On

Most AI firewalls work like this:

  1. User sends input
  2. Firewall checks input against rules
  3. If suspicious, return an error: “I can’t help with that”
  4. Attacker notes the detection, modifies their prompt, tries again

The attacker gets a binary signal: detected or not detected. That’s all they need to iterate.

Studies show unprotected LLMs have a 70-85% attack success rate. The most dangerous attack — roleplay jailbreaking (ATK-006) — succeeds 89.6% of the time against undefended systems.

Enter Oubliette: Block, Deceive, Gather Intelligence

Oubliette Shield takes a fundamentally different approach with three pillars:

Detection

A 5-stage tiered pipeline that catches 85-90% of attacks:

Input → Sanitize (<1ms) → Pre-Filter (~10ms) → ML Classifier (~2ms) → LLM Judge → Session Update

Most attacks are obvious — a pattern match catches them in 10 milliseconds. Only the truly ambiguous inputs (5-15%) need the full LLM judge. This saves massive compute costs.

Deception

Three modes that waste attackers’ time:

  • Honeypot: Returns convincing fake credentials, fake API keys, fake database entries
  • Tarpit: Generates verbose stalling responses to waste the attacker’s time
  • Redirect: Steers the conversation back without revealing detection

The attacker thinks they succeeded. They waste hours exploring fake data. Meanwhile, you’re watching everything.

Intelligence

Every detected attack generates structured threat intelligence:

  • STIX 2.1 export for MISP, OpenCTI, ThreatConnect
  • MITRE ATLAS technique mapping (13 techniques)
  • IOC extraction for automated response
  • CEF logging for SIEM integration

Getting Started: 3 Lines of Python

from oubliette_shield import Shield

shield = Shield(provider="openai", mode="honeypot")
result = shield.analyze(user_input)

That’s it. Install with extras for your framework:

pip install oubliette-shield[langchain,fastapi,litellm]

The ML Classifier

At the heart of the pipeline is a LogisticRegression + TF-IDF classifier:

MetricValue
F1 Score0.98
AUC-ROC0.99
Inference Time~2ms
False Positive RateLow
Training Samples1,365
Feature Dimensions733

Why LogisticRegression over transformers? Because 2ms inference means you can run it on every single request without adding noticeable latency. The pre-filter handles the obvious attacks; the ML model catches the sophisticated ones; the LLM judge handles the truly ambiguous edge cases.

9 Framework Integrations

Drop-in security for every major LLM framework:

  • LangChainOublietteCallbackHandler
  • FastAPIShieldMiddleware
  • LiteLLMOublietteCallback
  • LangGraphcreate_shield_node
  • CrewAIShieldTaskCallback + ShieldTool
  • HaystackShieldGuard
  • Semantic KernelShieldPromptFilter
  • DSPyshield_assert + ShieldModule
  • LlamaIndexOublietteCallbackHandler

Why We Built This

Oubliette Security is a veteran-owned cybersecurity company with roots in purple teaming — cyber deception, threat hunting, and threat intelligence. We’ve spent years bridging offense and defense, and we know that giving attackers feedback is the worst thing you can do.

We build tools that don’t just defend — they fight back.

The core is open source under Apache 2.0 because we believe every AI system deserves protection, not just enterprise deployments.

What’s Next


Every AI system deserves a firewall that fights back.