Open Source · Apache 2.0

Oubliette Shield

The AI firewall that detects prompt injections, deceives attackers with honeypots, and generates threat intelligence — all in under 2 milliseconds.

Install from PyPI View Source

Three Pillars of AI Defense

Detection alone isn't enough. Oubliette combines detection, deception, and intelligence into a unified platform.

Detection

5-stage tiered ensemble pipeline. 85-90% detection rate with low false positives. Blocks obvious attacks in microseconds, reserves LLM judge for the 5-15% that need it.

F1: 0.98AUC: 0.992ms ML85-90%

Deception

Three deception modes turn attacks into intelligence-gathering operations. Honeypot returns fake data, Tarpit wastes attacker time, Redirect steers conversations.

HoneypotTarpitRedirectHoney Tokens

Intelligence

Every detected attack generates structured threat intelligence. STIX 2.1 export, MITRE ATLAS mapping, IOC extraction, CEF logging for SIEM integration.

STIX 2.1MITRE ATLASIOC ExtractCEF Logging

5-Stage Detection Pipeline

Block obvious attacks in microseconds. Reserve expensive LLM calls for the 5-15% that need them.

Input Sanitizer

<1ms

Strips 9 types of encoding attacks, Unicode obfuscation, and invisible characters before any analysis begins.

Pre-Filter

~10ms

11 pattern-matching rules block obvious prompt injections, jailbreaks, and DAN attacks instantly. 1,550x faster than LLM-only.

ML Classifier

~2ms

LogisticRegression + TF-IDF with 733 features. F1=0.98, AUC=0.99. Catches sophisticated attacks the pre-filter misses.

LLM Judge

12 providers

Only 5-15% of inputs reach the LLM judge. Supports OpenAI, Anthropic, Azure, Bedrock, Vertex, Ollama, and more.

Session Tracker

multi-turn

Accumulates attack signals across conversation turns. Escalates sessions when thresholds are exceeded.

"Most attacks are obvious — a pattern match catches it in 10 milliseconds. Only the truly ambiguous inputs need the full LLM judge."

ML Classifier Details

Purpose-built for speed and accuracy. Runs on every request without adding perceptible latency.

Architecture

ModelLogisticRegression
Feature ExtractionTF-IDF + Structural
Feature Dimensions733
Training Samples1,365
Categories553 benign / 812 malicious

Performance

F1 Score0.98
AUC-ROC0.99
Inference Time~2ms
False Positive RateLow
Cross-Val F1 Mean0.986

12 LLM Backends

Use any LLM provider as your judge backend. Switch with a single config change.

OpenAI

Anthropic

Azure OpenAI

AWS Bedrock

Google Vertex

Google Gemini

Ollama

LiteLLM

Cohere

Mistral

Groq

vLLM

Plus any OpenAI-compatible server

Quick Start

Drop-in integration with your existing stack. Choose your framework:

quickstart.py

from oubliette_shield import Shield

shield = Shield(provider="openai")
result = shield.analyze("user input here")

if result.blocked:
    print("Attack detected!", result.verdict)

app.py

from flask import Flask
from oubliette_shield import Shield, create_shield_blueprint

app = Flask(__name__)
shield = Shield(provider="ollama")
app.register_blueprint(
    create_shield_blueprint(shield)
)

chain.py

from oubliette_shield.integrations.langchain import (
    OublietteCallbackHandler
)

handler = OublietteCallbackHandler(
    shield_url="http://localhost:5000"
)
chain.invoke(
    input,
    config={"callbacks": [handler]}
)

main.py

from fastapi import FastAPI
from oubliette_shield.integrations.fastapi import (
    ShieldMiddleware
)

app = FastAPI()
app.add_middleware(
    ShieldMiddleware,
    shield_url="http://localhost:5000"
)

pip install oubliette-shield[langchain,fastapi,litellm,crewai,haystack]

Detection Capabilities

Comprehensive coverage across all known prompt injection categories.

Instruction override / prompt injection

Persona override / identity manipulation

DAN and jailbreak attempts

Hypothetical framing attacks

Logic traps and indirect prompts

Prompt extraction / system prompt leakage

Context switching attacks

Multi-turn escalation patterns

Encoding and obfuscation attacks

Output manipulation / response steering

Deploy Anywhere

From cloud to air-gapped SCIF environments. Zero cloud dependencies required.

Cloud

Docker/K8s in front of OpenAI, Anthropic, Azure, or Bedrock

On-Premise

Ollama backend, SQLite storage, zero external dependencies

Air-Gapped

SCIF-ready, no internet required, CEF to local SIEM only

New

MCP Server Integration

Use Shield as an MCP server in Claude Desktop, Claude Code, or any MCP-compatible client. Every tool call is security-scanned in real time.

analyze

Scan text for prompt injection and jailbreak attacks. Returns verdict, ML score, and MITRE ATLAS mapping.

validate_tool_call

Validate MCP tool arguments for injection attacks, path traversal, SSRF, and credential leaks.

scan_output

Scan LLM output for secrets, PII, suspicious URLs, invisible text, and data leakage.

get_session

Retrieve session state including threat counts, escalation status, and attack pattern history.

list_honey_tools

List available deception tool definitions for injection into MCP server tool lists.

export_threat_intel

Export STIX 2.1 threat intelligence bundle for a session's detected attacks.

Install and run

pip install oubliette-shield-mcp && oubliette-shield-mcp

Compliance-Ready from Day One

Mapped to every major AI security framework. Audit-ready documentation included.

OWASP LLM Top 10

10/10 categories

OWASP Agentic AI

15/15 categories

MITRE ATLAS

13 techniques

NIST AI RMF 1.0

4 functions

NIST SP 800-53

9 controls

CMMC 2.0

5 domains

CWE

13 identifiers

CVSS v3.1

Severity mapping

NIST CSF 2.0

12 subcategories

Start Protecting Your AI Today

Free and open source. Enterprise support available.

Install from PyPI Contact Sales