AI Sec Reviews
Circuit board technology — illustrating an article on Rebuff Open-Source Prompt Injection Defense in Production
reviews

Rebuff: Open-Source Prompt Injection Defense in Production

Rebuff is a self-hosted prompt injection defense with a multi-layer architecture. The heuristics layer is fast; the LLM-based detection adds coverage. Here's the production configuration that made it viable.

By Marcus Reid · · 8 min read

Rebuff (from ProtectAI) is an open-source prompt injection defense library. The project was released in 2023 and has maintained active development. For teams with compliance requirements that preclude sending user inputs to external APIs, or teams with tight cost constraints, it’s the most production-ready open-source option in the space — the self-hosted counterpart to a managed detector like Lakera Guard.

This is not a typical “here’s the GitHub project” review. We deployed it, tuned it, hit its limits, and found the configuration that made it viable, following our AI security tool evaluation framework.

Architecture

Rebuff’s defense has four layers, each providing different coverage:

1. Heuristics detection: Regular expression and string matching against a curated set of known injection patterns. Fast (~1ms), no external calls, catches the most obvious attacks.

2. Vector similarity detection: Compares the input against a database of known injection attempts using semantic similarity. Catches paraphrased variants of known attacks.

3. LLM-based detection: Passes the input to an LLM (you configure which one) with a meta-prompt asking it to evaluate whether the input is a prompt injection attempt. Most thorough, highest latency.

4. Canary token detection: Embeds a token in the system prompt and checks whether the LLM’s output contains that token (which would indicate the system prompt was leaked or the context was compromised).

You can run any combination of layers. The default runs all four.

Performance with the default configuration

Default configuration detection rates on our test set:

Default configuration latency:

The full-stack latency is too high for synchronous use in interactive applications. This is where the configuration work pays off.

The production configuration that worked

We deployed Rebuff in a tiered configuration:

Tier 1 (all inputs, synchronous): Heuristics + vector similarity only. Latency ~30ms. Catches ~75% of attacks.

Tier 2 (inputs that pass Tier 1 but have risk signals, asynchronous): LLM-based detection runs after the response has already been delivered. If Tier 2 flags, the session gets additional scrutiny on subsequent turns and the input is added to the vector database.

Tier 3 (periodic): Canary token monitoring runs on a sample of sessions to detect context leakage.

This configuration gives us:

The trade-off: Tier 2 doesn’t prevent the first harmful response if Tier 1 misses it. This is acceptable if the application’s harm model allows a one-turn window before detection.

Operational requirements

Rebuff requires:

The infrastructure overhead is real. Budget one engineer-day for initial setup and one engineer-day per quarter for maintenance.

Where it falls short

Encoding-based attacks: The heuristics layer doesn’t decode Base64, ROT13, or other encodings before pattern matching. Adding a preprocessing normalization step before the heuristics layer is worth implementing.

Multilingual: The vector database and LLM detection work across languages, but the heuristics patterns are primarily English. Non-English injection attempts may bypass the heuristics layer more easily.

Maintenance of the pattern database: The heuristics patterns need updating as new attack patterns emerge. The repo updates periodically, but you’re responsible for keeping your deployed version current.

Verdict

Rebuff is the right choice for:

The tiered deployment configuration makes it viable in production. Out of the box, the latency profile requires configuration work before it’s production-appropriate.

For teams comparing self-hosted vs. managed injection detection, the benchmark data at bestllmscanners.com covers the coverage gap between the two approaches.

For more context, AI defense strategies covers related topics in depth.

Sources

  1. Rebuff GitHub Repository
  2. ProtectAI
#rebuff #open-source #prompt-injection #llm-security #self-hosted #production
Subscribe

AI Sec Reviews — in your inbox

Reviews of AI security products and platforms. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments