Item: Rebuff
Rating: 3
Author: AI Sec Reviews

Rebuff (from ProtectAI) is an open-source prompt injection detector, first released in 2023 under the Apache-2.0 license. For teams with compliance requirements that preclude routing user inputs through third-party detection APIs, or teams that want full control over the detection stack, it remains one of the more frequently referenced open-source options in the space — the self-hosted counterpart to a managed detector like Lakera Guard.

This is a documentation- and architecture-based review: what Rebuff’s layers do according to its README and repository, what it takes to run, and where the project stands today. It follows our AI security tool evaluation framework. Note up front: the repository was archived (made read-only) on May 16, 2025, so what follows describes a project that is stable but no longer under active development.

Architecture

Per its README, Rebuff employs a four-layer defense strategy. The project describes each layer as follows:

1. Heuristics: “Filter out potentially malicious input before it reaches the LLM.” This is a fast, local pass that runs before any model call — no external request required.

2. LLM-based detection: “Use a dedicated LLM to analyze incoming prompts and identify potential attacks.” A separate model is given the input and asked to judge whether it looks like a prompt injection attempt. This layer is the most thorough but adds the latency of a model call.

3. Vector database: “Store embeddings of previous attacks in a vector database to recognize and prevent similar attacks in the future.” By comparing new inputs against embeddings of known attacks, this layer is intended to catch paraphrased variants that string-level heuristics would miss.

4. Canary tokens: “Add canary tokens to prompts to detect leakages, allowing the framework to store embeddings about the incoming prompt in the vector database and prevent future attacks.” A marker is embedded in the prompt; if it later surfaces in output, that signals the prompt was leaked, and the offending input is recorded for future matching.

The README frames these as complementary, with the vector database and canary tokens forming a self-hardening loop — detected attacks feed back into the embedding store so similar future inputs are recognized. The project is explicit that this is not a complete solution: its own disclaimer states that “Rebuff is still a prototype and cannot provide 100% protection against prompt injection attacks.”

What it takes to run

Based on the repository’s setup documentation, a working Rebuff deployment needs:

An LLM for the detection layer. The examples default to OpenAI (GPT-3.5-turbo), configurable to other models.
A vector database. The documented options are Pinecone or Chroma, used to store and query attack embeddings.
The Rebuff server / SDK. Rebuff ships both a Python SDK and a JavaScript/TypeScript SDK, and a self-hostable server (the docs reference Supabase for the self-hosted database and credit/billing management).

The practical implication is that Rebuff is not a single drop-in library: the LLM-based and vector-similarity layers each pull in an external dependency (a model endpoint and a vector store) that you have to provision and operate. Teams choosing it for data-residency reasons should confirm that their configured LLM and vector store also meet those residency requirements, since both see the user input or its embedding.

Design trade-offs to weigh

These follow directly from the architecture rather than from any single benchmark:

Per-layer latency profile. The heuristics layer is local and cheap; the LLM-based layer carries the round-trip cost of a model call. Running every layer on every request inline will be noticeably slower than running heuristics alone. Teams sensitive to interactive latency typically separate the cheap local checks from the model-dependent ones — for example, running heuristics and vector lookup inline and reserving the LLM judgment for higher-risk inputs or asynchronous review.

Encoding-aware attacks. Heuristic and embedding matching operate on the input as presented. Inputs obfuscated via Base64, ROT13, or similar encodings can evade pattern- and similarity-based layers unless you add a normalization/decoding step before detection. This is a general limitation of input-pattern detection, not unique to Rebuff.

Language coverage. The LLM-based and embedding layers can generalize across languages depending on the model and embeddings used, but any English-centric heuristic patterns will be weaker against non-English injection attempts.

Maintenance posture. With the repository archived since May 2025, there are no upstream updates to the heuristics or detection logic. Anyone deploying it today is responsible for maintaining their own fork, keeping the attack corpus current, and tracking the security of its dependencies.

Verdict

Rebuff is worth evaluating for:

Teams that want a self-hosted, inspectable injection detector rather than a managed API
Deployments where the self-hardening vector-feedback design (detected attacks improving future detection) is appealing
Engineers comfortable operating the supporting LLM and vector-store dependencies

The important caveats: it is a prototype by the maintainers’ own description, it is no longer actively maintained as of the May 2025 archive, and getting an acceptable latency profile requires deciding which layers run inline versus asynchronously rather than running all four on every request. As with any single detector, treat it as one layer of defense, not a guarantee.

For teams comparing self-hosted vs. managed injection detection, the benchmark data at bestllmscanners.com ↗ covers the trade-offs between the two approaches.

For more context, AI defense strategies ↗ covers related topics in depth.

Rebuff: Open-Source Prompt Injection Defense, Layer by Layer

Architecture

What it takes to run

Design trade-offs to weigh

Verdict

Sources

AI Sec Reviews — in your inbox

Related

PyRIT Deep Dive: Microsoft's AI Red Teaming Framework in Practice

Lakera Guard: Prompt Injection Detection in Practice

OWASP LLM Top 10 Mitigation Guide: Controls for Every Risk Category (2025 Edition)

Comments