AI Sec Reviews
Python code on a screen orchestrating automated red-team prompts
reviews

PyRIT Deep Dive: Microsoft's AI Red Teaming Framework in Practice

A long-form review of PyRIT, Microsoft's open-source AI red teaming framework. Its orchestrator/target/converter/scorer/memory architecture, multi-turn attack support, result persistence, and where it fits versus garak — described from the project's own docs.

By Marcus Reid · · 8 min read

We’ve covered PyRIT at a high level. This deep dive goes through its architecture component by component, explains what makes it a security-team tool rather than a research scanner, and is precise about what the project actually claims. All capabilities below are drawn from PyRIT’s repository and documentation.

What PyRIT is, precisely

PyRIT — the Python Risk Identification Tool for generative AI — is an open-source framework whose stated purpose is to “empower security professionals and engineers to proactively identify risks in generative AI systems.” It came out of Microsoft’s AI Red Team, a group with real production AI-security experience, and the design reflects that origin: it’s built around the workflow of a security engineer running repeatable assessments, not around one-shot research breadth.

A note on provenance that matters for anyone pinning dependencies or following links: PyRIT now lives at microsoft/PyRIT. The older Azure/PyRIT repository has been archived and redirects to the Microsoft org. It is released under the MIT license.

The architecture

PyRIT composes red-team runs from a small set of well-defined building blocks. Understanding these five is understanding PyRIT:

Prompt targets

A target is the thing being tested — the model or endpoint that receives prompts. PyRIT abstracts targets so the same attack logic can point at different backends, including OpenAI-style chat endpoints, Azure OpenAI, and other API endpoints. Targets can also serve secondary roles (for example, a target used by a scorer to perform an LLM-as-judge evaluation).

Orchestrators (attacks)

Orchestrators are the engine that drives an assessment: they take a set of prompts (or a strategy for generating them), send them to a target through any configured converters, and route the responses to scorers. The orchestrator is where single-turn versus multi-turn behavior is decided — it owns the loop. This is the component that turns “I have some attack prompts” into “I ran a structured campaign and collected scored results.”

Prompt converters

Converters transform prompts before they reach the target. This is one of PyRIT’s most distinctive ideas: instead of hand-writing every variant of an attack, you apply converters to mutate a base prompt — encoding it, translating it, rephrasing it, or otherwise transforming it to probe whether the transformation slips past a model’s defenses. Converters can be chained, and they expand a small seed set into broad coverage automatically.

Scorers

Scorers decide whether an attack succeeded. PyRIT supports programmatic evaluation of responses — including LLM-as-judge scoring (using a model to assess another model’s output against a rubric) and simpler matching approaches. Because scoring is a first-class, pluggable component, you can define what “success” means for your specific risk and get consistent, machine-readable verdicts rather than eyeballing transcripts.

Memory

PyRIT persists what happens. Its memory subsystem stores prompts, responses, and scores so that runs are durable and comparable over time. This is the feature that most clearly separates PyRIT from a throwaway script: you can run an assessment, change the model, run it again, and compare — did a model update change behavior on a specific attack class? Memory also underpins multi-turn attacks, because the conversation context has to be tracked across turns.

Multi-turn attacks

This is a genuine differentiator. Because orchestrators own the interaction loop and memory tracks conversation state, PyRIT can represent multi-turn attack patterns — assessments where the attack unfolds over several exchanges rather than a single prompt-and-response. Many real-world failures only emerge across a conversation (an agent that’s slowly steered, a model coaxed over several turns), and single-turn scanners structurally cannot represent them. PyRIT can.

Converters in depth — the force multiplier

It’s worth dwelling on converters, because they’re the component that most distinguishes PyRIT’s philosophy. The naive way to build an attack corpus is to hand-write every variation: the base request, the base64-encoded version, the translated version, the leetspeak version, the politely-reframed version. That doesn’t scale, and it goes stale the moment a model’s defenses shift.

PyRIT’s converter model inverts this. You write (or pick) a base prompt and apply transformations programmatically. A converter takes a prompt and returns a mutated prompt; because converters are composable and chainable, a handful of them multiply a small seed set into broad coverage. Encode it, then translate it, then reframe it — each combination is a distinct test, generated rather than authored. This matters for two reasons:

  • Coverage per unit of effort. A small, well-chosen seed set plus a converter chain explores far more of the attack space than the same effort spent writing static variants, and it’s maintainable: improve the seed or add a converter, and every downstream combination updates.
  • Probing the boundary between “instruction” and “obfuscation.” Many real bypasses work by obfuscating an otherwise-blocked request so a classifier or the model itself doesn’t recognize it as prohibited. Converters operationalize exactly that class of probe, which is why they’re a natural fit for a tool built by people who actually run these engagements.

The tradeoff to be honest about: converter chains can generate a large volume of prompts quickly, so you scope them deliberately and lean on scorers (below) to triage the results, rather than reading every transcript by hand.

Single-turn vs multi-turn, in practice

The phrase “multi-turn support” gets thrown around loosely, so it’s worth being concrete about what it buys you. A single-turn test sends one prompt and judges one response — fine for “does this jailbreak string work.” But a meaningful fraction of real failures only appear across a conversation: a model that refuses a direct request but complies after being incrementally reframed over several exchanges, or an agent that’s slowly steered off its task.

Because PyRIT’s orchestrators own the interaction loop and its memory tracks conversation state across turns, the framework can represent these conversational attack patterns rather than only one-shot prompts. In practice that means you can build an orchestrator that adapts subsequent prompts based on prior responses, and the whole exchange — every turn, with its scores — lands in memory for later review. A single-turn scanner structurally cannot model this; it has no notion of state between requests. For teams red-teaming conversational assistants or agents specifically, this is the capability that justifies reaching for PyRIT over a simpler tool.

Why it’s a security-team tool, not a research scanner

The recurring theme is that PyRIT is engineered for a process, not a one-off scan:

  • Repeatability. The orchestrator/target/converter/scorer composition is reusable. You build an assessment once and run it on a schedule or before releases.
  • Trackability. Memory makes results durable and comparable, which is what regression detection requires.
  • Programmatic success criteria. Pluggable scorers mean “did it work” is defined explicitly and evaluated consistently, not judged ad hoc.
  • Integration-friendliness. A focused assessment on one attack category is a normal Python run that fits into a pipeline, rather than a multi-hour research sweep.

That orientation costs breadth. PyRIT’s curated attack/converter set is narrower than a maximalist scanner’s probe library. The tradeoff is deliberate: focused, trackable, repeatable testing over exhaustive one-time coverage.

Coverage versus garak

The comparison with garak is the one most teams care about:

  • garak has the broader probe library and is oriented toward comprehensive research scanning. If the question is “what known attack classes does my model fall to, across the widest possible set,” garak wins on breadth.
  • PyRIT has better workflow integration, result persistence, multi-turn representation, and a converter model that expands coverage from seed prompts. If the question is “let me run repeatable, scored, trackable assessments as part of my security process,” PyRIT wins on fit.

They are complementary, not competitive. A mature program can use garak for periodic breadth sweeps and PyRIT for the ongoing, integrated assessment loop.

Enterprise and Azure context

PyRIT works against any compatible API endpoint, so it is not Azure-locked. That said, for organizations already running LLM applications on Azure, the Azure OpenAI target support and the natural fit with Microsoft’s broader security tooling are real conveniences — shared identity and logging paths reduce setup friction. For non-Azure deployments, the Azure-specific integrations are simply irrelevant rather than an obstacle.

Practical adoption notes

  • Start with one orchestrator and one scorer. The component model is powerful but can feel abstract; build a single working PromptSendingOrchestrator-style run with a clear scorer before composing converters and multi-turn flows.
  • Treat memory as an asset. Point it at a durable store and keep your run history; the comparative value compounds over releases.
  • Define success per risk. The default scorers are a starting point; the payoff comes from scorers that encode what failure actually means for your application.

Verdict

PyRIT is the right choice for security teams running regular, repeatable assessments of LLM applications — especially where multi-turn attacks, result tracking, and CI integration matter. Its architecture is coherent, its memory and scoring make it a process tool rather than a one-off, and its converter model is an elegant way to expand coverage from a small seed set.

It is the less natural fit if you want maximal one-time probe breadth with minimal setup — that’s garak’s lane. For most teams the answer isn’t either/or: PyRIT for the assessment loop, garak for breadth sweeps, and a validation layer like Giskard for application- and RAG-level testing. See our AI security tool evaluation framework for how we weigh them.

Sources

  1. PyRIT GitHub Repository (Microsoft)
  2. PyRIT Documentation
  3. Microsoft AI Red Team
Subscribe

AI Sec Reviews — in your inbox

Reviews of AI security products and platforms. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments