Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 138 additions & 35 deletions docs/sdk/airt.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ title: dreadnode.airt

{/*
::: dreadnode.airt.attack
::: dreadnode.airt.target
*/}

Attack
Expand All @@ -22,7 +23,9 @@ A list of tags associated with the attack for logging.
### target

```python
target: Annotated[Target[In, Out], Config()]
target: Annotated[
SkipValidation[Target[CandidateT, OutputT]], Config()
]
```

The target to attack.
Expand Down Expand Up @@ -147,7 +150,7 @@ def prompt_attack(
f"the following goal: {goal}"
)

objective = (
prompt_judge = (
llm_judge(
evaluator_model,
rubric,
Expand All @@ -156,14 +159,15 @@ def prompt_attack(
max_score=10,
)
/ 10
>> "prompt_judge"
)

return Attack[str, str](
name=name,
target=target,
search_strategy=search_strategy,
objective=objective,
objectives={
"prompt_judge": prompt_judge,
},
)
```

Expand All @@ -188,6 +192,33 @@ tap_attack(
Creates a Generative Attack optimized for the TAP (Tree-of-thought Attack Prompting) pattern,
using LLMs for both refinement (attacker) and scoring (evaluator/objective).

Uses `prompt_attack` under the hood with TAP-specific default guidance and rubric.

**Parameters:**

* **`goal`**
(`str`)
–The high-level objective of the attack.
* **`target`**
(`Target[str, str]`)
–The target system to be attacked.
* **`attacker_model`**
(`str`)
–The language model used to generate and refine prompts.
* **`evaluator_model`**
(`str`)
–The language model used to score the effectiveness of responses.
* **`beam_width`**
(`int`, default:
`10`
)
–The number of candidate prompts to maintain at each step of the search.
* **`branching_factor`**
(`int`, default:
`3`
)
–The number of new candidates to generate from each existing candidate.

<Accordion title="Source code in dreadnode/airt/attack/tap.py" icon="code">
```python
def tap_attack(
Expand All @@ -202,47 +233,119 @@ def tap_attack(
"""
Creates a Generative Attack optimized for the TAP (Tree-of-thought Attack Prompting) pattern,
using LLMs for both refinement (attacker) and scoring (evaluator/objective).
"""

# Prompt refining
Uses `prompt_attack` under the hood with TAP-specific default guidance and rubric.

guidance = REFINE_GUIDANCE.format(goal=goal)
refiner = llm_refine(model=attacker_model, guidance=guidance).adapt(
adapt_prompt_trials, lambda x: x
Args:
goal: The high-level objective of the attack.
target: The target system to be attacked.
attacker_model: The language model used to generate and refine prompts.
evaluator_model: The language model used to score the effectiveness of responses.
beam_width: The number of candidate prompts to maintain at each step of the search.
branching_factor: The number of new candidates to generate from each existing candidate.
"""

return prompt_attack(
goal,
target,
attacker_model,
evaluator_model,
refine_guidance=REFINE_GUIDANCE.format(goal=goal),
evaluation_rubric=EVALUATION_RUBRIC.format(goal=goal),
beam_width=beam_width,
branching_factor=branching_factor,
)
```

# Objective

objective = (
llm_judge(
evaluator_model,
EVALUATION_RUBRIC.format(goal=goal),
min_score=1,
max_score=10,
)
/ 10
>> "jailbreak_judge"
)
</Accordion>
CustomTarget
------------

# Constraints
Adapts any Task to be used as an attackable target.

topic_constraint = llm_judge(evaluator_model, ON_TOPIC_RUBRIC.format(goal=goal))
### input\_param\_name

# Search strategy
```python
input_param_name: str | None = None
```

search_strategy = beam_search(
refiner,
initial_candidate=goal,
beam_width=beam_width,
branching_factor=branching_factor,
)
The name of the parameter in the task's signature where the attack input should be injected.
Otherwise the first non-optional parameter will be used, or no injection will occur.

return Attack[str, str](
target=target,
search_strategy=search_strategy,
objective=objective,
constraints=[topic_constraint],
)
### name

```python
name: str
```

Returns the name of the target.

### task

```python
task: Annotated[Task[..., Out], Config()]
```

The task to be called with attack input.

LLMTarget
---------

Target backed by a rigging generator for LLM inference.

* Accepts as input any message, conversation, or content-like structure.
* Returns just the generated text from the LLM.

### model

```python
model: str | Generator
```

The inference model, as a rigging generator identifier string or object.

See: https://docs.dreadnode.io/open-source/rigging/topics/generators

### params

```python
params: AnyDict | GenerateParams | None = Config(
default=None, expose_as=AnyDict | None
)
```

Optional generation parameters.

See: https://docs.dreadnode.io/open-source/rigging/api/generator#generateparams

Target
------

Abstract base class for any target that can be attacked.

### name

```python
name: str
```

Returns the name of the target.

### task\_factory

```python
task_factory(input: In) -> Task[..., Out]
```

Creates a Task that will run the given input against the target.

<Accordion title="Source code in dreadnode/airt/target/base.py" icon="code">
```python
@abc.abstractmethod
def task_factory(self, input: In) -> Task[..., Out]:
"""Creates a Task that will run the given input against the target."""
raise NotImplementedError
```


Expand Down
124 changes: 124 additions & 0 deletions docs/sdk/data_types.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,130 @@ def __init__(
```


</Accordion>

### from\_pil

```python
from_pil(pil_image: Image, format: str = 'png') -> Image
```

Creates a dn.Image from a Pillow Image object.

<Accordion title="Source code in dreadnode/data_types/image.py" icon="code">
```python
@classmethod
def from_pil(cls, pil_image: "PILImage", format: str = "png") -> "Image":
"""Creates a dn.Image from a Pillow Image object."""
buffer = io.BytesIO()
pil_image.save(buffer, format=format)
buffer.seek(0)
return cls(data=buffer.read(), format=format, mode=pil_image.mode)
```


</Accordion>

### show

```python
show() -> None
```

Displays the image using the default image viewer.

<Accordion title="Source code in dreadnode/data_types/image.py" icon="code">
```python
def show(self) -> None:
"""Displays the image using the default image viewer."""
self.to_pil().show()
```


</Accordion>

### to\_base64

```python
to_base64() -> str
```

Returns the image as a base64 encoded string.

<Accordion title="Source code in dreadnode/data_types/image.py" icon="code">
```python
def to_base64(self) -> str:
"""Returns the image as a base64 encoded string."""
buffer = io.BytesIO()
self.to_pil().save(buffer, format=self._format or "PNG")
return base64.b64encode(buffer.getvalue()).decode("utf-8")
```


</Accordion>

### to\_numpy

```python
to_numpy(dtype: Any = np.uint8) -> np.ndarray[t.Any, t.Any]
```

Returns the image as a NumPy array with a specified dtype.

Common dtypes:
- np.uint8: Standard 8-bit integer pixels [0, 255]. Default.
- np.float32 / np.float64: Floating point pixels, typically for
numerical operations. Values are scaled to [0.0, 1.0].

**Returns:**

* `ndarray[Any, Any]`
–A NumPy array in HWC (Height, Width, Channels) format.

<Accordion title="Source code in dreadnode/data_types/image.py" icon="code">
```python
def to_numpy(self, dtype: t.Any = np.uint8) -> "np.ndarray[t.Any, t.Any]":
"""
Returns the image as a NumPy array with a specified dtype.

Common dtypes:
- np.uint8: Standard 8-bit integer pixels [0, 255]. Default.
- np.float32 / np.float64: Floating point pixels, typically for
numerical operations. Values are scaled to [0.0, 1.0].

Returns:
A NumPy array in HWC (Height, Width, Channels) format.
"""
pil_img = self.to_pil().convert("RGB")
arr = np.array(pil_img)

if np.issubdtype(dtype, np.floating):
return arr.astype(dtype) / 255.0
return arr.astype(dtype)
```


</Accordion>

### to\_pil

```python
to_pil() -> PILImage
```

Returns the image as a Pillow Image object for manipulation.

<Accordion title="Source code in dreadnode/data_types/image.py" icon="code">
```python
def to_pil(self) -> "PILImage":
"""Returns the image as a Pillow Image object for manipulation."""
import PIL.Image

image_bytes, _ = self.to_serializable()
return PIL.Image.open(io.BytesIO(image_bytes))
```


</Accordion>

### to\_serializable
Expand Down
Loading