Safety/Red Team

We try to break our own systems before anyone else does.

Red teaming is how we find the things the design review missed. We do it in-house every release, we pay outside researchers to do it independently, and we publish what we learn once a fix is live.

Report something Disclosure policy

Scope

Five things we try to break.

Jailbreaks and instruction overrides.

Prompts and conversations designed to get the model to ignore its instructions or the user’s approval gate. We try the obvious ones, the published ones, and the ones we have not seen yet.

Tool-call abuse.

Cases where the model tries to do something outside the approved action, or where a malicious input convinces the runtime to surface a different action than the one the model intended.

Data leakage.

Attempts to make the model repeat training data, leak prompt context, or surface another user’s session.

Output integrity.

Outputs that look correct but are wrong in dangerous ways: false code, fabricated citations, plausible but wrong safety advice.

Runtime escapes.

Cases where a tool call, plugin, or sandboxed execution finds a way out of its scope on the user’s machine or in our infrastructure.

How we work

In-house and out-of-house.

Our own team should not be the only people allowed to attack our systems. The rules below are how we make sure that is true in practice, not just in writing.

Internal red team, every release.

A small in-house team attacks the model and the runtime against the scope above. Every release ships with their written report, including what they could not break.

External researchers, paid.

We run a bounty for the same scope. Critical findings are paid at the high end of the AI industry rate. We do not require NDAs that prevent you from publishing your findings after disclosure.

Safe harbour for good-faith research.

If you are testing in good faith, on our public products, and you do not exfiltrate other users’ data, we will not pursue legal action against you. This is written down in the disclosure policy and we mean it.

Coordinated disclosure.

We agree a fix window with the reporter. We publish after the fix ships, including the technique, the impact, and credit. If you would rather stay anonymous, we honour that.

Disclosure

How to tell us, and what happens next.

Four steps, written so you can hold us to each one. The full policy is on the disclosure page; this is the short version.

security@blankline.org

/01

Report

Send a write-up and a minimal reproduction to security@blankline.org. Encrypted submission is welcome but not required.

/02

Acknowledge

We acknowledge within one business day, assign a severity, and tell you who is working on it.

/03

Fix

We agree a target date with you. For active risk, the affected capability is paused while we work.

/04

Publish

Once the fix ships, we publish a write-up with credit. You see it before it goes live.

What we publish

The findings, not just the verdict.

For every release we publish what the red team tried, what worked, and what we changed before shipping. For external findings we publish the technique, the impact, and the credit, with the reporter’s consent.

Where a public write-up would help an attacker more than a defender, we redact the specific exploit while keeping the class of issue and the fix visible. We say in the write-up when we have done that and why.

Read the safety research