What we publish, and how often.
Transparency at a small lab is not a live dashboard. It is a short list of things we say we will tell you, and a schedule we hold ourselves to. This page is both.
Six things we will tell you, every time.
Numbers we use in public, we publish.
Every benchmark and metric in our papers, marketing pages, or model cards is backed by a public methodology. The script that produced the number is part of the release, not an afterthought.
Energy is reported, not hidden.
The Joule Index is our public accounting for the energy cost of frontier coding agents. We report our own numbers there alongside everyone else, with the same definitions.
Safety incidents are public.
Severity-one and severity-two incidents are written up and posted once a fix is in place. Lower-severity issues are aggregated and reported on the quarterly cycle.
Decisions are recorded.
Council reviews that change a public product, public claim, or publication produce a short, dated, written outcome. Where the decision affects the public, the record is public.
Capital and conflicts are disclosed.
Material investors, board structure, and known conflicts of interest sit on the Invest page. Updates carry a date and a reason.
When we are wrong, the correction lives where the claim did.
Retractions and corrections are posted on the original page, with the date and the substance of what changed. We do not quietly edit history.
Three documents on a calendar.
We are not large enough yet to fill a real-time dashboard with honest numbers. Instead, we have committed to three written reports on a schedule. The first issues are below.
Quarterly transparency report.
Compute use, energy mix, incidents, council decisions, and corrections. First issue planned for the end of the next quarter.
Annual safety review.
A longer write-up across the year: red team findings, charter amendments, retractions, and what we learned. Open for external comment.
Energy methodology paper.
The full definitions behind the Joule Index numbers we report for our own systems: what counts as one coding turn, how it is measured, and where the error bars come from.
We have not had a severity-one or severity-two incident to disclose to date. When we do, the write-up will be posted here within five business days of the fix going live.
Read the severity and response policyWhat we have not solved.
A list of the open problems we are honest about. Each one is an active area of research, not a settled feature.
Checking work we cannot follow.
As models get more capable, the humans reviewing them have a harder time telling whether the answer is right. We do not have a clean way to check work that is past us. We use smaller models we can read to help review the bigger ones, but it is not a solved problem.
A model that passes the test, then changes.
A model can look safe during evaluation and behave differently once it is in real use. The approval gate on every tool call is what protects you in practice. We cannot promise the model itself is incapable of the shift.
Long jobs with many steps.
Asking a model to plan and finish a task with many connected steps is still where things break. Breaking the work into smaller pieces helps, but it does not make the problem go away.
Working outside what was trained on.
Quality drops when the work in front of the model is far from what it learned on. We measure this where we can. Where we cannot measure it, we do not ship.
See something we should be saying and are not?
Tell us. The list above is the start, not the end. If we are publishing a number we should be explaining, or hiding one we should be publishing, we want to know.