Dropstone Heavy 1.6

The strongest open-weight model we measured: #1 open on SWE-bench Pro, ahead of GPT-5.5. Built on GLM-5.2, hosted in the US.

Try Dropstone Read the docs

Built on

GLM-5.2

Hosted in

US (SOC 2)

Approval gate

Every tool call

Refresh cadence

Monthly

Announcements

NewDropstone Heavy 1.622 Jun 2026

Heavy moves to Z.ai GLM-5.2 (744B MoE, ~40B active, 1M context). Highest open-weight score on SWE-bench Pro (62.1%), ahead of GPT-5.5; first open model past 80% on Terminal-Bench 2.1. Behind the Claude frontier, shown honestly on the same axis.

Refresh cadenceOngoing

Heavy tracks the strongest open-weight model on SWE-bench Pro. Z.ai moved 58.4 to 62.1 in a single GLM-5.1 to GLM-5.2 step; re-baselining is how that reaches you without a migration.

Research

Dropstone 1.6: Technical ReportJun 22, 2026

Choosing, hosting, and improving open-weight coding models

Read the research

Dropstone 1.5: Technical ReportJun 2, 2026

Inference, Pricing, and Runtime Safety Architecture for a Versioned, Model-Agnostic Coding Agent

Read the research

The Dropstone D3 Neuro-Symbolic ArchitectureDec 19, 2025

Moving beyond "Monolithic Context" to a deterministic, state-managed runtime for high-assurance engineering agents.

Read the research

Benchmarks

Measured head-to-head.

Standard public coding benchmarks, latest published runs.

Overview

Everything Heavy ships with.

Dropstone Heavy is the tier for the hardest, longest-horizon work. It runs Z.ai's GLM-5.2, a 744B-param Mixture-of-Experts model (roughly 40B active per token) with a 1M-token context. It scores 62.1% on SWE-bench Pro, the highest of any open-weight model and ahead of GPT-5.5 at 58.6%, and is the first open model past 80% on Terminal-Bench 2.1 (81.0%). It does not catch the closed Claude frontier (Opus 4.8 at 69.2%, Fable 5 at 80.3%), and we show that gap on the same axis as the win. For the large majority of production coding, the best open model, US-hosted at a fraction of frontier-closed cost, is now a reasonable default.

Long-horizon refactors

Changes that span many files and hundreds of steps. GLM-5.2's 1M-token context and Terminal-Bench lead keep it on track where weaker open models drift.

Hard SWE-bench-shaped issues

Hand Heavy a failing test and a stack trace and let it read beyond the immediate file, edit, and re-run until green. Its margin shows on the harder instances.

Terminal-driven debugging

Iterative command-line work, the workload Terminal-Bench was built to measure, where GLM-5.2 leads the open field.

Trust & safety

Security comes from the runtime, not the weights.

Read the full system card

The security boundary is the approval gate.

Every Dropstone request is treated as if the model could be adversarial. The CLI requires explicit user approval before any action that writes to disk, runs a shell command, or fetches a URL. No model output is ever auto-executed.

US-hosted inference. You control your data.

Heavy runs on SOC 2-certified, US-based inference providers. Enterprise and commercial data is never stored and never trained on, regardless of any setting. Consumer sessions help improve the open model we run and can be turned off anytime in settings, as text only, never your raw images.

Honest about what we cannot prove.

Heavy is built on GLM-5.2, an open-weight foundation model. Goldwasser et al. (2022) proved no party can prove a closed foundation model is free of embedded behaviors, including Anthropic for Claude and OpenAI for GPT. We say this out loud. The runtime is why model origin does not matter for your code.

Pick your tier

Three tiers. One CLI.

Same approval gate, same US-hosted inference, same zero-retention guarantee across all three. Pick the smallest model that meets the task.

Dropstone Fast

Dropstone Fast 1.6

Low-latency agentic coding for edits, refactors, and high-throughput inline completion. Built on DeepSeek V4 Flash, hosted in the US.

Best for

Inline completion
File-scoped edits
Test scaffolding

Security

Approval gate on every tool call
US-hosted inference
Zero retention
Monthly model refresh

Explore Fast

Dropstone Pro

Dropstone Pro 1.6

The everyday workhorse, built for tool use and day-to-day coding. Built on Kimi K2.7 Code, hosted in the US.

Best for

Full-stack feature work
Tool-driven workflows
Code review at scale

Security

Approval gate on every tool call
US-hosted inference
Zero retention
Monthly model refresh

Explore Pro

You are here

Dropstone Heavy

Dropstone Heavy 1.6

The strongest open-weight model we measured: #1 open on SWE-bench Pro, ahead of GPT-5.5. Built on GLM-5.2, hosted in the US.

Best for

Long-horizon refactors
Hard SWE-bench-shaped issues
Terminal-driven debugging

Security

Approval gate on every tool call
US-hosted inference
Zero retention
Monthly model refresh

Fast is included on every plan. Pro and Heavy are unlocked on the Pro plan ($15/mo) and Max plan ($75/mo).

FAQ

Questions you'd ask in a security review.

When should I use Dropstone Heavy 1.6?

Is my code sent to a Chinese model provider?

No. Inference runs on SOC 2-certified, US-based providers. The model weights are open-source and loaded into those providers' US data centers. Your prompts never touch a foreign network.

What changes when Dropstone Heavy 1.6 gets refreshed?

The version number and benchmarks. The CLI surface, the pricing structure, the approval-gate behavior, and the security model do not change. Existing scripts and CI pipelines continue working without modification.

Do you offer enterprise deployment?

Yes. Dropstone Enterprise extends the same audited-tier platform with VPC, on-premises, and air-gapped deployments, SSO, audit logs, and custom SLAs. Pricing is seat plus usage at API rates, on annual commitments. Read the Enterprise plan overview or contact enterprise@blankline.org.

Ship with Dropstone Heavy 1.6.

Install the CLI, authenticate, and start running approval-gated agentic workflows. No credit card to start.

Try Dropstone Read the docs