Mythos | Kian Kyars

April 25, 2026

The funny thing about the Claude Mythos incident is that it sounds like the plot of a cyberpunk novel and then, on inspection, seems to have been mostly a permissions bug with better branding.

Anthropic had a model it did not want to release. Not “did not want to release” in the normal product-management sense, where someone is still deciding how much to charge. It said Mythos Preview was a general-purpose frontier model whose coding skill had crossed into a new regime of cybersecurity competence. In Anthropic’s own telling, Mythos could find and exploit zero-days in every major operating system and every major web browser. It had found thousands of high-severity vulnerabilities. It had autonomously written exploit chains that would normally be the province of very good security researchers. So Anthropic created Project Glasswing: give the model to a small set of trusted companies and critical-infrastructure maintainers, let defenders patch first, and keep everyone else waiting outside.

Then a private Discord group got in on launch day.

This has been described vaguely as “using a Mercor leak” or “guessing the URL”, which makes it sound either more magical or more incompetent than it was. The precise public version is this:

A member of the group was working at a third-party contractor for Anthropic.
That person already had legitimate access to Anthropic model-evaluation systems.
The group was interested in unreleased AI models and apparently used bots and other ordinary OSINT tools to scrape public or accidentally exposed clues, including from places like GitHub.
The recent Mercor breach exposed information about Anthropic’s model naming and formatting conventions.
Combining the contractor’s access with those naming conventions, the group inferred the online location / endpoint for Claude Mythos Preview.
The third-party vendor environment accepted that route as usable by the contractor’s context, so the group could run Mythos.

That is the whole scandal, as far as the public record supports it. Not a Mythos-generated super exploit. Not a nation-state cracking Anthropic’s crown jewels. Not proof that Anthropic’s core systems were penetrated. A contractor had a key to a building. Someone else had a map showing that rooms were named in a predictable way. The locked room everyone cared about had a predictable name, and the key worked.

This distinction matters because “they guessed the URL” is too cute. On a properly designed system, guessing a sensitive endpoint should get you a 404, a 403, or a long conversation with the security team. It should not get you the model. The failure was not that the string was guessable. Lots of internal strings are guessable. The failure was that authorization appears to have been bound too loosely to the vendor environment or evaluation account, so that knowing the model’s address was enough to make the existing access useful.

It is also worth saying what is not known. I could not find any credible public source that discloses the literal endpoint, model slug, host name, dashboard URL, or API route. The responsible version of the story therefore cannot be “they went to X.” It is “they reconstructed the location format from leaked and scraped clues, then used a contractor-accessible Anthropic vendor environment to reach it.” If someone claims more precision than that, they should show their evidence.

The source chain is unusually clean. Anthropic’s own Project Glasswing announcement says Mythos Preview is unreleased, general-purpose, and restricted to launch partners plus additional critical-software organizations. Anthropic’s red-team writeup says the model could identify and exploit zero-days across major operating systems and browsers, and gives examples including browser exploit chaining and FreeBSD remote code execution. Bloomberg, summarized by The Verge, The CyberWire, TechRadar, and The Next Web, says the access came through a third-party contractor/vendor environment, using the contractor’s access plus ordinary internet sleuthing. The Verge and TechRadar add the Mercor-link detail: information from the Mercor breach helped the group infer Anthropic’s model-location conventions. Cybernews’ Mercor breach report gives the upstream context: Mercor confirmed impact from the LiteLLM supply-chain incident, while attackers claimed terabytes of source code, databases, and VPN-related data.

There is a small temptation to make this into an Anthropic-specific morality play. This is understandable. Anthropic spends a lot of time talking about safety; here, the dangerous model escaped the rope line before the press cycle had cooled. The irony writes itself, which is usually a warning sign that the first draft is too satisfying.

The less satisfying version is more useful. Frontier labs are becoming access-control companies. The thing they are guarding is no longer just a set of weights in a datacenter. It is a web of API routes, evaluation dashboards, vendor accounts, contractors, cloud deployments, customer allowlists, temporary experiments, GitHub traces, Slack screenshots, leaked source code, and humans who are not quite sure which of these count as production. A model can be “not publicly released” and still be reachable from enough semi-public edges that a determined hobbyist group can triangulate it.

This is the same old security lesson, but with a stranger payload. Do not protect the sensitive system by keeping its address secret. Protect it by making sure the wrong badge cannot open the door, even if the visitor knows exactly where the door is.

The Mythos incident is not primarily evidence that AI cyber models are uncontrollable. It is evidence that the industry’s access systems are being asked to handle objects whose risk profile changed faster than the surrounding bureaucracy. Yesterday’s model-preview workflow was: let contractors poke the new model, collect evals, make sure the dashboard works. Today’s model-preview workflow is, apparently: defend an exploit-capable system whose mere presence in a vendor environment may become a target.

Anthropic says it has found no evidence that the access reached beyond the third-party vendor environment or affected core systems. That may be true, and it matters. But the tighter lesson survives even if the blast radius was small. If a model is too dangerous to release, then “who can discover that it exists?”, “who can route to it?”, and “who can spend tokens against it?” are not three separate questions. They are one question, and the answer has to be enforced by authorization, not obscurity.