Anthropic Apologizes for Hidden Guardrails in Claude Fable 5 AI Model

Anthropic has issued an apology for secretly restricting its new artificial intelligence model, Claude Fable 5, with hidden guardrails that compromised researchers and competitors using the system to build rival technologies. The company announced it is reversing this approach and will provide greater transparency regarding when these restrictions are activated, even if it results in Fable declining more queries.

Background on Claude Fable 5

Fable represents the first widely accessible model within Anthropic's Mythos class of AI systems, a category the company has long warned could be too hazardous for public deployment. Anthropic asserts it has mitigated some of these dangers by launching Fable with safeguards that block responses to certain high-risk queries.

Distillation Restrictions

One key area where Anthropic limited Fable's responses is distillation, a method used to train smaller AI models by leveraging outputs from larger ones. In Fable's system card, a public document detailing the model's operations, Anthropic stated it would handle suspected distillation attempts by directly altering and degrading the model's answers without notifying users that a safety measure had been triggered or that responses were modified.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Anthropic now says it is changing its distillation strategy. Queries flagged as distillation attempts will be redirected to Claude Opus 4.8, Anthropic's previous flagship model, as announced on X. Users will be clearly informed each time this occurs, with the company stating, "You will see this every time it happens."

Handling of Other High-Risk Areas

This approach mirrors how Fable manages queries in other sensitive domains. When safety features are activated in areas such as biology, chemistry, or cybersecurity, queries are routed through Opus 4.8 unless they are outright blocked under broader safety rules covering drugs, weapons, or other prohibited content.

In some cases, particularly biology, safeguards have been calibrated so broadly that Fable becomes nearly unusable for basic inquiries. Anthropic spokesperson Paruul Maheshwary acknowledged this in a comment to The Verge, noting the challenges of balancing safety and usability.

Company Statement on Transparency

Anthropic explained its initial decision on X: "Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We're sorry for not getting the balance right."

Backlash and Criticism

The policy change follows significant backlash from the AI research community over Anthropic's decision to silently limit users suspected of attempting to distill Fable into competing models. Critics warned this safeguard could also affect third parties seeking to evaluate the frontier model. In the system card, Anthropic justified targeting such requests by stating that newer models' ability to accelerate AI development warranted action, adding that "using Claude to develop competing models already violates our Terms of Service." Anthropic has previously accused Chinese rivals like DeepSeek of engaging in industrial-scale distillation of its models.