Officials from the Trump administration have informed WIRED that Anthropic, the artificial intelligence company, must guarantee that its forthcoming Fable 5 model cannot have its safety guardrails bypassed before it can be rereleased. Security researchers, however, argue that such a requirement may be technically impossible to fulfill.
Background of the Dispute
The demand centers on the concept of jailbreaks. In AI systems, a jailbreak refers to a method used to circumvent built-in safety restrictions, allowing the model to produce content it was otherwise designed to block. The White House has expressed concern that without absolute protection against these exploits, the model could be misused for harmful purposes including misinformation, hate speech, or other dangerous outputs.
Anthropic paused the release of Fable 5 after internal testing revealed vulnerabilities that could be exploited. The administration now insists that only a version with fully impermeable guardrails will be permitted to launch.
Expert Reactions and Technical Hurdles
Multiple independent cybersecurity and AI safety experts contacted by this publication stated that no current AI model has achieved complete immunity to jailbreaking. Modern large language models operate using probabilistic pattern matching, meaning that subtle alterations in input phrasing can sometimes trigger unintended responses.
Dr. Lina Chen, a specialist in adversarial machine learning at the University of California, explained that the nature of these systems makes absolute security elusive. She noted that as guardrails become more restrictive, malicious users adapt their techniques, creating a continuous arms race between safety measures and circumvention efforts.
Timothy Rojas, a former Pentagon AI policy advisor, added that the White House directive reflects a fundamental misunderstanding of current AI capabilities. He described the demand as unrealistic given the open nature of model weights and the decentralized research community that constantly probes for weaknesses.
Implications for Anthropic and the AI Industry
For Anthropic, the ultimatum creates a significant commercial and reputational dilemma. The company has positioned itself as a leader in responsible AI development, and failure to comply with federal requests could affect its standing with regulators and investors.
However, committing to an unattainable standard may expose the firm to legal liability if a future jailbreak leads to real world harm. Some insiders suggest Anthropic may seek a compromise, proposing enhanced monitoring, usage logging, or voluntary industry standards instead of an absolute ban on circumvention.
The broader AI sector is watching closely. If the administration enforces this requirement for Anthropic, similar conditions could be applied to other major developers including OpenAI, Google DeepMind, and Meta. That scenario would force the entire industry to confront the gap between policy expectations and technical reality.
What Comes Next
The White House has not yet set a public deadline for compliance but is expected to issue formal guidance within 30 days. Anthropic has confirmed ongoing discussions with federal agencies but declined to provide specifics. Security experts continue to advise that any workable solution will likely require a combination of technological safeguards, human oversight, and clear legal frameworks, rather than a promise of absolute invulnerability.