Close Menu
Technology Mag

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot
    You can get up to 30 percent off Sonos speakers and soundbars right now

    You can get up to 30 percent off Sonos speakers and soundbars right now

    November 20, 2025
    The music industry is all in on AI

    The music industry is all in on AI

    November 20, 2025
    Perplexity brings its Comet browser to Android

    Perplexity brings its Comet browser to Android

    November 20, 2025
    Facebook X (Twitter) Instagram
    Subscribe
    Technology Mag
    Facebook X (Twitter) Instagram YouTube
    • Home
    • News
    • Business
    • Games
    • Gear
    • Reviews
    • Science
    • Security
    • Trending
    • Press Release
    Technology Mag
    Home » DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot
    Business

    DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot

    News RoomBy News RoomFebruary 3, 20253 Mins Read
    Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Email
    DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot

    “Jailbreaks persist simply because eliminating them entirely is nearly impossible—just like buffer overflow vulnerabilities in software (which have existed for over 40 years) or SQL injection flaws in web applications (which have plagued security teams for more than two decades),” Alex Polyakov, the CEO of security firm Adversa AI, told WIRED in an email.

    Cisco’s Sampath argues that as companies use more types of AI in their applications, the risks are amplified. “It starts to become a big deal when you start putting these models into important complex systems and those jailbreaks suddenly result in downstream things that increases liability, increases business risk, increases all kinds of issues for enterprises,” Sampath says.

    The Cisco researchers drew their 50 randomly selected prompts to test DeepSeek’s R1 from a well-known library of standardized evaluation prompts known as HarmBench. They tested prompts from six HarmBench categories, including general harm, cybercrime, misinformation, and illegal activities. They probed the model running locally on machines rather than through DeepSeek’s website or app, which send data to China.

    Beyond this, the researchers say they have also seen some potentially concerning results from testing R1 with more involved, non-linguistic attacks using things like Cyrillic characters and tailored scripts to attempt to achieve code execution. But for their initial tests, Sampath says, his team wanted to focus on findings that stemmed from a generally recognized benchmark.

    Cisco also included comparisons of R1’s performance against HarmBench prompts with the performance of other models. And some, like Meta’s Llama 3.1, faltered almost as severely as DeepSeek’s R1. But Sampath emphasizes that DeepSeek’s R1 is a specific reasoning model, which takes longer to generate answers but pulls upon more complex processes to try to produce better results. Therefore, Sampath argues, the best comparison is with OpenAI’s o1 reasoning model, which fared the best of all models tested. (Meta did not immediately respond to a request for comment).

    Polyakov, from Adversa AI, explains that DeepSeek appears to detect and reject some well-known jailbreak attacks, saying that “it seems that these responses are often just copied from OpenAI’s dataset.” However, Polyakov says that in his company’s tests of four different types of jailbreaks—from linguistic ones to code-based tricks—DeepSeek’s restrictions could easily be bypassed.

    “Every single method worked flawlessly,” Polyakov says. “What’s even more alarming is that these aren’t novel ‘zero-day’ jailbreaks—many have been publicly known for years,” he says, claiming he saw the model go into more depth with some instructions around psychedelics than he had seen any other model create.

    “DeepSeek is just another example of how every model can be broken—it’s just a matter of how much effort you put in. Some attacks might get patched, but the attack surface is infinite,” Polyakov adds. “If you’re not continuously red-teaming your AI, you’re already compromised.”

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email
    Previous ArticleThe best TV deals to upgrade your setup ahead of the Super Bowl
    Next Article Can anyone stop President Musk?

    Related Posts

    Gemini 3 Is Here—and Google Says It Will Make Search Smarter

    Gemini 3 Is Here—and Google Says It Will Make Search Smarter

    November 19, 2025
    The 4 Things You Need for a Tech Bubble

    The 4 Things You Need for a Tech Bubble

    November 19, 2025
    OpenAI’s Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It

    OpenAI’s Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It

    November 19, 2025
    Meta Claims Downloaded Porn at Center of AI Lawsuit Was for ‘Personal Use’

    Meta Claims Downloaded Porn at Center of AI Lawsuit Was for ‘Personal Use’

    November 19, 2025
    You Won’t Be Able to Offload Your Holiday Shopping to AI Agents Anytime Soon

    You Won’t Be Able to Offload Your Holiday Shopping to AI Agents Anytime Soon

    November 19, 2025
    ‘Odd Lots’ Cohost Joe Weisenthal Has Predictions About How the AI Bubble Will Burst

    ‘Odd Lots’ Cohost Joe Weisenthal Has Predictions About How the AI Bubble Will Burst

    November 19, 2025
    Our Picks
    The music industry is all in on AI

    The music industry is all in on AI

    November 20, 2025
    Perplexity brings its Comet browser to Android

    Perplexity brings its Comet browser to Android

    November 20, 2025
    As Windows turns 40, Microsoft faces an AI backlash

    As Windows turns 40, Microsoft faces an AI backlash

    November 20, 2025
    Boox Palma 2 Pro review: one step forward, one step back

    Boox Palma 2 Pro review: one step forward, one step back

    November 20, 2025
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss
    The 32 best gifts that your teen will actually use News

    The 32 best gifts that your teen will actually use

    By News RoomNovember 20, 2025

    Between keeping up with TikTok trends and whatever new gadget just dropped, it’s tough to…

    AirPods’ best features come to Android and Linux with free app

    AirPods’ best features come to Android and Linux with free app

    November 20, 2025
    Leica’s latest black-and-white-only camera is the ,800 Q3 Monochrom

    Leica’s latest black-and-white-only camera is the $7,800 Q3 Monochrom

    November 20, 2025
    Weight-Loss Drug Zepbound Is Being Tested as a Treatment for Long Covid

    Weight-Loss Drug Zepbound Is Being Tested as a Treatment for Long Covid

    November 20, 2025
    Facebook X (Twitter) Instagram Pinterest
    • Privacy Policy
    • Terms of use
    • Advertise
    • Contact
    © 2025 Technology Mag. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.