Close Menu
Technology Mag

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot
    Sony temporarily suspends memory card sales due to shortages

    Sony temporarily suspends memory card sales due to shortages

    March 27, 2026
    Wait, the Trump phone might actually exist

    Wait, the Trump phone might actually exist

    March 27, 2026
    The White House has an app now, and Trump wants you to report people to ICE on it

    The White House has an app now, and Trump wants you to report people to ICE on it

    March 27, 2026
    Facebook X (Twitter) Instagram
    Subscribe
    Technology Mag
    Facebook X (Twitter) Instagram YouTube
    • Home
    • News
    • Business
    • Games
    • Gear
    • Reviews
    • Science
    • Security
    • Trending
    • Press Release
    Technology Mag
    Home » A New Trick Uses AI to Jailbreak AI Models—Including GPT-4
    Business

    A New Trick Uses AI to Jailbreak AI Models—Including GPT-4

    News RoomBy News RoomDecember 6, 20233 Mins Read
    Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Email
    A New Trick Uses AI to Jailbreak AI Models—Including GPT-4

    Large language models recently emerged as a powerful and transformative new kind of technology. Their potential became headline news as ordinary people were dazzled by the capabilities of OpenAI’s ChatGPT, released just a year ago.

    In the months that followed the release of ChatGPT, discovering new jailbreaking methods became a popular pastime for mischievous users, as well as those interested in the security and reliability of AI systems. But scores of startups are now building prototypes and fully fledged products on top of large language model APIs. OpenAI said at its first-ever developer conference in November that over 2 million developers are now using its APIs.

    These models simply predict the text that should follow a given input, but they are trained on vast quantities of text, from the web and other digital sources, using huge numbers of computer chips, over a period of many weeks or even months. With enough data and training, language models exhibit savant-like prediction skills, responding to an extraordinary range of input with coherent and pertinent-seeming information.

    The models also exhibit biases learned from their training data and tend to fabricate information when the answer to a prompt is less straightforward. Without safeguards, they can offer advice to people on how to do things like obtain drugs or make bombs. To keep the models in check, the companies behind them use the same method employed to make their responses more coherent and accurate-looking. This involves having humans grade the model’s answers and using that feedback to fine-tune the model so that it is less likely to misbehave.

    Robust Intelligence provided WIRED with several example jailbreaks that sidestep such safeguards. Not all of them worked on ChatGPT, the chatbot built on top of GPT-4, but several did, including one for generating phishing messages, and another for producing ideas to help a malicious actor remain hidden on a government computer network.

    A similar method was developed by a research group led by Eric Wong, an assistant professor at the University of Pennsylvania. The one from Robust Intelligence and his team involves additional refinements that let the system generate jailbreaks with half as many tries.

    Brendan Dolan-Gavitt, an associate professor at New York University who studies computer security and machine learning, says the new technique revealed by Robust Intelligence shows that human fine-tuning is not a watertight way to secure models against attack.

    Dolan-Gavitt says companies that are building systems on top of large language models like GPT-4 should employ additional safeguards. “We need to make sure that we design systems that use LLMs so that jailbreaks don’t allow malicious users to get access to things they shouldn’t,” he says.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email
    Previous ArticleThe Best Password Managers to Secure Your Digital Life
    Next Article Facebook and Instagram accused of creating a “marketplace” for child predators in new lawsuit

    Related Posts

    What Happens When Your Coworkers Are AI Agents

    What Happens When Your Coworkers Are AI Agents

    December 9, 2025
    San Francisco Mayor Daniel Lurie: ‘We Are a City on the Rise’

    San Francisco Mayor Daniel Lurie: ‘We Are a City on the Rise’

    December 9, 2025
    An AI Dark Horse Is Rewriting the Rules of Game Design

    An AI Dark Horse Is Rewriting the Rules of Game Design

    December 9, 2025
    Watch the Highlights From WIRED’s Big Interview Event Right Here

    Watch the Highlights From WIRED’s Big Interview Event Right Here

    December 9, 2025
    Amazon Has New Frontier AI Models—and a Way for Customers to Build Their Own

    Amazon Has New Frontier AI Models—and a Way for Customers to Build Their Own

    December 4, 2025
    AWS CEO Matt Garman Wants to Reassert Amazon’s Cloud Dominance in the AI Era

    AWS CEO Matt Garman Wants to Reassert Amazon’s Cloud Dominance in the AI Era

    December 4, 2025
    Our Picks
    Wait, the Trump phone might actually exist

    Wait, the Trump phone might actually exist

    March 27, 2026
    The White House has an app now, and Trump wants you to report people to ICE on it

    The White House has an app now, and Trump wants you to report people to ICE on it

    March 27, 2026
    Returning from a humanitarian aid trip to Cuba, Americans have phones seized at US airport

    Returning from a humanitarian aid trip to Cuba, Americans have phones seized at US airport

    March 27, 2026
    Nuki’s one-touch retrofit smart lock got its first-ever discount

    Nuki’s one-touch retrofit smart lock got its first-ever discount

    March 27, 2026
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss
    This modular crafting machine can create custom shirts, phone cases, and molds News

    This modular crafting machine can create custom shirts, phone cases, and molds

    By News RoomMarch 27, 2026

    xTool, a company best known amongst makers for its laser-based cutting and engraving tools, has…

    Motorola’s Razr Ultra 2026 might be a hair thicker than last year’s version

    Motorola’s Razr Ultra 2026 might be a hair thicker than last year’s version

    March 27, 2026
    Meta’s court losses could be just the beginning

    Meta’s court losses could be just the beginning

    March 27, 2026
    Sony is raising PS5 prices by 0 in April

    Sony is raising PS5 prices by $100 in April

    March 27, 2026
    Facebook X (Twitter) Instagram Pinterest
    • Privacy Policy
    • Terms of use
    • Advertise
    • Contact
    © 2026 Technology Mag. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.