Close Menu
Technology Mag

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot
    OpenAI’s Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It

    OpenAI’s Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It

    November 19, 2025
    Netflix signs a three year deal to stream MLB live events and games

    Netflix signs a three year deal to stream MLB live events and games

    November 19, 2025
    This Is the Platform Google Claims Is Behind a ‘Staggering’ Scam Text Operation

    This Is the Platform Google Claims Is Behind a ‘Staggering’ Scam Text Operation

    November 19, 2025
    Facebook X (Twitter) Instagram
    Subscribe
    Technology Mag
    Facebook X (Twitter) Instagram YouTube
    • Home
    • News
    • Business
    • Games
    • Gear
    • Reviews
    • Science
    • Security
    • Trending
    • Press Release
    Technology Mag
    Home » AI Is a Black Box. Anthropic Figured Out a Way to Look Inside
    Business

    AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

    News RoomBy News RoomMay 23, 20243 Mins Read
    Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Email
    AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

    Last year, the team began experimenting with a tiny model that uses only a single layer of neurons. (Sophisticated LLMs have dozens of layers.) The hope was that in the simplest possible setting they could discover patterns that designate features. They ran countless experiments with no success. “We tried a whole bunch of stuff, and nothing was working. It looked like a bunch of random garbage,” says Tom Henighan, a member of Anthropic’s technical staff. Then a run dubbed “Johnny”—each experiment was assigned a random name—began associating neural patterns with concepts that appeared in its outputs.

    “Chris looked at it, and he was like, ‘Holy crap. This looks great,’” says Henighan, who was stunned as well. “I looked at it, and was like, ‘Oh, wow, wait, is this working?’”

    Suddenly the researchers could identify the features a group of neurons were encoding. They could peer into the black box. Henighan says he identified the first five features he looked at. One group of neurons signified Russian texts. Another was associated with mathematical functions in the Python computer language. And so on.

    Once they showed they could identify features in the tiny model, the researchers set about the hairier task of decoding a full-size LLM in the wild. They used Claude Sonnet, the medium-strength version of Anthropic’s three current models. That worked, too. One feature that stuck out to them was associated with the Golden Gate Bridge. They mapped out the set of neurons that, when fired together, indicated that Claude was “thinking” about the massive structure that links San Francisco to Marin County. What’s more, when similar sets of neurons fired, they evoked subjects that were Golden Gate Bridge-adjacent: Alcatraz, California governor Gavin Newsom, and the Hitchcock movie Vertigo, which was set in San Francisco. All told the team identified millions of features—a sort of Rosetta Stone to decode Claude’s neural net. Many of the features were safety-related, including “getting close to someone for some ulterior motive,” “discussion of biological warfare,” and “villainous plots to take over the world.”

    The Anthropic team then took the next step, to see if they could use that information to change Claude’s behavior. They began manipulating the neural net to augment or diminish certain concepts—a kind of AI brain surgery, with the potential to make LLMs safer and augment their power in selected areas. “Let’s say we have this board of features. We turn on the model, one of them lights up, and we see, ‘Oh, it’s thinking about the Golden Gate Bridge,’” says Shan Carter, an Anthropic scientist on the team. “So now, we’re thinking, what if we put a little dial on all these? And what if we turn that dial?”

    So far, the answer to that question seems to be that it’s very important to turn the dial the right amount. By suppressing those features, Anthropic says, the model can produce safer computer programs and reduce bias. For instance, the team found several features that represented dangerous practices, like unsafe computer code, scam emails, and instructions for making dangerous products.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email
    Previous ArticleYou can buy an LG OLED TV starting at just $736
    Next Article US Official Warns a Cell Network Flaw Is Being Exploited for Spying

    Related Posts

    OpenAI’s Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It

    OpenAI’s Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It

    November 19, 2025
    Meta Claims Downloaded Porn at Center of AI Lawsuit Was for ‘Personal Use’

    Meta Claims Downloaded Porn at Center of AI Lawsuit Was for ‘Personal Use’

    November 19, 2025
    You Won’t Be Able to Offload Your Holiday Shopping to AI Agents Anytime Soon

    You Won’t Be Able to Offload Your Holiday Shopping to AI Agents Anytime Soon

    November 19, 2025
    ‘Odd Lots’ Cohost Joe Weisenthal Has Predictions About How the AI Bubble Will Burst

    ‘Odd Lots’ Cohost Joe Weisenthal Has Predictions About How the AI Bubble Will Burst

    November 19, 2025
    Anthropic’s Claude Takes Control of a Robot Dog

    Anthropic’s Claude Takes Control of a Robot Dog

    November 19, 2025
    The AI Boom Is Fueling a Need for Speed in Chip Networking

    The AI Boom Is Fueling a Need for Speed in Chip Networking

    November 18, 2025
    Our Picks
    Netflix signs a three year deal to stream MLB live events and games

    Netflix signs a three year deal to stream MLB live events and games

    November 19, 2025
    This Is the Platform Google Claims Is Behind a ‘Staggering’ Scam Text Operation

    This Is the Platform Google Claims Is Behind a ‘Staggering’ Scam Text Operation

    November 19, 2025
    Google’s new Scholar Labs search uses AI to find relevant studies

    Google’s new Scholar Labs search uses AI to find relevant studies

    November 19, 2025
    For Black Friday, get the M4 MacBook Air that’s only 0 more than the slower M1 model

    For Black Friday, get the M4 MacBook Air that’s only $150 more than the slower M1 model

    November 19, 2025
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss
    Meta Claims Downloaded Porn at Center of AI Lawsuit Was for ‘Personal Use’ Business

    Meta Claims Downloaded Porn at Center of AI Lawsuit Was for ‘Personal Use’

    By News RoomNovember 19, 2025

    Further, that alleged activity can’t even reliably be linked to any Meta employee, Meta claims.Strike…

    You Won’t Be Able to Offload Your Holiday Shopping to AI Agents Anytime Soon

    You Won’t Be Able to Offload Your Holiday Shopping to AI Agents Anytime Soon

    November 19, 2025
    Future Google TV devices might come with a solar-powered remote

    Future Google TV devices might come with a solar-powered remote

    November 19, 2025
    Screw it, I’m installing Linux

    Screw it, I’m installing Linux

    November 19, 2025
    Facebook X (Twitter) Instagram Pinterest
    • Privacy Policy
    • Terms of use
    • Advertise
    • Contact
    © 2025 Technology Mag. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.