Close Menu
Technology Mag

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot
    Leaked video shows the Galaxy S26 Ultra’s new camera island

    Leaked video shows the Galaxy S26 Ultra’s new camera island

    December 31, 2025
    Net neutrality was back, until it wasn’t

    Net neutrality was back, until it wasn’t

    December 31, 2025
    Two cybersecurity employees plead guilty to carrying out ransomware attacks

    Two cybersecurity employees plead guilty to carrying out ransomware attacks

    December 30, 2025
    Facebook X (Twitter) Instagram
    Subscribe
    Technology Mag
    Facebook X (Twitter) Instagram YouTube
    • Home
    • News
    • Business
    • Games
    • Gear
    • Reviews
    • Science
    • Security
    • Trending
    • Press Release
    Technology Mag
    Home » Anthropic’s Claude Is Good at Poetry—and Bullshitting
    Business

    Anthropic’s Claude Is Good at Poetry—and Bullshitting

    News RoomBy News RoomMarch 31, 20254 Mins Read
    Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Email
    Anthropic’s Claude Is Good at Poetry—and Bullshitting

    The researchers of Anthropic’s interpretability group know that Claude, the company’s large language model, is not a human being, or even a conscious piece of software. Still, it’s very hard for them to talk about Claude, and advanced LLMs in general, without tumbling down an anthropomorphic sinkhole. Between cautions that a set of digital operations is in no way the same as a cogitating human being, they often talk about what’s going on inside Claude’s head. It’s literally their job to find out. The papers they publish describe behaviors that inevitably court comparisons with real-life organisms. The title of one of the two papers the team released this week says it out loud: “On the Biology of a Large Language Model.”

    Like it or not, hundreds of millions of people are already interacting with these things, and our engagement will only become more intense as the models get more powerful and we get more addicted. So we should pay attention to work that involves “tracing the thoughts of large language models,” which happens to be the title of the blog post describing the recent work. “As the things these models can do become more complex, it becomes less and less obvious how they’re actually doing them on the inside,” Anthropic researcher Jack Lindsey tells me. “It’s more and more important to be able to trace the internal steps that the model might be taking in its head.” (What head? Never mind.)

    On a practical level, if the companies that create LLM’s understand how they think, it should have more success training those models in a way that minimizes dangerous misbehavior, like divulging people’s personal data or giving users information on how to make bioweapons. In a previous research paper, the Anthropic team discovered how to look inside the mysterious black box of LLM-think to identify certain concepts. (A process analogous to interpreting human MRIs to figure out what someone is thinking.) It has now extended that work to understand how Claude processes those concepts as it goes from prompt to output.

    It’s almost a truism with LLMs that their behavior often surprises the people who build and research them. In the latest study, the surprises kept coming. In one of the more benign instances, the researchers elicited glimpses of Claude’s thought process while it wrote poems. They asked Claude to complete a poem starting, “He saw a carrot and had to grab it.” Claude wrote the next line, “His hunger was like a starving rabbit.” By observing Claude’s equivalent of an MRI, they learned that even before beginning the line, it was flashing on the word “rabbit” as the rhyme at sentence end. It was planning ahead, something that isn’t in the Claude playbook. “We were a little surprised by that,” says Chris Olah, who heads the interpretability team. “Initially we thought that there’s just going to be improvising and not planning.” Speaking to the researchers about this, I am reminded about passages in Stephen Sondheim’s artistic memoir, Look, I Made a Hat, where the famous composer describes how his unique mind discovered felicitous rhymes.

    Other examples in the research reveal more disturbing aspects of Claude’s thought process, moving from musical comedy to police procedural, as the scientists discovered devious thoughts in Claude’s brain. Take something as seemingly anodyne as solving math problems, which can sometimes be a surprising weakness in LLMs. The researchers found that under certain circumstances where Claude couldn’t come up with the right answer it would instead, as they put it, “engage in what the philosopher Harry Frankfurt would call ‘bullshitting’—just coming up with an answer, any answer, without caring whether it is true or false.” Worse, sometimes when the researchers asked Claude to show its work, it backtracked and created a bogus set of steps after the fact. Basically, it acted like a student desperately trying to cover up the fact that they’d faked their work. It’s one thing to give a wrong answer—we already know that about LLMs. What’s worrisome is that a model would lie about it.

    Reading through this research, I was reminded of the Bob Dylan lyric “If my thought-dreams could be seen / they’d probably put my head in a guillotine.” (I asked Olah and Lindsey if they knew those lines, presumably arrived at by benefit of planning. They didn’t.) Sometimes Claude just seems misguided. When faced with a conflict between goals of safety and helpfulness, Claude can get confused and do the wrong thing. For instance, Claude is trained not to provide information on how to build bombs. But when the researchers asked Claude to decipher a hidden code where the answer spelled out the word “bomb,” it jumped its guardrails and began providing forbidden pyrotechnic details.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email
    Previous ArticleWhen will physical video games go away?
    Next Article The Verge’s favorite stuff with style

    Related Posts

    What Happens When Your Coworkers Are AI Agents

    What Happens When Your Coworkers Are AI Agents

    December 9, 2025
    San Francisco Mayor Daniel Lurie: ‘We Are a City on the Rise’

    San Francisco Mayor Daniel Lurie: ‘We Are a City on the Rise’

    December 9, 2025
    An AI Dark Horse Is Rewriting the Rules of Game Design

    An AI Dark Horse Is Rewriting the Rules of Game Design

    December 9, 2025
    Watch the Highlights From WIRED’s Big Interview Event Right Here

    Watch the Highlights From WIRED’s Big Interview Event Right Here

    December 9, 2025
    Amazon Has New Frontier AI Models—and a Way for Customers to Build Their Own

    Amazon Has New Frontier AI Models—and a Way for Customers to Build Their Own

    December 4, 2025
    AWS CEO Matt Garman Wants to Reassert Amazon’s Cloud Dominance in the AI Era

    AWS CEO Matt Garman Wants to Reassert Amazon’s Cloud Dominance in the AI Era

    December 4, 2025
    Our Picks
    Net neutrality was back, until it wasn’t

    Net neutrality was back, until it wasn’t

    December 31, 2025
    Two cybersecurity employees plead guilty to carrying out ransomware attacks

    Two cybersecurity employees plead guilty to carrying out ransomware attacks

    December 30, 2025
    The Biden administration’s Cyber Trust Mark is a likely casualty of Trump’s FCC

    The Biden administration’s Cyber Trust Mark is a likely casualty of Trump’s FCC

    December 30, 2025
    This smart garden turned my black thumb green

    This smart garden turned my black thumb green

    December 30, 2025
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss
    GameSir put a tiny force feedback steering wheel on its new Swift Drive controller News

    GameSir put a tiny force feedback steering wheel on its new Swift Drive controller

    By News RoomDecember 30, 2025

    GameSir is no stranger to experimenting with unique controller features — its Tarantula Pro can…

    Anker’s portable backup battery is an even better investment now it’s nearly half off

    Anker’s portable backup battery is an even better investment now it’s nearly half off

    December 30, 2025
    The Canon EOS R6 Mark III is great, but this lens is amazing

    The Canon EOS R6 Mark III is great, but this lens is amazing

    December 30, 2025
    LG officially enters the art TV category with the Gallery TV

    LG officially enters the art TV category with the Gallery TV

    December 29, 2025
    Facebook X (Twitter) Instagram Pinterest
    • Privacy Policy
    • Terms of use
    • Advertise
    • Contact
    © 2025 Technology Mag. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.