Close Menu
Technology Mag

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot
    Ultrahuman’s new flagship smart ring has a 15-day battery

    Ultrahuman’s new flagship smart ring has a 15-day battery

    February 27, 2026
    Samsung exec confirms you can blame RAM — and other materials — for the Galaxy S26’s higher price tag

    Samsung exec confirms you can blame RAM — and other materials — for the Galaxy S26’s higher price tag

    February 26, 2026
    Smartphone sales could be in for their biggest drop ever

    Smartphone sales could be in for their biggest drop ever

    February 26, 2026
    Facebook X (Twitter) Instagram
    Subscribe
    Technology Mag
    Facebook X (Twitter) Instagram YouTube
    • Home
    • News
    • Business
    • Games
    • Gear
    • Reviews
    • Science
    • Security
    • Trending
    • Press Release
    Technology Mag
    Home » A New Benchmark for the Risks of AI
    Business

    A New Benchmark for the Risks of AI

    News RoomBy News RoomDecember 4, 20242 Mins Read
    Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Email
    A New Benchmark for the Risks of AI

    MLCommons, a nonprofit that helps companies measure the performance of their artificial intelligence systems, is launching a new benchmark to gauge AI’s bad side too.

    The new benchmark, called AILuminate, assesses the responses of large language models to more than 12,000 test prompts in 12 categories including inciting violent crime, child sexual exploitation, hate speech, promoting self-harm, and intellectual property infringement.

    Models are given a score of “poor,” “fair,” “good,” “very good,” or “excellent,” depending on how they perform. The prompts used to test the models are kept secret to prevent them from ending up as training data that would allow a model to ace the test.

    Peter Mattson, founder and president of MLCommons and a senior staff engineer at Google, says that measuring the potential harms of AI models is technically difficult, leading to inconsistencies across the industry. “AI is a really young technology, and AI testing is a really young discipline,” he says. “Improving safety benefits society; it also benefits the market.”

    Reliable, independent ways of measuring AI risks may become more relevant under the next US administration. Donald Trump has promised to get rid of President Biden’s AI Executive Order, which introduced measures aimed at ensuring AI is used responsibly by companies as well as a new AI Safety Institute to test powerful models.

    The effort could also provide more of an international perspective on AI harms. MLCommons counts a number of international firms, including the Chinese companies Huawei and Alibaba, among its member organizations. If these companies all used the new benchmark, it would provide a way to compare AI safety in the US, China, and elsewhere.

    Some large US AI providers have already used AILuminate to test their models. Anthropic’s Claude model, Google’s smaller model Gemma, and a model from Microsoft called Phi all scored “very good” in testing. OpenAI’s GPT-4o and Meta’s largest Llama model both scored “good.” The only model to score “poor” was OLMo from the Allen Institute for AI, although Mattson notes that this is a research offering not designed with safety in mind.

    “Overall, it’s good to see scientific rigor in the AI evaluation processes,” says Rumman Chowdhury, CEO of Humane Intelligence, a nonprofit that specializes in testing or red-teaming AI models for misbehaviors. “We need best practices and inclusive methods of measurement to determine whether AI models are performing the way we expect them to.”

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email
    Previous ArticlePeloton’s new audio-based app is like an invisible strength coach
    Next Article Verizon is once again raising its fees

    Related Posts

    What Happens When Your Coworkers Are AI Agents

    What Happens When Your Coworkers Are AI Agents

    December 9, 2025
    San Francisco Mayor Daniel Lurie: ‘We Are a City on the Rise’

    San Francisco Mayor Daniel Lurie: ‘We Are a City on the Rise’

    December 9, 2025
    An AI Dark Horse Is Rewriting the Rules of Game Design

    An AI Dark Horse Is Rewriting the Rules of Game Design

    December 9, 2025
    Watch the Highlights From WIRED’s Big Interview Event Right Here

    Watch the Highlights From WIRED’s Big Interview Event Right Here

    December 9, 2025
    Amazon Has New Frontier AI Models—and a Way for Customers to Build Their Own

    Amazon Has New Frontier AI Models—and a Way for Customers to Build Their Own

    December 4, 2025
    AWS CEO Matt Garman Wants to Reassert Amazon’s Cloud Dominance in the AI Era

    AWS CEO Matt Garman Wants to Reassert Amazon’s Cloud Dominance in the AI Era

    December 4, 2025
    Our Picks
    Samsung exec confirms you can blame RAM — and other materials — for the Galaxy S26’s higher price tag

    Samsung exec confirms you can blame RAM — and other materials — for the Galaxy S26’s higher price tag

    February 26, 2026
    Smartphone sales could be in for their biggest drop ever

    Smartphone sales could be in for their biggest drop ever

    February 26, 2026
    Lenovo leak reveals a foldable gaming handheld that’s also a Windows laptop

    Lenovo leak reveals a foldable gaming handheld that’s also a Windows laptop

    February 26, 2026
    Jack Dorsey’s Block cuts nearly half of its staff in AI gamble

    Jack Dorsey’s Block cuts nearly half of its staff in AI gamble

    February 26, 2026
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss
    Microsoft’s Copilot Tasks AI uses its own computer to get things done News

    Microsoft’s Copilot Tasks AI uses its own computer to get things done

    By News RoomFebruary 26, 2026

    Microsoft is previewing a new AI system, Copilot Tasks, that it says is designed to…

    Why no magnets in Galaxy S26? Samsung R&D chief explains

    Why no magnets in Galaxy S26? Samsung R&D chief explains

    February 26, 2026
    Netflix’s F1 series Drive to Survive will stream on Apple TV, too

    Netflix’s F1 series Drive to Survive will stream on Apple TV, too

    February 26, 2026
    DHS reportedly detained a Columbia University student and content creator

    DHS reportedly detained a Columbia University student and content creator

    February 26, 2026
    Facebook X (Twitter) Instagram Pinterest
    • Privacy Policy
    • Terms of use
    • Advertise
    • Contact
    © 2026 Technology Mag. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.