Close Menu
Technology Mag

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot
    OpenAI is planning a desktop ‘superapp’

    OpenAI is planning a desktop ‘superapp’

    March 19, 2026
    Meta is actually keeping its VR metaverse running, for now

    Meta is actually keeping its VR metaverse running, for now

    March 19, 2026
    Google reveals its solution for true Android sideloading: a mandatory waiting period

    Google reveals its solution for true Android sideloading: a mandatory waiting period

    March 19, 2026
    Facebook X (Twitter) Instagram
    Subscribe
    Technology Mag
    Facebook X (Twitter) Instagram YouTube
    • Home
    • News
    • Business
    • Games
    • Gear
    • Reviews
    • Science
    • Security
    • Trending
    • Press Release
    Technology Mag
    Home » Open-source AI must reveal its training data, per new OSI definition
    News

    Open-source AI must reveal its training data, per new OSI definition

    News RoomBy News RoomOctober 28, 20244 Mins Read
    Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Email
    Open-source AI must reveal its training data, per new OSI definition

    The Open Source Initiative (OSI) has released its official definition of “open” artificial intelligence, setting the stage for a clash with tech giants like Meta — whose models don’t fit the rules.

    OSI has long set the industry standard for what constitutes open-source software, but AI systems include elements that aren’t covered by conventional licenses, like model training data. Now, for an AI system to be considered truly open source, it must provide:

    • Access to details about the data used to train the AI so others can understand and re-create it
    • The complete code used to build and run the AI
    • The settings and weights from the training, which help the AI produce its results

    This definition directly challenges Meta’s Llama, widely promoted as the largest open-source AI model. Llama is publicly available for download and use, but it has restrictions on commercial use (for applications with over 700 million users) and does not provide access to training data, causing it to fall short of OSI’s standards for unrestricted freedom to use, modify, and share.

    Meta spokesperson Faith Eischen told The Verge that while “we agree with our partner OSI on many things,” the company disagrees with this definition. “There is no single open source AI definition, and defining it is a challenge because previous open source definitions do not encompass the complexities of today’s rapidly advancing AI models.”

    “We will continue working with OSI and other industry groups to make AI more accessible and free responsibly, regardless of technical definitions,” Eischen added.

    For 25 years, OSI’s definition of open-source software has been widely accepted by developers who want to build on each other’s work without fear of lawsuits or licensing traps. Now, as AI reshapes the landscape, tech giants face a pivotal choice: embrace these established principles or reject them. The Linux Foundation has also made a recent attempt to define “open-source AI,” signaling a growing debate over how traditional open-source values will adapt to the AI era.

    “Now that we have a robust definition in place maybe we can push back more aggressively against companies who are ‘open washing’ and declaring their work open source when it actually isn’t,” Simon Willison, an independent researcher and creator of the open-source multi-tool Datasette, told The Verge.

    Hugging Face CEO Clément Delangue called OSI’s definition “a huge help in shaping the conversation around openness in AI, especially when it comes to the crucial role of training data.”

    OSI’s executive director Stefano Maffulli says it took the initiative two years, consulting experts globally, to refine this definition through a collaborative process. This involved working with experts from academia on machine learning and natural language processing, philosophers, content creators from the Creative Commons world, and more.

    While Meta cites safety concerns for restricting access to its training data, critics see a simpler motive: minimizing its legal liability and safeguarding its competitive advantage. Many AI models are almost certainly trained on copyrighted material; in April, The New York Times reported that Meta internally acknowledged there was copyrighted content in its training data “because we have no way of not collecting that.” There’s a litany of lawsuits against Meta, OpenAI, Perplexity, Anthropic, and others for alleged infringement. But with rare exceptions — like Stable Diffusion, which reveals its training data — plaintiffs must currently rely on circumstantial evidence to demonstrate that their work has been scraped.

    Meanwhile, Maffulli sees open-source history repeating itself. “Meta is making the same arguments” as Microsoft did in the 1990s when it saw open source as a threat to its business model, Maffulli told The Verge. He recalls Meta telling him about its intensive investment in Llama, asking him “who do you think is going to be able to do the same thing?” Maffulli saw a familiar pattern: a tech giant using cost and complexity to justify keeping its technology locked away. “We come back to the early days,” he said.

    “That’s their secret sauce,” Maffulli said of the training data. “It’s the valuable IP.”

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email
    Previous ArticleMeta is reportedly working on its own AI-powered search engine, too
    Next Article Universal Music partners with AI company building an ‘ethical’ music generator

    Related Posts

    OpenAI is planning a desktop ‘superapp’

    OpenAI is planning a desktop ‘superapp’

    March 19, 2026
    Meta is actually keeping its VR metaverse running, for now

    Meta is actually keeping its VR metaverse running, for now

    March 19, 2026
    Google reveals its solution for true Android sideloading: a mandatory waiting period

    Google reveals its solution for true Android sideloading: a mandatory waiting period

    March 19, 2026
    Sony’s WF-1000XM6 wireless earbuds are on sale for the first time

    Sony’s WF-1000XM6 wireless earbuds are on sale for the first time

    March 19, 2026
    Marc Andreessen is a philosophical zombie

    Marc Andreessen is a philosophical zombie

    March 19, 2026
    Hermès doesn’t include a power adapter with its ,150 charging case

    Hermès doesn’t include a power adapter with its $5,150 charging case

    March 19, 2026
    Our Picks
    Meta is actually keeping its VR metaverse running, for now

    Meta is actually keeping its VR metaverse running, for now

    March 19, 2026
    Google reveals its solution for true Android sideloading: a mandatory waiting period

    Google reveals its solution for true Android sideloading: a mandatory waiting period

    March 19, 2026
    Sony’s WF-1000XM6 wireless earbuds are on sale for the first time

    Sony’s WF-1000XM6 wireless earbuds are on sale for the first time

    March 19, 2026
    Marc Andreessen is a philosophical zombie

    Marc Andreessen is a philosophical zombie

    March 19, 2026
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss
    Hermès doesn’t include a power adapter with its ,150 charging case News

    Hermès doesn’t include a power adapter with its $5,150 charging case

    By News RoomMarch 19, 2026

    Over a decade after Apple first partnered with Hermès for a special Apple Watch collaboration,…

    A rogue AI led to a serious security incident at Meta

    A rogue AI led to a serious security incident at Meta

    March 19, 2026
    Prediction markets are trying to lure journalists with partnership deals

    Prediction markets are trying to lure journalists with partnership deals

    March 19, 2026
    Lina Khan was right

    Lina Khan was right

    March 19, 2026
    Facebook X (Twitter) Instagram Pinterest
    • Privacy Policy
    • Terms of use
    • Advertise
    • Contact
    © 2026 Technology Mag. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.