Close Menu
Technology Mag

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot
    BEWARE SOFTWARE BRAIN

    BEWARE SOFTWARE BRAIN

    April 23, 2026
    Tim Cook’s departure is the start of a new era at Apple

    Tim Cook’s departure is the start of a new era at Apple

    April 23, 2026
    Govee’s new colorful outdoor lights are its first with solar power

    Govee’s new colorful outdoor lights are its first with solar power

    April 23, 2026
    Facebook X (Twitter) Instagram
    Subscribe
    Technology Mag
    Facebook X (Twitter) Instagram YouTube
    • Home
    • News
    • Business
    • Games
    • Gear
    • Reviews
    • Science
    • Security
    • Trending
    • Press Release
    Technology Mag
    Home » Open-source AI must reveal its training data, per new OSI definition
    News

    Open-source AI must reveal its training data, per new OSI definition

    News RoomBy News RoomOctober 28, 20244 Mins Read
    Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Email
    Open-source AI must reveal its training data, per new OSI definition

    The Open Source Initiative (OSI) has released its official definition of “open” artificial intelligence, setting the stage for a clash with tech giants like Meta — whose models don’t fit the rules.

    OSI has long set the industry standard for what constitutes open-source software, but AI systems include elements that aren’t covered by conventional licenses, like model training data. Now, for an AI system to be considered truly open source, it must provide:

    • Access to details about the data used to train the AI so others can understand and re-create it
    • The complete code used to build and run the AI
    • The settings and weights from the training, which help the AI produce its results

    This definition directly challenges Meta’s Llama, widely promoted as the largest open-source AI model. Llama is publicly available for download and use, but it has restrictions on commercial use (for applications with over 700 million users) and does not provide access to training data, causing it to fall short of OSI’s standards for unrestricted freedom to use, modify, and share.

    Meta spokesperson Faith Eischen told The Verge that while “we agree with our partner OSI on many things,” the company disagrees with this definition. “There is no single open source AI definition, and defining it is a challenge because previous open source definitions do not encompass the complexities of today’s rapidly advancing AI models.”

    “We will continue working with OSI and other industry groups to make AI more accessible and free responsibly, regardless of technical definitions,” Eischen added.

    For 25 years, OSI’s definition of open-source software has been widely accepted by developers who want to build on each other’s work without fear of lawsuits or licensing traps. Now, as AI reshapes the landscape, tech giants face a pivotal choice: embrace these established principles or reject them. The Linux Foundation has also made a recent attempt to define “open-source AI,” signaling a growing debate over how traditional open-source values will adapt to the AI era.

    “Now that we have a robust definition in place maybe we can push back more aggressively against companies who are ‘open washing’ and declaring their work open source when it actually isn’t,” Simon Willison, an independent researcher and creator of the open-source multi-tool Datasette, told The Verge.

    Hugging Face CEO Clément Delangue called OSI’s definition “a huge help in shaping the conversation around openness in AI, especially when it comes to the crucial role of training data.”

    OSI’s executive director Stefano Maffulli says it took the initiative two years, consulting experts globally, to refine this definition through a collaborative process. This involved working with experts from academia on machine learning and natural language processing, philosophers, content creators from the Creative Commons world, and more.

    While Meta cites safety concerns for restricting access to its training data, critics see a simpler motive: minimizing its legal liability and safeguarding its competitive advantage. Many AI models are almost certainly trained on copyrighted material; in April, The New York Times reported that Meta internally acknowledged there was copyrighted content in its training data “because we have no way of not collecting that.” There’s a litany of lawsuits against Meta, OpenAI, Perplexity, Anthropic, and others for alleged infringement. But with rare exceptions — like Stable Diffusion, which reveals its training data — plaintiffs must currently rely on circumstantial evidence to demonstrate that their work has been scraped.

    Meanwhile, Maffulli sees open-source history repeating itself. “Meta is making the same arguments” as Microsoft did in the 1990s when it saw open source as a threat to its business model, Maffulli told The Verge. He recalls Meta telling him about its intensive investment in Llama, asking him “who do you think is going to be able to do the same thing?” Maffulli saw a familiar pattern: a tech giant using cost and complexity to justify keeping its technology locked away. “We come back to the early days,” he said.

    “That’s their secret sauce,” Maffulli said of the training data. “It’s the valuable IP.”

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email
    Previous ArticleMeta is reportedly working on its own AI-powered search engine, too
    Next Article Universal Music partners with AI company building an ‘ethical’ music generator

    Related Posts

    BEWARE SOFTWARE BRAIN

    BEWARE SOFTWARE BRAIN

    April 23, 2026
    Tim Cook’s departure is the start of a new era at Apple

    Tim Cook’s departure is the start of a new era at Apple

    April 23, 2026
    Govee’s new colorful outdoor lights are its first with solar power

    Govee’s new colorful outdoor lights are its first with solar power

    April 23, 2026
    Honor’s new phones look like iPhones for Android

    Honor’s new phones look like iPhones for Android

    April 23, 2026
    Microsoft says the ‘idea’ of an Xbox mobile store ‘is not dead’

    Microsoft says the ‘idea’ of an Xbox mobile store ‘is not dead’

    April 22, 2026
    Google Meet will take AI notes for in-person meetings too

    Google Meet will take AI notes for in-person meetings too

    April 22, 2026
    Our Picks
    Tim Cook’s departure is the start of a new era at Apple

    Tim Cook’s departure is the start of a new era at Apple

    April 23, 2026
    Govee’s new colorful outdoor lights are its first with solar power

    Govee’s new colorful outdoor lights are its first with solar power

    April 23, 2026
    Honor’s new phones look like iPhones for Android

    Honor’s new phones look like iPhones for Android

    April 23, 2026
    Microsoft says the ‘idea’ of an Xbox mobile store ‘is not dead’

    Microsoft says the ‘idea’ of an Xbox mobile store ‘is not dead’

    April 22, 2026
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss
    Google Meet will take AI notes for in-person meetings too News

    Google Meet will take AI notes for in-person meetings too

    By News RoomApril 22, 2026

    Google’s AI meeting notetaker is no longer limited to Google Meets — Gemini can also…

    Microsoft teases mysterious Discord and Xbox Game Pass partnership

    Microsoft teases mysterious Discord and Xbox Game Pass partnership

    April 22, 2026
    Elon Musk admits that millions of Tesla vehicles won’t get unsupervised FSD

    Elon Musk admits that millions of Tesla vehicles won’t get unsupervised FSD

    April 22, 2026
    Tesla’s revenue rises again as it prepares for more AI and robotics

    Tesla’s revenue rises again as it prepares for more AI and robotics

    April 22, 2026
    Facebook X (Twitter) Instagram Pinterest
    • Privacy Policy
    • Terms of use
    • Advertise
    • Contact
    © 2026 Technology Mag. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.