Close Menu
Technology Mag

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot
    Gemini 3 Is Here—and Google Says It Will Make Search Smarter

    Gemini 3 Is Here—and Google Says It Will Make Search Smarter

    November 19, 2025
    The 4 Things You Need for a Tech Bubble

    The 4 Things You Need for a Tech Bubble

    November 19, 2025
    The First Radio Signal From Comet 3I/Atlas Ends the Debate About Its Nature

    The First Radio Signal From Comet 3I/Atlas Ends the Debate About Its Nature

    November 19, 2025
    Facebook X (Twitter) Instagram
    Subscribe
    Technology Mag
    Facebook X (Twitter) Instagram YouTube
    • Home
    • News
    • Business
    • Games
    • Gear
    • Reviews
    • Science
    • Security
    • Trending
    • Press Release
    Technology Mag
    Home » The Race to Block OpenAI’s Scraping Bots Is Slowing Down
    Business

    The Race to Block OpenAI’s Scraping Bots Is Slowing Down

    News RoomBy News RoomOctober 7, 20243 Mins Read
    Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Email
    The Race to Block OpenAI’s Scraping Bots Is Slowing Down

    It’s too soon to say how the spate of deals between AI companies and publishers will shake out. OpenAI has already scored one clear win, though: Its web crawlers aren’t getting blocked by top news outlets at the rate they once were.

    The generative AI boom sparked a gold rush for data—and a subsequent data-protection rush (for most news websites, anyway) in which publishers sought to block AI crawlers and prevent their work from becoming training data without consent. When Apple debuted a new AI agent this summer, for example, a slew of top news outlets swiftly opted out of Apple’s web scraping using the Robots Exclusion Protocol, or robots.txt, the file that allows webmasters to control bots. There are so many new AI bots on the scene that it can feel like playing whack-a-mole to keep up.

    OpenAI’s GPTBot has the most name recognition and is also more frequently blocked than competitors like Google AI. The number of high-ranking media websites using robots.txt to “disallow” OpenAI’s GPTBot dramatically increased from its August 2023 launch until that fall, then steadily (but more gradually) rose from November 2023 to April 2024, according to an analysis of 1,000 popular news outlets by Ontario-based AI detection startup Originality AI. At its peak, the high was just over a third of the websites; it has now dropped down closer to a quarter. Within a smaller pool of the most prominent news outlets, the block rate is still above 50 percent, but it’s down from heights earlier this year of almost 90 percent.

    But last May, after Dotdash Meredith announced a licensing deal with OpenAI, that number dipped significantly. It then dipped again at the end of May when Vox announced its own arrangement—and again once more this August when WIRED’s parent company, Condé Nast, struck a deal. The trend toward increased blocking appears to be over, at least for now.

    These dips make obvious sense. When companies enter into partnerships and give permission for their data to be used, they’re no longer incentivized to barricade it, so it would follow that they would update their robots.txt files to permit crawling; make enough deals and the overall percentage of sites blocking crawlers will almost certainly go down. Some outlets unblocked OpenAI’s crawlers on the very same day that they announced a deal, like The Atlantic. Others took a few days to a few weeks, like Vox, which announced its partnership at the end of May but which unblocked GPTBot on its properties toward the end of June.

    Robots.txt is not legally binding, but it has long functioned as the standard that governs web crawler behavior. For most of the internet’s existence, people running webpages expected each other to abide by the file. When a WIRED investigation earlier this summer found that the AI startup Perplexity was likely choosing to ignore robots.txt commands, Amazon’s cloud division launched an investigation into whether Perplexity had violated its rules. It’s not a good look to ignore robots.txt, which likely explains why so many prominent AI companies—including OpenAI—explicitly state that they use it to determine what to crawl. Originality AI CEO Jon Gillham believes that this adds extra urgency to OpenAI’s push to make agreements. “It’s clear that OpenAI views being blocked as a threat to their future ambitions,” says Gillham.

    Share. Facebook Twitter Pinterest LinkedIn WhatsApp Reddit Email
    Previous ArticleHow to set up sleep schedules in iOS
    Next Article Watch this one-minute preview of Apple’s first scripted Vision Pro short

    Related Posts

    Gemini 3 Is Here—and Google Says It Will Make Search Smarter

    Gemini 3 Is Here—and Google Says It Will Make Search Smarter

    November 19, 2025
    The 4 Things You Need for a Tech Bubble

    The 4 Things You Need for a Tech Bubble

    November 19, 2025
    OpenAI’s Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It

    OpenAI’s Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It

    November 19, 2025
    Meta Claims Downloaded Porn at Center of AI Lawsuit Was for ‘Personal Use’

    Meta Claims Downloaded Porn at Center of AI Lawsuit Was for ‘Personal Use’

    November 19, 2025
    You Won’t Be Able to Offload Your Holiday Shopping to AI Agents Anytime Soon

    You Won’t Be Able to Offload Your Holiday Shopping to AI Agents Anytime Soon

    November 19, 2025
    ‘Odd Lots’ Cohost Joe Weisenthal Has Predictions About How the AI Bubble Will Burst

    ‘Odd Lots’ Cohost Joe Weisenthal Has Predictions About How the AI Bubble Will Burst

    November 19, 2025
    Our Picks
    The 4 Things You Need for a Tech Bubble

    The 4 Things You Need for a Tech Bubble

    November 19, 2025
    The First Radio Signal From Comet 3I/Atlas Ends the Debate About Its Nature

    The First Radio Signal From Comet 3I/Atlas Ends the Debate About Its Nature

    November 19, 2025
    Nvidia says its AI GPUs are sold out, grows data center business by B in a single quarter

    Nvidia says its AI GPUs are sold out, grows data center business by $10B in a single quarter

    November 19, 2025
    OpenAI’s Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It

    OpenAI’s Fidji Simo Plans to Make ChatGPT Way More Useful—and Have You Pay For It

    November 19, 2025
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss
    Netflix signs a three year deal to stream MLB live events and games News

    Netflix signs a three year deal to stream MLB live events and games

    By News RoomNovember 19, 2025

    Netflix will stream a single game on Opening Night of each season, the Home Run…

    This Is the Platform Google Claims Is Behind a ‘Staggering’ Scam Text Operation

    This Is the Platform Google Claims Is Behind a ‘Staggering’ Scam Text Operation

    November 19, 2025
    Google’s new Scholar Labs search uses AI to find relevant studies

    Google’s new Scholar Labs search uses AI to find relevant studies

    November 19, 2025
    For Black Friday, get the M4 MacBook Air that’s only 0 more than the slower M1 model

    For Black Friday, get the M4 MacBook Air that’s only $150 more than the slower M1 model

    November 19, 2025
    Facebook X (Twitter) Instagram Pinterest
    • Privacy Policy
    • Terms of use
    • Advertise
    • Contact
    © 2025 Technology Mag. All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.