AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica

Started by rcjordan, April 04, 2025, 10:12:28 PM

Previous topic - Next topic

rcjordan


https://arstechnica.com/information-technology/2025/04/ai-bots-strain-wikimedia-as-bandwidth-surges-50/

I've noticed more sites using Cloudflare to "check to see if you're human" and also an uptick in "Click all the squares with" Captcha (ugh).

Brad

I've been encountering the Cloudflare thingy more.

I just looked, WordPress has several bot blocking plugins but many just edit the robots.txt file which isn't enough.

Like anti-spam there is a business opportunity here for anti-AIbots.

ergophobe

My experience is that proof-of-work solutions work best. For something like OpenAI with their massive crawl, that chews up resources.

The new Cloudflare approach is cool. It combines Honeypot concepts (send them down the wrong path) with Proof of Work (send them down lots of wrong paths). It must eat a ton of CF resources too though. Most Proof of Work succeeds by requiring *client* side work, so it pushes the work from the server to the client.

[edit: my experience with bots, not specifically experience with AI bots, of which I have none]

rcjordan

So, what happens when the bulk of users start using AI search?  I noticed 1min.ai plow through a bunch of sites (in about 3 seconds) when I was doing medical research.  It seems to me many/most sites will see bandwidth skyrocket -or- they'll exclude their site from user-directed AI searches.  ....Not that it matters much to the site, as I do not foresee much likelihood of user visits due to their search.

ergophobe

>>  I noticed 1min.ai plow through a bunch of sites

I haven't use that feature. Are you feeding it the URLs or is it "researching" and going to the sites?

I think for most chatbot queries, though, it is pulling from training data, right? It's rarely actually visiting the sites during a search. Isn't it more when you feed it a URL and ask it to summarize a page or feed it a set of URLs and ask it to summarize?

rcjordan

Onsite instructions are vague, here's what I did;

Multi AI Chat > Settings Icon > Mix AI Models 'on' > Web Search 'on' > Input # of sites > Input words per site > Input Prompts

>is it "researching" and going to the sites?

Yes, once I entered the prompts and clicked 'Send', a list of urls scrolled very quickly off the top of the page.  I may have prompted the bots with a general listing of authoritative med sites (ex: PubMed) but I don't really recall doing that. I know I did not specify any urls.

In the above case, I sicced 3 bots on the query at once and then had them jointly summarize.

>chatbot queries, though, it is pulling from training data, right?

I'm guessing that general public use is currently largely dominated by that kind of query, but I don't think it will stay as dominant in the long term as people become more familiar with AI search.

I'd bet even money, though, that researchers, commercial users, and even SEOers are feeding it urls, domains, & categories.

ergophobe

>> feeding it the URLs or is it "researching" and going to the sites?

https://openai.com/index/introducing-operator/

A research preview of an agent that can use its own browser to perform tasks for you. Available to Pro users in the U.S.

rcjordan

#webdev #hosting

AI crawlers and fetchers are blowing up websites, with Meta and OpenAI the worst offenders

One fetcher bot seen smacking a website with 39,000 requests per minute

https://www.theregister.com/2025/08/21/ai_crawler_traffic/


Travoli

>more sites using Cloudflare
>AI crawlers and fetchers are blowing up websites

This finally happened here this week. Took the sites offline several times. We're looking into the free Cloudflare system now.

rcjordan

>looking at

I think I have posted this elsewhere here, but you can see one version of cloudflare's "checking to see if you're a human" at by visiting yeggi.com

It's a minor pita

ergophobe

Quote from: rcjordan on August 22, 2025, 12:08:29 PMIt's a minor pita

I think that might be because you are running blockers and other privacy protection. For most users, CF is transparent and a quick proof of work gets you through. Once you start doing things to block tracking, though, they system takes human action.

I'm not 100% sure on the AI thing, but that is how it has always worked in general

In my case, running just Privacy Badger now, I did not get a challenge for yeggi.com

rcjordan

>running blockers and other privacy protection

I get the challenge in FF Incognito with only ublock & tampermonkey running.  But it is not UNlikely that my browser drops yellow flags.

That said, I sent a foundation manager to yeggi to see the challenge and he saw it.

Maybe you're so plain vanilla that nobody worries about you. Sad. hhh

ergophobe

>> FF Incognito

That will trigger it almost all the time.

This is the web equivalent of your response to my whining about tortillas or yogurt being considered UP food. You have to remember, almost everyone is on a UP browser that is ultra-processing their data.

I don't consider Yoplait to be actual yogurt. You consider FF Incognito to be your pared down "normal person" browser :-)

Travoli

"checking to see if you're a human"

Happens to me also, and I agree it's annoying. I run incognito often too, though.

I'm told there is a toggle to turn that interstitial off. We're digging into that project tomorrow, so we'll see.