AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica

Started by rcjordan, April 04, 2025, 10:12:28 PM

Previous topic - Next topic

rcjordan


https://arstechnica.com/information-technology/2025/04/ai-bots-strain-wikimedia-as-bandwidth-surges-50/

I've noticed more sites using Cloudflare to "check to see if you're human" and also an uptick in "Click all the squares with" Captcha (ugh).

Brad

I've been encountering the Cloudflare thingy more.

I just looked, WordPress has several bot blocking plugins but many just edit the robots.txt file which isn't enough.

Like anti-spam there is a business opportunity here for anti-AIbots.

ergophobe

My experience is that proof-of-work solutions work best. For something like OpenAI with their massive crawl, that chews up resources.

The new Cloudflare approach is cool. It combines Honeypot concepts (send them down the wrong path) with Proof of Work (send them down lots of wrong paths). It must eat a ton of CF resources too though. Most Proof of Work succeeds by requiring *client* side work, so it pushes the work from the server to the client.

[edit: my experience with bots, not specifically experience with AI bots, of which I have none]

rcjordan

So, what happens when the bulk of users start using AI search?  I noticed 1min.ai plow through a bunch of sites (in about 3 seconds) when I was doing medical research.  It seems to me many/most sites will see bandwidth skyrocket -or- they'll exclude their site from user-directed AI searches.  ....Not that it matters much to the site, as I do not foresee much likelihood of user visits due to their search.

ergophobe

>>  I noticed 1min.ai plow through a bunch of sites

I haven't use that feature. Are you feeding it the URLs or is it "researching" and going to the sites?

I think for most chatbot queries, though, it is pulling from training data, right? It's rarely actually visiting the sites during a search. Isn't it more when you feed it a URL and ask it to summarize a page or feed it a set of URLs and ask it to summarize?

rcjordan

Onsite instructions are vague, here's what I did;

Multi AI Chat > Settings Icon > Mix AI Models 'on' > Web Search 'on' > Input # of sites > Input words per site > Input Prompts

>is it "researching" and going to the sites?

Yes, once I entered the prompts and clicked 'Send', a list of urls scrolled very quickly off the top of the page.  I may have prompted the bots with a general listing of authoritative med sites (ex: PubMed) but I don't really recall doing that. I know I did not specify any urls.

In the above case, I sicced 3 bots on the query at once and then had them jointly summarize.

>chatbot queries, though, it is pulling from training data, right?

I'm guessing that general public use is currently largely dominated by that kind of query, but I don't think it will stay as dominant in the long term as people become more familiar with AI search.

I'd bet even money, though, that researchers, commercial users, and even SEOers are feeding it urls, domains, & categories.

ergophobe

>> feeding it the URLs or is it "researching" and going to the sites?

https://openai.com/index/introducing-operator/

A research preview of an agent that can use its own browser to perform tasks for you. Available to Pro users in the U.S.