The Core

Why We Are Here => Traffic => Topic started by: rcjordan on April 04, 2025, 10:12:28 PM

Title: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on April 04, 2025, 10:12:28 PM

https://arstechnica.com/information-technology/2025/04/ai-bots-strain-wikimedia-as-bandwidth-surges-50/

I've noticed more sites using Cloudflare to "check to see if you're human" and also an uptick in "Click all the squares with" Captcha (ugh).
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: Brad on April 05, 2025, 06:40:13 AM
I've been encountering the Cloudflare thingy more.

I just looked, WordPress has several bot blocking plugins but many just edit the robots.txt file which isn't enough.

Like anti-spam there is a business opportunity here for anti-AIbots.
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: ergophobe on April 05, 2025, 07:18:55 PM
My experience is that proof-of-work solutions work best. For something like OpenAI with their massive crawl, that chews up resources.

The new Cloudflare approach is cool. It combines Honeypot concepts (send them down the wrong path) with Proof of Work (send them down lots of wrong paths). It must eat a ton of CF resources too though. Most Proof of Work succeeds by requiring *client* side work, so it pushes the work from the server to the client.

[edit: my experience with bots, not specifically experience with AI bots, of which I have none]
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on April 05, 2025, 08:12:41 PM
So, what happens when the bulk of users start using AI search?  I noticed 1min.ai plow through a bunch of sites (in about 3 seconds) when I was doing medical research.  It seems to me many/most sites will see bandwidth skyrocket -or- they'll exclude their site from user-directed AI searches.  ....Not that it matters much to the site, as I do not foresee much likelihood of user visits due to their search.
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: ergophobe on April 05, 2025, 11:40:45 PM
>>  I noticed 1min.ai plow through a bunch of sites

I haven't use that feature. Are you feeding it the URLs or is it "researching" and going to the sites?

I think for most chatbot queries, though, it is pulling from training data, right? It's rarely actually visiting the sites during a search. Isn't it more when you feed it a URL and ask it to summarize a page or feed it a set of URLs and ask it to summarize?
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on April 06, 2025, 12:15:57 AM
Onsite instructions are vague, here's what I did;

Multi AI Chat > Settings Icon > Mix AI Models 'on' > Web Search 'on' > Input # of sites > Input words per site > Input Prompts

>is it "researching" and going to the sites?

Yes, once I entered the prompts and clicked 'Send', a list of urls scrolled very quickly off the top of the page.  I may have prompted the bots with a general listing of authoritative med sites (ex: PubMed) but I don't really recall doing that. I know I did not specify any urls.

In the above case, I sicced 3 bots on the query at once and then had them jointly summarize.

>chatbot queries, though, it is pulling from training data, right?

I'm guessing that general public use is currently largely dominated by that kind of query, but I don't think it will stay as dominant in the long term as people become more familiar with AI search.

I'd bet even money, though, that researchers, commercial users, and even SEOers are feeding it urls, domains, & categories.
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: ergophobe on April 08, 2025, 01:37:29 AM
>> feeding it the URLs or is it "researching" and going to the sites?

https://openai.com/index/introducing-operator/

A research preview of an agent that can use its own browser to perform tasks for you. Available to Pro users in the U.S.
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on August 21, 2025, 03:43:58 PM
#webdev #hosting

AI crawlers and fetchers are blowing up websites, with Meta and OpenAI the worst offenders

One fetcher bot seen smacking a website with 39,000 requests per minute

https://www.theregister.com/2025/08/21/ai_crawler_traffic/
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: ergophobe on August 21, 2025, 08:46:22 PM
>> Cloudflare

https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: Travoli on August 22, 2025, 02:39:58 AM
>more sites using Cloudflare
>AI crawlers and fetchers are blowing up websites

This finally happened here this week. Took the sites offline several times. We're looking into the free Cloudflare system now.
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on August 22, 2025, 12:08:29 PM
>looking at

I think I have posted this elsewhere here, but you can see one version of cloudflare's "checking to see if you're a human" at by visiting yeggi.com

It's a minor pita
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: ergophobe on August 22, 2025, 07:58:07 PM
Quote from: rcjordan on August 22, 2025, 12:08:29 PMIt's a minor pita

I think that might be because you are running blockers and other privacy protection. For most users, CF is transparent and a quick proof of work gets you through. Once you start doing things to block tracking, though, they system takes human action.

I'm not 100% sure on the AI thing, but that is how it has always worked in general

In my case, running just Privacy Badger now, I did not get a challenge for yeggi.com
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on August 22, 2025, 08:17:54 PM
>running blockers and other privacy protection

I get the challenge in FF Incognito with only ublock & tampermonkey running.  But it is not UNlikely that my browser drops yellow flags.

That said, I sent a foundation manager to yeggi to see the challenge and he saw it.

Maybe you're so plain vanilla that nobody worries about you. Sad. hhh
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: ergophobe on August 22, 2025, 08:26:08 PM
>> FF Incognito

That will trigger it almost all the time.

This is the web equivalent of your response to my whining about tortillas or yogurt being considered UP food. You have to remember, almost everyone is on a UP browser that is ultra-processing their data.

I don't consider Yoplait to be actual yogurt. You consider FF Incognito to be your pared down "normal person" browser :-)
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: Travoli on August 24, 2025, 10:03:36 PM
"checking to see if you're a human"

Happens to me also, and I agree it's annoying. I run incognito often too, though.

I'm told there is a toggle to turn that interstitial off. We're digging into that project tomorrow, so we'll see.
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on August 24, 2025, 10:40:43 PM
>incognito

Happens to my when not using incognito. Here's what I'm running.
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on August 24, 2025, 10:42:26 PM
I doubt many of the average retail users will trip it.
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on August 25, 2025, 12:18:10 PM
"Bro, ban me at the IP level if you don't like me!" - The Boston Diaries - Captain Napalm

https://boston.conman.org/2025/08/21.1
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: ergophobe on August 25, 2025, 08:08:32 PM
I think Privacy Badger, uBlock Origin and Ad Nauseum might all trip up Cloudflare.

If you think about it, most bot crawlers are not going to be able to handle third-party cookies, so that's a signal to Cloudflare.

I have noticed for the last few years I have spikes where I exceed CPU usage limits despite not having a huge number of page views. I know it's bots, but I wonder how much is just your old school probes and how much is AI crawlers.

I use CF for many sites, but not all since sometimes it caused an issue here or there... but I can't remember what
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on August 26, 2025, 12:02:35 PM
stolen from bsky;

Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: Rumbas on August 28, 2025, 03:33:11 PM
Amen. These AI bots are going nuts atm. 2.7M requests on one site in a week.
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: ergophobe on August 28, 2025, 07:49:14 PM
Finally went into Cloudflare and turned on
 - AI Labyrinth on non-revenue sites
 - Block AI Crawlers on all sites
 - Bot Fight Mode on all sites

It's kind of astounding what CF offers for free if you have a low-traffic site (or, in my case, several which are in some cases just parked domains now but which can still get blasted by bots)
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on November 16, 2025, 05:03:39 PM
The Internet Is No Longer A Safe Haven | Brain Baking

https://brainbaking.com/post/2025/10/the-internet-is-no-longer-a-safe-haven/
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on November 18, 2025, 01:14:54 PM
There is an internal server error on Cloudflare's network.

 Internal server error Error code 500
Visit cloudflare.com for more information.
2025-11-18 13:09:02 UTC

www.theregister.com



+

Cloudflare timed out on Yeggi just prior to trying theregister link
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on November 18, 2025, 04:28:05 PM
Debbie said this was likely a widespread failure.  /r agreed. has meme.

that was a nice 20-minute apocalypse! : memes
https://old.reddit.com/r/memes/comments/1p0agab/that_was_a_nice_20minute_apocalypse/
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on November 18, 2025, 04:51:15 PM
memes running hot now

This one is good

Cloudflare is down. : meme
https://old.reddit.com/r/meme/comments/1p0b372/cloudflare_is_down/
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: ergophobe on November 21, 2025, 02:36:31 AM
Service Outage
https://xkcd.com/3170/
Title: Re: AI bots strain Wikimedia as bandwidth surges 50% - Ars Technica
Post by: rcjordan on May 21, 2026, 04:22:49 PM
Sites blocking spiders are crippling previously good aggregators.  Add enshitification of sites that used to be good sources for an aggregator and there's little reason to visit anymore.

I've lost a really good source for 3d printer models.