The Core

Why We Are Here => Traffic => Topic started by: ergophobe on November 10, 2024, 12:26:10 AM

Title: GPTBot crawl
Post by: ergophobe on November 10, 2024, 12:26:10 AM
I was just looking at a few raw server logs and noticed that GPTBot was crawling like mad in October. A site with between 100-200 pages got

5700 so far in Nov
97,000 hits from GPTBot in October
2068 in September
1700 in August

I looked at some other mini sites with maybe a dozen pages and they got 37,000 to 47,000 hits in October and just a handful in September.

I can't imagine the crawl budget OpenAI must have
Title: Re: GPTBot crawl
Post by: rcjordan on November 10, 2024, 01:34:42 AM
Debbie says they may be running full blast before the copyright lawsuits throttle them.

https://www.shacknews.com/article/141313/openai-needs-copyright-material

OpenAI insists it can't sufficiently train AI models without copyrighted material | Shacknews

---

Take a look....

https://www.google.com/search?q=ai+bots+courts+copyright

ai bots courts copyright - Google Search


Title: Re: GPTBot crawl
Post by: ergophobe on November 10, 2024, 02:22:02 AM
Well.... they have my content. I wonder how hard it would be to seed an AI, like the SEO comps where people would compete to rank some previously unique phrase.
Title: Re: GPTBot crawl
Post by: rcjordan on November 10, 2024, 11:56:52 AM
>seed

related:

Annoyed Redditors tanking Google Search results illustrates perils of AI scraper

https://th3core.com/talk/traffic/annoyed-redditors-tanking-google-search-results-illustrates-perils-of-ai-scraper/msg86053/#msg86053
Title: Re: GPTBot crawl
Post by: ergophobe on November 10, 2024, 07:04:41 PM
Ah yes. I thought we had had some discussion of that