The Core

Why We Are Here => Traffic => Topic started by: Gurtie on November 30, 2013, 09:06:48 PM

Title: the whitelist approach to robots.txt
Post by: Gurtie on November 30, 2013, 09:06:48 PM
has anyone successfully done this?  I'm not sure how workable it is - the proposal is to block everything unless specifically whitelisted (and there's an acknowledgement that some bots will ignore it, of course)

thoughts?
Title: Re: the whitelist approach to robots.txt
Post by: Rooftop on November 30, 2013, 11:04:10 PM
Sounds like a recipe for future disaster to me.     Perfect on paper though. 
Title: Re: the whitelist approach to robots.txt
Post by: rcjordan on November 30, 2013, 11:14:59 PM
Ain't but two guys to ask, IMO, and I don't know where Ralph is right now.  So that leaves incredibill and it looks like he's posted recently on this

https://www.google.com/search?q=site:incredibill.com+whitelist
Title: Re: the whitelist approach to robots.txt
Post by: buckworks on December 01, 2013, 04:13:19 AM
I'm not much of a techie so consider this endorsement with great caution, but I've been using Incredibill's whitelisting approach for two or three years with no known problems.
Title: Re: the whitelist approach to robots.txt
Post by: Gurtie on December 01, 2013, 09:36:23 AM
aaah, good to hear - thanks for the link RC. Does anyone know if he's every got around to part 2 or expanded/published suggested approved bot list/etc elsewhere, or does anyone have an up to date list of good bots?

I'm concerned about stuff like Google product crawlers (suspect they crawl to check price validity etc and if we block them we might have issues with product feed approval) and whether Yahoo still uses SLURP and what the smaller engines use (while DDG may not be sending us much traffic it seems wrong to exclude it - not only for us but also they're never going to rule the world if we all block them!). I don't have a lot of time to hunt this stuff down and double check it at the moment!

<edit> Ralph of course, but i don't really know him well enough to hit him up for a random freebie!
Title: Re: the whitelist approach to robots.txt
Post by: rcjordan on December 01, 2013, 02:13:15 PM
Unless he's mellowed, or muddled his brain on those damn cigs, you won't get a freebie from Ralph.
Title: Re: the whitelist approach to robots.txt
Post by: Rumbas on December 01, 2013, 02:35:01 PM
>Unless he's mellowed, or muddled his brain on those damn cigs, you won't get a freebie from Ralph.

Hahahah! Probably got that right, however he is a nice guy, lol.

Title: Re: the whitelist approach to robots.txt
Post by: JasonD on December 05, 2013, 11:50:33 AM
i used to selectively deliver differing IP addresses via DNS requests.

It's effectively normal IP cloaking but at a lower level and I found it much easier to control / manage.