The Core

Why We Are Here => Water Cooler => Topic started by: Rooftop on April 19, 2017, 08:55:44 AM

Title: Anyone use 80legs?
Post by: Rooftop on April 19, 2017, 08:55:44 AM
I've got a small project that 80legs.com looks like a great match for.  It will need a simple custom "80app" though and I'm not sure this is something worth my time to learn.  Anyone have any experience or know someone who might be able to do this?
Title: Re: Anyone use 80legs?
Post by: JasonD on April 20, 2017, 12:45:36 PM
What is it that you need done?
Title: Re: Anyone use 80legs?
Post by: Rooftop on April 20, 2017, 01:12:00 PM
I suspect you can read nicely between the lines here, but I'm thinking we give 80legs a big list of URLs and return


The yes/no flags are based on string matching.  For example, one flag might be "sharing widgets" which is Y if any of the following strings are found:
sharethis.com
addthis.com
addtoany.com

(Not actually doing sharing widgets - but you get the drift).

I suspect this is 20 lines of javascript, but I'm crap at it!
Title: Re: Anyone use 80legs?
Post by: JasonD on April 20, 2017, 06:09:34 PM
I am pretty confident I know where you're coming from.

Quantity of URLs?
Title: Re: Anyone use 80legs?
Post by: Rooftop on April 20, 2017, 09:10:41 PM
I have about 2.6 million domains in the overall list. Most of that is shite though. There is a section of about 250k domains that is far more interesting.

So, this is basically (as I am sure JD realised) prospecting. Our market is English language websites running at least one of a number of on-site technologies. We've doing this on a small scale periodically using scrapebox and similar, but it feels like we should start building this into an uber-list for work on for the long term.

Plan is:
- Check sites for the existence of one of our target technologies
- Store a text sample from those that do
- Language checking the matches (currently looking at paying for Google translate API)

I want to build a big list, but we don't need it quickly.  If we were dripping through a few dozen English languages "hits" a week that is actually enough to keep us busy.  No idea what proportion of the total list that represents though.