Author Topic: Anyone use 80legs?  (Read 2127 times)

Rooftop

  • Inner Core
  • Hero Member
  • *
  • Posts: 1915
    • View Profile
Anyone use 80legs?
« on: April 19, 2017, 08:55:44 AM »
I've got a small project that 80legs.com looks like a great match for.  It will need a simple custom "80app" though and I'm not sure this is something worth my time to learn.  Anyone have any experience or know someone who might be able to do this?

JasonD

  • Inner Core
  • Hero Member
  • *
  • Posts: 1420
  • Look at THAT!!!!
    • AOL Instant Messenger - JasonDDuke
    • View Profile
    • Domain Names
    • Email
Re: Anyone use 80legs?
« Reply #1 on: April 20, 2017, 12:45:36 PM »
What is it that you need done?

Rooftop

  • Inner Core
  • Hero Member
  • *
  • Posts: 1915
    • View Profile
Re: Anyone use 80legs?
« Reply #2 on: April 20, 2017, 01:12:00 PM »
I suspect you can read nicely between the lines here, but I'm thinking we give 80legs a big list of URLs and return

  • Title
  • First 20 words of text
  • Five yes/no flags

The yes/no flags are based on string matching.  For example, one flag might be "sharing widgets" which is Y if any of the following strings are found:
sharethis.com
addthis.com
addtoany.com

(Not actually doing sharing widgets - but you get the drift).

I suspect this is 20 lines of javascript, but I'm crap at it!

JasonD

  • Inner Core
  • Hero Member
  • *
  • Posts: 1420
  • Look at THAT!!!!
    • AOL Instant Messenger - JasonDDuke
    • View Profile
    • Domain Names
    • Email
Re: Anyone use 80legs?
« Reply #3 on: April 20, 2017, 06:09:34 PM »
I am pretty confident I know where you're coming from.

Quantity of URLs?

Rooftop

  • Inner Core
  • Hero Member
  • *
  • Posts: 1915
    • View Profile
Re: Anyone use 80legs?
« Reply #4 on: April 20, 2017, 09:10:41 PM »
I have about 2.6 million domains in the overall list. Most of that is shite though. There is a section of about 250k domains that is far more interesting.

So, this is basically (as I am sure JD realised) prospecting. Our market is English language websites running at least one of a number of on-site technologies. We've doing this on a small scale periodically using scrapebox and similar, but it feels like we should start building this into an uber-list for work on for the long term.

Plan is:
- Check sites for the existence of one of our target technologies
- Store a text sample from those that do
- Language checking the matches (currently looking at paying for Google translate API)

I want to build a big list, but we don't need it quickly.  If we were dripping through a few dozen English languages "hits" a week that is actually enough to keep us busy.  No idea what proportion of the total list that represents though.