I have about 2.6 million domains in the overall list. Most of that is shite though. There is a section of about 250k domains that is far more interesting.
So, this is basically (as I am sure JD realised) prospecting. Our market is English language websites running at least one of a number of on-site technologies. We've doing this on a small scale periodically using scrapebox and similar, but it feels like we should start building this into an uber-list for work on for the long term.
Plan is:
- Check sites for the existence of one of our target technologies
- Store a text sample from those that do
- Language checking the matches (currently looking at paying for Google translate API)
I want to build a big list, but we don't need it quickly. If we were dripping through a few dozen English languages "hits" a week that is actually enough to keep us busy. No idea what proportion of the total list that represents though.