Thank you Aaron,

I'll take screenshots as the aesthetic change says a lot in itself, very much appreciated.

The baked in Google estate is something I'm aware of, at least the examples you cited. I feel people generally mistake the invasiveness of many of the things you reference with the knowledge graph stuff that's generally available from open sources. Still, when you remove the open source stuff, there's a lot of commercial motives at play.

It's just interesting especially for someone younger to know there was a time where it was simply organic + ads, and organic was purely organic.

Separately, I do speak to a lot of people in the tech industry (server / programming people), who have many years online but pretty much take search engine results as gospel and 'despise' anything that involves the conception of SEO. I feel that is a lot down to naivety and that manipulating result is a poison on the well. Given the philosophical conversations about information retrieval over the years on webmasterworld and elsewhere I'm sure everyone just has to take a neutral stance in the end.

Those are great Rich, cheers

Any more would be great - just building up context of how things appear the way they are now.

Plenty of you I'm sure have read those mega long blog posts that cover the general timeline of search engine evolution, I remember reading some but never bothered to bookmark or remember who wrote them, they just seemed cool to read at the time. They tended to be written by SEOs who'd seen the day to day news and gradual changes implemented.

I'm interested in one that covers the general timeline of engines, mainly Google and how the real estate for organic has shrank, and how privacy has been eroded.

It'll be used by someone whose relatively inexperienced that can muster some context and facts from those articles

Do you recall any good ones (in the context of they covered the points you felt were most important)? Might need to check the wayback machine if it's older but relevant.

>80's ANYTHING from the US would be the most awesome thing

Definitely for me when growing up in the 80s in Scotland. Michael Jackson, WWF, hollywood... what wasn't to like.

Eye rolls when Western leaders talk about the free world, though. Protectionism seems to rear its head now and again, pretty sure there were similar tariffs in the early 2000s by Bush.

I'm probably not in the best position to talk about it, but the owner might be interested in stopping by sometime.

Privacy and no 3rd party tracking is a cornerstone of all future plans though. He's pointed out some fundamental issues with the way DDG serve their results...

I'm sure there will be ads at some point, but nothing that would involve your browsing history.

I have to plug here. Known the owner for 10+ years.

2 billion page index with room for growth. Privacy orientated. a crawler engine with its own results, not just another bing clone.

I'm helping out with the knowledge based stuff.

Some of the results are a bit dated and some things definitely need updated in that regard, but definitely one to watch.

(Having studied their results, I think a bigger index and fresher results solves a lot of the relevancy issues, their ranking algo is pretty sophisticated)

There could be useful info here for anyone who wants to dig further into how it affects page views and whatnot -

Same data with a UI -|Dog

Had a quick look myself, at the least it contains every article and how many page views that article got on an hourly basis.

From a technical PoV I know of implementations that have the Wiki summaries held on memory and it takes up several GB, iirc that's just for the English version though.

>Die Antwoord

Edgy stuff, I see there's a reference to Aphex Twin (song starts at 4:00)

Reminds me how middle of the road I am

Shame about the boat race

Watched a fair bit of Tony Clifton lately on YouTube, what a legend.

At least in the near future the 'Last updated' can be mapped to historical scrapes of WHOIS

Agree about open WHOIS adding legitimacy to some enquiries, I found it useful for filtering out fly by night web hosts when looking for cheap hosting.

Thanks Littleman, those both look very helpful. Having read around it seems detections can get as good as 99%, at least for the major languages. I'll dig around to see how those compare.

The template extraction problem I have an idea of how one would work but there's doesn't seem to be much (public) code out there, and it is much more of a SE-specific problem than language.

Puts a New York minute to shame

I'd bet a huge chunk of that information generated is unavailable or hard to get at after a year

Some folks just need something to believe in

I don't hold it against them until they disagree :)

I jest though. Plenty believers in many things that seem very intelligent but have a particular belief system.

I don't know Vic, but he doesn't answer his door much

I'm in need of two tools or at least some inspiration for best practice

1st is detecting languages used on web page, as some tests show that lang attributes are accurate 80% of the time, so something more robust that actually looks at the content. I'm aware of a technique that looks at two-three character combos which apparently works well, also perhaps popular words from each language. Anyone seen an implementation (with code or explanation) that works well?

2nd is somewhat related, evaluating 1 or more web pages from a domain and being able to detect the main content area of a page. Seen anything that claims to work well (code or explanation would be great)

