Marginalia hits 106 Million Documents

Started by Brad, October 21, 2022, 05:04:48 PM

Previous topic - Next topic

Brad

https://search.marginalia.nu

Marginalia search engine, specializes in text heavy, non-commercial web pages. It has several different algo's you can try.

https://twitter.com/MarginaliaNu/status/1583464144686104576

QuoteBut get this: Marginalia now indexes 106 million documents! Off a single PC. This is kinda bonkers. Previous record was barely above 60 million. Turns out modern computers are kinda powerful.

Crawling took 2 weeks. The index is 1.1 Tb.

rcjordan

>specializes in text heavy, non-commercial

I wonder how they differentiate between commercial & non-commercial? Graphics? Logos? Topics? KWs?  All of the above?

Brad

>differentiate

Dunno.  There is some human screening going on.  I suspect the amount of ads plays a part and probably graphics too.