The Core

Why We Are Here => Traffic => Topic started by: Brad on October 21, 2022, 05:04:48 PM

Title: Marginalia hits 106 Million Documents
Post by: Brad on October 21, 2022, 05:04:48 PM
https://search.marginalia.nu

Marginalia search engine, specializes in text heavy, non-commercial web pages. It has several different algo's you can try.

https://twitter.com/MarginaliaNu/status/1583464144686104576

QuoteBut get this: Marginalia now indexes 106 million documents! Off a single PC. This is kinda bonkers. Previous record was barely above 60 million. Turns out modern computers are kinda powerful.

Crawling took 2 weeks. The index is 1.1 Tb.
Title: Re: Marginalia hits 106 Million Documents
Post by: rcjordan on October 21, 2022, 07:04:18 PM
>specializes in text heavy, non-commercial

I wonder how they differentiate between commercial & non-commercial? Graphics? Logos? Topics? KWs?  All of the above?
Title: Re: Marginalia hits 106 Million Documents
Post by: Brad on October 21, 2022, 07:27:15 PM
>differentiate

Dunno.  There is some human screening going on.  I suspect the amount of ads plays a part and probably graphics too.