Author Topic: Marginalia hits 106 Million Documents  (Read 821 times)

Brad

  • Inner Core
  • Hero Member
  • *
  • Posts: 4174
  • What, me worry?
    • View Profile
Marginalia hits 106 Million Documents
« on: October 21, 2022, 05:04:48 PM »
https://search.marginalia.nu

Marginalia search engine, specializes in text heavy, non-commercial web pages. It has several different algo's you can try.

https://twitter.com/MarginaliaNu/status/1583464144686104576

Quote
But get this: Marginalia now indexes 106 million documents! Off a single PC. This is kinda bonkers. Previous record was barely above 60 million. Turns out modern computers are kinda powerful.

Crawling took 2 weeks. The index is 1.1 Tb.

rcjordan

  • I'm consulting the authorities on the subject
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 16492
  • Debbie says...
    • View Profile
Re: Marginalia hits 106 Million Documents
« Reply #1 on: October 21, 2022, 07:04:18 PM »
>specializes in text heavy, non-commercial

I wonder how they differentiate between commercial & non-commercial? Graphics? Logos? Topics? KWs?  All of the above?

Brad

  • Inner Core
  • Hero Member
  • *
  • Posts: 4174
  • What, me worry?
    • View Profile
Re: Marginalia hits 106 Million Documents
« Reply #2 on: October 21, 2022, 07:27:15 PM »
>differentiate

Dunno.  There is some human screening going on.  I suspect the amount of ads plays a part and probably graphics too.