Author Topic: Roll your own search engine?  (Read 4451 times)

Brad

  • Inner Core
  • Hero Member
  • *
  • Posts: 4137
  • What, me worry?
    • View Profile
Roll your own search engine?
« on: August 17, 2018, 11:27:59 PM »
I'm still fascinated with the idea of the https://asciimoo.github.io/searx/ script.  What bothers me is it is a scrape metasearch and sooner or later the search engines will shut an instance down. 

I'm wondering if a subscription pay roll your own search service could be put together?  Not really a meta search but more of a blended hybrid search.  I'm not sure where I'm going with this.  I've got all these different parts and theories and I'm trying to piece them all together.

If you could blend search feeds from Yandex, Mojeek and Gigablast together you would have a very good, maybe better than most backfill quality search results.  If Yahoo is still building an index independent of Bing one might throw them in the mix.

You have https://www.curlie.org/ if they ever open the doors again.

Wikipedia

Others that we don't know about.

We had Rollyo and Eurikster be sort of make your own search engines, I'm just wondering if a person could cobble together different sources for a DIY search engine service and would there be any demand?

All hypothetical.


littleman

  • Administrator
  • Hero Member
  • *****
  • Posts: 6531
    • View Profile
Re: Roll your own search engine?
« Reply #1 on: August 18, 2018, 05:27:31 AM »
Brad, you really should just build yourself a search engine.  You've visited this topic many times over the years.

Added:
http://th3core.com/talk/traffic/to-start-a-search-engine/
https://th3core.com/talk/water-coolerextra/i-kinda-miss-those-5000-search-engines/
« Last Edit: August 18, 2018, 05:48:32 AM by littleman »

Brad

  • Inner Core
  • Hero Member
  • *
  • Posts: 4137
  • What, me worry?
    • View Profile
Re: Roll your own search engine?
« Reply #2 on: August 18, 2018, 12:15:18 PM »
Point taken. I won't approach the subject again. Thanks.

littleman

  • Administrator
  • Hero Member
  • *****
  • Posts: 6531
    • View Profile
Re: Roll your own search engine?
« Reply #3 on: August 18, 2018, 05:50:31 PM »
No, I didn't mean to close down the topic!  I was being serious, starting a search engine seems to be something that has occupied your thoughts a lot over the years.  The technology to build and run one has never been easier or less expensive to obtain.  You really should build one!

My own research into the topic showed me that the size of the database surprising is not what drives the experience up sharply, but rather, that comes from traffic.  So, if the business model is sound then your cost grows with the revenue.

Edit: fat thumbs on a phone
« Last Edit: August 18, 2018, 05:57:25 PM by littleman »

Rumbas

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2103
  • Viking Wrath
    • MSN Messenger - rasmussoerensen@hotmail.com
    • AOL Instant Messenger - seorasmus
    • View Profile
Re: Roll your own search engine?
« Reply #4 on: August 19, 2018, 01:03:23 PM »
Go for it Brad. Maybe a Th3Core Super Search Thingy?

rcjordan

  • I'm consulting the authorities on the subject
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 16268
  • Debbie says...
    • View Profile
Re: Roll your own search engine?
« Reply #5 on: August 19, 2018, 01:36:13 PM »
Would be particularly nice if there is a way to generate semi-curated static pages from the serps.  I see some highly specialized sites (food safety, medical research) which I think are using meta-search and selected rss feed search as source material.

Brad

  • Inner Core
  • Hero Member
  • *
  • Posts: 4137
  • What, me worry?
    • View Profile
Re: Roll your own search engine?
« Reply #6 on: August 19, 2018, 01:57:51 PM »
The truth is my capabilities end around HTML 3.  I'd love to make a search engine but it's beyond me.

I look at these posts as Idea Virus posts, maybe somebody will come along with more money or more skill think about this and follow through.  With this one was to point out some off the shelf resources that seem to be laying around and a few thoughts about what they might be used for.

My second goal - which I didn't explain very well - about the DIY search service - is part of "Brad's Ongoing Guerrilla Insurgency Against Google" BOGIAG*.   

Webmasters helped Google get started by putting Google search boxes on their websites.  My thought was: "What if every blogger/webmaster put up a search box for a non-Google engine?"  Then, progressing, "What if every webmaster could put together their own modular search engine, with few being exactly alike (combined spider, directory, RSS engine?), and put them on their websites?"  Then, "Well what if one could provide a service to make it easy for webmasters to put together a modular search engine, a bit like Rollyo a bit like Eurekster but better and would anyone use it?"

Hence the post.

>>size of the database

I think you are right LM.  I'm in the middle of a three week test of Mojeek.com as my default engine.  On long multi word queries it sometimes fails.  But it keeps surprising me, with Duckduck and Bing based engines, I pretty much know which trusted sites Bing will bring up for reviews and best of tech lists.  But with Mojeek, I'm getting some real gems out of what would be "the long tail" on a major search engine.  I'm kinda amazed.

Aside

*BOGAIG is utilizing lots of tiny elements to get around Google as Gatekeeper, mainly for blogs.  These include: Indieweb.org elements like webmentions, syndication to social media for traffic, RSS, old time blogrolls, curated micro-directories, maybe webrings, site searches to link several of our domains together, search boxes from any one other than Google, search feeds, maybe one exclusive subject category on our blogs that Google and only Google is excluded from in robots.txt, etc.  Anything, that is cheap, easy, off the self, low risk and kinda fun. Sounds a little bit crazy to us, but to the younger set not so crazy.  They like the idea of reviving a retro-web of many search engines, many directories, blogrolls - anything to break the monopolies.
/Aside

BoL

  • Inner Core
  • Hero Member
  • *
  • Posts: 1205
    • View Profile
Re: Roll your own search engine?
« Reply #7 on: August 21, 2018, 01:14:44 PM »
>Mojeek

Bit OT but a little insight:
https://www.mojeek.co.uk/search?q=scotland&moo=0 - default
https://www.mojeek.co.uk/search?q=scotland&moo=1 - on-page factors only
https://www.mojeek.co.uk/search?q=scotland&moo=2 - off-page factors only

Site search and custom ranking factors are two things more or less available (I've seen a page that spits out stats on dozens of ranking factors, you can change the weights via URL). Gauging/funding those kind of things is something that may get looked into. Certainly challenging in pure organic on an international scale is a big task.


Brad

  • Inner Core
  • Hero Member
  • *
  • Posts: 4137
  • What, me worry?
    • View Profile
Re: Roll your own search engine?
« Reply #8 on: August 21, 2018, 09:32:55 PM »
>Mojeek

First, it takes a lot of guts to rely only on your own index and algo for a search engine in 2018.  I admire Mojeek for their moxie.  Qwant may be making their own index but they have Bing to do the heavy lifting right now.  Mojeek is out there all alone with only their own resources.

Second, thank you for those examples.  It makes the role of those factors so plain in contrast.

I've become a bit of a Mojeek fan since you tipped us off to it a couple of months ago.  Their algo is pretty good too.

Remember those Parallel search forms we used to have years ago?  The ones that showed Google, Yahoo, ATW results side by side for comparison on the same search.  I'd love to have one of those today for DDG, Startpage, Mojeek.


BoL

  • Inner Core
  • Hero Member
  • *
  • Posts: 1205
    • View Profile
Re: Roll your own search engine?
« Reply #9 on: August 22, 2018, 01:12:56 PM »
Indeed. I spoke to Marc (the Mojeek founder) in 2006 on WmW before it got started, just talking about algos and ranking ideas. It's such a huge task that you think anyone trying it on their own would give up, but he's done an amazing job.

The parallel search forms wouldn't be too hard to code up with some iframes and javascript. Interesting that startpage use POST vars, makes their SERPs harder to link to.

Also, maybe you'd be interested in yacy.net if you've not seen it.

Do pass on your experiences with Mojeek and your trial via the feedback option, it's invaluable to hear about real-world experience
« Last Edit: August 22, 2018, 04:28:19 PM by BoL »

Brad

  • Inner Core
  • Hero Member
  • *
  • Posts: 4137
  • What, me worry?
    • View Profile
Re: Roll your own search engine?
« Reply #10 on: August 23, 2018, 12:32:29 PM »
>algos

I remember when Microsoft first launched their search engine, Windows Live Gates Search or whatever, and it was derided, rightly, on WmW.  They thought that a glorified document search would work.  Web is different and if you become big people will try to spam it.  MS eventually got it to be pretty good. (Now they don't know what to do with it.)

>>Yacy.net 
I have mixed feelings about that one.  They are building a commercial index, but with other people's work.  And we don't know how they are going to use it in the end.  I'll look into it some more, but they really need a competent crawler. 

>>Mojeek

I will give them some feedback.  It's good to get that from outside eyes.


Brad

  • Inner Core
  • Hero Member
  • *
  • Posts: 4137
  • What, me worry?
    • View Profile
Re: Roll your own search engine?
« Reply #11 on: September 20, 2018, 05:06:00 PM »

Do pass on your experiences with Mojeek and your trial via the feedback option, it's invaluable to hear about real-world experience

BoL,

I used Mojeek as my default every day for 3 weeks and blogged a review. Mojeek noticed my review via Twitter. They linked to it on their blog. So they got my feedback.

rcjordan

  • I'm consulting the authorities on the subject
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 16268
  • Debbie says...
    • View Profile
Re: Roll your own search engine?
« Reply #12 on: September 20, 2018, 05:18:34 PM »
I read your review.  So DDG/Bing is still your primary, but only because of index size?

Good idea about parallel.

<added>
Any idea as to how they plan to monetize? 
« Last Edit: September 20, 2018, 05:21:56 PM by rcjordan »

BoL

  • Inner Core
  • Hero Member
  • *
  • Posts: 1205
    • View Profile
Re: Roll your own search engine?
« Reply #13 on: September 20, 2018, 07:50:14 PM »
Brad, your feedback was received and greatly appreciated, cheers. It's all taken on board. One of the things I'll be helping out with is to try squeeze a bit more room in indexes to increase overall index size...

Did you notice the knowledge graph stuff? I'd put a bit of that together.

Marc (the founder/owner) has a fair idea of the value of traffic, he's looking simply to get the search numbers up before thinking more about monetisation.

Brad

  • Inner Core
  • Hero Member
  • *
  • Posts: 4137
  • What, me worry?
    • View Profile
Re: Roll your own search engine?
« Reply #14 on: September 20, 2018, 07:56:26 PM »
>>DDG/Bing

Mainly.  Sometimes even DDG just does not have enough deep, long tail stuff and I have to resort to StartPage.

 Where Mojeek started coming up short were searches like, "free perl webring hosting script"  "free php webring hosting script" "bomis ring hosting script" (I'm going from memory here.)  When you get into 5 word searches about unpopular topics like webrings you can start seeing Mojeek struggling to produce.

On DDG/Bing if you do searches like "best linux note software" "How to install ubuntu linux" "best windows 10 calendar software"  or any of those kind of hardware software reviews, lists comparisons, I can almost predict the sites that will come up on the first page: Lifehacker, makeuseof, techworld, computerworld, etc.  Nothing wrong with those just Bing has it's favorites.

Which made Mojeek refreshing, for those same searches I'd be getting second tier and some UK sites that actually have better written, more in depth articles.  Those were the gems I talked about.

>Monetize

No idea.  DDG started out using Amazon alone, then brought in Bing Ads, that seems to work okay as long as they don't track more than the query and don't overload the serps with ads. Qwant uses Bing ads I think.