... but I haven't finished working my way through the detail and planning it out in my head...
https://patents.google.com/patent/US9165040B1/en
>planning it out in my head
Well, if anybody could do it...
>>the detail
Well, that's a lot of detail. More than I have time for this morning, but from reading the beginning, I'm guessing the essence is this
QuoteOne possible variation of PageRank that would reduce the effect of these techniques is to select a few "trusted" pages (also referred to as the seed pages) and discovers other pages which are likely to be good by following the links from the trusted pages....
Generally, it is desirable to use large number of seed pages to accommodate the different languages and a wide range of fields which are contained in the fast growing web contents. Unfortunately, this variation of PageRank requires solving the entire system for each seed separately. Hence, as the number of seed pages increases, the complexity of computation increases linearly, thereby limiting the number of seeds that can be practically used.
So solving that problem makes it much easier to expand the number of seed pages and to update the ranking data frequently.
What are you thinking is the significance?
- harder to spam?
- drop in brand authority?
- ????
> What are you thinking is the significance?
> - harder to spam?
> - drop in brand authority?
Realtime black and white bird related.
I've not gone beyond the abstract, but this sounds much like what we've known as TrustRank for the last 10 years or so. Is that an oversimplification?
Interesting...
QuoteFiling date 2006-10-12
Publication date 2015-10-20
Grant date 2015-10-20
Long time from filing to granting, which might be that they designed this some time ago and now have a use for it (Jason's real time b&w bird)
>> sounds much like what we've known as TrustRank
That was my thought too.
If that's true I don't think it would change much of what I've already been doing / trying to do.
> Is that an oversimplification ?
No. I don't believe it is and there are lot of similarities with Yahoo's Trustrank and even Majestic's patent but there are also subtle differences too.
the principles though..... trustrank through and through, at least that's what I believe so far...
Quote from: Adam C on November 11, 2015, 11:21:30 AM
I've not gone beyond the abstract, but this sounds much like what we've known as TrustRank for the last 10 years or so. Is that an oversimplification?
As they say in the Prior Art section, they have known for quite some time that you can have seed pages and use the distance from those seed pages to establish trust/authority. This does not change that (and as you point out, this is math that is 10 years old anyway).
But doing so requires solving for the entire index at once, which means that complexity increases exponentially as you add seed pages. So they number of seed pages has always been small.
This method allows for a much larger set of seed pages and for calculating on the fly. That's why I thought it would
a) make things harder to spam - more seed pages means that getting a good link on the graph is harder because you're being looked at from more angles. So if you manage to sneak a link into the neighborhood of one seed page, this makes that link count less.
b) reduce importance of brand. My idea there was that with a broader set of seed pages, you might be able to return some balance to long tail searches. You're not as dependent on a small number of seed pages, so you should widen your bell curve.
I would guess this would be related to Hummingbird in the sense that to take advantage of this new technique, you need a more agile system that is better at updating on the fly. So I would guess it was math like this that drove the need for a system like that.
So it may well be that though the math has existed for 10 years, it wasn't until Hummingbird they were able to roll it out. Who knows?
I'm a bit lost on the black and white bird comment.... all the black and white birds I know are woodpeckers...
http://images.clipartpanda.com/penguin-clip-art-yckB78ocE.png
Penguin... obviously... mind not working this morning.
As a naturalist, my mind goes to the birds and bears I see every day, not pandas and penguins.
Yes, I would think that this patent has a lot to do with penguin updates.
> my mind goes to the birds and bears I see every day
I wish I had an office there, it sounds glorious :)
>10 years
back when DMOZ would have been the seed list
> 10 years / DMOZ
Exactly and I completely agree.
I think the overwhelming question will be (and I'm still reading it on and off and trying to get my head completely around it) what would be the list today working on the basis that it could always be in a state of flux and constant change? ...
The secondary question would be, How to get on the list?
I'm pretty sure it's going to be a tough question(s) to answer, but... you never know and luckily i feel that BoL and I could be in a pretty good position to give a very good answer to them, that although may not be perfect, could be close enough to be practically useful.
>>what would be the list today working on the basis that it could always be in a state of flux and constant change?
>>The secondary question would be, How to get on the list?
Hmm... food for thought...
If we start with an assumption that to rank for something meaningful today you need to a relatively short connection to trusted seed(s), then today's high ranking sites are our route to becoming tomorrow's high ranking sites
OR, maybe more specifically
The pool of sites (that themselves rank in their niches) that link to high ranking sites in my niche is the pool of sites I want to link to me
---
or conversely - if it ranks for nothing, its no good to me.
> to rank for something meaningful today you need to a relatively short connection to trusted seed(s),
Agreed
> then today's high ranking sites are our route to becoming tomorrow's high ranking sites
Agreed
> The pool of sites (that themselves rank in their niches) that link to high ranking sites in my niche is the pool of sites I want to link to me
Agreed, but hasn't G always shown us the way in that manner. Sites that rank are the best sites to gain links from.
> if it ranks for nothing, it's no good to me.
Not agreed. I don't believe that this is true. A seed site, is likely to be a site that links to others prodigiously and based on amazing editorial intent. It may not in and of itself deliver content worthy of ranking.
Dmoz wouldn't rank nowadays but Wikipedia does. The major difference, if we consider DMOZ to still be at it's old dizzying heights, is that Wikipedia itself has (overall) extremely high editorial standards with content. Dmoz didn't and hence wouldn't rank today.
I see the links alone from Wikipedia as being a version of DMOZ back then. The links are still valid on their own but without the content it won't rank.
I also believe each vertical would have it's own seed sites. When I say vertical I mean quite broad and not a specific keyword niche.
Keywords feed into industries that feed into verticals.
EG.
Texas Holdem Poker ---> poker --> table games ---> gambling --> entertainment.
Where would the seed sites be in the above example?
I see them at all the locations in the mix but the ones at gambling and entertainment are most likely to be closest to what the ultimate seed lists that used to be used, but nowadays texas Holdem Poker, may have a seed list of licensed operators for each country. So the operators and gamblign companies become the seed list for the ultimate phrases of poker and texas holdem, whereas if you go up one level it becomes the Gambling Commission
EG in the UK it would be located at:
https://secure.gamblingcommission.gov.uk/gccustomweb/PublicRegister/PRSearch.aspx
Being listed there, gives you the chance to rank as you are close, or in actual fact, linked to from the seed list for the UK in that key phrase, but it is a prerequisite to ranking.
Good old fashioned link love itself them takes over to deliver the rankings. The seed links simply allow the opportunity to rank...
hmmmm, interesting, thanks Jason!
QuoteI believe this to be extremely important....
I believe this to be extremely relevant....
QuoteThe biggest advancement with RankBrain, though, is in how it deals with the quantity of content it analyzes in order to create the vectors. It seems bigger than the classic "link anchor text and surrounding text" that we always considered when discussing, for instance, how the Link Graph works.... In the patent, huge importance is attributed to context and "concepts," and the fact that RankBrain uses vectors (again, "vast amounts of written language embedded into mathematical entities"). This is likely because those vectors are needed to secure a higher probability of understanding context and detecting already-known concepts, thus resulting in a higher probability of positively matching those unknown concepts it's trying to understand in the query.
https://moz.com/blog/rankbrain-unleashed
Got a flashback to Teoma there.
Somewhat related, remember a good few years back G released a dump of all uni, bi, tri and quadgram words. Since then they've incorporated Freebase (a large chunk of the knowledge graph).
Theming was always a 'year away' wasn't it. Seems fairly safe to say Google has a fairly good handle on contextual relatedness (for topics there's lots of data for).