How did Google blow all other search engines out of the water almost overnight? PageRank.
You’ve heard of it. It appears to be one of the most influential innovations in information retrieval and organization. But it’s also a widely misunderstood concept.
It’s a big part of how we ended up in this mess of Google being our search monopoly. It’s also ushered in a new way of sorting information (objects and relationships) in a variety of contexts.
PageRank has been repurposed for all sorts of network applications. But we’ll get to that below.
First, what PageRank is, and then, why you should maybe care.
The equation started with two Computer Science PhDs at Stanford developing the concept [original-ish paper]. The one sentence summary is that the authors, Sergey Brin and Larry Page, developed a mix of methods to return better search results with large amounts of pages by using weighted values of incoming links (PageRank) and indicators of relevance (like anchor text, big bold fonts, URLs, titles).
In that paper linked above, Sergey and Larry give a perfect description for understanding it as a layperson:
PageRank can be thought of as a model of user behavior. We assume there is a “random surfer” who is given a web page at random and keeps clicking on links, never hitting “back” but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the “random surfer” will get bored and request another random page.
If there was an army of zombies surfing the web, they would randomly click links on pages, eventually leading them to the most prominent pages as indicated by links to those pages. This would also be influenced by prominent pages that link to other prominent pages. Those most prominent pages would then get ranked with the highest scores assuming they met a threshold of relevance to the searcher.
PageRank is a way to quickly and accurately model the importance of pages as determined by their context in a network of pages (what we call the web).
This is why you always hear variations of advice around “more links, more better.”
The modeling part is key. It is a way to quantify value in the network patterns that emerge between pages using links on the web.
Think about it:
- As an expert you have your target subject matter.
- Experts you look up to paved the way for your progress.
- You do the same for the experts that come after you.
- You also have expert peers.
All of these relationships, influences, etc., are reflected in the structure of the web. That’s why as an SME, you should have a strong grasp of real world context of your expertise and the online context of your expertise.
But, as you know, more often than we’d like, credit does not get given where it’s due. Edison with the lightbulb. Franklin with electricity. Even PageRank itself was heavily influenced by a citation algorithm used to measure scientific influence in the 1950s by Eugene Garfield.
Research journal citations are a good model for network effects in academia:
A peer writes about something, citing an important source and putting their spin on it. You are prompted to conduct similar research, possibly with a different hypothesis. You add to the literature.
The more influential some research is, the more it is cited and influences further research. All of those citations add up to determine who wields the most influence in a field.
Why this matters for you
Real world influence leaves a footprint online, one that is an accurate representation of that influence.
Or it doesn’t.
When it doesn’t, which is often, it creates a gap between an important idea and the dissemination of that idea.
The ability to use arbitrage to exploit that gap, to make an idea sticky and portable, or increase its utility is a big part of how SEO was approached in the 2000s.
This principle still holds but the value created from that arbitrage, the extendibility of a good message, needs to be real.
Food for thought: The winners are not those who invent things or develop important concepts, the winners are those who popularize and maximize the impact of those things.
Understand power laws, understand authority
A scale-free network is one in which the distribution of number of connections between nodes follows a power law curve. Lots of real world phenomena follow power curves.
We have Google because Larry Page identified that Eugene Garfield’s algorithm for citations followed similar power law curvature to the web.
Now PageRank as a concept gets repurposed for dozens of brilliant applications from identifying security vulnerabilities to word rank algorithms used on natural language processing in machine learning.
How relationships shape our world, and respectively, how links shape the web
As mentioned, the web is, to an extent, a reflection of reality.
You link to your friend’s content. Your customers write you testimonials online. Large companies with lots of content get more traffic. Consolidation of power and wealth in the real world over time gets reflected online.
This also happens on social. You share and like things.
Those actions leave a stream of data in your wake that can be reverse engineered to understand your passions, preferences, relationships, or personality.
We find the most influential people in social networks by looking at who has the most friends, whose content gets most engaged with, shared, etc.,
These are all instances of power law in action.
The nature of search engines have also strengthened the curve. We must return a first result in a query so by definition there is a winner, a king of the hill. It’s a winner takes all thing.
Understand how your subject matter “network” looks and evolves online
Here’s where it gets even more interesting: link patterns vary significantly between industries, verticals, types of content, formats, whatever.
The rules apply but some industries are much more collaborative. Some are too boring for easy link-getting. B2B toilet widgets do not get linked to by bloggers. Visually appealing content is a link magnet. People love sharing pretty things. Interior design photos get linked like crazy and shared across social channels, even by competitors.
An Eric Ward example, as of ~2010, there were about three blogs dedicated to the study of bats, all other sites about bats linked to all three of those sites, and those sites mostly only linked to each other.
The way you, your audience, peers, “link” these days is still the primary indicator of credibility, influence, reputation, and pervasiveness that Google relies on.
The page link structure of the Internet follows this distribution naturally. Do nothing but write content and maybe someone of importance will discover you and share your content in a meaningful enough way that you get credit and reap the rewards.
Or have an answer for how you make your content portable, amplify it, and extend the reach of your content and move up in your power curve without sitting around and hoping for it to happen to you.
Articles referencing this one