Ranking basically means that other important, credible websites believe that you are one of them – an important, credible website – by linking to you. To harness the power of the community, you need to first contribute to the community, before they’ll take you seriously and tell their friends about you.
In this sub-section of the article on Link Graph and Information Retrieval, I’ll cover connectivity-based ranking, query dependencies, neighborhood subgraphs and the HITS algorithm.
What is Connectivity-Based Ranking?
Now that you’re familiar with graphs, their types and their application in structuring the web, let’s look at how the graph of the web is utilized by Google’s search algorithm.
The web is a massive graph where millions of documents are connected with each other via hyperlinks. We’ll start by looking at the bigger picture and later take a deeper look at PageRank.
This is what a Connectivity-Based Ranking system means?
The value (importance) of a document available on the web is determined by the number of links (directed and undirected), from other documents available on the web, that are pointing towards that document.
The Connectivity-Based Ranking system uses both Link Graph and the Co-Citation Graph in indexing, categorizing and ranking web pages.
Search Queries and Connectivity-Based Ranking
The concept of a Connectivity-Based Ranking system is intuitive and simple – that’s why it’s the foundation of almost every online search engine in existence. Now we can start making our way towards understanding PageRank.
The first stop along the way is to understand how Google determines the value and relevance of web pages based on search queries using the Connectivity-Based Ranking system.
Query-Independent Connectivity-Based Ranking
First, we need to determine the overall quality of a page in relation to all the other pages available in an index. PageRank is a Query-Independent Connectivity Ranking system that assigns a quality score to a page purely based on the quality (PageRank) of web pages that link a page to it.
Being query-independent means that PageRank alone is not a complete measure of the relevance of a page to a search query.
Think of PageRank as quality control that sifts through documents available on the internet to ensure that every webpage that’s indexed by Google and made available to be displayed in search results meets Google’s quality standards.
Query-Dependent Connectivity Based Ranking
Time to revisit elementary school mathematics. An index is a group of documents or a Set of objects organized based on their individual PageRanks. In order to deliver documents based on specific queries, this set is divided into small portions or subsets based on their relevance to the keywords (or phrases) used by the user.
This is an indegree or backlink based ranking model that connects the query-independent link analysis based ranking system with user queries by building subgraphs to include only those documents that are relevant to the user query. In geek-lingo, these subgraphs are called neighborhood graphs.
In simple terms; “you’re known by the company you keep”. The only difference is that if you’re not proactive, the company you keep will be decided for you (and this is like being picked last in gym class, nobody wants to be that guy).
Hub Score and Authority Score: The HITS Algorithm
Let me summarise what we know so far about connectivity based ranking: PageRank is a quality control system that initially determines if the webpage that you’ve published qualifies to be displayed in search results. What’s left for it to do is to determine where and for which queries should that document be displayed in search results.
The shortcoming of a link analysis based ranking system (including the neighborhood subgraph) is that it assigns a score to a webpage solely based on the number of links pointing towards it, without considering the quality of those links.
HITS algorithm is the next step that ensures that only the most relevant and high-quality pages are displayed based on a user query.
A Short Description of the HITS Algorithm
For any given query, the HITS algorithm determines the Hub Score and the Authority Score of each document available in the neighborhood subgraph. These documents are then ranked based on their respective Hub and Authority scores.
A document has a high Hub Score if it contains a high number of links leading towards high quality and relevant documents. Hub Score does not consider internal links.
So a webpage is a high-quality Hub if it recommends user to visit pages (via links) to other high-quality content.
A document has a high Authority Score if many other high-quality documents point towards it. Your webpage is considered to contain relevant and useful content if more and more web pages recommend users to visit it by linking to it.
The Real Reason Why Good SEO’s Are Needed
A ranking system based on Hub and Authority scores means that over time web pages that consistently connect to a large number of authoritative content will start ranking higher as high-quality Hubs. The same thing would happen for Authority Score of web pages when a large number of high-quality Hubs start linking back to them.
What this means that as an SEO your job is to ensure both that the information on your site is easily accessible by search crawlers and that the content being published on your site is valuable and gains high Hub or Authority scores.
Having said that, the HITS Algorithm neighborhood subgraph while performing a query-dependent analysis can only include a limited number of web pages in the subgraph. This means that your site, even with high-quality content, might not be included in that subgraph if its Authority or Hub scores are low.
That’s why a good SEO understands that in order to produce the best results every single element of Google’s link analysis and processing algorithm needs to be taken into account:
- Number of Inbound Links
- Number of Outbound Links
- HITS Authority Score (Quality of inbound links)
- HITS Hub Score (Quality of outbound links)