Clarification and Functioning of the Google PageRank Algorithm

# Comprehending the Google PageRank Algorithm: An Easy-to-Understand Overview

Whenever we employ a search engine such as Google to obtain responses to our questions, a sophisticated algorithm functions behind the scenes to deliver the most pertinent information. A fundamental element of Google’s search mechanism is **PageRank**, which was unveiled in 1998. This algorithm evaluates the **significance, dependability, and credibility** of web pages by scrutinizing their links. In this article, we will simplify the workings of PageRank.

—

## **What Is PageRank?**

PageRank is a pioneering algorithm crafted by Google’s creators, Larry Page and Sergey Brin. It gauges the importance of web pages by examining links leading to and from them. The algorithm employs the concepts of **querying and indexing** to assign scores to each page based on its relevance and authority.

### **Querying**
Querying involves interpreting a user’s search query. The algorithm enhances the search via these steps:

1. **Parsing** – Retrieving text from the input.
2. **Tokenizing** – Eliminating extraneous punctuation and whitespace.
3. **Removing Stop Words** – Discarding common words like “the,” “like,” and “that.”
4. **Stemming** – Converting words to their root form (e.g., “running” → “run”).

As an illustration, a query with “The Running Dog” would be distilled into **{‘run’, ‘dog’}** for enhanced search precision.

—

## **Indexing: Assessing Keyword Relevance**

After processing the user’s query, the search engine **indexes** pages considering keyword relevance. This involves two primary calculations:

### 1. **Term Frequency (TF)**
This indicates how frequently a keyword appears on a page compared to the most common word.

[
TF = frac{x}{y}
]

Where:
– **x** = Number of times the term appears on a page
– **y** = Frequency of the most prevalent term on the page

For instance, if the term **”run”** shows up **once** in a document where another term appears **4 times**, then:

[
TF = frac{1}{4} = 0.25
]

### 2. **Inverse Document Frequency (IDF)**
The IDF gauges how widespread or rare a term is across all documents. The less common a term is, the more significant it becomes.

[
IDF = log{left(frac{n}{n(i)}right)}
]

Where:
– **n** = Total number of documents in the collection
– **n(i)** = Count of documents that include the term

For the term **”run”**, if there are **4 documents**, and “run” is present in **2 documents**, the IDF would be:

[
IDF = log{left(frac{4}{2}right)} = log{2} = 0.301
]

### 3. **Relevance Score**
Multiplying TF by IDF yields a **relevance score for a word** on a page.

[
text{Relevance} = TF times IDF
]

For **”run”**:

[
text{Relevance} = 0.25 times 0.301 = 0.075
]

The greater the relevance score, the more vital that term is to the page.

—

## **Calculating the PageRank Score**

After scoring words based on their keyword relevance, PageRank evaluates how **authoritative** a page is according to its **link layout**. The ranking procedure adheres to **four guidelines**:

1. A page with a larger number of incoming links possesses higher authority.
2. If a significant page (Page P) connects to another page (Page X), Page X becomes more significant.
3. If Page P links to fewer pages, its impact on each linked page is amplified.
4. Pages that are accessible within fewer clicks are assigned a superior ranking score.

### **PageRank Score Calculation**

To establish the **weight of a link**, the algorithm employs:

[
w(pm) = frac{0.85}{n(pm)} + frac{0.15}{t}
]

Where:
– **w(pm)** = Weight page **P** receives from page **M**
– **n(pm)** = Count of unique links between pages **P** and **M**
– **t** = Total number of pages in the collection
– **0.85** and **0.15** are damping factors that consider random browsing patterns.

If **Page M** connects to **Page P**, this weight determines the level of authority conveyed.

For example, suppose the page **“The Running Dog”** is

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.