# Comprehending the Google PageRank Algorithm: An Easy-to-Understand Overview
Whenever we employ a search engine such as Google to obtain responses to our questions, a sophisticated algorithm functions behind the scenes to deliver the most pertinent information. A fundamental element of Google’s search mechanism is **PageRank**, which was unveiled in 1998. This algorithm evaluates the **significance, dependability, and credibility** of web pages by scrutinizing their links. In this article, we will simplify the workings of PageRank.
—
## **What Is PageRank?**
PageRank is a pioneering algorithm crafted by Google’s creators, Larry Page and Sergey Brin. It gauges the importance of web pages by examining links leading to and from them. The algorithm employs the concepts of **querying and indexing** to assign scores to each page based on its relevance and authority.
### **Querying**
Querying involves interpreting a user’s search query. The algorithm enhances the search via these steps:
1. **Parsing** – Retrieving text from the input.
2. **Tokenizing** – Eliminating extraneous punctuation and whitespace.
3. **Removing Stop Words** – Discarding common words like “the,” “like,” and “that.”
4. **Stemming** – Converting words to their root form (e.g., “running” → “run”).
As an illustration, a query with “The Running Dog” would be distilled into **{‘run’, ‘dog’}** for enhanced search precision.
—
## **Indexing: Assessing Keyword Relevance**
After processing the user’s query, the search engine **indexes** pages considering keyword relevance. This involves two primary calculations:
### 1. **Term Frequency (TF)**
This indicates how frequently a keyword appears on a page compared to the most common word.
[
TF = frac{x}{y}
]
Where:
– **x** = Number of times the term appears on a page
– **y** = Frequency of the most prevalent term on the page
For instance, if the term **”run”** shows up **once** in a document where another term appears **4 times**, then:
[
TF = frac{1}{4} = 0.25
]
### 2. **Inverse Document Frequency (IDF)**
The IDF gauges how widespread or rare a term is across all documents. The less common a term is, the more significant it becomes.
[
IDF = log{left(frac{n}{n(i)}right)}
]
Where:
– **n** = Total number of documents in the collection
– **n(i)** = Count of documents that include the term
For the term **”run”**, if there are **4 documents**, and “run” is present in **2 documents**, the IDF would be:
[
IDF = log{left(frac{4}{2}right)} = log{2} = 0.301
]
### 3. **Relevance Score**
Multiplying TF by IDF yields a **relevance score for a word** on a page.
[
text{Relevance} = TF times IDF
]
For **”run”**:
[
text{Relevance} = 0.25 times 0.301 = 0.075
]
The greater the relevance score, the more vital that term is to the page.
—
## **Calculating the PageRank Score**
After scoring words based on their keyword relevance, PageRank evaluates how **authoritative** a page is according to its **link layout**. The ranking procedure adheres to **four guidelines**:
1. A page with a larger number of incoming links possesses higher authority.
2. If a significant page (Page P) connects to another page (Page X), Page X becomes more significant.
3. If Page P links to fewer pages, its impact on each linked page is amplified.
4. Pages that are accessible within fewer clicks are assigned a superior ranking score.
### **PageRank Score Calculation**
To establish the **weight of a link**, the algorithm employs:
[
w(pm) = frac{0.85}{n(pm)} + frac{0.15}{t}
]
Where:
– **w(pm)** = Weight page **P** receives from page **M**
– **n(pm)** = Count of unique links between pages **P** and **M**
– **t** = Total number of pages in the collection
– **0.85** and **0.15** are damping factors that consider random browsing patterns.
If **Page M** connects to **Page P**, this weight determines the level of authority conveyed.
For example, suppose the page **“The Running Dog”** is