PageRank Citation Values: Bringing Order to the Web

mstlucky8072 · Post by **mstlucky8072** » Mon Dec 09, 2024 8:06 am

The importance of a Web page is, by its very nature, a matter of personal perspective, dependent on the interests, knowledge, and attitudes of readers. But there is still much that can be said about the relative importance of Web pages. This article describes PageRank, which is an objective and mechanical way of evaluating Web pages, effectively measuring what people are interested in and what interests them.

We compare PageRank using an ideal, random web crawler as a reference. We show how to effectively calculate PageRank for a large number of pages. And, we show how PageRank can be applied to search and user behavior.

This content is a translation of the PageRank article published by Stanford University in 1998 .

1. Introduction and Motivation
The World Wide Web presents many new challenges to the provision of information. It is very large and heterogeneous. Current estimates suggest that there are over 150 million web pages, and that number has doubled in less than a year. More importantly, web pages range from “What is Joe having for lunch today” to newspapers that provide information. In addition to this main challenge, search engines on the Web must also cope with inexperienced users and pages that attempt to manipulate the search engine ranking functions.

However, unlike “flat” documents, the World Wide Web has a employment database hypertext structure that includes significant additional information, such as link structure and link text, within the text at the top of the web pages. In this article, we use the link structure of the Web to generate a “significance” value for each web page globally. This ranking, called PageRank, helps search engines and users quickly make sense of the vast heterogeneity of the World Wide Web.

1.1 Variety of Web Pages
Although there is already a large literature on academic citation analysis, there are several important differences between web pages and academic publications. Unlike carefully scrutinized academic publications, web pages proliferate without any quality control or publication fees. With a simple program, a large number of pages can be easily produced, and with these pages, citation counts can be artificially inflated. Since the Web is home to a host of profit-oriented competitors, strategies evolve over time in response to the movements of search engine algorithms. Therefore, strategies developed regarding the copyability of websites are also susceptible to manipulation. In addition, academic articles are roughly similar in terms of quality and citations, as well as in terms of well-defined tasks and purposes. Web pages vary much more widely than academic articles in terms of quality, usage, citations, and length. A randomly archived message asking an incomprehensible question about an IBM computer is quite different from an IBM homepage. A research article on a mobile phone is quite different from an advertisement for a particular mobile phone provider in terms of distracting drivers. The average web page quality experienced by a user is higher than the average quality of a web page. This is because web pages are easy to create and publish, resulting in low-quality websites that users do not want to read.

There are many methods for how web pages can be separated from one another. In this article, we will focus on one in particular – an approach that looks at the overall relative importance of web pages.

1.2 Pagerank
To measure the relative importance of web pages, we propose a computational method called PageRank, which ranks web pages according to the general structure of the web. PageRank has applications in search, crawling, and traffic calculations.

The second section gives a mathematical definition of PageRank and provides some intuitive justification. In section 3, we show how we efficiently compute PageRank for 518 million hyperlinks. To test the use of the PageRank service in search, we created a web search engine called Google. We also showed how PageRank can be used for browsing aids in Section 7.3.

2. Ranking for All Pages on the Web

2.1 Related Studies
There have been numerous studies on academic citation analysis [Gar95]. Goffman [Gof71] published an interesting theory about how the flow of information in academic circles becomes a contagious process.

There has been a fair amount of work done on how to exploit the extensive link structures in hypertext systems like the Web. Pitkow recently completed his Ph. D. thesis on “Characterizing World Wide Web Ecologies” by conducting extensive link-based analyses [Pit97, PPR96]. Weiss evaluates cluster methods [WVS+96] considering link structure. Spertus [Spe97] discusses the insights that can be gained from link structures for various applications [Spe97]. The addition of a good visualization expectation to the hypertext structure was discussed in [MFH95, MF95]. Recently, Kleinberg [Kle98] developed an interesting model of the Web in terms of Hubs and Authorities. This model is constructed based on eigenvector computations in the co-citation matrix of the Web.

Finally, a library community showed their interest in what constitutes “quality” on the Internet [Til].

It is obvious that standard citation analysis techniques should be tried to be applied to the hypertextual citation structure of the web. It is possible to think of each link as an academic citation. Therefore, a large site like

The fact that the Yahoo homepage has a large number of backlinks shows that this is quite important. Indeed, most web search engines use the number of backlinks to bias their databases, highlighting higher quality or more important pages. However, the use of backlink count alone has some problems on the web. Some of these problems are inherent problems of the web that do not manifest themselves in normal academic citation databases.

2.2 Link Structure of the Web
Although calculations vary, the current picture shows that the crawlable Web has 150 million nodes (pages) and 1.7 billion vertices (links). Each page has a certain number of forward links (outer vertices) and backlinks (inner vertices) (Figure 1). We never know if we have found all the backlinks for a particular page, but if we download them, we can know all the forward links at that moment.

Figure 1: A and B are backlinks to C.

Web pages vary in the number of backlinks they have. For example, the Netscape homepage has 62,804 backlinks, while many pages in our current database have only a few backlinks. In general, pages with many links are "more important" than sites with few links. It has been speculated that simple calculations of citation counts could be used to determine future Nobel Prize winners [San95]. PageRank offers a more sophisticated way of calculating citations.