Talk:PageRank

From Wikipedia, the free encyclopedia

This is the talk page for discussing improvements to the PageRank article.

Article policies
This article is within the scope of WikiProject Internet, an attempt to better organise information in articles related to the Internet. For more information, visit the project page.
B This article has been rated as B-class on the class scale.
High This article has been rated as high-importance on the importance scale.

Contents

[edit] Criticism section is missing

The title should say it all. Not everybody thinks Google's page ranking system is a good thing. KSM-2501ZX, IP address:= 200.155.188.4 (talk) 23:30, 10 March 2008 (UTC)

I don't see how any "criticism" can come about from the Pagerank system, in theory that should be posted directly into the summary of Pagerank on Google or Google Search. For future reference you post your new discussion entries at the bottom. If you "insist" on doing criticism section for this article please provide it here before altering the article. SDSandecki (talk) 23:56, 10 March 2008 (UTC)
Certainly there is criticism of PageRank: some argue its use of inlinks is misleading (a link doesn't necessarily equal a "vote"), others argue it "makes the popular become more popular" and suppresses new voices, etc. I'll work on finding proper sources. --ZimZalaBim talk 15:54, 11 March 2008 (UTC)
Yes, but this is why they introduced the rel=nofollow attribute. This way you can provide and outbound link and not pass on any Pagerank or SEO value towards that link. I understand criticism exists, just not in the form I think would be appropriate for the article. However I could be wrong; I'm interested in seeing what comes from this, if you need any help finding references or citations let me know via my talk page. SDSandecki (talk) 17:24, 11 March 2008 (UTC)
If you do not understand why "not everybody thinks Google's page ranking system is a good thing" it's because you do not know how Altavista worked before the birth and rise of Google. As for editing any Wikipedia articles, no thanks, I'm satisfied with merely suggesting possible improvements to them. You see, I don't wanna take the risk of seeing a well-documented and well-sourced argumentation be promptly deleted only because it won't fit on the overbiased agendas of certain guardians of Wikipedia. KSM-2501ZX, IP address:= 200.155.188.4 (talk) 15:51, 11 March 2008 (UTC)
Just because people don't think it's a "good thing" doesn't mean you need to add a criticism section about it. In theory you can make a criticism section about 90% of all the articles on wikipedia. You shouldn't assume what someone does or does not know; wikipedia is about providing encyclopedic information, not bias "i don't like this because" information. I'll wait on User:ZimZalaBim to see his contributions. SDSandecki (talk) 17:24, 11 March 2008 (UTC)
wikipedia is about providing encyclopedic information, not bias "i don't like this because" information. That's just and only the "official speech" of course. Not all readers of Wikipedia are as ignorant as many of its editors would like us to be. KSM-2501ZX, IP address:= 200.155.188.4 (talk) 18:53, 11 March 2008 (UTC)
You seem to have a zero "neutral point of view" on this subject for some strange reason. When someone can follow the rules and create a critisim section I'm all for it; not based on their own unique bias opinion Wikipedia:NPOV. Any reader can be an editor. To claim the editors want readers to be ignorant is far from the truth. SDSandecki (talk) 19:11, 11 March 2008 (UTC)

[edit] Opening comments

talk:Wikipedia Announcements says that wikipedia has a PageRank of 0.7. How does one discover this PageRank?

If you have a Google toolbar add-on to your browser the PageRanks are indicated within it.
Since the Google toolbar is Windows/IE only, another trick is to search the Google Directory.
But there is a Google toolbar for Firefox, although I don´t know if its just for Win...
You can also go to http://www.mygooglepagerank.com and put in the url for which you want the rank.
What is the source for the second formula when PageRank is divided by N? seems to be incorrect. —Preceding unsigned comment added by Itman (talk • contribs) 13:52, 12 March 2008 (UTC)
Ah, sorry, it is simply obtained by dividing by N. Perhaps, that should be noted. —Preceding unsigned comment added by Itman (talk • contribs) 14:22, 12 March 2008 (UTC)

[edit] More

A common theory for why this is is because the Wikipedia is very interconnected, with each article having many internal links from other articles, which in turn have links from many other sites on the Web pointing to them. Compared to Wikipedia, and similar high quality content-rich sites, the rest of the World Wide Web is relatively loosely connected.

This cannot be correct. Simply being more tightly woven with links cannot increase the total flux of links, which determines the PageRank of pages within a site. More plausible explanation would be:

  1. Wikipedia has a high ratio of internal site links to external links
  2. Wikipedia as a whole has a very high PageRank
  3. Some of Wikipedia's highly ranked competitors (eg xrefer) have broken Google indexing of their entries (actually, xrefer is no longer free as in beer at all)
  4. One of google's other algorithms (a lot more goes into their rankings than just PageRank) is favouring wikipedia -- maybe they simply decided "let's boost wikipedia" :)

--Pde 18:51 17 Jun 2003 (UTC)


I think it probably is correct - depending on how Google seeds it's pagerank. In the pagerank paper, they mention two possible seeds: a uniform allocation to all top-level webpages, and a uniform allocation to all pages. The former makes sense, because it bootstraps off the domain name system, making attacks on pagerank costly. (To increase your pagerank, you'd have to buy lots of domain names). The later means link farms (which wikipedia closely resembles ;) get a high pagerank.

In any case, most links are internal, so not much "energy" leaks out. So in either seeding system, wikipedia would do quite well. For a more detailed discussion, see the work by Monica Bianchini and friends.

Clausen 01:33 18 Jun 2003 (UTC)

It's definitely not correct. A high level of internal interlinking will do nothing except "average out" the pagerank amongst the linked pages. If a set of pages has a higher than normal average pagerank then it's either because it has a higher number of incoming than outgoing links and/or those incoming links have high rank. See the examples starting at "Fully Meshed" at http://www.ianrogers.net/google-page-rank#ex7

SirGroane (talk) 02:45, 31 December 2007 (UTC)



The link to "PowerPoint HTML presentation by Larry Page" is dead

epsalon 15:22, May 5, 2004 (UTC)


I've deleted the list of sites that have at some point had a high PageRank. It's subjective, highly non-comprehensive, will never be up to date, and we've got external links to sites that do the same, only better. -- ALargeElk | Talk 10:15, 28 Jul 2004 (UTC)

[edit] New Link?

Hey all, I just wrote an article on PageRank and I was wondering if I could get it added here.

http://plainbeta.com/2007/11/10/the-quest-for-pagerank-how-it-works/

//brianpurkiss —Preceding unsigned comment added by 208.117.69.100 (talk) 04:01, 11 November 2007 (UTC)

Is my request going to be answered? Or ignored? //brianpurkiss

Not sure I see what your article adds that the Wikipedia entry doesn't already explain. --ZimZalaBim talk 02:20, 29 November 2007 (UTC)

[edit] Add a link to my paper: The Cost of Attack of PageRank?

Hi all,

Does anyone want to add a link to my work on PageRank:

members.optusnet.com.au/clausen/reputation

I think adding it myself would be too much self-promotion. Contributions: survey of all the different PageRank formulas around (lots of mistakes!), proof of convergence, analysis of cost of attack. So, if you think the interested reader should know about it, add it yourself...

--Clausen 21:50, 20 Sep 2004 (UTC)

[edit] How many?

If the "PageRank" as it's showed in the google toolbad is 5/10, is there a way to fiture out how many of the currenctly 8,058,044,651 pages that link to that page?--Jerryseinfeld 22:32, 3 Jan 2005 (UTC)

No. It could have one link from a page with PageRank of ~6, or it could have 100000 links from pages of PageRank 1. It also depends on how many outgoing links it has. (Outgoing looks reduce PageRank) Read my thesis! http://members.optusnet.com.au/clausen/reputation --Clausen 12:25, 29 May 2005 (UTC)

Contrary to what Clausen posted, it is possible to see how many links a page has with the Google query: link:http://www.example.org/page.html On the right of the blue bar at the top of the page it gives an approximate number of links the page has. Scott 18:51, 25 October 2006 (UTC)

[edit] expected number of clicks

Removed This happens to equal t − 1 where t is the expectation of the number of clicks (or random jumps) required to get from the page back to itself. This were true only if the algorithm would be seeded with the unit vector of the page. --Tgr 10:51, 29 May 2005 (UTC)

Actually, you can interpret non-uniform seed vectors as a probability distribution over the web pages the random surfer "jumps" to when it "gets bored". In this interpretation, the "random surfer" interpretation of t still holds. --Clausen 12:24, 29 May 2005 (UTC)

Yeah, I misunderstood the point the article made. Reverted to original. --Tgr 22:51, 29 May 2005 (UTC)


[edit] Google PageRank Checkers

I note that someone recently added a link to a web-based Google PageRank Checker, and... I, personally, think such links ought to be banned from the parent article. For one, since the source code is easily available and since there are already more than a few already available, linking to them constitutes little more than an attempt at vanity, imho. For two, if anything about web-based Google PageRank checkers is to be mentioned at all, I think the related link ought to be to the source code - not to some script that someone wrote just to promote themselves. TerraFrost 05:04, 13 October 2005 (UTC)

I agree. Also wouldn't mind seeing PageRank Explained For Idiots and A good list of pagerank 10 sites go. Aapo Laitinen 16:10, 13 October 2005 (UTC)
  • I removed one page rank checker today (along with a cull of other external links a few days ago). We still have a lot of external links here that I think could go.
  • I agree, but disagree. I would say that it's stupid to have a bunch of links to PageRank Checkers. As TerraFrost said, the source code for that is easily accessible - however, not everyone knows how to make use of it. I think there should be links to the source code (and possibly an explanation on how to use it), and then a link to one or two PageRank Checkers.
Here's my favorite: http://www.iwebtool.com/pagerank_checker
//brianpurkiss —Preceding unsigned comment added by 208.117.69.100 (talk) 04:13, 11 November 2007 (UTC)

[edit] Warning

I've noticed this page gets link spam alot. Would a html comment help things at all?--Eugman

[edit] What is patented? What is public knowledge?

The article says, near the beginning, that "The exact details of this scale are not public knowledge", and "The PageRank process has been patented (U.S. Patent 6,285,999). The patent is not assigned to Google but to Stanford University."

However, later on the article explains in detail how the pagerank is calculated. As a joe reader I ask myself: what is patented? How can this not be public knowledge if it's described below?

Is it possible to state clearly if the "PageRank process", which is patented, is something different than the "PageRank algorithm", which seems public knowledge, and where the differences are?

From what I've read we know the basic idea behind it(this has been patented by stanford) but google has undoubtedly changed some stuff around and we don't know the details of that.--Eugman 17:03, 14 January 2006 (UTC)

It should be noted that there are numerous derivations of the PageRank algorithm which have been published in various academic papers (a search on scholar.google.com for expressions including or related to pagerank will reveal many of these papers, or their abstracts). It is my understanding (but I am no patent attorney) that the patent applies to the ranking mechanism itself, and not to specific algorithms derived from or based upon it (but that those algorithms may be governed by the patent to some extent).

Yahoo! has invested a great deal of resources into researching and refining PageRank in conjunction with Stanford University. It should be understood that PageRank, while closely associated with Google in the public view (or at least the general SEO community view) is being or has been adapted or studied by more than one search service.Michael Martinez 19:36, 14 February 2006 (UTC)

[edit] bobmutch edits

Hi all. I am a SEO consultant, write on Pagerank, . I have added a "Real RageRank" and "Google Directory PageRank" section to the PageRank entry. I also have added a number of quality links to the "External Links" section. I am open to feedback on any of the edits I have made. I plan on making some more edits to this section and adding some more links in the future.--bobmutch 14:55, 11 February 2006 (UTC)

Please don't do that again. Your changes should be discussed here first, as you introduced a lot of confusing information that doesn't bring any value to the article. It looks like you threw in a lot of undesirable links, too (such as the one to Phil Craven's paper, which does a very poor job of explaining PageRank).Michael Martinez 04:19, 13 February 2006 (UTC)

I notice today that someone posting from an Australian IP address added two commercial links to the External Links section and also removed the name of Bob Mutch's company (seocompany.ca) from his comment above.Michael Martinez 19:32, 14 February 2006 (UTC)

I think Mark's, Ian's, Phil's, Chris's and Wakfer's articles are on PageRank for good articles. I will like to see the links added in the link section.

I also would like to add the following paragraph on Google Directory Pagerank.

Google Directory PageRank

The Google Directory is a 8 unit measurement. These values can be viewed in the Google Directory. Unlike the Google Toolbar which shows the PageRank value by popup the Google Directgory doesn't show the PageRank values. You can see the PageRank scale values by looking at the source and wading though the HTML code.

These 8 positions are displayed next to each Website in the Google Directory. cleardot.gif is used for a zero value and a combination of two graphics pos.gif and neg.fig are used for the other 7 values. The pixel width of the 7 values are 5/35, 11/29, 16/24, 22/18, 27/13, 32/8 and 38/2 (pos.gif/neg.gif).

I am posting these 5 links and 1 section up for discuss for 2 weeks before I go and add them to the Pagerank section. If any one else doing addtions on the Pagerank section has any objections or can see how the Google Directory RankRank section can be made better please post up and lets discuss. --bobmutch 13:43, 01 March 2006 (UTC)

Ok I put this section up for discuss for 2 weeks and then some. Is there any opjects to me adding this up to the Pagerank section. Speak or for ever hold there piece. --bobmutch 18:11, 18 March 2006 (UTC)

I would like to add a page that shows the history of Pagerank updates for the last 6 years. Does any one object to me doing this. Pagerank Update History

--bobmutch 19:00, 23 October 2006 (UTC)

I my opinion there are two arguments against it: seocompany.ca is not the original source of Google's update history. The original (as far as I know) was posted on WebmasterWorld. Unfortunately, the access to WebmasterWorld isn't free anymore. Also, there might be some interests in the PageRank update history years ago when PageRank, PageRank shown in the toolbar, backlinks and SERPs were updated at the same time. However, today PageRank is calculated continuously. The benefit of the update history of PageRank shown in the toolbar (in fact it isn't a history of PageRank update any more) is negligible. --Doc z 10:11, 24 October 2006 (UTC)

[edit] Pagerank 10 sites list link

Someone keeps adding this link back in. The site purports to track Web sites with Toolbar PageRank values of 10, but it is a commercial site offering Web promotion services. The link should not be allowed as it is not significant and clearly violates the "no commercial links" standard which has been enforced for this article. Michael Martinez 19:32, 14 February 2006 (UTC)

Hey Michael,

you should know this link has been here for more than a year, some people occasionally remove it and automatically gets replaced, Search engine genie is 80% non commercial site, read the blog section here httpblog, I respect wikipedia and has worked for its growth a lot, Please dont mistake me searchenginegenie's link is no where else in wikipedia we respect this place and never ever thought of inserting out links.

The problem is that the list encourages PR chasing and the presence of the link encourages SEOs to link drop. Neither of which has anything to do with explaining what PageRank actually is or does. That the link has been dropped repeatedly over the past year doesn't make it more vital.Michael Martinez 15:52, 15 February 2006 (UTC)

Ok Michael I appreciate your post and your willingness to keep wikipedia clean, But I bet you are a long way away from it, There are 1000s of pages spammed across wikipedia everyday, even I make some major edits but those links are replaced immediately, I just saw your post at seomoz, I am not new to internet nor new to blogging or new to seo have been in this business for more than 4 years now, i feel you are not happy with someone in a forum probably in highrankings forum?? , Rand has given a nice place to rant go on,

Infact this pagerank 10 list was one of the two lists first available in internet, Looking at our logs people searching for pagerank 10 are topping the list, We get about 30 uniques just for PR 10 related keyword combinations from google alone, Also we get a lot of traffic from wikipedia too, SO people are actually interested in looking at this article so why you feel its irrelevant,

RESPONSE: You feel incorrectly, then, when you say "i feel you are not happy with someone in a forum probably in highrankings forum??" I'm not editing articles here because of anything anyone says or does in an SEO forum.

People who pay attention to the Google Toolbar PR are not only wasting their time, they are chasing what Mike Grehan has aptly called "green fairy dust". Including a link to a PR 10 chase site isn't helping anyone understand the PageRank algorithm. Yes, people are interested in all sorts of SEO myths and nonsense. That doesn't mean every one of those kooky ideas should be advocated by an open source site like Wikipedia.

I'm very critical of Wikipedia precisely because of these kinds of low standards. I do my best to help educate people, but I cannot prevent people from hanging on to bad ideas. PR chasing, fortunately, has taken on a bad reputation over the past couple of years. That's a positive change in the SEO community.Michael Martinez 22:39, 15 February 2006 (UTC)

Response:

People interested to know about the list of pagerank 10 sites cannot be related to PR chasing, So you mean everyone looking at Alexa ratings are wrong??, People are always in lookout for various means to gauge the quality of a page or a site, Google's pagerank is one of them atleast PR8 and above cannot be abused easily, Only very good sites can reach PR9 or PR10 and there are only about 30 unique PR 10 sites from millions of sites around the web,

Response: First, please sign your posts with your name if you're not going to log in.

Secondly, there is no SEO benefit to looking at a list of pages with PR 10 ratings and the raw numbers of backlinks.

Third, Alexa rankings are as bogus as PageRank. Both can be manipulated and only SEOs are interested in either, and only SEOs who have spent too much time following the wrong forum discussions, reading the wrong FAQs, tutorials, etc.

Neither toolbar PR nor an Alexa ranking is any indication of the quality or value of a page. There are a growing number of academic/technical papers which address the problem of falsely inflated high PR. Look at TrustRank and its followon concepts such as spam mass estimation. High PR is just not taken seriously. So there is no value to the PR 10 page.Michael Martinez 04:46, 16 February 2006 (UTC)

I suggest that the list of PR10 pages be added in the list of links. It fosters an understanding of the Toolbar Pagerank 0-10 scale by show how few Website home pages or Website pages reach PR10. It also shows how the relationship scale between Real Pagerank and Toolbar Pagerank is changing.

I also think it would be helpful to write a short paragraph on the relationship between Real Pagerank and Toolbar Pagerank and how this relationship is changing.--bobmutch 13:53, 01 March 2006 (UTC)

  • I'm not sure exactly what the point of the above discussion about external links is. Are you saying the article should not link to any site listing PR10 sites? Or that the particular site(s) being added are overly commercial? There is a page here which lists PR10/9 sites and is relatively clean (only a few ads off to the side). Actually it is based on a page that used to be on Wikipedia... would it be okay/useful to add this page? Scott 01:01, 14 March 2007 (UTC)

[edit] "Sink pages" description should be removed?

When calculating PageRank, pages with no outbound links are assumed to link out to all other pages in the collection. Their PageRank scores are therefore divided evenly among all other pages. In other words, to be fair with pages that are not sinks, these random transitions are added to all nodes in the Web, with a residual probability of usually d = 0.85, estimated from the frequency that an average surfer uses his or her browser's bookmark feature.
So, the equation is as follows:
PR(p_i) = \frac{1-d}{N} + d \sum_{p_j \in M(p_i)} \frac{PR (p_j)}{L(p_j)}
where p1,p2,...,pN are the pages under consideration, M(pi) is the set of pages that link to pi, L(pj) is the number of outbound links on page pj, and N is the total number of pages.

That bit seems wrong as L(pj) equals 0 for "sink pages"!

Also, the phrase are added to all nodes in the Web is proposing that the full equation is

P\!R_i = \frac {1-d} {N} + d \, \sum_{\forall j \in \{(j,i)\}} {\frac {P\!R_j} {C_j}} + d \, \frac {number of sink pages} {N}

That paragraph, and everything that depends on it, should to be removed!

SirGroane (talk) 04:13, 31 December 2007 (UTC)

[edit] Mathematical points to clarify/add

There are some points in the mathematical parts which should be changed/added. In the current article PageRank calculation is described as an eigen value problem ("The PageRank values are the entries of the dominant eigenvector of the modified adjacency matrix.") - not only in case of a damping factor of one (q=1) but also for a lower value. This point of view is used very often in the context of PageRank calculation. However, the simpler point of view is seeing original PageRank equation as the Jacobi-Iteration of a system of linear equations. Doing this, Pagerank

- is not defined recursively

- can be calculated analytically, i.e. no iterations and no initial guess is needed. Of course, for larger matrices this isn't a practical way, but for smaller systems the equations can be solved easily.

- can be computed numerically not only by the Jacobian-iteration which leads to the well-known PageRank formula but also with other iteration schemes such as minimal residue, Gauss-Seidel, over-relaxation methods, conjugate gradient, preconditioning methods, multigrid and blocking techniques.

Also there are no problems for page with no outgoing links. IAnyway Jacobi-iteration should be mentioned in the article. This should be the origin for this kind of iteration.

Two other points (minor changes) I would correct/add:

- I would use d instead of q as damping parameter (as done in the original papers). Anyway I would mention the parameter also in the text, e.g. 'The damping factor q is subtracted'

- I would add a short section about modifications, i.e. Personalised PageRank, Topic-Sensitiv PageRank ...

I won't correct the article because I'm not a native speaker. However, someone take this a input and change it.

I will add link to a page where the mathematics is described in the way I mentioned above, i.e. system of linear equations. From my point of view this is much straighter, easier to understand and mathematically clearer. Also mathematical details can be taken from this page. So far I haven't seen any other page explaining PageRank in this way.

--Doc z 12:55, 25 March 2006 (UTC)

[edit] More mathematical points to clarify or add...

This probability is allowed for through a damping factor

Is that the probability, at any step, that the person will continue, or the probability that he will stop, or something else?

Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85.
PR(A)=\left( \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)}{L(D)}+\,\cdots \right) q + 1 - q

Is "q" supposed to be the damping factor? If so, it should say so. Above, one could rephrase so that instead of saying "the damping factor will be set around 0.85", it would say "the damping factor q will be set around 0.85".

As Google increases the number of documents in its collection, the initial approximation of PageRank decreases for all documents.

If a million new documents are added and ALL of them link to page "A", does the initial approximation of A's PageRank decrease? Of should it say the AVERAGE PageRank of all old document decreases, rather than that all of them, unanimously, decrease?

Michael Hardy 23:08, 27 March 2006 (UTC)

q is the probability that the random surfer is picking up a randomly chosen link from the current page; (1-q) is the probability that the random surfer re-starts with a randomly chosen page.
Indeed q is the dampening factor. However, normally it is denoted by d
Originally, d was set to 0.85. The current value is unknown, but it is expected that Google is using a value around 0.85.
One should be careful when saying something like "As Google increases the number of documents in its collection, the initial approximation of PageRank decreases for all documents" because it depends on the normalisation. The statement is referring to the case of the "1/N" normalisation. Indeed, even for this normalisation PageRank of a single page might increase when adding new pages as explained above.
--Doc z 09:46, 28 March 2006 (UTC)
Forgive my ignorance, but I don't see how a dampening factor works at all. If every page has a dampening factor of .85 or 8.5 or 85.0, then every other measure is scaled according to the magnitude defined by d and no ordinal changes occur with any page relative to any other page. This probability that a random surfer will cease surfing or accounting for pages without links seems to be a distraction. How is it a page with 3 million hits a day like New York Times but with few inbound links to its home page nytimes.com will be fairly measured? Google rightly protects its algorithms just as it kept its earnings secret in early years to ward off competitors. So it is not unreasonable to question the math and logic of search.
Pages don't have a damping factor - just links are damped. Changing the damping factor not only leads to re-sclale of PageRank (due to the PageRank source of 1-d) but a different distribution. The PageRank algorithm is published and well-known. Of course, Google changed the algorithm they used for ranking pages, but this article simply desbribes the original PageRank algorithm. You can question the logic of Google's search but you can't question the math of this algorithm. --Doc z 09:28, 29 August 2006 (UTC)
Okay, links have dampening factors, that part I understand. However if the universal dampening factor is consistently .85, then everything is reduced by 15%, yes? How does re-scaling everything downward by 15% cause a redistribution? The distribution would still have the same shape relative to the mean and the first standard deviation, yes? I guess I do question the logic not the math, because why a dampening factor is included at all doesn't make sense to me. The probability that a surfer will continue on to another random page after reaching a random page should not have a role in determining the authority of any random, or non-random page, should it? The probability that a surfer will continue to another page after landing on google.com must be about 1000%, since all that's there is a search box, but logic would say an authoritative page would end the surfer's search, a dampening effect of 1.00. Thanks for your time in answering this.192.246.0.76 13:35, 29 August 2006 (UTC)
'However if the universal dampening factor is consistently .85, then everything is reduced by 15%, yes?' No - there is a self contribution of every page of 1-d! Changing the self contribution from (1-d) leads to a re-scale of PageRank but not changing the damping factor. For example, in the limit d=0 PageRank is distributed uniformly, i.e. every page has the same PageRank. You can also see that the distribution is changed when changing d if you consider the random surfer model. A random surfer chooses randomly a link on the current page with probability d or a new page (which means a page independently if it has an incoming link from this page or not) with probability 1-d. In case d=1 the surfer always follows a link on the current page, while for d=0 he is always randomly chooses a new page (from the whole internet). (Of course, this is just a simple model of the user behaviour.) --Doc z 17:55, 29 August 2006 (UTC)
You said, "Changing the self contribution from (1-d) leads to a re-scale of PageRank but not changing the damping factor." which sort of agrees with what I said, I think. If (1-d) is always (1-.85), then why can it not just always be .085 or 8.5 or 85.0? If d is one number for all instances, then it isn't a variable, by definition, yes? If it isn't a varible, then why is it included at all? If it is a variable, how does Google decide which site gets a .85 and which gets a .01? If it is the probability that a user will select a link on the current page seen vs. not, then are the passages in the main page that talk about the damping factor being a measure of randomness inaccurate? Thanks. 192.246.0.76 19:31, 29 August 2006 (UTC)
This is the (recursive) PageRank formula
P\!R_i = 1-d + d \, \sum_{\forall j \in \{(j,i)\}} {\frac {P\!R_j} {C_j}}
with
0 \le d \le 1
Using a different normalization as for example
P\!R_i = \frac {1-d} {N} + d \, \sum_{\forall j \in \{(j,i)\}} {\frac {P\!R_j} {C_j}}
leads to a re-scaling of PR. However, if you change d from 0.9 to 0.8 the distribution change. d is a parameter which can be changed, but it doesn't depend on the page. This means that Google might choose d=0.9 for the current calculation of PageRank, but they used d=0.85 some years ago (as an example). In any case, the same damping factor is used for all pages. To get the best value, one is calculating PageRank for different d and compare the results. By the way someone might change the phrase Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85[citation needed]. in the article. Sergey Brin and Lawrence Page said (The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Computer Networks and ISDN Systems 1998) that We usually set d to 0.85. This means that thry used d=0.85 in that work. Most of the people believe that d is still close to that value. (Of course, the is no source for the latter statement.) --Doc z 20:39, 29 August 2006 (UTC)
Thanks for your analysis. I am a bit surprised at the simplicity of even the basic PageRank algorithm. It must be a challenge to implement though, given that N is the total collection of all pages crawled and included by Google. (1-d)/N is a really, really small number. When d=.85, then (1-d)/N is like .15 divided by billions of pages, and that tiny number is computed for all links leading into a page.
My interest in this problem began when I would key in three or four unlikely strings into to a search box to pull up a file I KNEW was on the web, and neither Google nor Yahoo! would place the paper at the top of the list. So if you key in a list of researchers who authored a paper, a popular book that cites the paper would land in position 1 and the paper itself wouldn't even make it onto the page. Good luck to us all.192.246.0.76 15:55, 30 August 2006 (UTC)

[edit] 7/10 for articles, 3/10 for main page, 5/10 for talk?

Why would it do that for this site? Random the Scrambled 17:55, 29 March 2006 (UTC)

Google PR is extremely hard to figure out and sometimes doesn't act in a particular manner. Thizz 03:58, 13 April 2006 (UTC)

[edit] Strange link

Link on Russian version of this article is VERY strange. :)

[edit] Proposal for links

I wonder if articles on how to improve PageRank are valid links here? Booles 21:19, 31 May 2006 (UTC)

[edit] PageRank and page rank

These are two different things. Page rank is universal, PageRank is a measure at Google, not an algorithm. The start of the article must be rewritten, but I need some advices before to change it. Booles 06:26, 7 June 2006 (UTC)

Of course, page rank and PageRank are two different things. But I don't see the need for rewriting the start of the article. Page rank and the relation to PageRank is given here: "According to Google, PageRank is now just one factor among many other ones, to calculate the rank of a page in results of searches." Indeed PageRank is just a measure, however the numerical iterative calculation is an algorithm. --Doc z 19:01, 7 June 2006 (UTC)


[edit] Weblinks

I'll remove the weblink Internet: Search Engine: Google: Algorithm: Search: Link: Pagerank: Can you show a simple example? because some examples are incorrect and there are numerous sites explaining PR calculation better (clear and precise) as e.g. pr.efactory.de --Doc z 12:49, 17 July 2006 (UTC)

[edit] Dead link

During several automated bot runs the following external link was found to be unavailable. Please check if the link is in fact down and fix or remove it in that case!


maru (talk) contribs 04:43, 27 July 2006 (UTC)

This link does work (it is a re-direct, which might have affected the bot). --MichaelZimmer (talk) 23:03, 8 August 2006 (UTC)

[edit] Dead link

During several automated bot runs the following external link was found to be unavailable. Please check if the link is in fact down and fix or remove it in that case!


maru (talk) contribs 04:43, 27 July 2006 (UTC)

This link does work (it is a re-direct, which might have affected the bot). --MichaelZimmer (talk) 23:03, 8 August 2006 (UTC)

[edit] Dead link

During several automated bot runs the following external link was found to be unavailable. Please check if the link is in fact down and fix or remove it in that case!


maru (talk) contribs 04:43, 27 July 2006 (UTC)

This link does work (it is a re-direct, which might have affected the bot). --MichaelZimmer (talk) 23:04, 8 August 2006 (UTC)

[edit] Dead link

During several automated bot runs the following external link was found to be unavailable. Please check if the link is in fact down and fix or remove it in that case!


maru (talk) contribs 04:43, 27 July 2006 (UTC)

In the article, this link actually is [1] without the "accessdate" parameter. As such, it does work correctly. --MichaelZimmer (talk) 23:07, 8 August 2006 (UTC)

[edit] Relevant edit at Search engine optimization

There is a content dispute at Search engine optimization that is related to PageRank. See edit here [2], and concern raised in two sections of the talk page: [3], and [4]. Any feedback is welcome. --MichaelZimmer (talk) 23:00, 8 August 2006 (UTC)

[edit] Pagerank log base

My addition about the relationship between toolbar pagerank and true PR was reverted with the comment the information given (log. base and change in 2003) are incorrect/speculative. I agree it is speculative. But the paragraph in the article states (in the reverted version): Many people assume that the Toolbar PageRank is a proxy value determined through a logarithmic scale. If you keep that phrase in, it makes sense to mention what people are assuming/speculating about this logarithmic scale. I remember having seen better references than the ones I added, but I couldn't find them. They were probably on webmasterworld.com that is nowadays no longer accessible without subscription. Han-Kwang 13:37, 18 August 2006 (UTC)

I have seen guesses for the values of the log. base (e.g. on webmasterworld) in the range from 2 up to more than 40. Mentioning a guess of 4.5 in the article doesn't make sense, mentioning the whole range isn't very helpful in my opinion. (By the way, one can prove the log. relationship between toolbar PageRank and real RageRank. Also it is possible to measure the log. base - doing this you'll find that the value is not even close to 4.5. Moreover, there was no significant change in the relationship within the last four years.) --Doc z 15:34, 18 August 2006 (UTC)
It would be very helpful if you incorporate the above into the article, preferably with facts rather than with "it is provable/possible". I do think that estimates, however large the error margin, should be included. Note that Webmasterworld, being subscription-only, is not suitable as a reference. Han-Kwang 15:53, 18 August 2006 (UTC)
As already said I would change several points in this article (especially some mathematical ones) but I'm not a native speaker and my English isn't very well. Therefore, I doesn't change them - I just corrected the German article. Of course, I wouldn't mention webmasterworld in the article - it was just an example for a wide range of assumptions for the log base. You can find numerous other guesses on the internet. However, I don't think it's very helpful to mention this because none of these estimates based on facts and the values are varying in a wide range. Finally, even in the German article I didn't mentioned details about measuring the logarithmic behaviour because (as far as I know) there are no public references for this. --Doc z 20:21, 18 August 2006 (UTC)

[edit] PageRank algorithm including damping factor


\mathbf{R} =

\begin{bmatrix}
{(1-d)/ N} \\
{(1-d) / N} \\
\vdots \\
{(1-d) / N}
\end{bmatrix}

+ d

\begin{bmatrix}
\ell(p_1,p_1) & \ell(p_1,p_2) & \cdots & \ell(p_1,p_N) \\
\ell(p_2,p_1) & \ddots & & \\
\vdots & & \ell(p_i,p_j) & \\
\ell(p_N,p_1) & & & \ell(p_N,p_N)
\end{bmatrix}

\mathbf{R}
where the adjacency function [...]

could be restated in this form:


\mathbf{R} =
\left (
I_N - d
\begin{bmatrix}
\ell(p_1,p_1) & \ell(p_1,p_2) & \cdots & \ell(p_1,p_N) \\
\ell(p_2,p_1) & \ddots & & \\
\vdots & & \ell(p_i,p_j) & \\
\ell(p_N,p_1) & & & \ell(p_N,p_N)
\end{bmatrix}
\right )^{-1}

\begin{bmatrix}
{(1-d)/ N} \\
{(1-d) / N} \\
\vdots \\
{(1-d) / N}
\end{bmatrix}
where IN is the Identity_matrix, the adjacency function [...]

Remi Arntzen 04:28, 30 August 2006 (UTC)

Of course, you can rewrite the equation
P\!R_i = \frac {1-d} {N} + d \, \sum_{\forall j \in \{(j,i)\}} {\frac {P\!R_j} {C_j}}
in the form
P\!R_i = \frac {1-d} {N} \sum_j {M^{-1}}_{ij}
The reason why most of the time the first equation is used, is that the original work used this a (recursive) definition of PageRank. But you can also see the second equation as the definition and the first equation as the Jacobi method for solving the matrix inversion numerically. --Doc z 09:23, 30 August 2006 (UTC)
Well yes of course I understand that one formula may be the original definition of a process, however the concept I was trying to convey is that I would rather have the direct answer {\color{Green}y=}\frac{1\pm \sqrt{1+4x^2}}{2} instead of {\color{Red}y=y^2}-x^2. The two equations are equivalent, where the first equation may be helpful in understanding the way the equation was derived, while the second equation is helpful for those who wish to implement the equation in as short a time as possible for demonstration/learning purposes. Remi Arntzen 20:32, 31 August 2006 (UTC)
I also prefer the second equation (see Mathematical points to clarify/add) and I changed the German article some time ago. As already said, in my opinion the first equation ist just a numerial way of matrix inversion (Jacobi method) for the second equation. (There are even better and faster algorithms to do this.) --Doc z 04:40, 1 September 2006 (UTC)


[edit] Reference / Matt Cutt

The reference for "Google representatives, such as engineer Matt Cutts, have publicly indicated that the Toolbar PageRank is republished about once every three months, indicating that the Toolbar PageRank values are generally unreliable measurements of actual PageRank value for most periods of the year." is

--Doc z 07:39, 2 September 2006 (UTC)

In the mean time Hankwang was so kind to add the reference. However, the reference is not shown in the article because there is no references tag in the references section. However, adding this would show the new reference in a different style (compared to the other ones). Perhaps someone know how to add the reference in an approtiate way. --Doc z 17:51, 2 September 2006 (UTC)

[edit] Does span title alter pageranking as it now has two text links?

Here's span title...

[[Wikipedia|<span title="the free encyclopedia">Wikipedia</span>]]
makes...
Wikipedia

Put your mouse over the link and you get an alt text. Does span title alter pageranking as it now has two text links? Anomo 12:59, 10 October 2006 (UTC)

PageRank is just based on the link structure, not on the link text. As long as the link structure is unchanged PageRank is unchanged. (However, page rank might be change even if PageRank is unchanged.) --Doc z 14:31, 10 October 2006 (UTC)
What does "However, page rank might be change even if PageRank is unchanged" mean? Anomo 21:53, 11 October 2006 (UTC)
PageRank and page rank are two different things. PageRank is the numerical value calculated by an algorithm from the link structure while page rank is the rank of the page. ("According to Google, PageRank is now just one factor among many other ones, to calculate the rank of a page in results of searches."). Obviously, in the example above the link structure isn't changed by the type of the link. Therefore PageRank is unchanged. However, the rank of the page might change after adding a span title. --Doc z 06:28, 12 October 2006 (UTC)

[edit] PageRank Value used by Google. Integer or Floating Point?

Doc z removed from the article the statement that the Google PageRank used by Google Internally is a floating point number with several digits after the decimal point and the toolbar value (0-10) just a rounded Integer. Please explain. --roy<sac> Talk! .oOo. 09:03, 29 October 2006 (UTC)

Indeed I removed the following section:
"The actual calculated number using the original algorithm is a floating point number with a lot of digits after the decimal point. A site with PR 1 could have an actual PR of 1.298245, while another site with PR1 might only have an actual PR of 1.00123."
"The fact that the toolbar value is rounded to the next whole number is important, because this invisible difference is making a bigger difference the higher the Pagerank™ of a site becomes (a difference of hundreds of thousands of inbound links between two pages with the same displayed PR)." with the reference http://www.cumbrowski.com/CarstenC/articles/20060620_Page_Rank_and_PageRank.asp Page Rank and Pagerank(tm) - Toolbar PR and Actual PR] by Carsten Cumbrowski, June 21st and August 20th, 2006
There are several reasons why I removed this statement:
- PageRank is mixed with the PageRank shown in the toolbar - the latter one is not only the rounded value of PageRank, there is also a logarithmic relation between these two values!
- "a difference of hundreds of thousands of inbound links between two pages with the same displayed PR": The number of inbound links says nothing about the PageRank
- "The fact that the toolbar value is rounded to the next whole number is important, because this invisible difference is making a bigger difference the higher the Pagerank™" This is simply caused by the logarithmic relation between PageRank and RageRank shown in the toolbar
- The reference is not a valid source to prove your statements! (Moreover, the fact that the value shown in the toolbar is an integer is trivial and need no reference.)
(- The first statement might suggest that the internal (floating) PageRank is calculated still according to the original algorithm. This claim can not be proofed.)
Please correct the article and remove the 'reference'. --Doc z 11:22, 29 October 2006 (UTC)
Hi Doc, I removed it , because I don't want to start a useless back and forth here. We have better things to do with our spare time, like extending the content of wikipedia :). Anyhow, I am not a mathematician, but pretty good in math in school. Most readers of the entry have proably even less math skills than I have. I would keep that in mind. That the toolbar value is a rounded integer value may sounds trivial to you, but may be the only thing that was understood in the paragraph by the average person. The Rounding (logarithmic scale or not) has major ramifications in the real world.. It explains why sites that appear to have the same pagerank in the toolbar have a huge difference in number and/or PR of inbound links and sites with a difference of 1 in PR might be almost identical when it comes to inbound links (for sites with low PR). I can't find the matrix anymore that illustrated that. That is also implied by the mathematical equation, but not everybody has a calculator handy all the time ;). Just my 2 cents. --roy<sac> Talk! .oOo. 19:42, 29 October 2006 (UTC)
I have no problem with mentioning the fact that the PageRank shown in the toolbar is a rounded integer. However, there is no need to put a reference for this statement in the article. (References should be the original source of facts.) Of course, the rounding effect causes that pages with a higher/lower PageRank are displayed with the same toolbar value. However, the real PageRank is not know and one cannot draw a conclusion just from the number of inbound links. Also the important point is the logarithmic scale, it explains increasing difference for higher PageRank. Just rounding PageRank wouldn't cause this effect. (However, the logarithmic relation was not mentioned in your statement. ) --Doc z 20:23, 29 October 2006 (UTC)

[edit] Spoofing through HTML-level redirection

Is this really effective? I mean, if a search for a site actually shows results for the site it redirected to, isn't its inflated PageRank useless then? The only purpose of PR is to appear in search engine results. CGameProgrammer 19:42, 4 December 2006 (UTC)

You can remove the redirect after a while - the spoofed PageRank is still shown. This isn't helpful for ranking but for selling/exchanging links because the page looks more 'worthwhile' than it is. --Doc z 07:50, 5 December 2006 (UTC)
OK, so it's only useful for defrauding link exchangers; it doesn't increase its placement in the actual search results. Right? CGameProgrammer 02:04, 8 December 2006 (UTC)
Exactly --Doc z 07:47, 8 December 2006 (UTC)

[edit] Removed Link to IBM Clever

There is no article display when it is clicked on. —The preceding unsigned comment was added by 67.164.65.12 (talk) 19:14, 3 January 2007 (UTC).

Corrected link. --Macrakis 19:42, 3 January 2007 (UTC)

[edit] Definition semantics in "Simplified PageRank algorithm"

PageRank#Simplified_PageRank_algorithm What is the sense behind boldyfying (strong) the capital letters in this part?! It does not make any semantic sense or is there something I do not understand? The capital letters are visible enough because these are capital letters. Ento 16:36, 6 January 2007 (UTC)

[edit] SVG image now available

{{Editprotected}} Can someone please replace Image:Linkstruct2.GIF with Image:Linkstruct2.svg? (I needed Inkscape and I thought I'd learn to use it by doing something useful.) The article is currently protected or I'd do it myself. --Was Once 16:00, 4 May 2007 (UTC)

Y Done Adambro

[edit] External Links

I am moving this here from my talk page so all can participate. Jehochman Talk 13:02, 7 June 2007 (UTC)

Can you explain me why you removed the link on the PageRank article? It doesn't violate WP:EL and it was there for more than a year. It explains the mathematics of the algorithm (e.g. Jacobi iteration). You said "Please discuss on the talk page" - I discussed the insertion of this link more than a year ago. However, the link was removed without any discussion. --Doc z 11:32, 7 June 2007 (UTC)

This external link might be better used as a reference at an appropriate place in the article. Additionally, I took a look at your edit history, and you seem to focus almost exclusively on edits to PageRank, Hilltop and TrustRank. I see that you delete a lot of linkspam, which is great, but you also seem to defend a few preferred links against removal. Keep in mind that if you are adding links to your own site or works, this could be a conflict of interest. I think your link might be appropriate in some instances, so if you address this issue directly, I will do my best to see that your link gets fair consideration. Jehochman Talk 13:02, 7 June 2007 (UTC)
Indeed, most of my edits in the English part of Wikipedia are done for the PageRank, Hilltop and TrustRank article. However, most of my edits are done in the German part of Wikipedia (e.g. started Hexatische Phase or completely rewritten the PageRank article).
In my opinion there are three good articles which explain PageRank in detail:
While Franz Embacher's page is just in German, the other two websites are also translated into English. As explained above ("Mathematical points to clarify/add"), there are a lot of points to clarify in this article (e.g. Jacobi iteration, linear set of equation) which are explained almost correctly on these websites. I changed the German Wikipedia article - unfortunately, I'm not a native speaker thus I don't want to correct this article. Therefore, I added the link (long time ago). In my opionion one should add one of these websites to the Link section because it's usefull for the reader. --Doc z 13:33, 7 June 2007 (UTC)
No problem, can you suggest how this article should be changed, and we will attempt to make those changes and strongly consider citing your recommended source as a reference. (Big Smile!) Jehochman Talk 13:52, 7 June 2007 (UTC)
Ok, I'll made some suggestions here the next days. --Doc z 14:11, 7 June 2007 (UTC)

[edit] Reorganization

I did some simple reorganizing of headings & sections of the article to try to create a little more semantic and logical flow.... --ZimZalaBim (talk) 21:05, 13 July 2007 (UTC)

When reorganising the article one might change some mathematical points as mentioned before.
Moreover, the statement "Google assigns a numeric weighting from 0-10 for each webpage on the Internet; this PageRank denotes your site’s importance in the eyes of Google. The scale for PageRank is logarithmic like the Richter Scale and roughly based upon quantity of inbound links as well as importance of the page providing the link." is incorrect.
PageRank is a not limited floating number. Just the PageRank displayed in the toolbar is an integer between 0 and 10. However, the real PageRank is used in the ranking algorithm, not the toolbar value. Also, the PageRank shown in the toolbar is older than the real one. The toolbar PageRank is indeed based on a logarithmic scale as the Richter Scale. However, I wouldn't compare them because the are several differences: The Richter Scale has no limits and can be either positive or negative while the toolbar value is between 0 and 10. The measurements on the Richter Scale are floating numbers while the values shown in the toolbar are integers. --Doc z 10:32, 2 August 2007 (UTC)
Are you suggesting that the "actual" PageRank value is not between 0 and 10? I think the problem is that we have little data about what "actual" PageRank values look like - unless there are citations out there with more accurate descriptions? --ZimZalaBim talk 13:53, 2 August 2007 (UTC)
I'm saying that one should carefully distinguish between PageRank (which is referring to the real value) and the value shown in the toolbar.
There is a real PageRank PR(p_i) \ge d , i.e. an floating number which is greater or equal the damping factor d and which is not limited. All formulas in the article are referring to real PageRank. These values are calculated in the iteration process. Google uses these values for his ranking algorithm.
There is also a value shown in the toolbar. This value is an integer between 0 and 10. It's related to the real PageRank due to a logarithmic scale. However, the toolbar value is - in contrast to the real one - just update every few month. The toolbar value (and the directory value) are the only information which are shown in public. However, internally they don't have any meaning. --Doc z 16:15, 2 August 2007 (UTC)
Doesn't the Toolbar section under "PageRank variations" make this differentiation apparent? There is the mathematical pagerank, and then the Toolbar version, which, by way of its placement in the article, is clearly a variation of that. --ZimZalaBim talk 18:41, 2 August 2007 (UTC)
Yes, the toolbar section explains the difference. However, the paragraph "Google assigns a numeric weighting from 0-10 for each webpage on the Internet; this PageRank denotes your site’s importance in the eyes of Google. The scale for PageRank is logarithmic like the Richter Scale and roughly based upon quantity of inbound links as well as importance of the page providing the link." is incorrect. Google assigns a numeric weighting - the real PageRank - to every page. This PageRank denotes the site’s importance in the eyes of Google. Information about the PageRank can be taken from the value shown in the toolbar which is an integer between 0 and 10 and which displays the real PageRank on a logarithmic Scale. --Doc z 20:25, 2 August 2007 (UTC)

[edit] Removing smiley image

I think the smiley image Image:PageRank-hi-res.png is silly and unprofessional, and adds little to the article given the more informative and accurate Image:Linkstruct2.svg already included. I suggest we delete the smiley faces. Thoughts? --ZimZalaBim talk 20:20, 19 September 2007 (UTC)

We should keep them for the sake of kawaii. There's nothing wrong with a little fun. - Jehochman Talk 20:24, 19 September 2007 (UTC)
Should we place a Hello Kitty logo on each page too? :) Cuteness is for kids bedrooms, not encyclopedias. --ZimZalaBim talk 00:08, 20 September 2007 (UTC)
The colorful diagram is easier to read and understand than the boring mathematical one. Remember, we are trying to be accessible to the masses. - Jehochman Talk 00:23, 20 September 2007 (UTC)
Good point. --ZimZalaBim talk 00:46, 20 September 2007 (UTC)
Correct me if I'm wrong: but doesn't a page without any links to it have PageRank 0? So why are those little green smileys smiling? --345Kai (talk) 20:18, 24 November 2007 (UTC)
Aha, maybe it's due to the "dampening", that pages without a link to them can have non-zero PageRank? Maybe it should be clearly written in the image captions whether or not dampening is used?--345Kai (talk) 20:27, 24 November 2007 (UTC)

[edit] PageRank and Wikipedia

Is there a way to get the PageRank of a specific Wikipedia page, to compare their popularity? -- Piotr Konieczny aka Prokonsul Piotrus | talk  21:35, 19 October 2007 (UTC)

Self-reply: yes.-- Piotr Konieczny aka Prokonsul Piotrus | talk  21:46, 19 October 2007 (UTC)

[edit] To do: Explain N/A

What does N/A mean in the Page Rank? This article should explain it. —Preceding unsigned comment added by 85.53.31.31 (talk) 18:23, 4 November 2007 (UTC)

[edit] A PageRank of 0.5 does not mean what the wiki page says it means

This statement: "Hence, a PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PageRank" is nonsensical, and should be deleted. A PageRank of 0.5 has nothing to do with the probability of selecting a certain link. No page has a PageRank of 0.5 anyway, since the sum of all the pagerank's of the billions of web pages is 1.0. 128.227.35.31 19:31, 6 November 2007 (UTC) Tim Davis, Univ. of Florida http://www.cise.ufl.edu/~davis

PageRank depends on the normalization. There are two common normalization: One is chosen in such a way that the whole PageRank is one the other is choosen in such a way that the total PageRank corresponds to the number of pages. If the total PageRank is normalized to 1 then a PageRank of 0.5 means that there is a 50% chance that a random surfer visits this page. Therefore, the statement "clicking on a random link" might be corrected in this way. Of course, in systems with billions of pages a page won't have a PageRank of 0.5, but for systems with four pages (as in the first example) it's a realistic value.
By the way, the PageRank for the whole system using the normalization P\!R_i = \frac {1-d} {N} + d \, \sum_{\forall j \in \{(j,i)\}} {\frac {P\!R_j} {C_j}} is only one if there are no dead ends, i.e. no pages without outgoing links. For real systems with billions of web pages these dead ends lead to a reduction of the total PageRank. For example a system of two pages which link to each other lead to a PageRank of 0.5 for each of the pages (and a total PageRank of 1), while a system of two pages with no links lead to (1-d)/2 for each of the pages (and a total PageRank of 1-d). The PageRank in systems with dead ends is proportional to the probability that a random surfer visit this page (the factor is the total PageRank), but it isn't the probability.
--Doc z 15:28, 7 November 2007 (UTC)
Well, Mr. Davis comment was NOT about normalization. He was using the "probability normalization" where all pages of the internet together have page rank of 1. (Click on a link and the probability is 1 that you end up on some web page.) And it is, indeed, true that if a page has PageRank 0.5, then the likelihood that you end up there after clicking a random link is 0.5. It is just a little bit counterintuitive, which has to do with the fact (that Mr. Davis himself mentions), that no page on the internet gets anywhere near a page rank of 0.5. So the statement IS correct, albeit confusing, because unrealistic.--345Kai (talk) 20:16, 24 November 2007 (UTC)
I know that Mr. Davis' comment wasn't about normalization. However, you cannot say anything about the meaning of a PageRank value without saying something about the normalization used. Also, the statement "a PageRank of 0.5 means that there is a 50% chance that a random surfer visits this page" is better than "means there is a 50% chance that a person clicking on a random link will be directed to the document" or "the likelihood that you end up there after clicking a random link is 0.5". The first process is well-defined (the limit of an infiniite numer of steps / equilibrium) while "clicking a random link" might be misinterpreted (e.g "starting on a random link and then clicking on a random link" - this is not what the PageRank means). --Doc z (talk) 15:32, 25 November 2007 (UTC)

[edit] Removed the pic with the smileys

A graphical representation of a web of links between sites used for PageRank calculations.
A graphical representation of a web of links between sites used for PageRank calculations.

This is the image I removed. I removed it NOT because I don't like the cute smileys (I do), but rather because it is mathematically wrong. I have actually done the calculation of the PageRanks for this network, and the result is the following (out of 100): the green smiley in the upper left has PageRank approx 3.3, the big yellow smiley has 38.4, the red smiley in the upper right has 34.3, the red smiley on the left has 3.9, the blue smiley 8.1, the green smiley on the right 3.9 (the same as the red smiley) and the 5 small green ones at the bottom have 1.6. (This is with the "standard" dampening factor of 85%). The biggest problem is that the red smiley on the right is much too small, it should be much bigger than the blue one (which is too big). Without dampening, all PageRank accumulates at the two sinks: one sink consisting of the green smiley on the left, and the two-smiley sink containing yellow and red. So this pic doesn't work at all well without dampening. Does someone know how to edit the pic to adjust the sizes of the smileys? It would be great to put it back with the correct relative sizes. It would illustrate nicely how a page such as the upper right red one beats the blue one in the middle because the one link coming into it is so much higher valued than the six links coming into blue. --345Kai (talk) 22:35, 24 November 2007 (UTC)


I modified the picture slightly in such a way that the sizes are almost correct now. Hopefully, someone might add it to the article (and mention that this is an example for PageRank including a damping factor).
Of course, damping is needed for the given linking structure to get 'reasonable' results. This is the same as for the internet where 'dead ends' and 'maple leaves' leads to problems (e.g. degenerate eigen values) if no damping factor is used. To get 'reasonable' results without damping factor one have to modify the linking structure in such a way that every page can be reached from every other page. BTW, a page with no incoming links is having a PageRank of (1 − d) / N as can be seen taking the formular given above. --Doc z (talk) 13:01, 25 November 2007 (UTC)
Mathematical PageRanks (out of 100) for a simple network. Dampening of 85% is assumed.
Mathematical PageRanks (out of 100) for a simple network. Dampening of 85% is assumed.
Here's my version of the original example. I like this network better, because it shows that a web page can have a higher PageRank, even if it has few links to it. I'm considering putting it up on the main page. Maybe even as the main picture?--345Kai (talk) 23:52, 25 November 2007 (UTC)

[edit] New Graphic

I've replaced the main graphic. Sorry, it's only a jpg, I just can't find a program that will do vector graphics for free on the mac. The advantages over the old pic are: it's colorful, it includes damping, it shows several features: a high ranked page with few links to it, and a sink, which is assumed to link everywhere. I hope the caption isn't too long, but I thought it's good to get the main ideas across without people having to read all the formulas coming later on in the page. Hope you like it. I moved the other graphic lower on the page.--345Kai (talk) 00:08, 26 November 2007 (UTC)

I would replace the new graphic with the last smiley image for the following reasons:
  • There is already an image showing how PageRank works in a mathematical way. Therefore, I would prefer a funny and easy to understand smiley image.
  • The normalisation given in the new image is none of that which are explained in the text or that are normally used or that are used in the other image.
  • The explanation is too long. The relation between PageRank and values given in the Google's toolbar isn't necessary for understanding and is already explained in the main text. The statement "Page A is assumed to link to all pages in the web, because it has no outgoing links" is wrong.
--Doc z (talk) 09:02, 26 November 2007 (UTC)
Why do you think the statement "Page A is assumed to link to all pages in the web, because it has no outgoing links" is wrong? The same piece of information was added into the main article on 2 February 2006 by User:Michael Martinez and has been in there since. Btw, 345Kai, I like your new image and I'm working on converting it into the SVG format. —ZeroOne (talk / @) 10:47, 26 November 2007 (UTC)
I finished the SVG-image, it's shown on the right.
The image in the SVG-format, unfortunately showing some librsvg bug.
The image in the SVG-format, unfortunately showing some librsvg bug.
Sadly, it shows some librsvg bug and someone with more experience has to fix it. —ZeroOne (talk / @) 14:42, 26 November 2007 (UTC)
The statement "When calculating PageRank, pages with no outbound links are assumed to link out to all other pages in the collection." in incorrect too. Dead ends are only causing problems for the case if no damping factor is introduced. (In this case PageRank calculation corresponds to calculating eigen values.) However, to overcome those problems (as well as degenerate eigen values and so on) was one reason to introduce a damping factor. (In this case PageRank calculation corresponds to solving a linear system of equations. You can show that independent from the linking structure there is always a unique solution.) Thus no problems for dead ends appear in this case (and both statements referring to the calculation including the damping factor). Adding outbound links in the way described, not only change the PageRank value but also the relative weights. (Adding these links just guarantees that the PageRank of the whole system is one using the normalization P\!R_i = \frac {1-d} {N} + d \, \sum_{\forall j \in \{(j,i)\}} {\frac {P\!R_j} {C_j}}. By the way, introducing these additional outbound links doesn't solve all problems for the case without damping factor. Also, originally the way to deal with dead ends was leaving them out of the calculation and add them after the calculation of the rest of the system was done.) --Doc z (talk) 13:07, 26 November 2007 (UTC)
So, what would be the correct values, then? It is easy to change them in the svg-image... if the image would just render correctly. —ZeroOne (talk / @) 18:12, 28 November 2007 (UTC)
The correct values (talking the formular given above, which is the same given in the article) are: A \approx 0.0276, \ B \approx 0.3242, \ C \approx 0.2892, \ D = F \approx 0.0330, \ E \approx 0.0682, \ \mathrm{Other} \approx 0.0136  . However, my opinion is that one mathematical correct example is enough and it's better to take the smiley image. --Doc z (talk) 20:18, 28 November 2007 (UTC)
In my opinion the current mathematical image fails because of its lack of visual clarity. What I mean by this is that you really have to read and remember all the numbers, because the items are of the same size. I find the new images with items of different sizes much more illustrative and more readable than the smiley-image that doesn't have any numbers at all but does have other unnecessary details, namely faces and hands. Human brain are hard wired to notice faces first, when in fact in this image it's the relations that are important. —ZeroOne (talk / @) 22:20, 28 November 2007 (UTC)
Indeed, the new image is better than the old mathematical one. However, if you just want to get an impression how PageRank works, the smiley image is the better one in my opinion because you get an faster impression about the relations and the linking structure. --Doc z (talk) 10:27, 29 November 2007 (UTC)

I disagree, I think the new image is more clear and thus gives a faster impression. I'm not saying the new image should replace the old mathematical one, just the smiley-version. The mathematical one could perhaps be improved, though. —ZeroOne (talk / @) 22:27, 29 November 2007 (UTC)

[edit] Someone has to fix svg

I finished the SVG-image, it's shown on the right.

The image in the SVG-format, unfortunately showing some librsvg bug.
The image in the SVG-format, unfortunately showing some librsvg bug.

Sadly, it shows some librsvg bug and someone with more experience has to fix it. —ZeroOne (talk / @) 14:42, 26 November 2007 (UTC)

What tool did you use? Daniel.Cardenas (talk) 19:43, 27 November 2007 (UTC)
Dia first and then Inkscape for post-processing. I have even validated the file with the W3C Markup Validation Service and it passes the test with no warnings. Somewhat interesting is that Inkscape and Firefox render the file correctly while Opera and Wikimedia both fail in different ways. —ZeroOne (talk / @) 00:11, 28 November 2007 (UTC)
Works fine for me in I.E. and firefox, that is about 99% of the users. Perhaps this is a bug in wikimedia? Daniel.Cardenas (talk) 03:27, 28 November 2007 (UTC)
Yes, there are some bugs in the librsvg-module that is used to render the svg-files here. A list of affected images can be found at commons:Category:Pictures showing a librsvg bug. —ZeroOne (talk / @) 09:36, 28 November 2007 (UTC)

[edit] Some clarification needed on why the pagerank is an eigenvector

The article describes the page rank via the equation:

The PageRank values are the entries of the dominant eigenvector of the modified adjacency matrix. This makes PageRank a particularly elegant metric: the eigenvector is

\mathbf{R} =
\begin{bmatrix}
PR(p_1) \\
PR(p_2) \\
\vdots \\
PR(p_N)
\end{bmatrix}

where R is the solution of the equation


\mathbf{R} =

\begin{bmatrix}
{(1-d)/ N} \\
{(1-d) / N} \\
\vdots \\
{(1-d) / N}
\end{bmatrix}

+ d

\begin{bmatrix}
\ell(p_1,p_1) & \ell(p_1,p_2) & \cdots & \ell(p_1,p_N) \\
\ell(p_2,p_1) & \ddots &  & \vdots \\
\vdots & & \ell(p_i,p_j) & \\
\ell(p_N,p_1) & \cdots & & \ell(p_N,p_N)
\end{bmatrix}
\mathbf{R}

but most students reading this will not understand why this is a Markov eigenproblem, because of the constant vector added on the right-hand side—it doesn't look like an eigenproblem (but it is).

It is fairly easy to explain this, however, and is important for understanding what is going on and why it works (in particular, why there is a unique solution for R). Here is a simple explanation; feel free to adapt it into the article.

Write the equation above as:

\mathbf{R} = \frac{1-d}{N} \mathbf{u} + d A \mathbf{R}

where u is the N-component vector of 1's, and A is the normalized adjacency matrix. The key to realizing that this is an eigenproblem is the fact that R is interpreted as a probability vector, and is therefore normalized: the sum of the components of R equals one, or equivalently

\mathbf{u}^T \mathbf{R} = 1

This means that we can write the pagerank equation as an obvious eigenproblem:

\mathbf{R} = \frac{1-d}{N} \mathbf{u} \mathbf{u}^T \mathbf{R} + d A \mathbf{R} = M\mathbf{R}

where

M = \frac{1-d}{N} \mathbf{u} \mathbf{u}^T + d A

is an N×N matrix and is easily verified to be Markov (its entries are non-negative and the columns sum to 1).

Moreover, because of the u uT term (a rank-1 matrix of all 1's), all of the entries of M are strictly positive. In this case, there is a well-known theorem that there is only one λ=1 eigenvector of M (see any linear algebra text, e.g. Intro to Linear Algebra by Strang), and it is the largest-magnitude eigenvalue. Thus, the process of repeatedly multiplying by M converges to a unique steady-state page rank vector for any random starting vector with components that sum to 1.

I hope you include some discussion along these lines; it is really confusing to call it an eigenvector of a Markov process if you don't write down the eigenproblem or the Markov matrix.

(Of course, in practice you would not compute M explicitly, because it is a huge dense matrix. But you can still see the iteration process as essentially the power method to compute the largest-magnitude eigenvector of M.)

—Steven G. Johnson

[edit] Check Pagerank Explanation (External)

I added the external link for how to check a web page's pagerank, this external link keeps a neutral point of view, doesn't contain any promotional script or service. It provides the method using the Google Toolbar.

SDSandecki (talk) 01:14, 27 February 2008 (UTC)

[edit] test

test —Preceding unsigned comment added by 72.54.50.130 (talk) 17:06, 11 April 2008 (UTC)