Talk:PageRank
From Wikipedia, the free encyclopedia
[edit] Opening comments
talk:Wikipedia Announcements says that wikipedia has a PageRank of 0.7. How does one discover this PageRank?
- If you have a Google toolbar add-on to your browser the PageRanks are indicated within it.
- Since the Google toolbar is Windows/IE only, another trick is to search the Google Directory.
- But there is a Google toolbar for Firefox, although I don´t know if its just for Win...
- You can also go to http://www.mygooglepagerank.com and put in the url for which you want the rank.
[edit] More
A common theory for why this is is because the Wikipedia is very interconnected, with each article having many internal links from other articles, which in turn have links from many other sites on the Web pointing to them. Compared to Wikipedia, and similar high quality content-rich sites, the rest of the World Wide Web is relatively loosely connected.
This cannot be correct. Simply being more tightly woven with links cannot increase the total flux of links, which determines the PageRank of pages within a site. More plausible explanation would be:
- Wikipedia has a high ratio of internal site links to external links
- Wikipedia as a whole has a very high PageRank
- Some of Wikipedia's highly ranked competitors (eg xrefer) have broken Google indexing of their entries (actually, xrefer is no longer free as in beer at all)
- One of google's other algorithms (a lot more goes into their rankings than just PageRank) is favouring wikipedia -- maybe they simply decided "let's boost wikipedia" :)
--Pde 18:51 17 Jun 2003 (UTC)
I think it probably is correct - depending on how Google seeds it's pagerank. In the pagerank paper, they mention two possible seeds: a uniform allocation to all top-level webpages, and a uniform allocation to all pages. The former makes sense, because it bootstraps off the domain name system, making attacks on pagerank costly. (To increase your pagerank, you'd have to buy lots of domain names). The later means link farms (which wikipedia closely resembles ;) get a high pagerank.
In any case, most links are internal, so not much "energy" leaks out. So in either seeding system, wikipedia would do quite well. For a more detailed discussion, see the work by Monica Bianchini and friends.
Clausen 01:33 18 Jun 2003 (UTC)
The link to "PowerPoint HTML presentation by Larry Page" is dead
epsalon 15:22, May 5, 2004 (UTC)
I've deleted the list of sites that have at some point had a high PageRank. It's subjective, highly non-comprehensive, will never be up to date, and we've got external links to sites that do the same, only better. -- ALargeElk | Talk 10:15, 28 Jul 2004 (UTC)
[edit] Add a link to my paper: The Cost of Attack of PageRank?
Hi all,
Does anyone want to add a link to my work on PageRank:
members.optusnet.com.au/clausen/reputation
I think adding it myself would be too much self-promotion. Contributions: survey of all the different PageRank formulas around (lots of mistakes!), proof of convergence, analysis of cost of attack. So, if you think the interested reader should know about it, add it yourself...
--Clausen 21:50, 20 Sep 2004 (UTC)
[edit] How many?
If the "PageRank" as it's showed in the google toolbad is 5/10, is there a way to fiture out how many of the currenctly 8,058,044,651 pages that link to that page?--Jerryseinfeld 22:32, 3 Jan 2005 (UTC)
No. It could have one link from a page with PageRank of ~6, or it could have 100000 links from pages of PageRank 1. It also depends on how many outgoing links it has. (Outgoing looks reduce PageRank) Read my thesis! http://members.optusnet.com.au/clausen/reputation --Clausen 12:25, 29 May 2005 (UTC)
Contrary to what Clausen posted, it is possible to see how many links a page has with the Google query: link:http://www.example.org/page.html On the right of the blue bar at the top of the page it gives an approximate number of links the page has. Scott 18:51, 25 October 2006 (UTC)
[edit] expected number of clicks
Removed This happens to equal t − 1 where t is the expectation of the number of clicks (or random jumps) required to get from the page back to itself. This were true only if the algorithm would be seeded with the unit vector of the page. --Tgr 10:51, 29 May 2005 (UTC)
Actually, you can interpret non-uniform seed vectors as a probability distribution over the web pages the random surfer "jumps" to when it "gets bored". In this interpretation, the "random surfer" interpretation of t still holds. --Clausen 12:24, 29 May 2005 (UTC)
Yeah, I misunderstood the point the article made. Reverted to original. --Tgr 22:51, 29 May 2005 (UTC)
[edit] Google PageRank Checkers
I note that someone recently added a link to a web-based Google PageRank Checker, and... I, personally, think such links ought to be banned from the parent article. For one, since the source code is easily available and since there are already more than a few already available, linking to them constitutes little more than an attempt at vanity, imho. For two, if anything about web-based Google PageRank checkers is to be mentioned at all, I think the related link ought to be to the source code - not to some script that someone wrote just to promote themselves. TerraFrost 05:04, 13 October 2005 (UTC)
- I agree. Also wouldn't mind seeing PageRank Explained For Idiots and A good list of pagerank 10 sites go. Aapo Laitinen 16:10, 13 October 2005 (UTC)
- I removed one page rank checker today (along with a cull of other external links a few days ago). We still have a lot of external links here that I think could go.
[edit] Warning
I've noticed this page gets link spam alot. Would a html comment help things at all?--Eugman
[edit] What is patented? What is public knowledge?
The article says, near the beginning, that "The exact details of this scale are not public knowledge", and "The PageRank process has been patented (U.S. Patent 6,285,999). The patent is not assigned to Google but to Stanford University."
However, later on the article explains in detail how the pagerank is calculated. As a joe reader I ask myself: what is patented? How can this not be public knowledge if it's described below?
Is it possible to state clearly if the "PageRank process", which is patented, is something different than the "PageRank algorithm", which seems public knowledge, and where the differences are?
From what I've read we know the basic idea behind it(this has been patented by stanford) but google has undoubtedly changed some stuff around and we don't know the details of that.--Eugman 17:03, 14 January 2006 (UTC)
It should be noted that there are numerous derivations of the PageRank algorithm which have been published in various academic papers (a search on scholar.google.com for expressions including or related to pagerank will reveal many of these papers, or their abstracts). It is my understanding (but I am no patent attorney) that the patent applies to the ranking mechanism itself, and not to specific algorithms derived from or based upon it (but that those algorithms may be governed by the patent to some extent).
Yahoo! has invested a great deal of resources into researching and refining PageRank in conjunction with Stanford University. It should be understood that PageRank, while closely associated with Google in the public view (or at least the general SEO community view) is being or has been adapted or studied by more than one search service.Michael Martinez 19:36, 14 February 2006 (UTC)
[edit] bobmutch edits
Hi all. I am a SEO consultant, write on Pagerank, . I have added a "Real RageRank" and "Google Directory PageRank" section to the PageRank entry. I also have added a number of quality links to the "External Links" section. I am open to feedback on any of the edits I have made. I plan on making some more edits to this section and adding some more links in the future.--bobmutch 14:55, 11 February 2006 (UTC)
Please don't do that again. Your changes should be discussed here first, as you introduced a lot of confusing information that doesn't bring any value to the article. It looks like you threw in a lot of undesirable links, too (such as the one to Phil Craven's paper, which does a very poor job of explaining PageRank).Michael Martinez 04:19, 13 February 2006 (UTC)
I notice today that someone posting from an Australian IP address added two commercial links to the External Links section and also removed the name of Bob Mutch's company (seocompany.ca) from his comment above.Michael Martinez 19:32, 14 February 2006 (UTC)
I think Mark's, Ian's, Phil's, Chris's and Wakfer's articles are on PageRank for good articles. I will like to see the links added in the link section.
- A Survey of Google's PageRank by Markus Sobek
- The Google Pagerank Algorithm and How It Works by Ian Rogers
- Google's PageRank Explained by Phil Craven
- PageRank Uncovered by Chris Ridings and Mike Shishigin (PDF)
- Google PageRank, & How to Get It by Bob Wakfer
I also would like to add the following paragraph on Google Directory Pagerank.
Google Directory PageRank
The Google Directory is a 8 unit measurement. These values can be viewed in the Google Directory. Unlike the Google Toolbar which shows the PageRank value by popup the Google Directgory doesn't show the PageRank values. You can see the PageRank scale values by looking at the source and wading though the HTML code.
These 8 positions are displayed next to each Website in the Google Directory. cleardot.gif is used for a zero value and a combination of two graphics pos.gif and neg.fig are used for the other 7 values. The pixel width of the 7 values are 5/35, 11/29, 16/24, 22/18, 27/13, 32/8 and 38/2 (pos.gif/neg.gif).
I am posting these 5 links and 1 section up for discuss for 2 weeks before I go and add them to the Pagerank section. If any one else doing addtions on the Pagerank section has any objections or can see how the Google Directory RankRank section can be made better please post up and lets discuss. --bobmutch 13:43, 01 March 2006 (UTC)
Ok I put this section up for discuss for 2 weeks and then some. Is there any opjects to me adding this up to the Pagerank section. Speak or for ever hold there piece. --bobmutch 18:11, 18 March 2006 (UTC)
I would like to add a page that shows the history of Pagerank updates for the last 6 years. Does any one object to me doing this. Pagerank Update History
--bobmutch 19:00, 23 October 2006 (UTC)
- I my opinion there are two arguments against it: seocompany.ca is not the original source of Google's update history. The original (as far as I know) was posted on WebmasterWorld. Unfortunately, the access to WebmasterWorld isn't free anymore. Also, there might be some interests in the PageRank update history years ago when PageRank, PageRank shown in the toolbar, backlinks and SERPs were updated at the same time. However, today PageRank is calculated continuously. The benefit of the update history of PageRank shown in the toolbar (in fact it isn't a history of PageRank update any more) is negligible. --Doc z 10:11, 24 October 2006 (UTC)
[edit] Pagerank 10 sites list link
Someone keeps adding this link back in. The site purports to track Web sites with Toolbar PageRank values of 10, but it is a commercial site offering Web promotion services. The link should not be allowed as it is not significant and clearly violates the "no commercial links" standard which has been enforced for this article. Michael Martinez 19:32, 14 February 2006 (UTC)
Hey Michael,
you should know this link has been here for more than a year, some people occasionally remove it and automatically gets replaced, Search engine genie is 80% non commercial site, read the blog section here httpblog, I respect wikipedia and has worked for its growth a lot, Please dont mistake me searchenginegenie's link is no where else in wikipedia we respect this place and never ever thought of inserting out links.
The problem is that the list encourages PR chasing and the presence of the link encourages SEOs to link drop. Neither of which has anything to do with explaining what PageRank actually is or does. That the link has been dropped repeatedly over the past year doesn't make it more vital.Michael Martinez 15:52, 15 February 2006 (UTC)
Ok Michael I appreciate your post and your willingness to keep wikipedia clean, But I bet you are a long way away from it, There are 1000s of pages spammed across wikipedia everyday, even I make some major edits but those links are replaced immediately, I just saw your post at seomoz, I am not new to internet nor new to blogging or new to seo have been in this business for more than 4 years now, i feel you are not happy with someone in a forum probably in highrankings forum?? , Rand has given a nice place to rant go on,
Infact this pagerank 10 list was one of the two lists first available in internet, Looking at our logs people searching for pagerank 10 are topping the list, We get about 30 uniques just for PR 10 related keyword combinations from google alone, Also we get a lot of traffic from wikipedia too, SO people are actually interested in looking at this article so why you feel its irrelevant,
RESPONSE: You feel incorrectly, then, when you say "i feel you are not happy with someone in a forum probably in highrankings forum??" I'm not editing articles here because of anything anyone says or does in an SEO forum.
People who pay attention to the Google Toolbar PR are not only wasting their time, they are chasing what Mike Grehan has aptly called "green fairy dust". Including a link to a PR 10 chase site isn't helping anyone understand the PageRank algorithm. Yes, people are interested in all sorts of SEO myths and nonsense. That doesn't mean every one of those kooky ideas should be advocated by an open source site like Wikipedia.
I'm very critical of Wikipedia precisely because of these kinds of low standards. I do my best to help educate people, but I cannot prevent people from hanging on to bad ideas. PR chasing, fortunately, has taken on a bad reputation over the past couple of years. That's a positive change in the SEO community.Michael Martinez 22:39, 15 February 2006 (UTC)
Response:
People interested to know about the list of pagerank 10 sites cannot be related to PR chasing, So you mean everyone looking at Alexa ratings are wrong??, People are always in lookout for various means to gauge the quality of a page or a site, Google's pagerank is one of them atleast PR8 and above cannot be abused easily, Only very good sites can reach PR9 or PR10 and there are only about 30 unique PR 10 sites from millions of sites around the web,
Response: First, please sign your posts with your name if you're not going to log in.
Secondly, there is no SEO benefit to looking at a list of pages with PR 10 ratings and the raw numbers of backlinks.
Third, Alexa rankings are as bogus as PageRank. Both can be manipulated and only SEOs are interested in either, and only SEOs who have spent too much time following the wrong forum discussions, reading the wrong FAQs, tutorials, etc.
Neither toolbar PR nor an Alexa ranking is any indication of the quality or value of a page. There are a growing number of academic/technical papers which address the problem of falsely inflated high PR. Look at TrustRank and its followon concepts such as spam mass estimation. High PR is just not taken seriously. So there is no value to the PR 10 page.Michael Martinez 04:46, 16 February 2006 (UTC)
I suggest that the list of PR10 pages be added in the list of links. It fosters an understanding of the Toolbar Pagerank 0-10 scale by show how few Website home pages or Website pages reach PR10. It also shows how the relationship scale between Real Pagerank and Toolbar Pagerank is changing.
I also think it would be helpful to write a short paragraph on the relationship between Real Pagerank and Toolbar Pagerank and how this relationship is changing.--bobmutch 13:53, 01 March 2006 (UTC)
- I'm not sure exactly what the point of the above discussion about external links is. Are you saying the article should not link to any site listing PR10 sites? Or that the particular site(s) being added are overly commercial? There is a page here which lists PR10/9 sites and is relatively clean (only a few ads off to the side). Actually it is based on a page that used to be on Wikipedia... would it be okay/useful to add this page? Scott 01:01, 14 March 2007 (UTC)
[edit] Mathematical points to clarify/add
There are some points in the mathematical parts which should be changed/added. In the current article PageRank calculation is described as an eigen value problem ("The PageRank values are the entries of the dominant eigenvector of the modified adjacency matrix.") - not only in case of a damping factor of one (q=1) but also for a lower value. This point of view is used very often in the context of PageRank calculation. However, the simpler point of view is seeing original PageRank equation as the Jacobi-Iteration of a system of linear equations. Doing this, Pagerank
- is not defined recursively
- can be calculated analytically, i.e. no iterations and no initial guess is needed. Of course, for larger matrices this isn't a practical way, but for smaller systems the equations can be solved easily.
- can be computed numerically not only by the Jacobian-iteration which leads to the well-known PageRank formula but also with other iteration schemes such as minimal residue, Gauss-Seidel, over-relaxation methods, conjugate gradient, preconditioning methods, multigrid and blocking techniques.
Also there are no problems for page with no outgoing links. IAnyway Jacobi-iteration should be mentioned in the article. This should be the origin for this kind of iteration.
Two other points (minor changes) I would correct/add:
- I would use d instead of q as damping parameter (as done in the original papers). Anyway I would mention the parameter also in the text, e.g. 'The damping factor q is subtracted'
- I would add a short section about modifications, i.e. Personalised PageRank, Topic-Sensitiv PageRank ...
I won't correct the article because I'm not a native speaker. However, someone take this a input and change it.
I will add link to a page where the mathematics is described in the way I mentioned above, i.e. system of linear equations. From my point of view this is much straighter, easier to understand and mathematically clearer. Also mathematical details can be taken from this page. So far I haven't seen any other page explaining PageRank in this way.
--Doc z 12:55, 25 March 2006 (UTC)
[edit] More mathematical points to clarify or add...
-
-
- This probability is allowed for through a damping factor
-
Is that the probability, at any step, that the person will continue, or the probability that he will stop, or something else?
-
-
- Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85.
-
Is "q" supposed to be the damping factor? If so, it should say so. Above, one could rephrase so that instead of saying "the damping factor will be set around 0.85", it would say "the damping factor q will be set around 0.85".
-
-
- As Google increases the number of documents in its collection, the initial approximation of PageRank decreases for all documents.
-
If a million new documents are added and ALL of them link to page "A", does the initial approximation of A's PageRank decrease? Of should it say the AVERAGE PageRank of all old document decreases, rather than that all of them, unanimously, decrease?
Michael Hardy 23:08, 27 March 2006 (UTC)
- q is the probability that the random surfer is picking up a randomly chosen link from the current page; (1-q) is the probability that the random surfer re-starts with a randomly chosen page.
- Indeed q is the dampening factor. However, normally it is denoted by d
- Originally, d was set to 0.85. The current value is unknown, but it is expected that Google is using a value around 0.85.
- One should be careful when saying something like "As Google increases the number of documents in its collection, the initial approximation of PageRank decreases for all documents" because it depends on the normalisation. The statement is referring to the case of the "1/N" normalisation. Indeed, even for this normalisation PageRank of a single page might increase when adding new pages as explained above.
- --Doc z 09:46, 28 March 2006 (UTC)
- Forgive my ignorance, but I don't see how a dampening factor works at all. If every page has a dampening factor of .85 or 8.5 or 85.0, then every other measure is scaled according to the magnitude defined by d and no ordinal changes occur with any page relative to any other page. This probability that a random surfer will cease surfing or accounting for pages without links seems to be a distraction. How is it a page with 3 million hits a day like New York Times but with few inbound links to its home page nytimes.com will be fairly measured? Google rightly protects its algorithms just as it kept its earnings secret in early years to ward off competitors. So it is not unreasonable to question the math and logic of search.
-
-
- Pages don't have a damping factor - just links are damped. Changing the damping factor not only leads to re-sclale of PageRank (due to the PageRank source of 1-d) but a different distribution. The PageRank algorithm is published and well-known. Of course, Google changed the algorithm they used for ranking pages, but this article simply desbribes the original PageRank algorithm. You can question the logic of Google's search but you can't question the math of this algorithm. --Doc z 09:28, 29 August 2006 (UTC)
-
-
- Okay, links have dampening factors, that part I understand. However if the universal dampening factor is consistently .85, then everything is reduced by 15%, yes? How does re-scaling everything downward by 15% cause a redistribution? The distribution would still have the same shape relative to the mean and the first standard deviation, yes? I guess I do question the logic not the math, because why a dampening factor is included at all doesn't make sense to me. The probability that a surfer will continue on to another random page after reaching a random page should not have a role in determining the authority of any random, or non-random page, should it? The probability that a surfer will continue to another page after landing on google.com must be about 1000%, since all that's there is a search box, but logic would say an authoritative page would end the surfer's search, a dampening effect of 1.00. Thanks for your time in answering this.192.246.0.76 13:35, 29 August 2006 (UTC)
-
-
- 'However if the universal dampening factor is consistently .85, then everything is reduced by 15%, yes?' No - there is a self contribution of every page of 1-d! Changing the self contribution from (1-d) leads to a re-scale of PageRank but not changing the damping factor. For example, in the limit d=0 PageRank is distributed uniformly, i.e. every page has the same PageRank. You can also see that the distribution is changed when changing d if you consider the random surfer model. A random surfer chooses randomly a link on the current page with probability d or a new page (which means a page independently if it has an incoming link from this page or not) with probability 1-d. In case d=1 the surfer always follows a link on the current page, while for d=0 he is always randomly chooses a new page (from the whole internet). (Of course, this is just a simple model of the user behaviour.) --Doc z 17:55, 29 August 2006 (UTC)
-
-
- You said, "Changing the self contribution from (1-d) leads to a re-scale of PageRank but not changing the damping factor." which sort of agrees with what I said, I think. If (1-d) is always (1-.85), then why can it not just always be .085 or 8.5 or 85.0? If d is one number for all instances, then it isn't a variable, by definition, yes? If it isn't a varible, then why is it included at all? If it is a variable, how does Google decide which site gets a .85 and which gets a .01? If it is the probability that a user will select a link on the current page seen vs. not, then are the passages in the main page that talk about the damping factor being a measure of randomness inaccurate? Thanks. 192.246.0.76 19:31, 29 August 2006 (UTC)
-
-
- This is the (recursive) PageRank formula
-
-
-
- with
-
-
-
- Using a different normalization as for example
-
-
-
- leads to a re-scaling of PR. However, if you change d from 0.9 to 0.8 the distribution change. d is a parameter which can be changed, but it doesn't depend on the page. This means that Google might choose d=0.9 for the current calculation of PageRank, but they used d=0.85 some years ago (as an example). In any case, the same damping factor is used for all pages. To get the best value, one is calculating PageRank for different d and compare the results. By the way someone might change the phrase Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85[citation needed]. in the article. Sergey Brin and Lawrence Page said (The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Computer Networks and ISDN Systems 1998) that We usually set d to 0.85. This means that thry used d=0.85 in that work. Most of the people believe that d is still close to that value. (Of course, the is no source for the latter statement.) --Doc z 20:39, 29 August 2006 (UTC)
- Thanks for your analysis. I am a bit surprised at the simplicity of even the basic PageRank algorithm. It must be a challenge to implement though, given that N is the total collection of all pages crawled and included by Google. (1-d)/N is a really, really small number. When d=.85, then (1-d)/N is like .15 divided by billions of pages, and that tiny number is computed for all links leading into a page.
-
-
- My interest in this problem began when I would key in three or four unlikely strings into to a search box to pull up a file I KNEW was on the web, and neither Google nor Yahoo! would place the paper at the top of the list. So if you key in a list of researchers who authored a paper, a popular book that cites the paper would land in position 1 and the paper itself wouldn't even make it onto the page. Good luck to us all.192.246.0.76 15:55, 30 August 2006 (UTC)
[edit] 7/10 for articles, 3/10 for main page, 5/10 for talk?
Why would it do that for this site? Random the Scrambled 17:55, 29 March 2006 (UTC)
Google PR is extremely hard to figure out and sometimes doesn't act in a particular manner. Thizz 03:58, 13 April 2006 (UTC)
[edit] Strange link
Link on Russian version of this article is VERY strange. :)
[edit] Proposal for links
I wonder if articles on how to improve PageRank are valid links here? Booles 21:19, 31 May 2006 (UTC)
[edit] PageRank and page rank
These are two different things. Page rank is universal, PageRank is a measure at Google, not an algorithm. The start of the article must be rewritten, but I need some advices before to change it. Booles 06:26, 7 June 2006 (UTC)
- Of course, page rank and PageRank are two different things. But I don't see the need for rewriting the start of the article. Page rank and the relation to PageRank is given here: "According to Google, PageRank is now just one factor among many other ones, to calculate the rank of a page in results of searches." Indeed PageRank is just a measure, however the numerical iterative calculation is an algorithm. --Doc z 19:01, 7 June 2006 (UTC)
[edit] Weblinks
I'll remove the weblink Internet: Search Engine: Google: Algorithm: Search: Link: Pagerank: Can you show a simple example? because some examples are incorrect and there are numerous sites explaining PR calculation better (clear and precise) as e.g. pr.efactory.de --Doc z 12:49, 17 July 2006 (UTC)
[edit] Dead link
During several automated bot runs the following external link was found to be unavailable. Please check if the link is in fact down and fix or remove it in that case!
maru (talk) contribs 04:43, 27 July 2006 (UTC)
- This link does work (it is a re-direct, which might have affected the bot). --MichaelZimmer (talk) 23:03, 8 August 2006 (UTC)
[edit] Dead link
During several automated bot runs the following external link was found to be unavailable. Please check if the link is in fact down and fix or remove it in that case!
maru (talk) contribs 04:43, 27 July 2006 (UTC)
- This link does work (it is a re-direct, which might have affected the bot). --MichaelZimmer (talk) 23:03, 8 August 2006 (UTC)
[edit] Dead link
During several automated bot runs the following external link was found to be unavailable. Please check if the link is in fact down and fix or remove it in that case!
maru (talk) contribs 04:43, 27 July 2006 (UTC)
- This link does work (it is a re-direct, which might have affected the bot). --MichaelZimmer (talk) 23:04, 8 August 2006 (UTC)
[edit] Dead link
During several automated bot runs the following external link was found to be unavailable. Please check if the link is in fact down and fix or remove it in that case!
maru (talk) contribs 04:43, 27 July 2006 (UTC)
- In the article, this link actually is [1] without the "accessdate" parameter. As such, it does work correctly. --MichaelZimmer (talk) 23:07, 8 August 2006 (UTC)
[edit] Relevant edit at Search engine optimization
There is a content dispute at Search engine optimization that is related to PageRank. See edit here [2], and concern raised in two sections of the talk page: [3], and [4]. Any feedback is welcome. --MichaelZimmer (talk) 23:00, 8 August 2006 (UTC)
[edit] Pagerank log base
My addition about the relationship between toolbar pagerank and true PR was reverted with the comment the information given (log. base and change in 2003) are incorrect/speculative. I agree it is speculative. But the paragraph in the article states (in the reverted version): Many people assume that the Toolbar PageRank is a proxy value determined through a logarithmic scale. If you keep that phrase in, it makes sense to mention what people are assuming/speculating about this logarithmic scale. I remember having seen better references than the ones I added, but I couldn't find them. They were probably on webmasterworld.com that is nowadays no longer accessible without subscription. Han-Kwang 13:37, 18 August 2006 (UTC)
- I have seen guesses for the values of the log. base (e.g. on webmasterworld) in the range from 2 up to more than 40. Mentioning a guess of 4.5 in the article doesn't make sense, mentioning the whole range isn't very helpful in my opinion. (By the way, one can prove the log. relationship between toolbar PageRank and real RageRank. Also it is possible to measure the log. base - doing this you'll find that the value is not even close to 4.5. Moreover, there was no significant change in the relationship within the last four years.) --Doc z 15:34, 18 August 2006 (UTC)
-
- It would be very helpful if you incorporate the above into the article, preferably with facts rather than with "it is provable/possible". I do think that estimates, however large the error margin, should be included. Note that Webmasterworld, being subscription-only, is not suitable as a reference. Han-Kwang 15:53, 18 August 2006 (UTC)
-
-
- As already said I would change several points in this article (especially some mathematical ones) but I'm not a native speaker and my English isn't very well. Therefore, I doesn't change them - I just corrected the German article. Of course, I wouldn't mention webmasterworld in the article - it was just an example for a wide range of assumptions for the log base. You can find numerous other guesses on the internet. However, I don't think it's very helpful to mention this because none of these estimates based on facts and the values are varying in a wide range. Finally, even in the German article I didn't mentioned details about measuring the logarithmic behaviour because (as far as I know) there are no public references for this. --Doc z 20:21, 18 August 2006 (UTC)
-
[edit] PageRank algorithm including damping factor
- where the adjacency function [...]
could be restated in this form:
- where IN is the Identity_matrix, the adjacency function [...]
Remi Arntzen 04:28, 30 August 2006 (UTC)
-
- Of course, you can rewrite the equation
-
- in the form
-
- The reason why most of the time the first equation is used, is that the original work used this a (recursive) definition of PageRank. But you can also see the second equation as the definition and the first equation as the Jacobi method for solving the matrix inversion numerically. --Doc z 09:23, 30 August 2006 (UTC)
-
-
- Well yes of course I understand that one formula may be the original definition of a process, however the concept I was trying to convey is that I would rather have the direct answer instead of . The two equations are equivalent, where the first equation may be helpful in understanding the way the equation was derived, while the second equation is helpful for those who wish to implement the equation in as short a time as possible for demonstration/learning purposes. Remi Arntzen 20:32, 31 August 2006 (UTC)
-
-
-
-
- I also prefer the second equation (see Mathematical points to clarify/add) and I changed the German article some time ago. As already said, in my opinion the first equation ist just a numerial way of matrix inversion (Jacobi method) for the second equation. (There are even better and faster algorithms to do this.) --Doc z 04:40, 1 September 2006 (UTC)
-
-
[edit] Reference / Matt Cutt
The reference for "Google representatives, such as engineer Matt Cutts, have publicly indicated that the Toolbar PageRank is republished about once every three months, indicating that the Toolbar PageRank values are generally unreliable measurements of actual PageRank value for most periods of the year." is
- Cutt, Matts. What’s an update? Blog post (September 8, 2005).
--Doc z 07:39, 2 September 2006 (UTC)
- In the mean time Hankwang was so kind to add the reference. However, the reference is not shown in the article because there is no references tag in the references section. However, adding this would show the new reference in a different style (compared to the other ones). Perhaps someone know how to add the reference in an approtiate way. --Doc z 17:51, 2 September 2006 (UTC)
[edit] Does span title alter pageranking as it now has two text links?
Here's span title...
- [[Wikipedia|<span title="the free encyclopedia">Wikipedia</span>]]
- makes...
- Wikipedia
Put your mouse over the link and you get an alt text. Does span title alter pageranking as it now has two text links? Anomo 12:59, 10 October 2006 (UTC)
- PageRank is just based on the link structure, not on the link text. As long as the link structure is unchanged PageRank is unchanged. (However, page rank might be change even if PageRank is unchanged.) --Doc z 14:31, 10 October 2006 (UTC)
-
- What does "However, page rank might be change even if PageRank is unchanged" mean? Anomo 21:53, 11 October 2006 (UTC)
-
-
- PageRank and page rank are two different things. PageRank is the numerical value calculated by an algorithm from the link structure while page rank is the rank of the page. ("According to Google, PageRank is now just one factor among many other ones, to calculate the rank of a page in results of searches."). Obviously, in the example above the link structure isn't changed by the type of the link. Therefore PageRank is unchanged. However, the rank of the page might change after adding a span title. --Doc z 06:28, 12 October 2006 (UTC)
-
[edit] PageRank Value used by Google. Integer or Floating Point?
Doc z removed from the article the statement that the Google PageRank used by Google Internally is a floating point number with several digits after the decimal point and the toolbar value (0-10) just a rounded Integer. Please explain. --roy<sac> Talk! .oOo. 09:03, 29 October 2006 (UTC)
- Indeed I removed the following section:
- "The actual calculated number using the original algorithm is a floating point number with a lot of digits after the decimal point. A site with PR 1 could have an actual PR of 1.298245, while another site with PR1 might only have an actual PR of 1.00123."
- "The fact that the toolbar value is rounded to the next whole number is important, because this invisible difference is making a bigger difference the higher the Pagerank™ of a site becomes (a difference of hundreds of thousands of inbound links between two pages with the same displayed PR)." with the reference http://www.cumbrowski.com/CarstenC/articles/20060620_Page_Rank_and_PageRank.asp Page Rank and Pagerank(tm) - Toolbar PR and Actual PR] by Carsten Cumbrowski, June 21st and August 20th, 2006
- There are several reasons why I removed this statement:
- - PageRank is mixed with the PageRank shown in the toolbar - the latter one is not only the rounded value of PageRank, there is also a logarithmic relation between these two values!
- - "a difference of hundreds of thousands of inbound links between two pages with the same displayed PR": The number of inbound links says nothing about the PageRank
- - "The fact that the toolbar value is rounded to the next whole number is important, because this invisible difference is making a bigger difference the higher the Pagerank™" This is simply caused by the logarithmic relation between PageRank and RageRank shown in the toolbar
- - The reference is not a valid source to prove your statements! (Moreover, the fact that the value shown in the toolbar is an integer is trivial and need no reference.)
- (- The first statement might suggest that the internal (floating) PageRank is calculated still according to the original algorithm. This claim can not be proofed.)
- Please correct the article and remove the 'reference'. --Doc z 11:22, 29 October 2006 (UTC)
-
- Hi Doc, I removed it , because I don't want to start a useless back and forth here. We have better things to do with our spare time, like extending the content of wikipedia :). Anyhow, I am not a mathematician, but pretty good in math in school. Most readers of the entry have proably even less math skills than I have. I would keep that in mind. That the toolbar value is a rounded integer value may sounds trivial to you, but may be the only thing that was understood in the paragraph by the average person. The Rounding (logarithmic scale or not) has major ramifications in the real world.. It explains why sites that appear to have the same pagerank in the toolbar have a huge difference in number and/or PR of inbound links and sites with a difference of 1 in PR might be almost identical when it comes to inbound links (for sites with low PR). I can't find the matrix anymore that illustrated that. That is also implied by the mathematical equation, but not everybody has a calculator handy all the time ;). Just my 2 cents. --roy<sac> Talk! .oOo. 19:42, 29 October 2006 (UTC)
-
-
- I have no problem with mentioning the fact that the PageRank shown in the toolbar is a rounded integer. However, there is no need to put a reference for this statement in the article. (References should be the original source of facts.) Of course, the rounding effect causes that pages with a higher/lower PageRank are displayed with the same toolbar value. However, the real PageRank is not know and one cannot draw a conclusion just from the number of inbound links. Also the important point is the logarithmic scale, it explains increasing difference for higher PageRank. Just rounding PageRank wouldn't cause this effect. (However, the logarithmic relation was not mentioned in your statement. ) --Doc z 20:23, 29 October 2006 (UTC)
-
[edit] Spoofing through HTML-level redirection
Is this really effective? I mean, if a search for a site actually shows results for the site it redirected to, isn't its inflated PageRank useless then? The only purpose of PR is to appear in search engine results. CGameProgrammer 19:42, 4 December 2006 (UTC)
- You can remove the redirect after a while - the spoofed PageRank is still shown. This isn't helpful for ranking but for selling/exchanging links because the page looks more 'worthwhile' than it is. --Doc z 07:50, 5 December 2006 (UTC)
- OK, so it's only useful for defrauding link exchangers; it doesn't increase its placement in the actual search results. Right? CGameProgrammer 02:04, 8 December 2006 (UTC)
- Exactly --Doc z 07:47, 8 December 2006 (UTC)
- OK, so it's only useful for defrauding link exchangers; it doesn't increase its placement in the actual search results. Right? CGameProgrammer 02:04, 8 December 2006 (UTC)
[edit] Removed Link to IBM Clever
There is no article display when it is clicked on. —The preceding unsigned comment was added by 67.164.65.12 (talk) 19:14, 3 January 2007 (UTC).
- Corrected link. --Macrakis 19:42, 3 January 2007 (UTC)
[edit] Definition semantics in "Simplified PageRank algorithm"
PageRank#Simplified_PageRank_algorithm What is the sense behind boldyfying (strong) the capital letters in this part?! It does not make any semantic sense or is there something I do not understand? The capital letters are visible enough because these are capital letters. Ento 16:36, 6 January 2007 (UTC)