Wikipedia talk:Researching Wikipedia

From Wikipedia, the free encyclopedia

Welcome to the discussion

Moved from Wikipedia:Village pump

Contents

[edit] Coverage

So, this is kind of a weird question, but I've been thinking about ways to model Wikipedia coverage. By this I mean, if you consider the range of possible subjects we consider worth including in Wikipedia, how many of those subjects already have articles? I know that that's a really abstract idea, but I think it's probably a better measure of how "good" Wikipedia is than just raw number of articles.

One way to think of it is to imagine some kind of endpoint in the future where we've got all the information we think Wikipedia-worthy in the system. Then, we all get to sit back, and just add new articles as new people, events, countries, awards ceremonies, species, albums, books, and planets come into being. How close are we to getting there? How many of the articles in that imagined encyclopedia do we already have?

Some ways I've been thinking of measuring this:

  • Of the entries in the 1911 Encyclopedia Britannica, how many have corresponding Wikipedia articles? (Totally crude, but if we were back in 1911, wouldn't we want to have at least as much knowledge as the EB? Close to it?)
  • What percentage of Wikipedia searches come up empty? (This would measure what percentage of things Wikipedia readers think should be in the system are already there.)
  • Of the internal links inside Wikipedia, what percentage point nowhere? How many have non-stub articles at the endpoint? (This would measure what percentage of things Wikipedia authors think should be in the system are already there.)

Yes, it's probably kind of silly to think of the range of Wikipedia-worthy subjects as a finite set, and even if we had an article for every one of those subjects, the individual facts, figures, interpretations, explanations and opinions worth putting in each article is also close to infinite. But I'd like to think we're closing in on the goal of being a good, reliable encyclopedia, and I think coverage is one way to measure that. I'm just wondering if it can be modeled in any reasonable way. -- ESP 05:03 20 Jul 2003 (UTC)

First some data: The number of links to non-existent pages is currently 527137. The number of links to existent pages is 2583574. That only about 17% of all links point to non-existent pages may be surprising, but note that this includes the signatures on talk pages and similar stuff (not the navigational links, though). It's possible to exclude these from the count, but that would tax the database server substantially.
Maybe I'll use the SQL dumps to build an off-line server for doing this. That'd lower the effect it would have on the DB server. -- ESP 15:46 20 Jul 2003 (UTC)
As for comparing Wikipedia, I'm not sure the 1911 encyclopedia is such a good idea. A lot has changed in the last 92 years, making many articles obsolete, and many others would now be considered irrelevant by all but the most specialized readers. Wikipedia by now probably reflects better what most people care about than other encyclopedias -- we have very detailed articles about music festivals and groups, about actors and actresses, about fictional characters, about sex positions; stuff that you will, for the most part, not find in traditional encyclopedias, but that is nevertheless searched for a lot.
I agree that Wikipedia is not the Encyclopedia Britannica. But I find it hard to imagine subjects that were considered worthy of encyclopediizing at the turn of the century and are no longer worth encyclopediizing, at least in a historical context. Yes, the EB is racist, sexist, imperialist, and paternalistic, but that's more about the content of the entries, not the subject.
Anyways, this would be a crude measurement, agreed. If I were going to think of it as a Venn diagram, I'd say that there's a set of articles worthy of Wikipedia W and a set of articles in Encyclopedia Britannica E. E may not be a proper subset of W, but I'd posit that the difference between E and the intersection of E and W would be negligibly small. Counting articles we share with EB would show us what our coverage is in that intersection area, and might give us an rough estimate of our coverage in the part of W that's not in the intersection of W and E. -- ESP 15:38 20 Jul 2003 (UTC)
We do lack articles about specialized knowledge areas that are nevertheless of cultural importance. A good indication to see where we do not have enough motivated Wikipedians is to check out Special:Ancientpages. Mostly it's stuff about "obscure" countries that get neglected -- but if you lived in such a country, you would be very disappointed not to find that information.
A systematic comparison would take a reasonably large random sample (say, 500 articles) from modern encyclopedias and compare their articles with the respective Wikipedia articles, if any; and to do the same in the other direction as well. That way you could develop a quality rating: Wikipedia's coverage of Encarta's sample is xyz better/worse than Encarta's coverage; Encarta's coverage of Wikipedia's sample is xyz better/worse than Wikipedia's coverage (where xyz would be some points/rating system).
I like this method. I think it would be reasonable to take a selection by alphabetic sorting -- like, say, a single volume of a multi-volume encyclopedia. It wouldn't be strictly random, of course, and some work on Wikipedia has be done in alphabetical order -- starting with A, etc. -- but it would be pretty random across topics. With five people on it doing searches, and counting hits, I think it would take less than a week of people's spare time.
The nice thing about the 1911 EB is that it'd be kind of automatable -- although the electronic versions available are kind of hard to work with -- whereas this would not. -- ESP 19:55 21 Jul 2003 (UTC)
This is a big project, but it can be done collaboratively. I would definitely be willing to review a few Britannica/Encarta (German) articles. To assure non-bias, these should be assigned by the person running the project. Feel free to start Wikipedia:State of Wikipedia. --Eloquence 06:28 20 Jul 2003 (UTC)

"Edit Zero" is a terrible name. How about Alephpedia: the for all practical purposes infinite encyclopedia in Jorge Luis Borges' library?

I like the idea of "Edit Zero" -- like t0 -- the edit someone makes in a mythical future that "completes" Wikipedia up to that point in time. I don't understand the idea of Alephpedia, so I wouldn't use it. Can you explain it? -- ESP 15:42 20 Jul 2003 (UTC)

[edit] EB1911

Of the entries in the 1911 Encyclopedia Britannica, how many have corresponding Wikipedia articles?

Wikipedia:Wikipedia Signpost/2006-02-27/News and notes: Integration of 1911 Britannica finished

--Piotr Konieczny aka Prokonsul Piotrus Talk 06:55, 5 March 2006 (UTC)

[edit] List of resources

Note: this list of resources was generated in early February 2007. It may be possible to semi-automatically update the pages here with this tool, but it is not working as expected. Please manually update this page with new research or tools.-- Piotr Konieczny aka Prokonsul Piotrus | talk  18:42, 9 February 2007 (UTC)

[edit] Survey to add