Wikipedia:Wikipedia CD Selection
From Wikipedia, the free encyclopedia
This is the project page for a series of Wikipedia CDs/DVDs being produced by Wikipedians and SOS Children. The 2007 DVD was a huge success, with distributions to schools in four countries, use by the Hole in the Wall (http://www.hole-in-the-wall.com/) project, thousands of downloads and disks and around 6000 unique IPs a day visiting the online version.
We are currently asking for volunteers to help with the 2008 version. Exactly what to do is detailed on the How to Help page.
Over 4000 articles were proposed for the 2007 CD by Wikipedians adding the {{WPCD}} tag to articles and by submission of all Wikipedia:Featured Articles and Wikipedia:Good Articles. These proposed were checked by SOS Children volunteers working from home or in their Cambridge offices, and were re-sorted into school curriculum subjects and recreation topics (the 0.5 version categories were uploaded and used for this). Then some additional articles were found by SOS volunteers plugging syllabus gaps, making 4625 in total.
All submitted articles then went through a clean-up script to remove Fair Use images, all sentences whose only purpose was to link to unincluded articles (e.g. "see also"), stubs and editorial content, and sections containing material unsuitable for children and external links. Where articles had been vandalised or contained questionable material the most recent good version was used. The resulting 2007 Selection is browsable at http://schools-wikipedia.org and was fixed in terms of article selection on 17 May 2007. It is now run on a flexible database so that articles can easily be added or updated. It has been released in three forms:
1) Download with only thumbnail images 792 MB from here
2) On a free DVD from the charity's offices with the full images.
3) As a BitTorrent file with the full images (about 2.6 GiB zipped).
Contents |
[edit] Press release
A joint press release between SOS Children and the Wikimedia Foundation announced the launch of the new Wikipedia Selection: see: wikimedia:Press_releases/SOSChildrenUK2007
Unlike last year, the project is now set up to allow continuous improvement so all comments and any errors are invited on the talk page. We are also still open to Wikipedia:Wikipedia_CD_Selection/additions_and_updates which will be incorporated from time to time. Other relevant project pages are Wikipedia:Wikipedia_CD_Selection/section_excludes which lists article sections to be excluded from the DVD, and Wikipedia:Wikipedia_CD_Selection/image_copyrights which explains the way images are now treated after the 2006 CD attracted some (minor) complaints. Articles to be removed should be listed directly at Wikipedia:Wikipedia_CD_Selection/deletes. Moved articles should be listed as updates here for the new location and deletions for the old version.
[edit] Differences with previous versions
Compared to the 2006 version and the 0.5 Release version notable differences are:
a) 4655 articles versus about 2000 for the others. All featured articles (except inappropriate ones) all good articles (except very recent additions and unsuitable ones) plus about 1200 additional articles to give breadth and include the remaining part of "series", all countries, all capitals, all chemical elements etc.
b) the 2007 version has the image pages for all of the images used like this http://schools-wikipedia.org/images/5/531.jpg.htm : it therefore gives image copyright information for images. This is an image requirement and something with Anthere (of the WMF) asked us to ensure for future versions after last year.
c) the content has been resorted by subject to match the UK national curriculum. This was particularly useful of itself because it showed up gaps
d) Redirects were be sorted everywhere from the redirect database (so all links which only work on WP via redirects are included in on the CD, which is a surprising proportion). Redirects are included in the title word index so that looking up Fool's gold takes you to the article on Pyrite etc.
e) The Wikimedia Foundation has now agreed to use of the Wikipedia logo, provided this stays as a non commercial project.
SOS Children now have a neat piece of software which can eat any list of the URLs of archived versions of articles and produce consistent WP with no loose ends, red links etc and run the category/subject index off an easy Ruby database. Continuous update now becomes a possibility.
[edit] Current issues
This year the vandalism rate was much higher than last year. More than 50 of the 4500 articles were found to be vandalised versus 5 of 2000 last year. This corresponds to a what seems to be a recognised falling standard on Wikipedia which is why the WP community threw in the towel and implemented nofollow etc. Managing using historical versions of articles is inadequate to project against graffiti since images get copied over and templates get vandalised. Some sort of manual check is needed too.
Other comments (since this is Wikipedia space): although Wikipedia has 1.7 m articles you quickly run into quality problems not far away from mainstream. We asked a few people who were central in a field to name the key articles but often the key themes had poor quality articles. Equally a lot of the "part of a series" info boxes included stubbed half written articles in them. This is a shame. Also a shame was that despite the huge number of topics, obvious ones are not covered. e.g. we estimated only about 10% of all the books on the UK core Eng Lit curriculum have articles at all.