Million Book Project

From Wikipedia, the free encyclopedia

This article or section needs to be updated.
Please update the article to reflect recent events / newly available information, and remove this template when finished.

The Million Book Project (or the Universal Library), led by Carnegie Mellon University School of Computer Science and University Libraries, aims to digitize a million books by 2007. Working with government and research partners in India and China, the project is scanning books in many languages, using OCR to enable full text searching, and providing free-to-read access to the books on the web. As of today, they have completed the scanning of 1 Million Books and have made accessible the entire database from http://www.ulib.org.

Twenty-two scanning centers are operating in India, including four mega-centers. Eighteen centers are running in China, including a mega-center in a free-trade zone to avoid customs delays with shipments of books from the United States. Materials are also being scanned in Egypt, Hawaii, and Carnegie Mellon.

By December 2007 more than 1.5 million books have been scanned, in 20 languages: 970,000 in Chinese; 360,000 in English; 50,000 in Telugu and 40,000 in Arabic [1]. Most of the books are in the public domain, but permission has been acquired to include over 60,000 copyrighted books (roughly 53,000 in English and 7,000 in Indian languages). The books will be mirrored at sites in India, China, Carnegie Mellon, the Internet Archive, and possibly other locations. The books that have been scanned to date are not yet all available online, and no single site has copies of all the books that are available online.

The million book project will provide a wide array of content, but one of its collection strengths will be agriculture. In partnership with the United Nations Food and Agriculture Organization, the United States National Agricultural Library, and university libraries with quality agriculture collections, the project is digitizing materials and developing plans for a knowledge network to improve rural community access to critical agricultural information.

Significant research is underway in the project, including OCR for Indian and Arabic languages and scripts. The research also includes developments in machine translation, automatic summarization, image processing, large-scale database management, user interface design, and strategies for acquiring copyright permission at an affordable cost. Indian partners have developed a translating and transliterating user interface. Partners in Egypt are developing an interface that supports annotation and highlighting. Partners in China have made remarkable progress on content-based image retrieval and machine analysis of calligraphic scripts. Carnegie Mellon has taken strides in machine translation and automatic summarization.

The National Science Foundation (NSF) awarded Carnegie Mellon $3.63M over four years for equipment and administrative travel for the Million Book Project. India is providing $25M annually to support language translation research projects. The Ministry of Education in China is providing $8.46M over three years. The Internet Archive has provided equipment, staff and money. The University of California Libraries at Merced funded the work to acquire copyright permission from U.S. publishers.

India, China and the U.S. agreed in November 2005 to join the Open Content Alliance (OCA), initiated by Brewster Kahle and the Internet Archive, because the goals of the OCA are consistent with those of the Million Book Project and the Universal Digital Library.

1 Key U.S. participants
2 Chinese partner institutions
3 India partner institutions
4 US partner institutions
5 References
6 External links

[edit] Key U.S. participants

A list of key participants in this project include:^[1]

Dr. Mark Kamlet, Provost, Carnegie Mellon--Kamlet led the delegation to China in 2002 and will lead the delegation to India in 2003.
Dr. Raj Reddy, Simon University Professor, Institute for Software Research International (ISRI), Carnegie Mellon
Dr. Gloriana St. Clair, Dean of University Libraries, Carnegie Mellon
Dr. Ching-chih Chen, Professor, Simmons College, Boston
Dr. Michael Shamos, Distinguished Career Professor and Principal System Scientist, School of Computer Science, Carnegie Mellon
Dr. Jaime Carbonell, Director of the Language Technologies Institute and Professor, School of Computer Science, Carnegie Mellon
Dr. Peter P. Chen, Distinguished Chair Professor of Computer Science, Louisiana State University
Ms. Gabrielle Michalek, Head, Archives and Digital Library Initiatives, Carnegie Mellon University Libraries
Ms. Denise Troll Covey, Principal Librarian for Special Projects, Carnegie Mellon
Ms. Erika Linke, Senior Librarian and Associate Dean of University Libraries (Collection and User Services), Carnegie Mellon
Mr. Brewster Kahle, Internet Archive
Phyllis Spies, Andrew Wang, and Lorraine Normore, OCLC
Bruce Miller, University Librarian, University of California at Merced

[edit] Chinese partner institutions

The institutions in China which are participants in this project include:^[1]

[edit] India partner institutions

The institutions in India which are participants in this project include:^[1]

Indian Institute of Science, Bangalore
International Institute of Information Technology
Indian Institute of Information Technology
Anna University, Chennai
Mysore University, Mysore
University of Pune, Pune
Goa University, Goa
Tirumala Tirupati Devasthanams, Tirupathi
Shanmugha Arts, Science, Technology & Research Academy, Tanjore
Arulmigu Kalasalingam College of Engineering, Srivilliputhur
Maharashtra Industrial Development Corporation, Mumbai

[edit] US partner institutions

The institutions in the U.S. which are participants include:^[1]

[edit] References

^ ^a ^b ^c ^d "Frequently Asked Questions About the Million Book Project," Carnegie Mellon University website.

This article does not cite any references or sources. (December 2006)
Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed.

[edit] External links

Categories: Digital libraries | Carnegie Mellon University

Hidden categories: Wikipedia articles in need of updating | Articles lacking sources from December 2006 | All articles lacking sources

Million Book Project

From Wikipedia, the free encyclopedia

Contents

[edit] Key U.S. participants

[edit] Chinese partner institutions

[edit] India partner institutions

[edit] US partner institutions

[edit] References

[edit] External links

Views

Navigation

Interaction

Search

Languages