Doug Cutting
From Wikipedia, the free encyclopedia
Doug Cutting is an advocate and creator of open-source search technology. He originated the Lucene and, with Mike Cafarella, the Nutch open-source search technology projects, which are now managed through the Apache Software Foundation. Prior to developing Lucene, Doug held search technology positions at Excite, Apple Inc. and Xerox PARC. Lucene, a search indexer, and Nutch, a spider or crawler, are the two key components of an open-source general search platform, which first crawls the Web for content, and then structures it into a searchable index. Cutting's leadership of these two projects extended the concepts and capabilities of general open-source software projects such as Linux and MySQL into the important vertical domain of search. While it is difficult to track the total number of installations of these platforms, public announcements of the use of Lucene and its direct descendant Solr by various venture-backed startups indicate a significant level of adoption. Perhaps the most significant deployment of Lucene is Wikipedia, where it powers search for the entire site.[1]
In December 2004, Google Labs published a paper on the MapReduce algorithm, which allows very large scale computations to be trivially parallelized across large clusters of servers. Cutting, realizing the importance of this paper to extending Lucene into the realm of extremely large (web-scale) search problems, created the open-source Hadoop framework that allows applications based on the MapReduce paradigm to be run on large clusters of commodity hardware. He is currently an employee of Yahoo!, where he leads the Hadoop project full-time.
[edit] References
- ^ Wikipedia: Powered by Lucene. Lucene. Retrieved on September 5, 2007.
[edit] External links
- Doug Cutting's blog.
- An interview with Doug Cutting
- Video interview of Doug Cutting
- Doug Cutting's publications and patents
- Doug Cutting joins Yahoo!
- Blog post by Tom White about Doug Cutting creating Hadoop Note that this post was written while Hadoop was still an un-named spinoff of Nutch. Tom updates his earlier post with the Hadoop name here.