MeCab

MeCab
Developer(s) Taku Kudou, Google Japanese Input project
Stable release
0.996 / 18 February 2013 (2013-02-18)
Development status active
Written in C++, has modules for C, C#, Java, Perl, Python, and Ruby
Platform Cross-platform
License Tri-licensed under GPL, LGPL and BSD licenses
Website http://taku910.github.io/mecab

MeCab is an open source text segmentation library for use with text written in the Japanese language originally developed by the Nara Institute of Science and Technology and currently maintained by Taku Kudou (工藤拓) as part of his work on the Google Japanese Input project.[1][2] The name derives from the developer's favorite food, mekabu (和布蕪), a Japanese dish made from wakame leaves.

The software was originally based on ChaSen and was developed under the name ChaSenTNG, but now it is developed independently from ChaSen and was rewritten from scratch. MeCab's analysis accuracy is comparable to ChaSen, and its analysis speed is 3-4 times faster on average.

MeCab can analyze and segment a sentence into its parts of speech. There are several dictionaries available for MeCab, but IPADIC is the most commonly used one as with ChaSen.

In 2007, Google used MeCab to generate n-gram data for a large corpus of Japanese text, which it published on its Google Japan blog.[3]

MeCab is also used for Japanese input on Mac OS X 10.5 and 10.6, and in iOS since version 2.1.[4][5]

References

  1. "「ググる」の精度を高めるために必要なもの - @IT自分戦略研究所" [What Google needs to improve its accuracy]. ITmedia (in Japanese). 2006-03-15. Retrieved 2009-04-09.
  2. "思いどおりの日本語入力 - Google 日本語入力" [Towards more accurate Japanese input]. Google (in Japanese). 2009-12-03. Retrieved 2009-12-03.
  3. "Google Japan Blog: 大規模日本語 n-gram データの公開" [Publication of n-gram data across large Japanese text corpus]. Google (in Japanese). 2007-11-01. Retrieved 2009-04-09.
  4. "大規模テキスト処理を支える形態素解析技術(工藤拓氏・Google)" [(Lecture) Morphological analysis supports large scale text processing (By Mr. Taku Kudou, employee at Google)] (in Japanese). 2009-12-03. Retrieved 2009-12-03.
  5. "iPhoneの仮名漢字変換はMeCabを利用" [iPhone uses MeCab for kana-kanji conversion] (in Japanese). 2009-12-03. Archived from the original on 2008-09-18. Retrieved 2009-12-03.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.