EDICT

From Wikipedia, the free encyclopedia

The JMdict/EDICT project was started by Jim Breen in 1991 with the aim to provide a machine-readable Japanese to English dictionary. Since that time it has been updated and expanded by many contributors. The dictionaries resulting from the project are simply text files; other programs are needed to search and display them. Jim Breen's own online dictionary WWWJDIC is a convenient way of searching EDICT.

The original structure for the entries in the EDICT file was quite simple, and it soon became apparent that a richer structure was required to represent adequately the complexities of the Japanese lexicon. In 1999 an XML version (JMdict) was introduced which allowed for such things as multiple surface forms of lexemes and multiple readings, as well as cross-references, annotations, etc. It also catered for glosses in other languages, and is released containing French, German, Russian, etc. translation for many entries. The JMdict file, which is in UTF-8 encoding, is the primary output from the project, with the original EDICT format still being produced for systems which rely on that format. An expanded version (EDICT2), which reflects the structure of the XML entries, is also produced and is used by several systems including the WWWJDIC server. Versions are also produced in the XML format used by Apple's "Dict" application and in the EPWING/JIS X 4081 format used by many Japanese electronic dictionary systems.

This project is considered a standard Japanese-English reference on the Internet, and is used by the Unihan Database and several other Japanese-English projects. Since 2000, the EDICT project has been managed by the Electronic Dictionary Research and Development Group (EDRDG).[1] In 2010 maintenance of the dictionary was moved to an online database system.

EDICT also inspired other projects, including the CEDICT Chinese dictionary project started by Paul Denisowski in 1997.

As of August 2013, the JMdict/EDICT file had about 170,000 entries.[2]

Word class abbreviations

EDICT has a set of word class abbreviations[3] that are used to disambiguate classes with similar word endings.


References

  1. "Electronic Dictionary Research and Development Group File". Retrieved 20 June 2011. 
  2. Breen, Jim. "The EDICT Dictionary File". Monash University. Retrieved 20 June 2011. 
  3. EDICT abbreviation list: http://www.csse.monash.edu.au/~jwb/jmdict_dtd_h.html

External links

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.