Corpus of Contemporary American English

The Corpus of Contemporary American English (COCA) is 450-million-word corpus of American English. It is one of the largest currently available corpora, and is the only publicly available corpus of American English to contain a wide array of texts from a number of genres.

It was created by Mark Davies, Professor of Corpus Linguistics at Brigham Young University.[1]

Content

The corpus is composed of more than 450 million words from more than 160,000 texts, including 20 million words each year from 1990 to 2015. The most recent update was made in December 2015. The corpus is used by approximately tens of thousands of people each month, which may make it the most widely used "structured" corpus currently available.

For each year, the corpus is evenly divided between the five genres: spoken, fiction, popular magazines, newspapers, and academic journals. The texts come from a variety of sources:

Availability

The corpus is free to search through its web interface[2], with a limit on the number of queries per day, and less-restricted access is available at cost[3]. The full corpus texts are available for a further fee[4].

Queries

See also

References

  1. Kauhanen, Henri (2011-03-21). "The Corpus of Contemporary American English: Background and history". VARIENG. Retrieved 2011-10-13.
  2. "Corpus of Contemporary American English". Corpus of Contemporary American English. Retrieved 20 July 2017.
  3. "BYU corpora: Premium". BYU corpora. Retrieved 20 July 2017.
  4. "Corpus data: Purchase". Retrieved 20 July 2017.

Bibliography

  • Davies, Mark (2010). "The Corpus of Contemporary American English as the First Reliable Monitor Corpus of English". Literary and Linguistic Computing. 25 (4): 447–65. doi:10.1093/llc/fqq018. 
  • Bennett, Gena R. (2010). Using Corpora in the Language Learning Classroom: Corpus Linguistics for Teachers. Ann Arbor, Michigan: University of Michigan. p. 144. ISBN 978-0-472-03385-0. 
  • Davies, Mark (2010). "More than a peephole: Using large and diverse online corpora". International Journal of Corpus Linguistics. 15 (3): 405–11. doi:10.1075/ijcl.15.3.13dav. 
  • Anderson, Wendy; Corbett, John (2009), Exploring English with Online Corpora, Palgrave Macmillan, p. 205, ISBN 978-0-230-55140-4 
  • Davies, Mark (2009). "The 385+ Million Word Corpus of Contemporary American English (1990–present)". International Journal of Corpus Linguistics. John Benjamins Publishing Company. 14 (2): 159–190(32). doi:10.1075/ijcl.14.2.02dav. 
  • Lindquist, Hans (2009). Corpus Linguistics and the Description of English. Edinburgh University Press. ISBN 978-0-7486-2615-1. 
  • Davies, Mark (2005). "The advantage of using relational databases for large corpora: Speed, advanced queries, and unlimited annotation". International Journal of Corpus Linguistics. John Benjamins Publishing Company. 10 (3): 307–334(28). doi:10.1075/ijcl.10.3.02dav. 


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.