Oxford English Corpus

From Wikipedia, the free encyclopedia

The Oxford English Corpus is a collection of English language texts used by the makers of the Oxford English Dictionary and by Oxford University Press's language research programme. It is the largest corpus of its kind, containing over one billion words. The sources for these words are writings of all sorts, from "literary novels and specialist journals to everyday newspapers and magazines and from Hansard to the language of chatrooms, emails, and weblogs"^[1]. This may be contrasted with similar databases that sample only a specific kind of writing.

The digital version of the Oxford English Corpus is formatted in XML and usually analyzed with Sketch Engine software.^[2]

Each document in the OE Corpus is accompanied by metadata naming: