free English corpora at Brigham Young University
[A “corpus,” as the term is used by contemporary linguists, is a large collection of language samples housed on a computer database. The plural of “corpus” is “corpora.”]
• Ross M has recently brought to my attention an important website, corpus.byu.edu, authored by Professor Mark Davies of Brigham Young University. The site provides free access to four English corpora (or “corpuses”), three based on American English and one on British English. The most important of the English corpora are the British National Corpus which contains 100 million words and the Corpus of Contemporary American English which contains 410 million words. Both include samples of spoken as well as written English.
• The site is free. It can be used, experimentally, without registration, but after making between ten and fifteen searches (during one or several visits), registration is required.
• Learning to use the corpora requires some time and effort, but given the potential value the information there has for teachers and students of English, this should be time and effort well spent. To speed up the process, it is a good idea to look carefully at the tutorial "The Five-minute tour" which can be accessed from the main page of both the corpora. (Use the drop-down menu "Help / information / context" located on the right-hand side of the page just below the main text box.)