The data files in this directory are derived from the [Google Web Trillion Word Corpus][corpus], as described by Thorsten Brants and Alex Franz, and [distributed][distributed] by the Linguistic Data Consortium. Note that this data **"may only be used for linguistic education and research"**, so for any other usage you should acquire a different data set. [corpus]: [distributed]: