7 lines
504 B
Markdown
7 lines
504 B
Markdown
The data files in this directory are derived from the [Google Web Trillion Word
|
|
Corpus][corpus], as described by Thorsten Brants and Alex Franz, and [distributed][distributed] by the
|
|
Linguistic Data Consortium. Note that this data **"may only be used for linguistic
|
|
education and research"**, so for any other usage you should acquire a different data set.
|
|
|
|
[corpus]: http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
|
|
[distributed]: https://catalog.ldc.upenn.edu/LDC2006T13 |