instant-segment/data
Nick Rempel 9bbb633f1d
Flesh out README (#14)
2021-04-29 11:12:42 +02:00
..
README.md Flesh out README (#14) 2021-04-29 11:12:42 +02:00
bigrams.txt Initial version 2020-05-26 20:07:00 +02:00
unigrams.txt Initial version 2020-05-26 20:07:00 +02:00
words.txt Initial version 2020-05-26 20:07:00 +02:00

README.md

The data files in this directory are derived from the Google Web Trillion Word Corpus, as described by Thorsten Brants and Alex Franz, and distributed by the Linguistic Data Consortium. Note that this data "may only be used for linguistic education and research", so for any other usage you should acquire a different data set.