instant-segment/data
Dirkjan Ochtman fee2adb995 Add new data files 2021-06-03 16:09:24 +02:00
..
README.md Flesh out README (#14) 2021-04-29 11:12:42 +02:00
bigrams.txt Initial version 2020-05-26 20:07:00 +02:00
en-bigrams.txt Add new data files 2021-06-03 16:09:24 +02:00
en-unigrams.txt Add new data files 2021-06-03 16:09:24 +02:00
grab.py Add script to download word list input data 2021-06-03 16:09:24 +02:00
unigrams.txt Initial version 2020-05-26 20:07:00 +02:00
words.txt Initial version 2020-05-26 20:07:00 +02:00

README.md

The data files in this directory are derived from the Google Web Trillion Word Corpus, as described by Thorsten Brants and Alex Franz, and distributed by the Linguistic Data Consortium. Note that this data "may only be used for linguistic education and research", so for any other usage you should acquire a different data set.