Update README.md

2021-08-18 13:18:56 -07:00 · 2021-08-18 13:18:56 -07:00 · 8230ac6ed5
parent fdd743478e
commit 8230ac6ed5
1 changed files with 1 additions and 9 deletions
--- a/README.md
+++ b/README.md
@ -14,14 +14,8 @@ which is in turn based on code from Peter Norvig's chapter [Natural Language
 Corpus Data][chapter] from the book [Beautiful Data][book] (Segaran and
 Hammerbacher, 2009).

-The data files in this repository are derived from the [Google Web Trillion Word
-Corpus][corpus], as described by Thorsten Brants and Alex Franz, and
-[distributed][distributed] by the Linguistic Data Consortium. Note that this
-data **"may only be used for linguistic education and research"**, so for any
-other usage you should acquire a different data set.
-
 For the microbenchmark included in this repository, Instant Segment is ~100x
-faster than the Python implementation. The API has been carefully constructed
+faster than the Python implementation. The API was carefully constructed
 so that multiple segmentations can share the underlying state to allow parallel
 usage.

@ -107,7 +101,5 @@ make test-python
 [python]: https://github.com/grantjenks/python-wordsegment
 [chapter]: http://norvig.com/ngrams/
 [book]: http://oreilly.com/catalog/9780596157111/
-[corpus]:
-  http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
 [distributed]: https://catalog.ldc.upenn.edu/LDC2006T13
 [issues]: https://github.com/InstantDomainSearch/instant-segment/issues