Update README.md
This commit is contained in:
parent
fdd743478e
commit
8230ac6ed5
10
README.md
10
README.md
|
@ -14,14 +14,8 @@ which is in turn based on code from Peter Norvig's chapter [Natural Language
|
||||||
Corpus Data][chapter] from the book [Beautiful Data][book] (Segaran and
|
Corpus Data][chapter] from the book [Beautiful Data][book] (Segaran and
|
||||||
Hammerbacher, 2009).
|
Hammerbacher, 2009).
|
||||||
|
|
||||||
The data files in this repository are derived from the [Google Web Trillion Word
|
|
||||||
Corpus][corpus], as described by Thorsten Brants and Alex Franz, and
|
|
||||||
[distributed][distributed] by the Linguistic Data Consortium. Note that this
|
|
||||||
data **"may only be used for linguistic education and research"**, so for any
|
|
||||||
other usage you should acquire a different data set.
|
|
||||||
|
|
||||||
For the microbenchmark included in this repository, Instant Segment is ~100x
|
For the microbenchmark included in this repository, Instant Segment is ~100x
|
||||||
faster than the Python implementation. The API has been carefully constructed
|
faster than the Python implementation. The API was carefully constructed
|
||||||
so that multiple segmentations can share the underlying state to allow parallel
|
so that multiple segmentations can share the underlying state to allow parallel
|
||||||
usage.
|
usage.
|
||||||
|
|
||||||
|
@ -107,7 +101,5 @@ make test-python
|
||||||
[python]: https://github.com/grantjenks/python-wordsegment
|
[python]: https://github.com/grantjenks/python-wordsegment
|
||||||
[chapter]: http://norvig.com/ngrams/
|
[chapter]: http://norvig.com/ngrams/
|
||||||
[book]: http://oreilly.com/catalog/9780596157111/
|
[book]: http://oreilly.com/catalog/9780596157111/
|
||||||
[corpus]:
|
|
||||||
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
|
|
||||||
[distributed]: https://catalog.ldc.upenn.edu/LDC2006T13
|
[distributed]: https://catalog.ldc.upenn.edu/LDC2006T13
|
||||||
[issues]: https://github.com/InstantDomainSearch/instant-segment/issues
|
[issues]: https://github.com/InstantDomainSearch/instant-segment/issues
|
||||||
|
|
Loading…
Reference in New Issue