Update README with new name

This commit is contained in:
Dirkjan Ochtman 2020-12-15 21:02:22 +01:00 committed by GitHub
parent cb3c9707ef
commit 3a37893e74
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 5 additions and 5 deletions

View File

@ -1,9 +1,9 @@
# word-segmenters: fast English word segmentation in Rust # instant-segment: fast English word segmentation in Rust
[![Build status](https://github.com/InstantDomainSearch/word-segmenters/workflows/CI/badge.svg)](https://github.com/InstantDomainSearch/word-segmenters/actions?query=workflow%3ACI) [![Build status](https://github.com/InstantDomainSearch/instant-segment/workflows/CI/badge.svg)](https://github.com/InstantDomainSearch/instant-segment/actions?query=workflow%3ACI)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE-APACHE) [![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE-APACHE)
word-segmenters is a fast Apache-2.0 library for English word segmentation. instant-segment is a fast Apache-2.0 library for English word segmentation.
It is based on the Python [wordsegment][python] project written by Grant Jenkins, It is based on the Python [wordsegment][python] project written by Grant Jenkins,
which is in turn based on code from Peter Norvig's chapter [Natural Language which is in turn based on code from Peter Norvig's chapter [Natural Language
Corpus Data][chapter] from the book [Beautiful Data][book] (Segaran and Hammerbacher, 2009). Corpus Data][chapter] from the book [Beautiful Data][book] (Segaran and Hammerbacher, 2009).
@ -13,7 +13,7 @@ Corpus][corpus], as described by Thorsten Brants and Alex Franz, and [distribute
Linguistic Data Consortium. Note that this data **"may only be used for linguistic Linguistic Data Consortium. Note that this data **"may only be used for linguistic
education and research"**, so for any other usage you should acquire a different data set. education and research"**, so for any other usage you should acquire a different data set.
For the microbenchmark included in this repository, word-segmenters is ~17x faster than For the microbenchmark included in this repository, instant-segment is ~17x faster than
the Python implementation. Further optimizations are planned -- see the [issues][issues]. the Python implementation. Further optimizations are planned -- see the [issues][issues].
The API has been carefully constructed so that multiple segmentations can share The API has been carefully constructed so that multiple segmentations can share
the underlying state (mainly the unigram and bigram maps) to allow parallel usage. the underlying state (mainly the unigram and bigram maps) to allow parallel usage.
@ -23,4 +23,4 @@ the underlying state (mainly the unigram and bigram maps) to allow parallel usag
[book]: http://oreilly.com/catalog/9780596157111/ [book]: http://oreilly.com/catalog/9780596157111/
[corpus]: http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html [corpus]: http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
[distributed]: https://catalog.ldc.upenn.edu/LDC2006T13 [distributed]: https://catalog.ldc.upenn.edu/LDC2006T13
[issues]: https://github.com/InstantDomainSearch/word-segmenters/issues [issues]: https://github.com/InstantDomainSearch/instant-segment/issues