Add testing
This commit is contained in:
parent
2b6862e54f
commit
5e2b1fd054
16
README.md
16
README.md
|
@ -60,7 +60,7 @@ instant-segment = "*"
|
|||
|
||||
## Using
|
||||
|
||||
Instant Segment works by segmenting a string into words by selecting the splits with the highest probability given a vocabulary of words and their occurances.
|
||||
Instant Segment works by segmenting a string into words by selecting the splits with the highest probability given a corpus of words and their occurances.
|
||||
|
||||
For instance, provided that `choose` and `spain` occur more frequently than `chooses` and `pain`, Instant Segment can help you split the string `choosespain.com` into [`ChooseSpain.com`](https://instantdomainsearch.com/search/sale?q=choosespain) which more likely matches user intent.
|
||||
|
||||
|
@ -125,6 +125,20 @@ Play with the examples above to see that different numbers of occurances will in
|
|||
|
||||
The example above is succinct but, in practice, you will want to load these words and occurances from a corpus of data like the ones we provide [here](./data). Check out [the](./instant-segment/instant-segment-py/test/test.py) [tests](./instant-segment/instant-segment/src/test_data.rs) to see examples of how you might do that.
|
||||
|
||||
## Testing
|
||||
|
||||
To run the tests run the following:
|
||||
|
||||
```
|
||||
cargo t -p instant-segment --all-features
|
||||
```
|
||||
|
||||
You can also test the python bindings with:
|
||||
|
||||
```
|
||||
make test-python
|
||||
```
|
||||
|
||||
[python]: https://github.com/grantjenks/python-wordsegment
|
||||
[chapter]: http://norvig.com/ngrams/
|
||||
[book]: http://oreilly.com/catalog/9780596157111/
|
||||
|
|
|
@ -0,0 +1,7 @@
|
|||
The data files in this directory are derived from the [Google Web Trillion Word
|
||||
Corpus][corpus], as described by Thorsten Brants and Alex Franz, and [distributed][distributed] by the
|
||||
Linguistic Data Consortium. Note that this data **"may only be used for linguistic
|
||||
education and research"**, so for any other usage you should acquire a different data set.
|
||||
|
||||
[corpus]: http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
|
||||
[distributed]: https://catalog.ldc.upenn.edu/LDC2006T13
|
Loading…
Reference in New Issue