Add testing

This commit is contained in:
Nicholas Rempel 2021-04-23 10:10:08 -07:00
parent 2b6862e54f
commit 5e2b1fd054
2 changed files with 22 additions and 1 deletions

View File

@ -60,7 +60,7 @@ instant-segment = "*"
## Using
Instant Segment works by segmenting a string into words by selecting the splits with the highest probability given a vocabulary of words and their occurances.
Instant Segment works by segmenting a string into words by selecting the splits with the highest probability given a corpus of words and their occurances.
For instance, provided that `choose` and `spain` occur more frequently than `chooses` and `pain`, Instant Segment can help you split the string `choosespain.com` into [`ChooseSpain.com`](https://instantdomainsearch.com/search/sale?q=choosespain) which more likely matches user intent.
@ -125,6 +125,20 @@ Play with the examples above to see that different numbers of occurances will in
The example above is succinct but, in practice, you will want to load these words and occurances from a corpus of data like the ones we provide [here](./data). Check out [the](./instant-segment/instant-segment-py/test/test.py) [tests](./instant-segment/instant-segment/src/test_data.rs) to see examples of how you might do that.
## Testing
To run the tests run the following:
```
cargo t -p instant-segment --all-features
```
You can also test the python bindings with:
```
make test-python
```
[python]: https://github.com/grantjenks/python-wordsegment
[chapter]: http://norvig.com/ngrams/
[book]: http://oreilly.com/catalog/9780596157111/

7
data/README.md Normal file
View File

@ -0,0 +1,7 @@
The data files in this directory are derived from the [Google Web Trillion Word
Corpus][corpus], as described by Thorsten Brants and Alex Franz, and [distributed][distributed] by the
Linguistic Data Consortium. Note that this data **"may only be used for linguistic
education and research"**, so for any other usage you should acquire a different data set.
[corpus]: http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
[distributed]: https://catalog.ldc.upenn.edu/LDC2006T13