Michael Partheil
b3e566dfce
Update Readme to reflect recent perf improvements
2023-11-15 15:47:33 -08:00
jinglybits
3a61fd2c55
replace cover image ( #49 )
2023-11-15 11:37:13 -08:00
jinglybits
2b47ca2ad4
replace cover image ( #48 )
...
* replace cover image
* replaced cover image with optimized svg
2023-11-15 10:13:42 -08:00
Michael Partheil
3b3627422b
Use nested `HashMap` for storing both unigram and bigram scores
2023-10-17 14:42:50 +02:00
Dirkjan Ochtman
f32b42537a
Update links to point to new GitHub org
2021-08-31 14:10:13 +02:00
Beau Hartshorne
f16306499c
Update README.md
2021-08-18 13:23:38 -07:00
Beau Hartshorne
8230ac6ed5
Update README.md
2021-08-18 13:18:56 -07:00
Beau Hartshorne
e2f6f5c4a5
Update README.md
2021-06-05 14:30:22 -07:00
Dirkjan Ochtman
7214ffc126
Remove note about planned further optimizations
2021-05-28 14:44:44 +02:00
Dirkjan Ochtman
85f4f94b53
Use more efficient segmentation strategy
...
Based on the triangular matrix approach as explained here:
https://towardsdatascience.com/fast-word-segmentation-for-noisy-text-2c2c41f9e8da
Use iteration rather than recursion to segment the input forwards
rather than backwards and use a `Vec`-based memoization strategy
instead of relying on a `HashMap` of words. This version is about
4.8x faster, 100 lines of code less and should use much less memory.
2021-05-28 14:30:27 +02:00
Nick Rempel
9bbb633f1d
Flesh out README ( #14 )
2021-04-29 11:12:42 +02:00
Dirkjan Ochtman
41fb2075a6
Tighten the language a little bit
2020-12-16 10:48:31 +01:00
Dirkjan Ochtman
27d20f07e5
Add crate badges to README
2020-12-16 10:44:56 +01:00
Dirkjan Ochtman
a8d93efbb6
Add cover to README
2020-12-16 10:42:35 +01:00
Dirkjan Ochtman
3a37893e74
Update README with new name
2020-12-15 21:02:22 +01:00
Dirkjan Ochtman
c11da266aa
Update performance claim in README
2020-11-26 11:39:57 +01:00
Dirkjan Ochtman
93bbff91ca
Create initial README ( fixes #1 )
2020-06-19 13:14:27 +02:00