Commit Graph

16 Commits

Author SHA1 Message Date
Jonathan Cochran a1e64e5659 replace cover image 2023-11-15 11:31:09 -08:00
jinglybits 2b47ca2ad4
replace cover image (#48)
* replace cover image

* replaced cover image with optimized svg
2023-11-15 10:13:42 -08:00
Michael Partheil 3b3627422b Use nested `HashMap` for storing both unigram and bigram scores 2023-10-17 14:42:50 +02:00
Dirkjan Ochtman f32b42537a Update links to point to new GitHub org 2021-08-31 14:10:13 +02:00
Beau Hartshorne f16306499c
Update README.md 2021-08-18 13:23:38 -07:00
Beau Hartshorne 8230ac6ed5
Update README.md 2021-08-18 13:18:56 -07:00
Beau Hartshorne e2f6f5c4a5
Update README.md 2021-06-05 14:30:22 -07:00
Dirkjan Ochtman 7214ffc126 Remove note about planned further optimizations 2021-05-28 14:44:44 +02:00
Dirkjan Ochtman 85f4f94b53 Use more efficient segmentation strategy
Based on the triangular matrix approach as explained here:

https://towardsdatascience.com/fast-word-segmentation-for-noisy-text-2c2c41f9e8da

Use iteration rather than recursion to segment the input forwards
rather than backwards and use a `Vec`-based memoization strategy
instead of relying on a `HashMap` of words. This version is about
4.8x faster, 100 lines of code less and should use much less memory.
2021-05-28 14:30:27 +02:00
Nick Rempel 9bbb633f1d
Flesh out README (#14) 2021-04-29 11:12:42 +02:00
Dirkjan Ochtman 41fb2075a6 Tighten the language a little bit 2020-12-16 10:48:31 +01:00
Dirkjan Ochtman 27d20f07e5 Add crate badges to README 2020-12-16 10:44:56 +01:00
Dirkjan Ochtman a8d93efbb6 Add cover to README 2020-12-16 10:42:35 +01:00
Dirkjan Ochtman 3a37893e74
Update README with new name 2020-12-15 21:02:22 +01:00
Dirkjan Ochtman c11da266aa Update performance claim in README 2020-11-26 11:39:57 +01:00
Dirkjan Ochtman 93bbff91ca Create initial README (fixes #1) 2020-06-19 13:14:27 +02:00