Commit Graph

8 Commits

Author SHA1 Message Date
Dirkjan Ochtman d539f209eb Use more efficient segmentation strategy
Based on the triangular matrix approach as explained here:

https://towardsdatascience.com/fast-word-segmentation-for-noisy-text-2c2c41f9e8da

Use iteration rather than recursion to segment the input forwards
rather than backwards and use a `Vec`-based memoization strategy
instead of relying on a `HashMap` of words. This version is about
4.8x faster, 100 lines of code less and should use much less memory.
2021-05-28 14:21:33 +02:00
Nick Rempel 9bbb633f1d
Flesh out README (#14) 2021-04-29 11:12:42 +02:00
Dirkjan Ochtman 41fb2075a6 Tighten the language a little bit 2020-12-16 10:48:31 +01:00
Dirkjan Ochtman 27d20f07e5 Add crate badges to README 2020-12-16 10:44:56 +01:00
Dirkjan Ochtman a8d93efbb6 Add cover to README 2020-12-16 10:42:35 +01:00
Dirkjan Ochtman 3a37893e74
Update README with new name 2020-12-15 21:02:22 +01:00
Dirkjan Ochtman c11da266aa Update performance claim in README 2020-11-26 11:39:57 +01:00
Dirkjan Ochtman 93bbff91ca Create initial README (fixes #1) 2020-06-19 13:14:27 +02:00