Commit Graph

35 Commits

Author SHA1 Message Date
Dirkjan Ochtman 96187965b6 Extract public asssert_segments() function 2021-02-04 10:40:04 +01:00
Dirkjan Ochtman 45e569379c Default to calculating total from unigram map 2021-02-04 10:36:30 +01:00
Dirkjan Ochtman 0d2930c408 Add API to create segmenter from hashmaps directly 2021-02-04 10:36:30 +01:00
Dirkjan Ochtman b85fc6adc2 Rename testcases to test_cases 2021-02-04 10:36:30 +01:00
Dirkjan Ochtman 55cc7c54a3 Use powi() instead of powf() for performance 2021-02-04 10:17:11 +01:00
Dirkjan Ochtman 970caeba44 Use std HashMap to simplify API 2021-02-04 10:16:38 +01:00
Dirkjan Ochtman 29d2d94a8d Reorganize tests and test data to expose test cases 2021-02-01 17:25:32 +01:00
Dirkjan Ochtman cb3c9707ef Add docstring for Segmenter type 2020-12-07 14:51:10 +01:00
Dirkjan Ochtman adf7995adb Remove now unused error type 2020-12-07 14:51:10 +01:00
Dirkjan Ochtman 2ab57ca0b1 Fix typo 2020-12-07 14:36:59 +01:00
Dirkjan Ochtman c571996925 Simplify bigram scoring algorithm 2020-12-07 14:24:33 +01:00
Dirkjan Ochtman 912e6477e3 Fix clippy problems in test data setup 2020-12-07 11:46:42 +01:00
Dirkjan Ochtman eeb9c77bc7 Simplify Segmenter setup API 2020-12-07 11:39:49 +01:00
Dirkjan Ochtman d554825594 Name complex type as suggested by clippy 2020-11-26 11:33:36 +01:00
Dirkjan Ochtman 691ecbc3c6 Simplify handling of empty tails 2020-11-26 11:20:06 +01:00
Dirkjan Ochtman ae3896b47b Use range for previous argument as well 2020-11-26 11:15:27 +01:00
Dirkjan Ochtman bc20e39c1e Make slicing cheaper by adding a little unsafe code 2020-11-26 11:14:53 +01:00
Dirkjan Ochtman bb1b1db9c5 Pass Range instead of str to search() 2020-11-26 11:13:35 +01:00
Dirkjan Ochtman 4be435e0fb Make split values absolute instead of relative 2020-11-26 11:12:52 +01:00
Dirkjan Ochtman b7daaff47a Simplify top-level loop 2020-11-26 10:46:27 +01:00
Dirkjan Ochtman 2f9cb95b5c Avoid allocations for split vectors 2020-11-26 10:46:23 +01:00
Dirkjan Ochtman a1f03e32fe Remove unused lifetime 2020-11-25 17:33:50 +01:00
Dirkjan Ochtman 47271ff81e Allocate a single Vec to back cached splits 2020-11-25 17:29:13 +01:00
Dirkjan Ochtman 947e003a48 Store splits instead of string slices 2020-11-25 17:29:13 +01:00
Dirkjan Ochtman 1df3c4397e Inline TextDivider iterator 2020-11-25 17:29:13 +01:00
Dirkjan Ochtman ead9a3064b Better typed handling of previous word 2020-11-25 17:29:13 +01:00
Dirkjan Ochtman ea4438f2e8 Make Segmenter::score() slightly more efficient 2020-11-25 17:29:13 +01:00
Dirkjan Ochtman 540348f703 Abstract over test data format code and API 2020-11-25 17:29:13 +01:00
Dirkjan Ochtman 0d7fbd53e7 Prevent allocations where possible 2020-11-25 17:29:11 +01:00
Dirkjan Ochtman 1b4377715f Move from err-derive to thiserror 2020-11-23 13:23:16 +01:00
Dirkjan Ochtman 76bdcf1ca5 Separate state from Segmenter 2020-05-28 19:56:13 +02:00
Dirkjan Ochtman 98a8368be6 Avoid string allocations for search 2020-05-28 19:56:13 +02:00
Dirkjan Ochtman b9c8402b0c Prevent allocations for memo keys 2020-05-28 19:56:13 +02:00
Dirkjan Ochtman 0f69f267d8 Use ahash for hashing 2020-05-28 19:56:13 +02:00
Dirkjan Ochtman 38f9747c92 Initial version 2020-05-26 20:07:00 +02:00