Dirkjan Ochtman
3ff31ba21b
Bump version number to 0.10.1
2022-09-08 11:39:32 +02:00
Antonio Schiavon
8c69bf577c
support numbers ( #31 )
2022-09-07 17:46:32 +02:00
Dirkjan Ochtman
0f776b8e6d
Upgrade to PyO3 0.17
2022-08-29 11:36:05 -07:00
Dirkjan Ochtman
1e7bfbc3ce
Ignore faulty clippy lint for now
2022-08-15 10:23:35 +02:00
Dirkjan Ochtman
b68655c17d
Remove authors from Cargo metadata (see RFC 3052)
2022-08-15 10:23:35 +02:00
Dirkjan Ochtman
22fff673e6
Bump version to 0.10
2022-08-15 10:23:35 +02:00
Dirkjan Ochtman
c4864fe724
Upgrade ahash to 0.8
2022-08-15 10:23:35 +02:00
Dirkjan Ochtman
747dd22098
Add some minimal documentation
2022-02-28 11:38:50 +01:00
Dirkjan Ochtman
2ef7eb82ba
Bump instant-segment version to 0.9
2022-02-28 11:38:34 +01:00
Dirkjan Ochtman
eac10fb553
Apply clippy suggestion
2022-02-28 11:35:38 +01:00
Dirkjan Ochtman
3709223fa9
Update pyo3 to 0.16
2022-02-28 11:34:57 +01:00
Dirkjan Ochtman
77203fc78c
Update smartstring to 1
2022-02-28 11:34:32 +01:00
Dirkjan Ochtman
c911457226
Leverage simplified Python protocols
2021-11-10 09:41:31 +01:00
dependabot[bot]
2d97878ed7
Update pyo3 requirement from 0.14.1 to 0.15.0
...
Updates the requirements on [pyo3](https://github.com/pyo3/pyo3 ) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases )
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md )
- [Commits](https://github.com/pyo3/pyo3/compare/v0.14.1...v0.15.0 )
---
updated-dependencies:
- dependency-name: pyo3
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-11-10 09:40:14 +01:00
Dirkjan Ochtman
30a80c4a79
Add minimal pyproject.toml
2021-09-15 16:37:41 +02:00
Dirkjan Ochtman
ef38bedaa2
Improve selection of Python versions
2021-09-01 16:11:13 +02:00
Dirkjan Ochtman
950faa6854
Tweak publish workflow
2021-08-31 16:42:39 +02:00
Dirkjan Ochtman
4c317051ef
Publish platform wheels after tag push
2021-08-31 16:29:17 +02:00
Dirkjan Ochtman
fb2a46ca27
Add Python test job
2021-08-31 16:29:17 +02:00
Dirkjan Ochtman
bd12ef8b69
Fix up python test code
2021-08-31 16:29:17 +02:00
Dirkjan Ochtman
906f202611
Fix some more clippy suggestions
2021-08-31 14:19:36 +02:00
Dirkjan Ochtman
e12390ea7d
py: bump version number to 0.1.4
2021-08-31 14:16:33 +02:00
Dirkjan Ochtman
2babad4a58
Bump version to 0.8.3
2021-08-31 14:15:32 +02:00
Dirkjan Ochtman
f32b42537a
Update links to point to new GitHub org
2021-08-31 14:10:13 +02:00
Dirkjan Ochtman
edfad13ddc
Fix clippy lint
2021-08-31 14:09:23 +02:00
Beau Hartshorne
f16306499c
Update README.md
2021-08-18 13:23:38 -07:00
Beau Hartshorne
8230ac6ed5
Update README.md
2021-08-18 13:18:56 -07:00
dependabot[bot]
fdd743478e
Update pyo3 requirement from 0.13.2 to 0.14.1
...
Updates the requirements on [pyo3](https://github.com/pyo3/pyo3 ) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases )
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md )
- [Commits](https://github.com/pyo3/pyo3/compare/v0.13.2...v0.14.1 )
---
updated-dependencies:
- dependency-name: pyo3
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
2021-07-05 15:09:48 +02:00
Beau Hartshorne
e2f6f5c4a5
Update README.md
2021-06-05 14:30:22 -07:00
Dirkjan Ochtman
bc59c6cf6f
Refactor to make test segmenter more accessible
2021-06-03 16:09:24 +02:00
Dirkjan Ochtman
65b85d9806
Remove old data files
2021-06-03 16:09:24 +02:00
Dirkjan Ochtman
99ddbf7366
Update data README
2021-06-03 16:09:24 +02:00
Dirkjan Ochtman
3c52201fa0
Update test cases to deal with new data
2021-06-03 16:09:24 +02:00
Dirkjan Ochtman
fee2adb995
Add new data files
2021-06-03 16:09:24 +02:00
Dirkjan Ochtman
fcf24c7543
Add Rust code to process ngram data
2021-06-03 16:09:24 +02:00
Dirkjan Ochtman
cc95d39063
Add script to download word list input data
2021-06-03 16:09:24 +02:00
Dirkjan Ochtman
57221b1dd5
Improve test framework to show all failures
2021-06-03 16:09:24 +02:00
Dirkjan Ochtman
e4e773c896
py: bump version to 0.1.3
2021-05-28 15:21:30 +02:00
Dirkjan Ochtman
89c232e3af
py: update crate metadata
2021-05-28 15:20:28 +02:00
Dirkjan Ochtman
7214ffc126
Remove note about planned further optimizations
2021-05-28 14:44:44 +02:00
Dirkjan Ochtman
f081d4b171
py: bump version to 0.1.1
2021-05-28 14:34:04 +02:00
Dirkjan Ochtman
9edd1bc8b7
Bump version number to 0.8.2
2021-05-28 14:31:59 +02:00
Dirkjan Ochtman
85f4f94b53
Use more efficient segmentation strategy
...
Based on the triangular matrix approach as explained here:
https://towardsdatascience.com/fast-word-segmentation-for-noisy-text-2c2c41f9e8da
Use iteration rather than recursion to segment the input forwards
rather than backwards and use a `Vec`-based memoization strategy
instead of relying on a `HashMap` of words. This version is about
4.8x faster, 100 lines of code less and should use much less memory.
2021-05-28 14:30:27 +02:00
dependabot-preview[bot]
541644a329
Upgrade to GitHub-native Dependabot
2021-04-30 09:51:09 +02:00
Dirkjan Ochtman
0ebae2923c
Add license files ( fixes #15 )
2021-04-29 15:34:02 +02:00
Nick Rempel
9bbb633f1d
Flesh out README ( #14 )
2021-04-29 11:12:42 +02:00
Dirkjan Ochtman
eca12c572f
Bump version number to 0.8.1
2021-04-22 15:08:23 +02:00
Dirkjan Ochtman
bba1de7543
Simplify loop
2021-04-22 15:07:54 +02:00
Dirkjan Ochtman
c21b66ab83
Rename sentence_score() to score_sentence()
2021-04-22 15:04:48 +02:00
Dirkjan Ochtman
62f5b79d6d
py: add Segmenter::sentence_score() method
2021-04-22 15:04:06 +02:00