Commit Graph

146 Commits

Author SHA1 Message Date
Dirkjan Ochtman cb4ded8f51 Use string for Python 3.10 2022-09-13 12:19:43 +02:00
Dirkjan Ochtman f1ee97de0f Add Python 3.10 to publish workflow 2022-09-13 11:59:15 +02:00
Dirkjan Ochtman a5deae92ca Bump version for Python bindings to 0.1.5 2022-09-13 11:59:15 +02:00
Dirkjan Ochtman 3ff31ba21b Bump version number to 0.10.1 2022-09-08 11:39:32 +02:00
Antonio Schiavon 8c69bf577c
support numbers (#31) 2022-09-07 17:46:32 +02:00
Dirkjan Ochtman 0f776b8e6d Upgrade to PyO3 0.17 2022-08-29 11:36:05 -07:00
Dirkjan Ochtman 1e7bfbc3ce Ignore faulty clippy lint for now 2022-08-15 10:23:35 +02:00
Dirkjan Ochtman b68655c17d Remove authors from Cargo metadata (see RFC 3052) 2022-08-15 10:23:35 +02:00
Dirkjan Ochtman 22fff673e6 Bump version to 0.10 2022-08-15 10:23:35 +02:00
Dirkjan Ochtman c4864fe724 Upgrade ahash to 0.8 2022-08-15 10:23:35 +02:00
Dirkjan Ochtman 747dd22098 Add some minimal documentation 2022-02-28 11:38:50 +01:00
Dirkjan Ochtman 2ef7eb82ba Bump instant-segment version to 0.9 2022-02-28 11:38:34 +01:00
Dirkjan Ochtman eac10fb553 Apply clippy suggestion 2022-02-28 11:35:38 +01:00
Dirkjan Ochtman 3709223fa9 Update pyo3 to 0.16 2022-02-28 11:34:57 +01:00
Dirkjan Ochtman 77203fc78c Update smartstring to 1 2022-02-28 11:34:32 +01:00
Dirkjan Ochtman c911457226 Leverage simplified Python protocols 2021-11-10 09:41:31 +01:00
dependabot[bot] 2d97878ed7 Update pyo3 requirement from 0.14.1 to 0.15.0
Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
- [Commits](https://github.com/pyo3/pyo3/compare/v0.14.1...v0.15.0)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-10 09:40:14 +01:00
Dirkjan Ochtman 30a80c4a79 Add minimal pyproject.toml 2021-09-15 16:37:41 +02:00
Dirkjan Ochtman ef38bedaa2 Improve selection of Python versions 2021-09-01 16:11:13 +02:00
Dirkjan Ochtman 950faa6854 Tweak publish workflow 2021-08-31 16:42:39 +02:00
Dirkjan Ochtman 4c317051ef Publish platform wheels after tag push 2021-08-31 16:29:17 +02:00
Dirkjan Ochtman fb2a46ca27 Add Python test job 2021-08-31 16:29:17 +02:00
Dirkjan Ochtman bd12ef8b69 Fix up python test code 2021-08-31 16:29:17 +02:00
Dirkjan Ochtman 906f202611 Fix some more clippy suggestions 2021-08-31 14:19:36 +02:00
Dirkjan Ochtman e12390ea7d py: bump version number to 0.1.4 2021-08-31 14:16:33 +02:00
Dirkjan Ochtman 2babad4a58 Bump version to 0.8.3 2021-08-31 14:15:32 +02:00
Dirkjan Ochtman f32b42537a Update links to point to new GitHub org 2021-08-31 14:10:13 +02:00
Dirkjan Ochtman edfad13ddc Fix clippy lint 2021-08-31 14:09:23 +02:00
Beau Hartshorne f16306499c
Update README.md 2021-08-18 13:23:38 -07:00
Beau Hartshorne 8230ac6ed5
Update README.md 2021-08-18 13:18:56 -07:00
dependabot[bot] fdd743478e Update pyo3 requirement from 0.13.2 to 0.14.1
Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version.
- [Release notes](https://github.com/pyo3/pyo3/releases)
- [Changelog](https://github.com/PyO3/pyo3/blob/main/CHANGELOG.md)
- [Commits](https://github.com/pyo3/pyo3/compare/v0.13.2...v0.14.1)

---
updated-dependencies:
- dependency-name: pyo3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-07-05 15:09:48 +02:00
Beau Hartshorne e2f6f5c4a5
Update README.md 2021-06-05 14:30:22 -07:00
Dirkjan Ochtman bc59c6cf6f Refactor to make test segmenter more accessible 2021-06-03 16:09:24 +02:00
Dirkjan Ochtman 65b85d9806 Remove old data files 2021-06-03 16:09:24 +02:00
Dirkjan Ochtman 99ddbf7366 Update data README 2021-06-03 16:09:24 +02:00
Dirkjan Ochtman 3c52201fa0 Update test cases to deal with new data 2021-06-03 16:09:24 +02:00
Dirkjan Ochtman fee2adb995 Add new data files 2021-06-03 16:09:24 +02:00
Dirkjan Ochtman fcf24c7543 Add Rust code to process ngram data 2021-06-03 16:09:24 +02:00
Dirkjan Ochtman cc95d39063 Add script to download word list input data 2021-06-03 16:09:24 +02:00
Dirkjan Ochtman 57221b1dd5 Improve test framework to show all failures 2021-06-03 16:09:24 +02:00
Dirkjan Ochtman e4e773c896 py: bump version to 0.1.3 2021-05-28 15:21:30 +02:00
Dirkjan Ochtman 89c232e3af py: update crate metadata 2021-05-28 15:20:28 +02:00
Dirkjan Ochtman 7214ffc126 Remove note about planned further optimizations 2021-05-28 14:44:44 +02:00
Dirkjan Ochtman f081d4b171 py: bump version to 0.1.1 2021-05-28 14:34:04 +02:00
Dirkjan Ochtman 9edd1bc8b7 Bump version number to 0.8.2 2021-05-28 14:31:59 +02:00
Dirkjan Ochtman 85f4f94b53 Use more efficient segmentation strategy
Based on the triangular matrix approach as explained here:

https://towardsdatascience.com/fast-word-segmentation-for-noisy-text-2c2c41f9e8da

Use iteration rather than recursion to segment the input forwards
rather than backwards and use a `Vec`-based memoization strategy
instead of relying on a `HashMap` of words. This version is about
4.8x faster, 100 lines of code less and should use much less memory.
2021-05-28 14:30:27 +02:00
dependabot-preview[bot] 541644a329 Upgrade to GitHub-native Dependabot 2021-04-30 09:51:09 +02:00
Dirkjan Ochtman 0ebae2923c Add license files (fixes #15) 2021-04-29 15:34:02 +02:00
Nick Rempel 9bbb633f1d
Flesh out README (#14) 2021-04-29 11:12:42 +02:00
Dirkjan Ochtman eca12c572f Bump version number to 0.8.1 2021-04-22 15:08:23 +02:00