commit | 3978cd7954485c8c12ee9c1e9427511bf8461190 | [log] [tgz] |
---|---|---|

author | Danny Guo <dannyguo91@gmail.com> | Sun May 05 21:00:43 2019 |

committer | Danny Guo <dannyguo91@gmail.com> | Sun May 05 21:00:43 2019 |

tree | 23ec6a818968a415add5c844c59c4e22ddf7d0ce | |

parent | 5907665163699d97467ba889cc382768a2afe466 [diff] |

Use a flat vector in Damerau-Levenshtein Instead of representing a 2x2 grid with a vector of vectors, just use a single vector to improve performance. We can do this since the dimensions are fixed. This method was suggested by @lovasoa as an alternative to adding a dependency on the ndarray crate. In my benchmark testing, the new approach is about as fast using ndarray. On my machine, the original approach takes about 22,000 ns/iter, whereas the new approach takes about 17,000 ns/iter. See https://github.com/dguo/strsim-rs/issues/34 for more context.

1 file changed

tree: 23ec6a818968a415add5c844c59c4e22ddf7d0ce

- .editorconfig
- .gitattributes
- .gitignore
- .travis.yml
- CHANGELOG.md
- Cargo.toml
- LICENSE
- README.md
- benches/
- dev
- src/
- tests/

README.md

Rust implementations of string similarity metrics:

- Hamming
- Levenshtein - distance & normalized
- Optimal string alignment
- Damerau-Levenshtein - distance & normalized
- Jaro and Jaro-Winkler - this implementation of Jaro-Winkler does not limit the common prefix length

The normalized versions return values between `0.0`

and `1.0`

, where `1.0`

means an exact match.

There are also generic versions of the functions for non-string inputs.

`strsim`

is available on crates.io. Add it to your `Cargo.toml`

:

[dependencies] strsim = "0.9.0"

Go to Docs.rs for the full documentation. You can also clone the repo, and run `$ cargo doc --open`

.

extern crate strsim; use strsim::{hamming, levenshtein, normalized_levenshtein, osa_distance, damerau_levenshtein, normalized_damerau_levenshtein, jaro, jaro_winkler}; fn main() { match hamming("hamming", "hammers") { Ok(distance) => assert_eq!(3, distance), Err(why) => panic!("{:?}", why) } assert_eq!(levenshtein("kitten", "sitting"), 3); assert!((normalized_levenshtein("kitten", "sitting") - 0.571).abs() < 0.001); assert_eq!(osa_distance("ac", "cba"), 3); assert_eq!(damerau_levenshtein("ac", "cba"), 2); assert!((normalized_damerau_levenshtein("levenshtein", "löwenbräu") - 0.272).abs() < 0.001); assert!((jaro("Friedrich Nietzsche", "Jean-Paul Sartre") - 0.392).abs() < 0.001); assert!((jaro_winkler("cheeseburger", "cheese fries") - 0.911).abs() < 0.001); }

Using the generic versions of the functions:

extern crate strsim; use strsim::generic_levenshtein; fn main() { assert_eq!(2, generic_levenshtein(&[1, 2, 3], &[0, 2, 5])); }

If you don't want to install Rust itself, you can run `$ ./dev`

for a development CLI if you have Docker installed.

Benchmarks require a Nightly toolchain. Run `$ cargo +nightly bench`

.