All checks were successful
/ Build-Stuff (push) Successful in 5s
|
||
---|---|---|
.gitea/workflows | ||
benchmarks | ||
src | ||
tests | ||
COPYING | ||
README.md | ||
meson.build |
iosifovich
Iosifovich is a blazingly faster implementation of the Levenshtein distance function.
The total performance of the current version is much greater than the original, lost version, which was able to process 10k words from the words file on my computer in a bit more than 15 seconds. The current implementation is able to do the same in about 11.3 seconds, meaning a performance gain of about 24%.
To my knowledge this is the fastest implementation of the levenshtein distance function you can find.
Licensing
I've kept the code under GPLv3 but would be willing to discuss alternative licenses, if anybody is interested.
News
The implementation has been moved to a template function. I didn't measure the performance meticously, but casual benchmarking revealed no problems with performance.
When templating the code, I also introduced a stack buffer, that is used, if the buffer size required is small enough to fit in the stack buffer. This optimization increased performance by about 11% for a benchmark using the first 5000 words from the words file. The total advantage over templating and the stack based buffer, is a performance boost that is quite significant.
Plans for further development
After having reduced the size of the buffer, templating the code and introducing the stack buffer, I don't think I can get performance much better, without introducing things like vectorization and parallelism. These advancements will require a lot of effort to do, without sacrificing performance for short strings.
Make benchmarks.- Introduce more tests.
- Tests with pre, post and infix strings shared between the strings
- Tests where the length of the strings are combinations of odd and even.
Reduce the size of the buffer. When this was done with the old version, performance was increased 100%.- Look into SIMD instructions
- Look into parallelism.
Templating the code
SIMD
I have some ideas for how SIMD instructions might be possible to use to improve performance, but I doubt it will have much effect on small strings and it might even be detremental if the strings are too short.
The most straightforward approach would be to just do more than one calculation at a time, shifting the results down the SIMD registers.
SIMD instructions are very CPU specific and doing anything on this would either mean investigating what cross-CPU approach is available, if any, that could allow me to do the implementation without having to maintain a backend for each CPU and a generic version that works everywhere. I don't expect massive gains from this as I think the performance bottleneck is currently in memory bandwidth.
Parallelism
It should be possible to do the calculations recursively, by splitting the longer string in the middle and then calculating the two parts sperately.
If that can be done, it should be easy to turn on the threads and make run this on all the cores.
It might be better to not do parallelism and let the caller rely on being able to process a lot of word-pairs quickly, instead of doing anything in parallel inside the function.