78 lines
3.1 KiB
Markdown
78 lines
3.1 KiB
Markdown
# iosifovich
|
|
|
|
Iosifovich is a blazingly faster implementation of the Levenshtein distance
|
|
function.
|
|
|
|
The total performance of the current version is much greater than the original,
|
|
lost version, which was able to process 10k words from the words file on my
|
|
computer in a bit more than 15 seconds. The current implementation is able to do
|
|
the same in about 11.3 seconds, meaning a performance gain of about 24%.
|
|
|
|
To my knowledge this is the fastest implementation of the levenshtein distance
|
|
function you can find.
|
|
|
|
## Licensing
|
|
|
|
I've kept the code under GPLv3 but would be willing to discuss alternative
|
|
licenses, if anybody is interested.
|
|
|
|
## News
|
|
|
|
The implementation has been moved to a template function. I didn't measure the
|
|
performance meticously, but casual benchmarking revealed no problems with
|
|
performance.
|
|
|
|
When templating the code, I also introduced a stack buffer, that is used, if the
|
|
buffer size required is small enough to fit in the stack buffer. This
|
|
optimization increased performance by about 11% for a benchmark using the first
|
|
5000 words from the words file. The total advantage over templating and the
|
|
stack based buffer, is a performance boost that is quite significant.
|
|
|
|
|
|
## Plans for further development
|
|
|
|
After having reduced the size of the buffer, templating the code and introducing
|
|
the stack buffer, I don't think I can get performance much better, without
|
|
introducing things like vectorization and parallelism. These advancements will
|
|
require a lot of effort to do, without sacrificing performance for short
|
|
strings.
|
|
|
|
|
|
1. ~~Make benchmarks.~~
|
|
2. Introduce more tests.
|
|
1. Tests with pre, post and infix strings shared between the strings
|
|
2. Tests where the length of the strings are combinations of odd and even.
|
|
3. ~~Reduce the size of the buffer. When this was done with the old version,
|
|
performance was increased 100%.~~
|
|
4. Look into SIMD instructions
|
|
5. Look into parallelism.
|
|
6. ~~Templating the code~
|
|
|
|
## SIMD
|
|
|
|
I have some ideas for how SIMD instructions might be possible to use to improve
|
|
performance, but I doubt it will have much effect on small strings and it might
|
|
even be detremental if the strings are too short.
|
|
|
|
The most straightforward approach would be to just do more than one calculation
|
|
at a time, shifting the results down the SIMD registers.
|
|
|
|
SIMD instructions are very CPU specific and doing anything on this would either
|
|
mean investigating what cross-CPU approach is available, if any, that could
|
|
allow me to do the implementation without having to maintain a backend for each
|
|
CPU and a generic version that works everywhere. I don't expect massive gains
|
|
from this as I think the performance bottleneck is currently in memory
|
|
bandwidth.
|
|
|
|
## Parallelism
|
|
|
|
It should be possible to do the calculations recursively, by splitting the
|
|
longer string in the middle and then calculating the two parts sperately.
|
|
|
|
If that can be done, it should be easy to turn on the threads and make run this
|
|
on all the cores.
|
|
|
|
It might be better to not do parallelism and let the caller rely on being able
|
|
to process a lot of word-pairs quickly, instead of doing anything in parallel
|
|
inside the function.
|