Age | Commit message (Collapse) | Author |
|
When I first wrote the aesenc/aes4/aes10 functions, I didn't know yet
how they were going to be used, so I sticked to the spec as much as
possible. As it turns out, they are always used with the same keys, so
it's enough to "initialize" the AES once, and then re-use for multiple E
computations.
It's also beginning a lot to look like all of those functions should
actually be methods, which is something we can fix in the future (and
unite decipher/encipher).
Anyway, the speedup here is around 38% for the 1KiB benchmark, and 4%
for the 16KiB benchmark.
|
|
This can give a speedup from 17% to 66%, depending on the input size
(larger speedup for larger inputs). It seems like even the "optimized"
multiply is slow enough to really cause a slowdown, especially for large
inputs where it is called a lot.
|
|
Even though aes::hazmat::cipher_round uses aes-ni instructions under the
hood, simply loading the data (and the keys!) takes a significant amount
of time. Sadly, there's no way that aes exposes that lets you re-use the
"loaded" keys.
By implementing aes4/aes10 directly with _mm_aesenc, we can keep the
keys properly aligned.
We still keep the software backend as fallback, using the software
implementation of the aes crate.
This gives a ~70% speedup.
|
|
|
|
adds more performance benefit
|
|
speeds up encryption by a bit
|
|
|
|
|
|
|
|
This vastly speeds up the encipher/decipher functions, as we no longer
keep computing key_i * (1 << exponent) over and over again.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|