Age | Commit message (Collapse) | Author |
|
I just want to ensure that we get the same encrypted values as the
reference (which seems fine), but for some reason, I get a lot of
crashes in aez:
AddressSanitizer:DEADLYSIGNAL
=================================================================
==15467==ERROR: AddressSanitizer: SEGV on unknown address 0x7b34b0420000 (pc 0x6371fcd8f682 bp 0x7ffceb91abf0 sp 0x7ffceb91a950 T0)
==15467==The signal is caused by a READ memory access.
#0 0x6371fcd8f682 in _mm_loadu_si128 /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/include/emmintrin.h:706:10
#1 0x6371fcd8f682 in loadu /home/daniel/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/aez-0.0.7/aez5-impls/aesni/encrypt.c:107:46
#2 0x6371fcd8f682 in cipher_aez_core /home/daniel/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/aez-0.0.7/aez5-impls/aesni/encrypt.c:572:32
#3 0x6371fcd8d581 in aez::Aez::encrypt::h56048920113a17d9 /home/daniel/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/aez-0.0.7/src/lib.rs:118:13
The crash
|
|
It doesn't matter much because we barely expect tau > 16, but if
somebody decides to use aez as a way to generate a lot of pseudorandom
bytes, then oh well.
With this change, we make better use of SIMD block xor'ing if available.
|
|
|
|
|
|
|
|
|
|
|
|
(requires nightly compiler)
|
|
Most of the time, especially in the hot loop, we're falling into the
lower branch with j != -1. Doing this check in advance gives around 10%
speedup.
Now, the code for j == -1 is directly in e(), as we never use E::new(-1,
...) anyway.
|
|
When I first wrote the aesenc/aes4/aes10 functions, I didn't know yet
how they were going to be used, so I sticked to the spec as much as
possible. As it turns out, they are always used with the same keys, so
it's enough to "initialize" the AES once, and then re-use for multiple E
computations.
It's also beginning a lot to look like all of those functions should
actually be methods, which is something we can fix in the future (and
unite decipher/encipher).
Anyway, the speedup here is around 38% for the 1KiB benchmark, and 4%
for the 16KiB benchmark.
|
|
This can give a speedup from 17% to 66%, depending on the input size
(larger speedup for larger inputs). It seems like even the "optimized"
multiply is slow enough to really cause a slowdown, especially for large
inputs where it is called a lot.
|
|
Even though aes::hazmat::cipher_round uses aes-ni instructions under the
hood, simply loading the data (and the keys!) takes a significant amount
of time. Sadly, there's no way that aes exposes that lets you re-use the
"loaded" keys.
By implementing aes4/aes10 directly with _mm_aesenc, we can keep the
keys properly aligned.
We still keep the software backend as fallback, using the software
implementation of the aes crate.
This gives a ~70% speedup.
|
|
|
|
adds more performance benefit
|
|
speeds up encryption by a bit
|
|
|
|
|
|
|
|
This vastly speeds up the encipher/decipher functions, as we no longer
keep computing key_i * (1 << exponent) over and over again.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|