Age | Commit message (Collapse) | Author |
|
I just want to ensure that we get the same encrypted values as the
reference (which seems fine), but for some reason, I get a lot of
crashes in aez:
AddressSanitizer:DEADLYSIGNAL
=================================================================
==15467==ERROR: AddressSanitizer: SEGV on unknown address 0x7b34b0420000 (pc 0x6371fcd8f682 bp 0x7ffceb91abf0 sp 0x7ffceb91a950 T0)
==15467==The signal is caused by a READ memory access.
#0 0x6371fcd8f682 in _mm_loadu_si128 /usr/lib/gcc/x86_64-pc-linux-gnu/14.2.1/include/emmintrin.h:706:10
#1 0x6371fcd8f682 in loadu /home/daniel/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/aez-0.0.7/aez5-impls/aesni/encrypt.c:107:46
#2 0x6371fcd8f682 in cipher_aez_core /home/daniel/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/aez-0.0.7/aez5-impls/aesni/encrypt.c:572:32
#3 0x6371fcd8d581 in aez::Aez::encrypt::h56048920113a17d9 /home/daniel/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/aez-0.0.7/src/lib.rs:118:13
The crash
|
|
It doesn't matter much because we barely expect tau > 16, but if
somebody decides to use aez as a way to generate a lot of pseudorandom
bytes, then oh well.
With this change, we make better use of SIMD block xor'ing if available.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(requires nightly compiler)
|
|
I've unrolled this earlier to speed up the computation for the commonly
used factors, but now we're precomputing the values anyway so there's no
reason to keep the code ugly.
|
|
This gives around 30% speedup, presumably because casting to the int is
more expensive than I thought. This operation is used so frequently in
the hot loop that even a tiny speedup can add up quickly.
|
|
Most of the time, especially in the hot loop, we're falling into the
lower branch with j != -1. Doing this check in advance gives around 10%
speedup.
Now, the code for j == -1 is directly in e(), as we never use E::new(-1,
...) anyway.
|
|
When I first wrote the aesenc/aes4/aes10 functions, I didn't know yet
how they were going to be used, so I sticked to the spec as much as
possible. As it turns out, they are always used with the same keys, so
it's enough to "initialize" the AES once, and then re-use for multiple E
computations.
It's also beginning a lot to look like all of those functions should
actually be methods, which is something we can fix in the future (and
unite decipher/encipher).
Anyway, the speedup here is around 38% for the 1KiB benchmark, and 4%
for the 16KiB benchmark.
|
|
This can give a speedup from 17% to 66%, depending on the input size
(larger speedup for larger inputs). It seems like even the "optimized"
multiply is slow enough to really cause a slowdown, especially for large
inputs where it is called a lot.
|
|
Even though aes::hazmat::cipher_round uses aes-ni instructions under the
hood, simply loading the data (and the keys!) takes a significant amount
of time. Sadly, there's no way that aes exposes that lets you re-use the
"loaded" keys.
By implementing aes4/aes10 directly with _mm_aesenc, we can keep the
keys properly aligned.
We still keep the software backend as fallback, using the software
implementation of the aes crate.
This gives a ~70% speedup.
|
|
We only ever use this function for small factors, either 2 (in
Block::exp), or 0-7 (in e, after the modulo 8). Therefore, for those
small values, we hard-code how they are computed by manually unrolling
the loop/recursion.
This gives around 30% more throughput.
|
|
|
|
adds more performance benefit
|
|
doesn't change performance, but is nicer to read
|
|
|
|
speeds up encryption by a bit
|
|
|
|
|
|
|
|
|
|
|
|
This vastly speeds up the encipher/decipher functions, as we no longer
keep computing key_i * (1 << exponent) over and over again.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|