Age | Commit message (Collapse) | Author | |
---|---|---|---|
2025-04-10 | pre-multiply keys | Daniel Schadt | |
This can give a speedup from 17% to 66%, depending on the input size (larger speedup for larger inputs). It seems like even the "optimized" multiply is slow enough to really cause a slowdown, especially for large inputs where it is called a lot. | |||
2025-04-10 | implement aes4 and aes10 with native instructions | Daniel Schadt | |
Even though aes::hazmat::cipher_round uses aes-ni instructions under the hood, simply loading the data (and the keys!) takes a significant amount of time. Sadly, there's no way that aes exposes that lets you re-use the "loaded" keys. By implementing aes4/aes10 directly with _mm_aesenc, we can keep the keys properly aligned. We still keep the software backend as fallback, using the software implementation of the aes crate. This gives a ~70% speedup. | |||
2025-04-10 | unroll Block::mul | Daniel Schadt | |
We only ever use this function for small factors, either 2 (in Block::exp), or 0-7 (in e, after the modulo 8). Therefore, for those small values, we hard-code how they are computed by manually unrolling the loop/recursion. This gives around 30% more throughput. | |||
2025-04-10 | don't pass arrays of keys to aes4 and aes10 | Daniel Schadt | |
2025-04-10 | precompute e(0, 0, key) | Daniel Schadt | |
adds more performance benefit | |||
2025-04-10 | rewrite Block::clip | Daniel Schadt | |
doesn't change performance, but is nicer to read | |||
2025-04-10 | add first benchmark | Daniel Schadt | |
2025-04-10 | rewrite aesenc to work in-place | Daniel Schadt | |
speeds up encryption by a bit | |||
2025-04-09 | add repository linkv0.1.0 | Daniel Schadt | |
2025-04-09 | change aez_prf to write into a buffer | Daniel Schadt | |
2025-04-09 | add first fuzz binary | Daniel Schadt | |
2025-04-09 | expose non-vec API | Daniel Schadt | |
2025-04-09 | rewrite algorithm to work in-place | Daniel Schadt | |
2025-04-09 | speed up computation of successive e values | Daniel Schadt | |
This vastly speeds up the encipher/decipher functions, as we no longer keep computing key_i * (1 << exponent) over and over again. | |||
2025-04-09 | speed up multiplication | Daniel Schadt | |
2025-04-09 | speed up zero appendage | Daniel Schadt | |
2025-04-09 | fix overflow for long messages | Daniel Schadt | |
2025-04-08 | add test case for empty message | Daniel Schadt | |
2025-04-08 | use constant_time_eq in decryption function | Daniel Schadt | |
2025-04-08 | revert test case reporting | Daniel Schadt | |
2025-04-08 | add documentation | Daniel Schadt | |
2025-04-05 | use proper Block struct and operator overloading | Daniel Schadt | |
2025-04-04 | first working version! | Daniel Schadt | |