I think to better optimize the ChaCha20 implementation we might consider switching to something like the BlockCipher trait (or potentially impling it and optionally exposing the raw block function as a BlockCipher), which I think would make it easier to implement a parallel implementation and also provides a common interface for all ChaCha20 block function implementations to conform to, e.g. using the encrypt_blocks method:
https://docs.rs/block-cipher-trait/0.6.2/block_cipher_trait/trait.BlockCipher.html#method.encrypt_blocks
(we could also use a ChaCha20-specific internal trait to avoid the extra dependency if we don't actually want to expose the block-based API publically)
I'd also like to make salsa20-core an optional dependency, allowing the crate to be used solely as a rand-core RNG instead, and if we expose the raw block function, via that API instead.
The chacha20poly1305 crate has much simpler needs than what the full stream-cipher API exposes, since it's doing a fairly straightforward CTR-mode-on-a-buffer operation, which makes it much easier to produce an optimized parallel implementation (with fewer dependencies).
Here's an example of the same idea in the aes-gcm crate:
https://github.com/RustCrypto/AEADs/blob/master/aes-gcm/src/ctr32.rs#L62
I think to better optimize the ChaCha20 implementation we might consider switching to something like the
BlockCiphertrait (or potentially impling it and optionally exposing the raw block function as aBlockCipher), which I think would make it easier to implement a parallel implementation and also provides a common interface for all ChaCha20 block function implementations to conform to, e.g. using theencrypt_blocksmethod:https://docs.rs/block-cipher-trait/0.6.2/block_cipher_trait/trait.BlockCipher.html#method.encrypt_blocks
(we could also use a ChaCha20-specific internal trait to avoid the extra dependency if we don't actually want to expose the block-based API publically)
I'd also like to make
salsa20-corean optional dependency, allowing the crate to be used solely as arand-coreRNG instead, and if we expose the raw block function, via that API instead.The
chacha20poly1305crate has much simpler needs than what the fullstream-cipherAPI exposes, since it's doing a fairly straightforward CTR-mode-on-a-buffer operation, which makes it much easier to produce an optimized parallel implementation (with fewer dependencies).Here's an example of the same idea in the
aes-gcmcrate:https://github.com/RustCrypto/AEADs/blob/master/aes-gcm/src/ctr32.rs#L62