Message ID | 20190515102450.30557-1-dja@axtens.net (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Herbert Xu |
Headers | show |
Series | crypto: vmx - CTR: always increment IV as quadword | expand |
On 05/15/2019 06:24 AM, Daniel Axtens wrote: > The kernel self-tests picked up an issue with CTR mode: > alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep" > > Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so > after 3 increments it should wrap around to 0. > > In the aesp8-ppc code from OpenSSL, there are two paths that > increment IVs: the bulk (8 at a time) path, and the individual > path which is used when there are fewer than 8 AES blocks to > process. > > In the bulk path, the IV is incremented with vadduqm: "Vector > Add Unsigned Quadword Modulo", which does 128-bit addition. > > In the individual path, however, the IV is incremented with > vadduwm: "Vector Add Unsigned Word Modulo", which instead > does 4 32-bit additions. Thus the IV would instead become > FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result. > > Use vadduqm. > > This was probably a typo originally, what with q and w being > adjacent. It is a pretty narrow edge case: I am really > impressed by the quality of the kernel self-tests! > > Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM") > Cc: stable@vger.kernel.org > Signed-off-by: Daniel Axtens <dja@axtens.net> > > --- > > I'll pass this along internally to get it into OpenSSL as well. > --- > drivers/crypto/vmx/aesp8-ppc.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/crypto/vmx/aesp8-ppc.pl b/drivers/crypto/vmx/aesp8-ppc.pl > index de78282b8f44..9c6b5c1d6a1a 100644 > --- a/drivers/crypto/vmx/aesp8-ppc.pl > +++ b/drivers/crypto/vmx/aesp8-ppc.pl > @@ -1357,7 +1357,7 @@ Loop_ctr32_enc: > addi $idx,$idx,16 > bdnz Loop_ctr32_enc > > - vadduwm $ivec,$ivec,$one > + vadduqm $ivec,$ivec,$one > vmr $dat,$inptail > lvx $inptail,0,$inp > addi $inp,$inp,16 Acked-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com>
On Wed, May 15, 2019 at 08:24:50PM +1000, Daniel Axtens wrote: > The kernel self-tests picked up an issue with CTR mode: > alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep" > > Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so > after 3 increments it should wrap around to 0. > > In the aesp8-ppc code from OpenSSL, there are two paths that > increment IVs: the bulk (8 at a time) path, and the individual > path which is used when there are fewer than 8 AES blocks to > process. > > In the bulk path, the IV is incremented with vadduqm: "Vector > Add Unsigned Quadword Modulo", which does 128-bit addition. > > In the individual path, however, the IV is incremented with > vadduwm: "Vector Add Unsigned Word Modulo", which instead > does 4 32-bit additions. Thus the IV would instead become > FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result. > > Use vadduqm. > > This was probably a typo originally, what with q and w being > adjacent. It is a pretty narrow edge case: I am really > impressed by the quality of the kernel self-tests! > > Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM") > Cc: stable@vger.kernel.org > Signed-off-by: Daniel Axtens <dja@axtens.net> > > --- > > I'll pass this along internally to get it into OpenSSL as well. > --- > drivers/crypto/vmx/aesp8-ppc.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Patch applied. Thanks.
Daniel Axtens <dja@axtens.net> writes: > The kernel self-tests picked up an issue with CTR mode: > alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep" > > Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so > after 3 increments it should wrap around to 0. > > In the aesp8-ppc code from OpenSSL, there are two paths that > increment IVs: the bulk (8 at a time) path, and the individual > path which is used when there are fewer than 8 AES blocks to > process. > > In the bulk path, the IV is incremented with vadduqm: "Vector > Add Unsigned Quadword Modulo", which does 128-bit addition. > > In the individual path, however, the IV is incremented with > vadduwm: "Vector Add Unsigned Word Modulo", which instead > does 4 32-bit additions. Thus the IV would instead become > FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result. > > Use vadduqm. > > This was probably a typo originally, what with q and w being > adjacent. It is a pretty narrow edge case: I am really > impressed by the quality of the kernel self-tests! > > Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM") > Cc: stable@vger.kernel.org > Signed-off-by: Daniel Axtens <dja@axtens.net> > > --- > > I'll pass this along internally to get it into OpenSSL as well. I passed this along to OpenSSL and got pretty comprehensively schooled: https://github.com/openssl/openssl/pull/8942 It seems we tweak the openssl code to use a 128-bit counter, whereas the original code was in fact designed for a 32-bit counter. We must have changed the vaddu instruction in the bulk path but not in the individual path, as they're both vadduwm (4x32-bit) upstream. I think this change is still correct with regards to the kernel, but I guess it's probably something where I should have done a more thorough read of the documentation before diving in to the code, and perhaps we should note it in the code somewhere too. Ah well. Regards, Daniel > --- > drivers/crypto/vmx/aesp8-ppc.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/crypto/vmx/aesp8-ppc.pl b/drivers/crypto/vmx/aesp8-ppc.pl > index de78282b8f44..9c6b5c1d6a1a 100644 > --- a/drivers/crypto/vmx/aesp8-ppc.pl > +++ b/drivers/crypto/vmx/aesp8-ppc.pl > @@ -1357,7 +1357,7 @@ Loop_ctr32_enc: > addi $idx,$idx,16 > bdnz Loop_ctr32_enc > > - vadduwm $ivec,$ivec,$one > + vadduqm $ivec,$ivec,$one > vmr $dat,$inptail > lvx $inptail,0,$inp > addi $inp,$inp,16 > -- > 2.19.1
On Mon, May 20, 2019 at 11:59:05AM +1000, Daniel Axtens wrote: > Daniel Axtens <dja@axtens.net> writes: > > > The kernel self-tests picked up an issue with CTR mode: > > alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep" > > > > Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so > > after 3 increments it should wrap around to 0. > > > > In the aesp8-ppc code from OpenSSL, there are two paths that > > increment IVs: the bulk (8 at a time) path, and the individual > > path which is used when there are fewer than 8 AES blocks to > > process. > > > > In the bulk path, the IV is incremented with vadduqm: "Vector > > Add Unsigned Quadword Modulo", which does 128-bit addition. > > > > In the individual path, however, the IV is incremented with > > vadduwm: "Vector Add Unsigned Word Modulo", which instead > > does 4 32-bit additions. Thus the IV would instead become > > FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result. > > > > Use vadduqm. > > > > This was probably a typo originally, what with q and w being > > adjacent. It is a pretty narrow edge case: I am really > > impressed by the quality of the kernel self-tests! > > > > Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM") > > Cc: stable@vger.kernel.org > > Signed-off-by: Daniel Axtens <dja@axtens.net> > > > > --- > > > > I'll pass this along internally to get it into OpenSSL as well. > > I passed this along to OpenSSL and got pretty comprehensively schooled: > https://github.com/openssl/openssl/pull/8942 > > It seems we tweak the openssl code to use a 128-bit counter, whereas > the original code was in fact designed for a 32-bit counter. We must > have changed the vaddu instruction in the bulk path but not in the > individual path, as they're both vadduwm (4x32-bit) upstream. > > I think this change is still correct with regards to the kernel, > but I guess it's probably something where I should have done a more > thorough read of the documentation before diving in to the code, and > perhaps we should note it in the code somewhere too. Ah well. > > Regards, > Daniel > Ah, I didn't realize there are multiple conventions for CTR. Yes, all CTR implementations in the kernel follow the 128-bit convention, and evidently the test vectors test for that. Apparently the VMX OpenSSL code got incompletely changed from the 32-bit convention by this commit, so that's what you're fixing: commit 1d4aa0b4c1816e8ca92a6aadb0d8f6b43c56c0d0 Author: Leonidas Da Silva Barbosa <leosilva@linux.vnet.ibm.com> Date: Fri Aug 14 10:12:22 2015 -0300 crypto: vmx - Fixing AES-CTR counter bug AES-CTR is using a counter 8bytes-8bytes what miss match with kernel specs. In the previous code a vadduwm was done to increment counter. Replacing this for a vadduqm now considering both cases counter 8-8 bytes and full 16bytes. A comment in the code would indeed be helpful. Note that the kernel doesn't currently need a 32-bit CTR implementation for GCM like OpenSSL does, because the kernel currently only supports 12-byte IVs with GCM. So the low 32 bits of the counter start at 1 and don't overflow, regardless of whether the counter is incremented mod 2^32 or mod 2^128. - Eric
diff --git a/drivers/crypto/vmx/aesp8-ppc.pl b/drivers/crypto/vmx/aesp8-ppc.pl index de78282b8f44..9c6b5c1d6a1a 100644 --- a/drivers/crypto/vmx/aesp8-ppc.pl +++ b/drivers/crypto/vmx/aesp8-ppc.pl @@ -1357,7 +1357,7 @@ Loop_ctr32_enc: addi $idx,$idx,16 bdnz Loop_ctr32_enc - vadduwm $ivec,$ivec,$one + vadduqm $ivec,$ivec,$one vmr $dat,$inptail lvx $inptail,0,$inp addi $inp,$inp,16
The kernel self-tests picked up an issue with CTR mode: alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep" Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so after 3 increments it should wrap around to 0. In the aesp8-ppc code from OpenSSL, there are two paths that increment IVs: the bulk (8 at a time) path, and the individual path which is used when there are fewer than 8 AES blocks to process. In the bulk path, the IV is incremented with vadduqm: "Vector Add Unsigned Quadword Modulo", which does 128-bit addition. In the individual path, however, the IV is incremented with vadduwm: "Vector Add Unsigned Word Modulo", which instead does 4 32-bit additions. Thus the IV would instead become FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result. Use vadduqm. This was probably a typo originally, what with q and w being adjacent. It is a pretty narrow edge case: I am really impressed by the quality of the kernel self-tests! Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM") Cc: stable@vger.kernel.org Signed-off-by: Daniel Axtens <dja@axtens.net> --- I'll pass this along internally to get it into OpenSSL as well. --- drivers/crypto/vmx/aesp8-ppc.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)