crypto: vmx - CTR: always increment IV as quadword

Message ID	20190515102450.30557-1-dja@axtens.net (mailing list archive)
State	Accepted
Delegated to:	Herbert Xu
Headers	show Return-Path: <linux-crypto-owner@kernel.org> From: Daniel Axtens <dja@axtens.net> To: mpe@ellerman.id.au, ebiggers@kernel.org, linux-crypto@vger.kernel.org, Herbert Xu <herbert@gondor.apana.org.au> Cc: marcelo.cerri@canonical.com, Stephan Mueller <smueller@chronox.de>, leo.barbosa@canonical.com, linuxppc-dev@lists.ozlabs.org, nayna@linux.ibm.com, pfsmorigo@gmail.com, leitao@debian.org, gcwilson@linux.ibm.com, omosnacek@gmail.com Subject: [PATCH] crypto: vmx - CTR: always increment IV as quadword Date: Wed, 15 May 2019 20:24:50 +1000 Message-Id: <20190515102450.30557-1-dja@axtens.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk
Series	crypto: vmx - CTR: always increment IV as quadword \| expand crypto: vmx - CTR: always increment IV as quadword

Message ID

20190515102450.30557-1-dja@axtens.net (mailing list archive)

State

Accepted

Delegated to:

Herbert Xu

Headers

From: Daniel Axtens <dja@axtens.net>
To: mpe@ellerman.id.au, ebiggers@kernel.org,
        linux-crypto@vger.kernel.org,
        Herbert Xu <herbert@gondor.apana.org.au>
Cc: marcelo.cerri@canonical.com, Stephan Mueller <smueller@chronox.de>,
        leo.barbosa@canonical.com, linuxppc-dev@lists.ozlabs.org,
        nayna@linux.ibm.com, pfsmorigo@gmail.com, leitao@debian.org,
        gcwilson@linux.ibm.com, omosnacek@gmail.com
Subject: [PATCH] crypto: vmx - CTR: always increment IV as quadword
Date: Wed, 15 May 2019 20:24:50 +1000
Message-Id: <20190515102450.30557-1-dja@axtens.net>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk

Series

crypto: vmx - CTR: always increment IV as quadword | expand

Commit Message

Daniel Axtens May 15, 2019, 10:24 a.m. UTC

The kernel self-tests picked up an issue with CTR mode:
alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep"

Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so
after 3 increments it should wrap around to 0.

In the aesp8-ppc code from OpenSSL, there are two paths that
increment IVs: the bulk (8 at a time) path, and the individual
path which is used when there are fewer than 8 AES blocks to
process.

In the bulk path, the IV is incremented with vadduqm: "Vector
Add Unsigned Quadword Modulo", which does 128-bit addition.

In the individual path, however, the IV is incremented with
vadduwm: "Vector Add Unsigned Word Modulo", which instead
does 4 32-bit additions. Thus the IV would instead become
FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result.

Use vadduqm.

This was probably a typo originally, what with q and w being
adjacent. It is a pretty narrow edge case: I am really
impressed by the quality of the kernel self-tests!

Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Axtens <dja@axtens.net>

---

I'll pass this along internally to get it into OpenSSL as well.
---
 drivers/crypto/vmx/aesp8-ppc.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Nayna May 17, 2019, 1:26 a.m. UTC | #1

On 05/15/2019 06:24 AM, Daniel Axtens wrote:
> The kernel self-tests picked up an issue with CTR mode:
> alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep"
>
> Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so
> after 3 increments it should wrap around to 0.
>
> In the aesp8-ppc code from OpenSSL, there are two paths that
> increment IVs: the bulk (8 at a time) path, and the individual
> path which is used when there are fewer than 8 AES blocks to
> process.
>
> In the bulk path, the IV is incremented with vadduqm: "Vector
> Add Unsigned Quadword Modulo", which does 128-bit addition.
>
> In the individual path, however, the IV is incremented with
> vadduwm: "Vector Add Unsigned Word Modulo", which instead
> does 4 32-bit additions. Thus the IV would instead become
> FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result.
>
> Use vadduqm.
>
> This was probably a typo originally, what with q and w being
> adjacent. It is a pretty narrow edge case: I am really
> impressed by the quality of the kernel self-tests!
>
> Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM")
> Cc: stable@vger.kernel.org
> Signed-off-by: Daniel Axtens <dja@axtens.net>
>
> ---
>
> I'll pass this along internally to get it into OpenSSL as well.
> ---
>   drivers/crypto/vmx/aesp8-ppc.pl | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/crypto/vmx/aesp8-ppc.pl b/drivers/crypto/vmx/aesp8-ppc.pl
> index de78282b8f44..9c6b5c1d6a1a 100644
> --- a/drivers/crypto/vmx/aesp8-ppc.pl
> +++ b/drivers/crypto/vmx/aesp8-ppc.pl
> @@ -1357,7 +1357,7 @@ Loop_ctr32_enc:
>   	addi		$idx,$idx,16
>   	bdnz		Loop_ctr32_enc
>   
> -	vadduwm		$ivec,$ivec,$one
> +	vadduqm		$ivec,$ivec,$one
>   	 vmr		$dat,$inptail
>   	 lvx		$inptail,0,$inp
>   	 addi		$inp,$inp,16

Acked-by: Nayna Jain <nayna@linux.ibm.com>
Tested-by: Nayna Jain <nayna@linux.ibm.com>

Herbert Xu May 17, 2019, 6 a.m. UTC | #2

On Wed, May 15, 2019 at 08:24:50PM +1000, Daniel Axtens wrote:
> The kernel self-tests picked up an issue with CTR mode:
> alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep"
> 
> Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so
> after 3 increments it should wrap around to 0.
> 
> In the aesp8-ppc code from OpenSSL, there are two paths that
> increment IVs: the bulk (8 at a time) path, and the individual
> path which is used when there are fewer than 8 AES blocks to
> process.
> 
> In the bulk path, the IV is incremented with vadduqm: "Vector
> Add Unsigned Quadword Modulo", which does 128-bit addition.
> 
> In the individual path, however, the IV is incremented with
> vadduwm: "Vector Add Unsigned Word Modulo", which instead
> does 4 32-bit additions. Thus the IV would instead become
> FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result.
> 
> Use vadduqm.
> 
> This was probably a typo originally, what with q and w being
> adjacent. It is a pretty narrow edge case: I am really
> impressed by the quality of the kernel self-tests!
> 
> Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM")
> Cc: stable@vger.kernel.org
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> 
> ---
> 
> I'll pass this along internally to get it into OpenSSL as well.
> ---
>  drivers/crypto/vmx/aesp8-ppc.pl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Patch applied.  Thanks.

Daniel Axtens May 20, 2019, 1:59 a.m. UTC | #3

Daniel Axtens <dja@axtens.net> writes:

> The kernel self-tests picked up an issue with CTR mode:
> alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep"
>
> Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so
> after 3 increments it should wrap around to 0.
>
> In the aesp8-ppc code from OpenSSL, there are two paths that
> increment IVs: the bulk (8 at a time) path, and the individual
> path which is used when there are fewer than 8 AES blocks to
> process.
>
> In the bulk path, the IV is incremented with vadduqm: "Vector
> Add Unsigned Quadword Modulo", which does 128-bit addition.
>
> In the individual path, however, the IV is incremented with
> vadduwm: "Vector Add Unsigned Word Modulo", which instead
> does 4 32-bit additions. Thus the IV would instead become
> FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result.
>
> Use vadduqm.
>
> This was probably a typo originally, what with q and w being
> adjacent. It is a pretty narrow edge case: I am really
> impressed by the quality of the kernel self-tests!
>
> Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM")
> Cc: stable@vger.kernel.org
> Signed-off-by: Daniel Axtens <dja@axtens.net>
>
> ---
>
> I'll pass this along internally to get it into OpenSSL as well.

I passed this along to OpenSSL and got pretty comprehensively schooled:
https://github.com/openssl/openssl/pull/8942

It seems we tweak the openssl code to use a 128-bit counter, whereas
the original code was in fact designed for a 32-bit counter. We must
have changed the vaddu instruction in the bulk path but not in the
individual path, as they're both vadduwm (4x32-bit) upstream.

I think this change is still correct with regards to the kernel,
but I guess it's probably something where I should have done a more
thorough read of the documentation before diving in to the code, and
perhaps we should note it in the code somewhere too. Ah well.

Regards,
Daniel

> ---
>  drivers/crypto/vmx/aesp8-ppc.pl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/crypto/vmx/aesp8-ppc.pl b/drivers/crypto/vmx/aesp8-ppc.pl
> index de78282b8f44..9c6b5c1d6a1a 100644
> --- a/drivers/crypto/vmx/aesp8-ppc.pl
> +++ b/drivers/crypto/vmx/aesp8-ppc.pl
> @@ -1357,7 +1357,7 @@ Loop_ctr32_enc:
>  	addi		$idx,$idx,16
>  	bdnz		Loop_ctr32_enc
>  
> -	vadduwm		$ivec,$ivec,$one
> +	vadduqm		$ivec,$ivec,$one
>  	 vmr		$dat,$inptail
>  	 lvx		$inptail,0,$inp
>  	 addi		$inp,$inp,16
> -- 
> 2.19.1

Eric Biggers May 20, 2019, 4:39 p.m. UTC | #4

On Mon, May 20, 2019 at 11:59:05AM +1000, Daniel Axtens wrote:
> Daniel Axtens <dja@axtens.net> writes:
> 
> > The kernel self-tests picked up an issue with CTR mode:
> > alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep"
> >
> > Test vector 3 has an IV of FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFD, so
> > after 3 increments it should wrap around to 0.
> >
> > In the aesp8-ppc code from OpenSSL, there are two paths that
> > increment IVs: the bulk (8 at a time) path, and the individual
> > path which is used when there are fewer than 8 AES blocks to
> > process.
> >
> > In the bulk path, the IV is incremented with vadduqm: "Vector
> > Add Unsigned Quadword Modulo", which does 128-bit addition.
> >
> > In the individual path, however, the IV is incremented with
> > vadduwm: "Vector Add Unsigned Word Modulo", which instead
> > does 4 32-bit additions. Thus the IV would instead become
> > FFFFFFFFFFFFFFFFFFFFFFFF00000000, throwing off the result.
> >
> > Use vadduqm.
> >
> > This was probably a typo originally, what with q and w being
> > adjacent. It is a pretty narrow edge case: I am really
> > impressed by the quality of the kernel self-tests!
> >
> > Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Daniel Axtens <dja@axtens.net>
> >
> > ---
> >
> > I'll pass this along internally to get it into OpenSSL as well.
> 
> I passed this along to OpenSSL and got pretty comprehensively schooled:
> https://github.com/openssl/openssl/pull/8942
> 
> It seems we tweak the openssl code to use a 128-bit counter, whereas
> the original code was in fact designed for a 32-bit counter. We must
> have changed the vaddu instruction in the bulk path but not in the
> individual path, as they're both vadduwm (4x32-bit) upstream.
> 
> I think this change is still correct with regards to the kernel,
> but I guess it's probably something where I should have done a more
> thorough read of the documentation before diving in to the code, and
> perhaps we should note it in the code somewhere too. Ah well.
> 
> Regards,
> Daniel
> 

Ah, I didn't realize there are multiple conventions for CTR.  Yes, all CTR
implementations in the kernel follow the 128-bit convention, and evidently the
test vectors test for that.  Apparently the VMX OpenSSL code got incompletely
changed from the 32-bit convention by this commit, so that's what you're fixing:

	commit 1d4aa0b4c1816e8ca92a6aadb0d8f6b43c56c0d0
	Author: Leonidas Da Silva Barbosa <leosilva@linux.vnet.ibm.com>
	Date:   Fri Aug 14 10:12:22 2015 -0300

	    crypto: vmx - Fixing AES-CTR counter bug
	    
	    AES-CTR is using a counter 8bytes-8bytes what miss match with
	    kernel specs.
	    
	    In the previous code a vadduwm was done to increment counter.
	    Replacing this for a vadduqm now considering both cases counter
	    8-8 bytes and full 16bytes.

A comment in the code would indeed be helpful.

Note that the kernel doesn't currently need a 32-bit CTR implementation for GCM
like OpenSSL does, because the kernel currently only supports 12-byte IVs with
GCM.  So the low 32 bits of the counter start at 1 and don't overflow,
regardless of whether the counter is incremented mod 2^32 or mod 2^128.

- Eric

diff --git a/drivers/crypto/vmx/aesp8-ppc.pl b/drivers/crypto/vmx/aesp8-ppc.pl
index de78282b8f44..9c6b5c1d6a1a 100644
--- a/drivers/crypto/vmx/aesp8-ppc.pl
+++ b/drivers/crypto/vmx/aesp8-ppc.pl
@@ -1357,7 +1357,7 @@  Loop_ctr32_enc:
 	addi		$idx,$idx,16
 	bdnz		Loop_ctr32_enc
 
-	vadduwm		$ivec,$ivec,$one
+	vadduqm		$ivec,$ivec,$one
 	 vmr		$dat,$inptail
 	 lvx		$inptail,0,$inp
 	 addi		$inp,$inp,16