diff mbox

[v2,16/18] arm64: crypto: disable LTO for aes-ce-cipher.c

Message ID 20171115213428.22559-17-samitolvanen@google.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sami Tolvanen Nov. 15, 2017, 9:34 p.m. UTC
CONFIG_LTO_CLANG requires the use of clang's integrated assembler, which
doesn't understand the inline assembly in aes-ce-cipher.c. Disable LTO for
the file to work around the issue.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
 arch/arm64/crypto/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Mark Rutland Nov. 20, 2017, 3:20 p.m. UTC | #1
On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote:
> CONFIG_LTO_CLANG requires the use of clang's integrated assembler, which
> doesn't understand the inline assembly in aes-ce-cipher.c. Disable LTO for
> the file to work around the issue.

Could you elaborate on what the integrated asembler doesn't like?

It's not entirely clear at a glance, as the asm in that file doesn't
seem to do anything that obscure.

Is it a bug?

Thanks,
Mark.

> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
> ---
>  arch/arm64/crypto/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index b5edc5918c28..af08508521a3 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -24,7 +24,7 @@ obj-$(CONFIG_CRYPTO_CRC32_ARM64_CE) += crc32-ce.o
>  crc32-ce-y:= crc32-ce-core.o crc32-ce-glue.o
>  
>  obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
> -CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
> +CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto $(DISABLE_LTO_CLANG)
>  
>  obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o
>  aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o
> -- 
> 2.15.0.448.gf294e3d99a-goog
>
Ard Biesheuvel Nov. 20, 2017, 3:25 p.m. UTC | #2
On 20 November 2017 at 15:20, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote:
>> CONFIG_LTO_CLANG requires the use of clang's integrated assembler, which
>> doesn't understand the inline assembly in aes-ce-cipher.c. Disable LTO for
>> the file to work around the issue.
>
> Could you elaborate on what the integrated asembler doesn't like?
>
> It's not entirely clear at a glance, as the asm in that file doesn't
> seem to do anything that obscure.
>

Actually, it just occurred to me that this code does not adhere 100%
to the kernel mode neon rules, by putting kernel_neon_begin/end and
the code itself into the same source file. At the time, it seemed
harmless, given that these functions are only exposed via function
pointers and never called locally, and the compiler cannot really
reorder the function calls with the asm block. However, under LTO this
all changes, and it is no longer guaranteed that the NEON registers
are only touched between the kernel mode neon begin/end calls.

So the correct way to fix this would be to move the asm into its own
.S file, and call it from between the kernel_mode_neon_begin/end
calls. That should also fix your compat issue.
Sami Tolvanen Nov. 20, 2017, 8:51 p.m. UTC | #3
On Mon, Nov 20, 2017 at 03:20:14PM +0000, Mark Rutland wrote:
> Could you elaborate on what the integrated asembler doesn't like?

Here's the error, looks like in aes_sub:

<inline asm>:1:69: error: invalid operand for instruction
        dup     v1.4s, w12              ;movi   v0.16b, #0              ;aese   v0.16b, v1.16b          ;umov   w12, v0.4s[0]   ;
                                                                                                                     ^
LLVM ERROR: Error parsing inline asm

Sami
Sami Tolvanen Nov. 20, 2017, 9:01 p.m. UTC | #4
On Mon, Nov 20, 2017 at 03:25:31PM +0000, Ard Biesheuvel wrote:
> However, under LTO this all changes, and it is no longer guaranteed
> that the NEON registers are only touched between the kernel mode
> neon begin/end calls.

LTO operates on LLVM IR, so disabling LTO for this file should make
sure there won't be any unsafe optimizations. Are there other places
in the kernel that might have this issue?

> So the correct way to fix this would be to move the asm into its own
> .S file, and call it from between the kernel_mode_neon_begin/end
> calls. That should also fix your compat issue.

Sure, that would also work.
 
Sami
Alex Matveev Nov. 20, 2017, 9:29 p.m. UTC | #5
On Mon, 20 Nov 2017 15:20:14 +0000
Mark Rutland <mark.rutland@arm.com> wrote:

> On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote:
> > CONFIG_LTO_CLANG requires the use of clang's integrated assembler,
> > which doesn't understand the inline assembly in aes-ce-cipher.c.
> > Disable LTO for the file to work around the issue.  
> 
> Could you elaborate on what the integrated asembler doesn't like?
> 
> It's not entirely clear at a glance, as the asm in that file doesn't
> seem to do anything that obscure.
> 
> Is it a bug?

Turns out, integrated assembler doesn't like this instruction in
aes_sub():
	umov	%w[out], v0.4s[0]

Specifically, it barks at "v0.4s[0]" part. And the way I read the spec,
it's quite correct in not accepting this argument. From UMOV
description:

UMOV <Wd>, <Vn>.<Ts>[<index>]

...

<Ts> For the 32-bit variant: is an element size specifier, encoded in
the "imm5" field. It can have the following values:
	B	when imm5 = xxxx1
	H	when imm5 = xxx10
	S	when imm5 = xx100


With "v0.s[0]" it builds fine.

Ard, since this is your code, can you comment? Feels like a typo.

Regards,
Alex
Ard Biesheuvel Nov. 20, 2017, 9:31 p.m. UTC | #6
On 20 November 2017 at 21:29, Alex Matveev <alxmtvv@gmail.com> wrote:
> On Mon, 20 Nov 2017 15:20:14 +0000
> Mark Rutland <mark.rutland@arm.com> wrote:
>
>> On Wed, Nov 15, 2017 at 01:34:26PM -0800, Sami Tolvanen wrote:
>> > CONFIG_LTO_CLANG requires the use of clang's integrated assembler,
>> > which doesn't understand the inline assembly in aes-ce-cipher.c.
>> > Disable LTO for the file to work around the issue.
>>
>> Could you elaborate on what the integrated asembler doesn't like?
>>
>> It's not entirely clear at a glance, as the asm in that file doesn't
>> seem to do anything that obscure.
>>
>> Is it a bug?
>
> Turns out, integrated assembler doesn't like this instruction in
> aes_sub():
>         umov    %w[out], v0.4s[0]
>
> Specifically, it barks at "v0.4s[0]" part. And the way I read the spec,
> it's quite correct in not accepting this argument. From UMOV
> description:
>
> UMOV <Wd>, <Vn>.<Ts>[<index>]
>
> ...
>
> <Ts> For the 32-bit variant: is an element size specifier, encoded in
> the "imm5" field. It can have the following values:
>         B       when imm5 = xxxx1
>         H       when imm5 = xxx10
>         S       when imm5 = xx100
>
>
> With "v0.s[0]" it builds fine.
>
> Ard, since this is your code, can you comment? Feels like a typo.
>

Yep.
Alex Matveev Nov. 20, 2017, 10:03 p.m. UTC | #7
Sami, this seems to be better solution for aes-ce-cipher.c problem.

Regards,
Alex


From 6bcdd763b56ce10a77a79373a46fc0e8d9026178 Mon Sep 17 00:00:00 2001
From: Alex Matveev <alxmtvv@gmail.com>
Date: Mon, 20 Nov 2017 21:30:38 +0000
Subject: [PATCH] arm64: crypto: fix typo in aes_sub()

Clang's integrated assembler can't parse "v0.4s[0]" argument of the UMOV
instruction. And, as per ARM ARM, this is incorrect usage:

UMOV <Wd>, <Vn>.<Ts>[<index>]

...

<Ts> For the 32-bit variant: is an element size specifier, encoded in
the "imm5" field. It can have the following values:
	B	when imm5 = xxxx1
	H	when imm5 = xxx10
	S	when imm5 = xx100

Signed-off-by: Alex Matveev <alxmtvv@gmail.com>
---
 arch/arm64/crypto/aes-ce-cipher.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/crypto/aes-ce-cipher.c
b/arch/arm64/crypto/aes-ce-cipher.c index 6a75cd75ed11..e26bedee2c45
100644 --- a/arch/arm64/crypto/aes-ce-cipher.c
+++ b/arch/arm64/crypto/aes-ce-cipher.c
@@ -152,7 +152,7 @@ static u32 aes_sub(u32 input)
 	__asm__("dup	v1.4s, %w[in]		;"
 		"movi	v0.16b, #0		;"
 		"aese	v0.16b, v1.16b		;"
-		"umov	%w[out], v0.4s[0]	;"
+		"umov	%w[out], v0.s[0]	;"
 
 	:	[out]	"=r"(ret)
 	:	[in]	"r"(input)
Mark Rutland Nov. 21, 2017, 11:47 a.m. UTC | #8
On Mon, Nov 20, 2017 at 01:01:43PM -0800, Sami Tolvanen wrote:
> On Mon, Nov 20, 2017 at 03:25:31PM +0000, Ard Biesheuvel wrote:
> > However, under LTO this all changes, and it is no longer guaranteed
> > that the NEON registers are only touched between the kernel mode
> > neon begin/end calls.

Just to check, I take it that the feat is that LTO can merge the
begin/asm/end, reordering bits to the begin/end relative to the asm?

AFAICT, assuming that LTO respects our compiler barriers:

* the preempt_disable() in kernel_neon_begin() should prevent the asm
  block from being moved earlier, but it looks like it could be moved
  somewhere in the middle of local_bh_enable().

* the __this_cpu_xchg() in kernel_neon_end() *isn't* ordered w.r.t the
  asm, as it doesn't have a full memory clobber, and could be
  re-ordered before the asm block.

We *could* solve this case with a barrier() at the end of
kernel_neon_begin() and the start of kernel_neon_end(), but it is a
whack-a-mole solution. :/

... this also raises the question as to how the {__,}this_cpu*() ops are
expected to be ordered w.r.t. other local operations, as that's not
clear to me even in the absence of LTO.

> LTO operates on LLVM IR, so disabling LTO for this file should make
> sure there won't be any unsafe optimizations. Are there other places
> in the kernel that might have this issue?

I suspect that as above, there are a number of places that implicitly
rely on compilation-unit boundaries enforcing (local) ordering w.r.t.
asynchronous events, as the compiler won't otherwise be able to reorder
code such as cpu-local flag manipulation.

I think we have a much bigger problem here.

Thanks,
Mark.
diff mbox

Patch

diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index b5edc5918c28..af08508521a3 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -24,7 +24,7 @@  obj-$(CONFIG_CRYPTO_CRC32_ARM64_CE) += crc32-ce.o
 crc32-ce-y:= crc32-ce-core.o crc32-ce-glue.o
 
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE) += aes-ce-cipher.o
-CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto
+CFLAGS_aes-ce-cipher.o += -march=armv8-a+crypto $(DISABLE_LTO_CLANG)
 
 obj-$(CONFIG_CRYPTO_AES_ARM64_CE_CCM) += aes-ce-ccm.o
 aes-ce-ccm-y := aes-ce-ccm-glue.o aes-ce-ccm-core.o