diff mbox series

[RFC,1/2] arm64: jump_label: use more precise asm constraints

Message ID 20220427171241.2426592-2-ardb@kernel.org (mailing list archive)
State New, archived
Headers show
Series arm64: use PIE code generation for KASLR kernel | expand

Commit Message

Ard Biesheuvel April 27, 2022, 5:12 p.m. UTC
In order to set bit #0 of the struct static_key pointer in the the jump
label struct, we currently cast the pointer to char[], and take the
address of either the 0th or 1st array member, depending on the value of
'branch'. This works but creates problems with -fpie code generation:
GCC complains about the constraint being unsatisfiable, and Clang
miscompiles the code in a way that causes stability issues (immediate
panic on 'attempt to kill init')

So instead, pass the struct static_key reference and the 'branch'
immediate individually, in a way that satisfies both GCC and Clang (GCC
wants the 'S' constraint, whereas Clang wants the 'i' constraint for
argument %0)

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/jump_label.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Nick Desaulniers April 27, 2022, 6:58 p.m. UTC | #1
On Wed, Apr 27, 2022 at 10:13 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> In order to set bit #0 of the struct static_key pointer in the the jump
> label struct, we currently cast the pointer to char[], and take the
> address of either the 0th or 1st array member, depending on the value of
> 'branch'. This works but creates problems with -fpie code generation:
> GCC complains about the constraint being unsatisfiable, and Clang
> miscompiles the code in a way that causes stability issues (immediate
> panic on 'attempt to kill init')

Any more info on the "miscompile?" Perhaps worth an upstream bug report?

>
> So instead, pass the struct static_key reference and the 'branch'
> immediate individually, in a way that satisfies both GCC and Clang (GCC
> wants the 'S' constraint, whereas Clang wants the 'i' constraint for
> argument %0)

Anything we can do to improve Clang's handling of S constraints?
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
>> An absolute symbolic address or a label reference

>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/jump_label.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
> index cea441b6aa5d..f741bbacf175 100644
> --- a/arch/arm64/include/asm/jump_label.h
> +++ b/arch/arm64/include/asm/jump_label.h
> @@ -23,9 +23,9 @@ static __always_inline bool arch_static_branch(struct static_key *key,
>                  "      .pushsection    __jump_table, \"aw\"    \n\t"
>                  "      .align          3                       \n\t"
>                  "      .long           1b - ., %l[l_yes] - .   \n\t"
> -                "      .quad           %c0 - .                 \n\t"
> +                "      .quad           %c0 - . + %1            \n\t"

If %1 is "i" constrained, then I think we can use the %c output
template for it as well?
https://gcc.gnu.org/onlinedocs/gccint/Output-Template.html

Is the expression clearer if we keep the `- .` at the end of each expression?

>                  "      .popsection                             \n\t"
> -                :  :  "i"(&((char *)key)[branch]) :  : l_yes);
> +                :  :  "Si"(key), "i"(branch) :  : l_yes);
>
>         return false;
>  l_yes:
> @@ -40,9 +40,9 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
>                  "      .pushsection    __jump_table, \"aw\"    \n\t"
>                  "      .align          3                       \n\t"
>                  "      .long           1b - ., %l[l_yes] - .   \n\t"
> -                "      .quad           %c0 - .                 \n\t"
> +                "      .quad           %c0 - . + %1            \n\t"
>                  "      .popsection                             \n\t"
> -                :  :  "i"(&((char *)key)[branch]) :  : l_yes);
> +                :  :  "Si"(key), "i"(branch) :  : l_yes);
>
>         return false;
>  l_yes:
> --
> 2.30.2
>
Ard Biesheuvel April 27, 2022, 9:50 p.m. UTC | #2
Hi Nick,

On Wed, 27 Apr 2022 at 20:59, Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Wed, Apr 27, 2022 at 10:13 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > In order to set bit #0 of the struct static_key pointer in the the jump
> > label struct, we currently cast the pointer to char[], and take the
> > address of either the 0th or 1st array member, depending on the value of
> > 'branch'. This works but creates problems with -fpie code generation:
> > GCC complains about the constraint being unsatisfiable, and Clang
> > miscompiles the code in a way that causes stability issues (immediate
> > panic on 'attempt to kill init')
>
> Any more info on the "miscompile?" Perhaps worth an upstream bug report?
>

I made very little progress trying to narrow it down: a segfault in
user space caused by who knows what. I'd file a bug if I knew how to
reproduce it.

> >
> > So instead, pass the struct static_key reference and the 'branch'
> > immediate individually, in a way that satisfies both GCC and Clang (GCC
> > wants the 'S' constraint, whereas Clang wants the 'i' constraint for
> > argument %0)
>
> Anything we can do to improve Clang's handling of S constraints?
> https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
> >> An absolute symbolic address or a label reference
>

"S" seems more appropriate than "i" for a symbol reference, and GCC
rejects "i" when using -fpie. But the actual literal being emitted is
a relative reference, not an absolute one. This likely needs more
discussion, but using "i" in the way we do now is definitely dodgy.

> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm64/include/asm/jump_label.h | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
> > index cea441b6aa5d..f741bbacf175 100644
> > --- a/arch/arm64/include/asm/jump_label.h
> > +++ b/arch/arm64/include/asm/jump_label.h
> > @@ -23,9 +23,9 @@ static __always_inline bool arch_static_branch(struct static_key *key,
> >                  "      .pushsection    __jump_table, \"aw\"    \n\t"
> >                  "      .align          3                       \n\t"
> >                  "      .long           1b - ., %l[l_yes] - .   \n\t"
> > -                "      .quad           %c0 - .                 \n\t"
> > +                "      .quad           %c0 - . + %1            \n\t"
>
> If %1 is "i" constrained, then I think we can use the %c output
> template for it as well?
> https://gcc.gnu.org/onlinedocs/gccint/Output-Template.html
>

Is %c what prevents the leading # from being emitted? I'm not sure
whether that matters here, as AArch64 asm does not require it (and the
code builds fine with either compiler). But if it reflects the use
more precisely, I agree we should add it.

> Is the expression clearer if we keep the `- .` at the end of each expression?
>

I don't think so. The add sets the enabled bit, so I'd even be
inclined to write this as (%c0 - .) + %c1 to emphasize that this is a
relative reference with bit #0 set separately.

> >                  "      .popsection                             \n\t"
> > -                :  :  "i"(&((char *)key)[branch]) :  : l_yes);
> > +                :  :  "Si"(key), "i"(branch) :  : l_yes);
> >
> >         return false;
> >  l_yes:
> > @@ -40,9 +40,9 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
> >                  "      .pushsection    __jump_table, \"aw\"    \n\t"
> >                  "      .align          3                       \n\t"
> >                  "      .long           1b - ., %l[l_yes] - .   \n\t"
> > -                "      .quad           %c0 - .                 \n\t"
> > +                "      .quad           %c0 - . + %1            \n\t"
> >                  "      .popsection                             \n\t"
> > -                :  :  "i"(&((char *)key)[branch]) :  : l_yes);
> > +                :  :  "Si"(key), "i"(branch) :  : l_yes);
> >
> >         return false;
> >  l_yes:
> > --
> > 2.30.2
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers
Ard Biesheuvel April 28, 2022, 9:35 a.m. UTC | #3
On Wed, 27 Apr 2022 at 23:50, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Hi Nick,
>
> On Wed, 27 Apr 2022 at 20:59, Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Wed, Apr 27, 2022 at 10:13 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > In order to set bit #0 of the struct static_key pointer in the the jump
> > > label struct, we currently cast the pointer to char[], and take the
> > > address of either the 0th or 1st array member, depending on the value of
> > > 'branch'. This works but creates problems with -fpie code generation:
> > > GCC complains about the constraint being unsatisfiable, and Clang
> > > miscompiles the code in a way that causes stability issues (immediate
> > > panic on 'attempt to kill init')
> >
> > Any more info on the "miscompile?" Perhaps worth an upstream bug report?
> >
>
> I made very little progress trying to narrow it down: a segfault in
> user space caused by who knows what. I'd file a bug if I knew how to
> reproduce it.
>

... and now, I cannot even reproduce it anymore, so I'll drop this
mention altogether.

> > >
> > > So instead, pass the struct static_key reference and the 'branch'
> > > immediate individually, in a way that satisfies both GCC and Clang (GCC
> > > wants the 'S' constraint, whereas Clang wants the 'i' constraint for
> > > argument %0)
> >
> > Anything we can do to improve Clang's handling of S constraints?
> > https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
> > >> An absolute symbolic address or a label reference
> >
>
> "S" seems more appropriate than "i" for a symbol reference, and GCC
> rejects "i" when using -fpie. But the actual literal being emitted is
> a relative reference, not an absolute one. This likely needs more
> discussion, but using "i" in the way we do now is definitely dodgy.
>

I have tried to create a reproducer for this issue, but the code
below, which does essentially the same thing as jump_label.h, works
fine with Clang and GCC with or without -fpie.

extern struct s {
  struct { } m;
} ss;

static inline __attribute__((always_inline)) int inner(struct s *s) {
  asm goto("adrp xzr, %c0" : : "S"(&s->m) : : l);

  return 0;
l:return 1;
}

int outer(void) {
  return inner(&ss);
}

So what's tricky here is that arch_static_branch() [like inner()
above] does not refer to the global directly, which is either struct
static_key_true or struct static_key_false, but to its 'key' member,
which is the first member of either former type. However, Clang does
not seem to care in case of this example, but when building the
kernel, it errors out with

/home/ard/linux/arch/arm64/include/asm/jump_label.h:22:3: error:
invalid operand for inline asm constraint 'S'

Another thing I hate about this code is that it needs optimizations
enabled, or it won't compile.

> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/jump_label.h | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
> > > index cea441b6aa5d..f741bbacf175 100644
> > > --- a/arch/arm64/include/asm/jump_label.h
> > > +++ b/arch/arm64/include/asm/jump_label.h
> > > @@ -23,9 +23,9 @@ static __always_inline bool arch_static_branch(struct static_key *key,
> > >                  "      .pushsection    __jump_table, \"aw\"    \n\t"
> > >                  "      .align          3                       \n\t"
> > >                  "      .long           1b - ., %l[l_yes] - .   \n\t"
> > > -                "      .quad           %c0 - .                 \n\t"
> > > +                "      .quad           %c0 - . + %1            \n\t"
> >
> > If %1 is "i" constrained, then I think we can use the %c output
> > template for it as well?
> > https://gcc.gnu.org/onlinedocs/gccint/Output-Template.html
> >
>
> Is %c what prevents the leading # from being emitted? I'm not sure
> whether that matters here, as AArch64 asm does not require it (and the
> code builds fine with either compiler). But if it reflects the use
> more precisely, I agree we should add it.
>
> > Is the expression clearer if we keep the `- .` at the end of each expression?
> >
>
> I don't think so. The add sets the enabled bit, so I'd even be
> inclined to write this as (%c0 - .) + %c1 to emphasize that this is a
> relative reference with bit #0 set separately.
>
> > >                  "      .popsection                             \n\t"
> > > -                :  :  "i"(&((char *)key)[branch]) :  : l_yes);
> > > +                :  :  "Si"(key), "i"(branch) :  : l_yes);
> > >
> > >         return false;
> > >  l_yes:
> > > @@ -40,9 +40,9 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
> > >                  "      .pushsection    __jump_table, \"aw\"    \n\t"
> > >                  "      .align          3                       \n\t"
> > >                  "      .long           1b - ., %l[l_yes] - .   \n\t"
> > > -                "      .quad           %c0 - .                 \n\t"
> > > +                "      .quad           %c0 - . + %1            \n\t"
> > >                  "      .popsection                             \n\t"
> > > -                :  :  "i"(&((char *)key)[branch]) :  : l_yes);
> > > +                :  :  "Si"(key), "i"(branch) :  : l_yes);
> > >
> > >         return false;
> > >  l_yes:
> > > --
> > > 2.30.2
> > >
> >
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
Mark Rutland April 28, 2022, 9:51 a.m. UTC | #4
Hi Ard,

On Wed, Apr 27, 2022 at 07:12:40PM +0200, Ard Biesheuvel wrote:
> In order to set bit #0 of the struct static_key pointer in the the jump
> label struct

I think you mean jump_entry::key here?

> , we currently cast the pointer to char[], and take the
> address of either the 0th or 1st array member, depending on the value of
> 'branch'. This works but creates problems with -fpie code generation:
> GCC complains about the constraint being unsatisfiable, and Clang
> miscompiles the code in a way that causes stability issues (immediate
> panic on 'attempt to kill init')

I couldn't reproduce that stability issue locally playing with Clang 12.0.0 and
14.0.0 (and just applying patch 2 of this series atop v5.18-rc1). I built
defconfig and booted that under a QEMU HVF VM on an M1 Mac.

FWIW, I used the binaries from llvm.org and built with:

  // magic script to add the toolchains to my PATH
  usellvm 12.0.0 make LLVM=1 ARCH=arm64 defconfig 
  usellvm 12.0.0 make LLVM=1 ARCH=arm64 -j50 Image

... and QEMU isn't providing entropy to the guest, so it's possible that:

* This only goes wrong when randomizing VAs (maybe we get a redundant
  relocation, and corrupt the key offset?).

* This is specific to the LLVM binaries you're using.

* This is down to a combination of LLVM + binutils, if you're not building with
  LLVM=1?

  I had a go with Clang 12.0.0 and the kernel.org crosstool GCC 11.1.0
  release's binutils. I made the constraint "Si" but left the indexing logic,
  and that still worked fine.

> So instead, pass the struct static_key reference and the 'branch'
> immediate individually, in a way that satisfies both GCC and Clang (GCC
> wants the 'S' constraint, whereas Clang wants the 'i' constraint for
> argument %0)
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/jump_label.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
> index cea441b6aa5d..f741bbacf175 100644
> --- a/arch/arm64/include/asm/jump_label.h
> +++ b/arch/arm64/include/asm/jump_label.h
> @@ -23,9 +23,9 @@ static __always_inline bool arch_static_branch(struct static_key *key,
>  		 "	.pushsection	__jump_table, \"aw\"	\n\t"
>  		 "	.align		3			\n\t"
>  		 "	.long		1b - ., %l[l_yes] - .	\n\t"
> -		 "	.quad		%c0 - .			\n\t"
> +		 "	.quad		%c0 - . + %1		\n\t"
>  		 "	.popsection				\n\t"
> -		 :  :  "i"(&((char *)key)[branch]) :  : l_yes);
> +		 :  :  "Si"(key), "i"(branch) :  : l_yes);

Nice! I like that this clearly separate the "set bit 0" portion out, and IMO
that's much clearer than the array indexing.

Thanks,
Mark.

>  
>  	return false;
>  l_yes:
> @@ -40,9 +40,9 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
>  		 "	.pushsection	__jump_table, \"aw\"	\n\t"
>  		 "	.align		3			\n\t"
>  		 "	.long		1b - ., %l[l_yes] - .	\n\t"
> -		 "	.quad		%c0 - .			\n\t"
> +		 "	.quad		%c0 - . + %1		\n\t"
>  		 "	.popsection				\n\t"
> -		 :  :  "i"(&((char *)key)[branch]) :  : l_yes);
> +		 :  :  "Si"(key), "i"(branch) :  : l_yes);
>  
>  	return false;
>  l_yes:
> -- 
> 2.30.2
>
Ard Biesheuvel April 28, 2022, 4:05 p.m. UTC | #5
On Thu, 28 Apr 2022 at 11:51, Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi Ard,
>
> On Wed, Apr 27, 2022 at 07:12:40PM +0200, Ard Biesheuvel wrote:
> > In order to set bit #0 of the struct static_key pointer in the the jump
> > label struct
>
> I think you mean jump_entry::key here?
>

Yes, which points to a struct static_key - I'll clarify this in v2.

> > , we currently cast the pointer to char[], and take the
> > address of either the 0th or 1st array member, depending on the value of
> > 'branch'. This works but creates problems with -fpie code generation:
> > GCC complains about the constraint being unsatisfiable, and Clang
> > miscompiles the code in a way that causes stability issues (immediate
> > panic on 'attempt to kill init')
>
> I couldn't reproduce that stability issue locally playing with Clang 12.0.0 and
> 14.0.0 (and just applying patch 2 of this series atop v5.18-rc1). I built
> defconfig and booted that under a QEMU HVF VM on an M1 Mac.
>
> FWIW, I used the binaries from llvm.org and built with:
>
>   // magic script to add the toolchains to my PATH
>   usellvm 12.0.0 make LLVM=1 ARCH=arm64 defconfig
>   usellvm 12.0.0 make LLVM=1 ARCH=arm64 -j50 Image
>
> ... and QEMU isn't providing entropy to the guest, so it's possible that:
>
> * This only goes wrong when randomizing VAs (maybe we get a redundant
>   relocation, and corrupt the key offset?).
>
> * This is specific to the LLVM binaries you're using.
>
> * This is down to a combination of LLVM + binutils, if you're not building with
>   LLVM=1?
>
>   I had a go with Clang 12.0.0 and the kernel.org crosstool GCC 11.1.0
>   release's binutils. I made the constraint "Si" but left the indexing logic,
>   and that still worked fine.
>

Yeah, as I reported in another email, I failed to reproduce this, and
I experienced some other issues yesterday due to the fact that llvm-nm
and clang/lld on my system were out of sync. So I think this was a
false positive.

> > So instead, pass the struct static_key reference and the 'branch'
> > immediate individually, in a way that satisfies both GCC and Clang (GCC
> > wants the 'S' constraint, whereas Clang wants the 'i' constraint for
> > argument %0)
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm64/include/asm/jump_label.h | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
> > index cea441b6aa5d..f741bbacf175 100644
> > --- a/arch/arm64/include/asm/jump_label.h
> > +++ b/arch/arm64/include/asm/jump_label.h
> > @@ -23,9 +23,9 @@ static __always_inline bool arch_static_branch(struct static_key *key,
> >                "      .pushsection    __jump_table, \"aw\"    \n\t"
> >                "      .align          3                       \n\t"
> >                "      .long           1b - ., %l[l_yes] - .   \n\t"
> > -              "      .quad           %c0 - .                 \n\t"
> > +              "      .quad           %c0 - . + %1            \n\t"
> >                "      .popsection                             \n\t"
> > -              :  :  "i"(&((char *)key)[branch]) :  : l_yes);
> > +              :  :  "Si"(key), "i"(branch) :  : l_yes);
>
> Nice! I like that this clearly separate the "set bit 0" portion out, and IMO
> that's much clearer than the array indexing.
>
> Thanks,
> Mark.
>
> >
> >       return false;
> >  l_yes:
> > @@ -40,9 +40,9 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
> >                "      .pushsection    __jump_table, \"aw\"    \n\t"
> >                "      .align          3                       \n\t"
> >                "      .long           1b - ., %l[l_yes] - .   \n\t"
> > -              "      .quad           %c0 - .                 \n\t"
> > +              "      .quad           %c0 - . + %1            \n\t"
> >                "      .popsection                             \n\t"
> > -              :  :  "i"(&((char *)key)[branch]) :  : l_yes);
> > +              :  :  "Si"(key), "i"(branch) :  : l_yes);
> >
> >       return false;
> >  l_yes:
> > --
> > 2.30.2
> >
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
index cea441b6aa5d..f741bbacf175 100644
--- a/arch/arm64/include/asm/jump_label.h
+++ b/arch/arm64/include/asm/jump_label.h
@@ -23,9 +23,9 @@  static __always_inline bool arch_static_branch(struct static_key *key,
 		 "	.pushsection	__jump_table, \"aw\"	\n\t"
 		 "	.align		3			\n\t"
 		 "	.long		1b - ., %l[l_yes] - .	\n\t"
-		 "	.quad		%c0 - .			\n\t"
+		 "	.quad		%c0 - . + %1		\n\t"
 		 "	.popsection				\n\t"
-		 :  :  "i"(&((char *)key)[branch]) :  : l_yes);
+		 :  :  "Si"(key), "i"(branch) :  : l_yes);
 
 	return false;
 l_yes:
@@ -40,9 +40,9 @@  static __always_inline bool arch_static_branch_jump(struct static_key *key,
 		 "	.pushsection	__jump_table, \"aw\"	\n\t"
 		 "	.align		3			\n\t"
 		 "	.long		1b - ., %l[l_yes] - .	\n\t"
-		 "	.quad		%c0 - .			\n\t"
+		 "	.quad		%c0 - . + %1		\n\t"
 		 "	.popsection				\n\t"
-		 :  :  "i"(&((char *)key)[branch]) :  : l_yes);
+		 :  :  "Si"(key), "i"(branch) :  : l_yes);
 
 	return false;
 l_yes: