diff mbox series

tcg/arm: Increase stack alignment for function generation

Message ID 20210901164446.2722007-2-rjones@redhat.com (mailing list archive)
State New, archived
Headers show
Series tcg/arm: Increase stack alignment for function generation | expand

Commit Message

Richard W.M. Jones Sept. 1, 2021, 4:44 p.m. UTC
This avoids the following assertion when the kernel initializes X.509
certificates:

[    7.315373] Loading compiled-in X.509 certificates
qemu-system-arm: ../tcg/tcg.c:3063: temp_allocate_frame: Assertion `align <= TCG_TARGET_STACK_ALIGN' failed.

Fixes: commit c1c091948ae
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1999878
Cc: qemu-stable@nongnu.org
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
---
 tcg/arm/tcg-target.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Peter Maydell Sept. 1, 2021, 6:18 p.m. UTC | #1
On Wed, 1 Sept 2021 at 18:01, Richard W.M. Jones <rjones@redhat.com> wrote:
>
> This avoids the following assertion when the kernel initializes X.509
> certificates:
>
> [    7.315373] Loading compiled-in X.509 certificates
> qemu-system-arm: ../tcg/tcg.c:3063: temp_allocate_frame: Assertion `align <= TCG_TARGET_STACK_ALIGN' failed.
>
> Fixes: commit c1c091948ae
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1999878
> Cc: qemu-stable@nongnu.org
> Tested-by: Richard W.M. Jones <rjones@redhat.com>
> Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
> ---
>  tcg/arm/tcg-target.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index d113b7f8db..09df3b39a1 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -115,7 +115,7 @@ extern bool use_neon_instructions;
>  #endif
>
>  /* used for function call generation */
> -#define TCG_TARGET_STACK_ALIGN         8
> +#define TCG_TARGET_STACK_ALIGN          16
>  #define TCG_TARGET_CALL_ALIGN_ARGS     1
>  #define TCG_TARGET_CALL_STACK_OFFSET   0

The 32-bit Arm procedure call standard only guarantees 8-alignment
of SP, not 16-alignment, so I suspect this is not the correct fix.

-- PMM
Richard W.M. Jones Sept. 1, 2021, 6:30 p.m. UTC | #2
On Wed, Sep 01, 2021 at 07:18:03PM +0100, Peter Maydell wrote:
> On Wed, 1 Sept 2021 at 18:01, Richard W.M. Jones <rjones@redhat.com> wrote:
> >
> > This avoids the following assertion when the kernel initializes X.509
> > certificates:
> >
> > [    7.315373] Loading compiled-in X.509 certificates
> > qemu-system-arm: ../tcg/tcg.c:3063: temp_allocate_frame: Assertion `align <= TCG_TARGET_STACK_ALIGN' failed.
> >
> > Fixes: commit c1c091948ae
> > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1999878
> > Cc: qemu-stable@nongnu.org
> > Tested-by: Richard W.M. Jones <rjones@redhat.com>
> > Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
> > ---
> >  tcg/arm/tcg-target.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> > index d113b7f8db..09df3b39a1 100644
> > --- a/tcg/arm/tcg-target.h
> > +++ b/tcg/arm/tcg-target.h
> > @@ -115,7 +115,7 @@ extern bool use_neon_instructions;
> >  #endif
> >
> >  /* used for function call generation */
> > -#define TCG_TARGET_STACK_ALIGN         8
> > +#define TCG_TARGET_STACK_ALIGN          16
> >  #define TCG_TARGET_CALL_ALIGN_ARGS     1
> >  #define TCG_TARGET_CALL_STACK_OFFSET   0
> 
> The 32-bit Arm procedure call standard only guarantees 8-alignment
> of SP, not 16-alignment, so I suspect this is not the correct fix.

Wouldn't it be a good idea if asserts in TCG dumped out something
useful about the guest code?  Because I can only reproduce this bug in
a very awkward batch environment I need to collect as much information
from log messages as possible.

Rich.
Peter Maydell Sept. 1, 2021, 6:41 p.m. UTC | #3
On Wed, 1 Sept 2021 at 19:30, Richard W.M. Jones <rjones@redhat.com> wrote:
>
> On Wed, Sep 01, 2021 at 07:18:03PM +0100, Peter Maydell wrote:
> > On Wed, 1 Sept 2021 at 18:01, Richard W.M. Jones <rjones@redhat.com> wrote:
> > >
> > > This avoids the following assertion when the kernel initializes X.509
> > > certificates:
> > >
> > > [    7.315373] Loading compiled-in X.509 certificates
> > > qemu-system-arm: ../tcg/tcg.c:3063: temp_allocate_frame: Assertion `align <= TCG_TARGET_STACK_ALIGN' failed.
> > >
> > > Fixes: commit c1c091948ae
> > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1999878
> > > Cc: qemu-stable@nongnu.org
> > > Tested-by: Richard W.M. Jones <rjones@redhat.com>
> > > Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
> > > ---
> > >  tcg/arm/tcg-target.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> > > index d113b7f8db..09df3b39a1 100644
> > > --- a/tcg/arm/tcg-target.h
> > > +++ b/tcg/arm/tcg-target.h
> > > @@ -115,7 +115,7 @@ extern bool use_neon_instructions;
> > >  #endif
> > >
> > >  /* used for function call generation */
> > > -#define TCG_TARGET_STACK_ALIGN         8
> > > +#define TCG_TARGET_STACK_ALIGN          16
> > >  #define TCG_TARGET_CALL_ALIGN_ARGS     1
> > >  #define TCG_TARGET_CALL_STACK_OFFSET   0
> >
> > The 32-bit Arm procedure call standard only guarantees 8-alignment
> > of SP, not 16-alignment, so I suspect this is not the correct fix.
>
> Wouldn't it be a good idea if asserts in TCG dumped out something
> useful about the guest code?  Because I can only reproduce this bug in
> a very awkward batch environment I need to collect as much information
> from log messages as possible.

Is the failure case short enough to allow -d ... logging to
be taken? That's usually the most useful info, but it's so huge
it's often not feasible.

Anyway, the problem here is that the arm backend now supports
Neon, and it will try to do some operations on 128 bit types,
where at least the loads and stores require 16-alignment. But
nothing is enforcing that we have 16 alignment. (Without the assert
we'd probably blunder on and fault on an unaligned access.)

Rereading the TCG code, maybe your patch here is right. It's
not clear to me whether TCG_TARGET_STACK_ALIGN is intended
to specify "alignment the calling convention requires" (though
that's what the comment above it suggests) or "alignment that
TCG requires". The prologue does seem to actively align to the
specified value, not merely assume-and-preserve that alignment.

-- PMM
Richard W.M. Jones Sept. 1, 2021, 6:51 p.m. UTC | #4
On Wed, Sep 01, 2021 at 07:41:21PM +0100, Peter Maydell wrote:
> Is the failure case short enough to allow -d ... logging to
> be taken? That's usually the most useful info, but it's so huge
> it's often not feasible.

I can try -- what exact -d option would be useful?

Rich.
Peter Maydell Sept. 1, 2021, 8:17 p.m. UTC | #5
On Wed, 1 Sept 2021 at 19:51, Richard W.M. Jones <rjones@redhat.com> wrote:
>
> On Wed, Sep 01, 2021 at 07:41:21PM +0100, Peter Maydell wrote:
> > Is the failure case short enough to allow -d ... logging to
> > be taken? That's usually the most useful info, but it's so huge
> > it's often not feasible.
>
> I can try -- what exact -d option would be useful?

Depends what you're after. Personally I'm fairly sure I know
what's going on, I'm just not sure what the right fix is.

-- PMM
Richard W.M. Jones Sept. 1, 2021, 8:24 p.m. UTC | #6
On Wed, Sep 01, 2021 at 09:17:07PM +0100, Peter Maydell wrote:
> On Wed, 1 Sept 2021 at 19:51, Richard W.M. Jones <rjones@redhat.com> wrote:
> >
> > On Wed, Sep 01, 2021 at 07:41:21PM +0100, Peter Maydell wrote:
> > > Is the failure case short enough to allow -d ... logging to
> > > be taken? That's usually the most useful info, but it's so huge
> > > it's often not feasible.
> >
> > I can try -- what exact -d option would be useful?
> 
> Depends what you're after. Personally I'm fairly sure I know
> what's going on, I'm just not sure what the right fix is.

Another question: We couldn't reproduce this even with the identical
ARM guest kernel + initrd + command line using qemu-system-arm
compiled for x86-64 host.  This was a bit surprising!  Was that bad
luck or is there some reason why this bug might not be reproducible
except on armv7 host?  (Both cases use -machine accel=tcg).

Rich.
Peter Maydell Sept. 2, 2021, 7:36 a.m. UTC | #7
On Wed, 1 Sept 2021 at 21:24, Richard W.M. Jones <rjones@redhat.com> wrote:
>
> On Wed, Sep 01, 2021 at 09:17:07PM +0100, Peter Maydell wrote:
> > On Wed, 1 Sept 2021 at 19:51, Richard W.M. Jones <rjones@redhat.com> wrote:
> > >
> > > On Wed, Sep 01, 2021 at 07:41:21PM +0100, Peter Maydell wrote:
> > > > Is the failure case short enough to allow -d ... logging to
> > > > be taken? That's usually the most useful info, but it's so huge
> > > > it's often not feasible.
> > >
> > > I can try -- what exact -d option would be useful?
> >
> > Depends what you're after. Personally I'm fairly sure I know
> > what's going on, I'm just not sure what the right fix is.
>
> Another question: We couldn't reproduce this even with the identical
> ARM guest kernel + initrd + command line using qemu-system-arm
> compiled for x86-64 host.  This was a bit surprising!  Was that bad
> luck or is there some reason why this bug might not be reproducible
> except on armv7 host?  (Both cases use -machine accel=tcg).

That's expected -- this is a bug in the codegen for arm hosts
(specifically 32-bit arm where Neon is available). tcg/i386/
sets TCG_TARGET_STACK_ALIGN to 16, so it won't hit the assert.

Yesterday I wrote:
> The prologue does seem to actively align to the
> specified value, not merely assume-and-preserve that alignment.

but I was misreading the code -- it does just assume-and-preserve.

Do you need an urgent fix/workaround for this? The simplest thing
is to wait for RTH to look at this, which is not likely to be before
the 13th.

Otherwise I think you can work around it with:

--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -152,7 +152,7 @@ extern bool use_neon_instructions;
 #define TCG_TARGET_HAS_qemu_st8_i32     0

 #define TCG_TARGET_HAS_v64              use_neon_instructions
-#define TCG_TARGET_HAS_v128             use_neon_instructions
+#define TCG_TARGET_HAS_v128             0
 #define TCG_TARGET_HAS_v256             0

 #define TCG_TARGET_HAS_andc_vec         1

though this is just a bodge that (hopefully) turns the use of v128
off entirely.

-- PMM
Richard W.M. Jones Sept. 2, 2021, 8:06 a.m. UTC | #8
On Thu, Sep 02, 2021 at 08:36:56AM +0100, Peter Maydell wrote:
> On Wed, 1 Sept 2021 at 21:24, Richard W.M. Jones <rjones@redhat.com> wrote:
> >
> > On Wed, Sep 01, 2021 at 09:17:07PM +0100, Peter Maydell wrote:
> > > On Wed, 1 Sept 2021 at 19:51, Richard W.M. Jones <rjones@redhat.com> wrote:
> > > >
> > > > On Wed, Sep 01, 2021 at 07:41:21PM +0100, Peter Maydell wrote:
> > > > > Is the failure case short enough to allow -d ... logging to
> > > > > be taken? That's usually the most useful info, but it's so huge
> > > > > it's often not feasible.
> > > >
> > > > I can try -- what exact -d option would be useful?
> > >
> > > Depends what you're after. Personally I'm fairly sure I know
> > > what's going on, I'm just not sure what the right fix is.
> >
> > Another question: We couldn't reproduce this even with the identical
> > ARM guest kernel + initrd + command line using qemu-system-arm
> > compiled for x86-64 host.  This was a bit surprising!  Was that bad
> > luck or is there some reason why this bug might not be reproducible
> > except on armv7 host?  (Both cases use -machine accel=tcg).
> 
> That's expected -- this is a bug in the codegen for arm hosts
> (specifically 32-bit arm where Neon is available). tcg/i386/
> sets TCG_TARGET_STACK_ALIGN to 16, so it won't hit the assert.
> 
> Yesterday I wrote:
> > The prologue does seem to actively align to the
> > specified value, not merely assume-and-preserve that alignment.
> 
> but I was misreading the code -- it does just assume-and-preserve.
> 
> Do you need an urgent fix/workaround for this? The simplest thing
> is to wait for RTH to look at this, which is not likely to be before
> the 13th.

We can wait for Richard to take a look.

Thanks,

Rich.

> Otherwise I think you can work around it with:
> 
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -152,7 +152,7 @@ extern bool use_neon_instructions;
>  #define TCG_TARGET_HAS_qemu_st8_i32     0
> 
>  #define TCG_TARGET_HAS_v64              use_neon_instructions
> -#define TCG_TARGET_HAS_v128             use_neon_instructions
> +#define TCG_TARGET_HAS_v128             0
>  #define TCG_TARGET_HAS_v256             0
> 
>  #define TCG_TARGET_HAS_andc_vec         1
> 
> though this is just a bodge that (hopefully) turns the use of v128
> off entirely.
> 
> -- PMM
Richard Henderson Sept. 3, 2021, 1:33 p.m. UTC | #9
On 9/1/21 8:41 PM, Peter Maydell wrote:
> On Wed, 1 Sept 2021 at 19:30, Richard W.M. Jones <rjones@redhat.com> wrote:
>>
>> On Wed, Sep 01, 2021 at 07:18:03PM +0100, Peter Maydell wrote:
>>> On Wed, 1 Sept 2021 at 18:01, Richard W.M. Jones <rjones@redhat.com> wrote:
>>>>
>>>> This avoids the following assertion when the kernel initializes X.509
>>>> certificates:
>>>>
>>>> [    7.315373] Loading compiled-in X.509 certificates
>>>> qemu-system-arm: ../tcg/tcg.c:3063: temp_allocate_frame: Assertion `align <= TCG_TARGET_STACK_ALIGN' failed.
>>>>
>>>> Fixes: commit c1c091948ae
>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1999878
>>>> Cc: qemu-stable@nongnu.org
>>>> Tested-by: Richard W.M. Jones <rjones@redhat.com>
>>>> Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
>>>> ---
>>>>   tcg/arm/tcg-target.h | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
>>>> index d113b7f8db..09df3b39a1 100644
>>>> --- a/tcg/arm/tcg-target.h
>>>> +++ b/tcg/arm/tcg-target.h
>>>> @@ -115,7 +115,7 @@ extern bool use_neon_instructions;
>>>>   #endif
>>>>
>>>>   /* used for function call generation */
>>>> -#define TCG_TARGET_STACK_ALIGN         8
>>>> +#define TCG_TARGET_STACK_ALIGN          16
>>>>   #define TCG_TARGET_CALL_ALIGN_ARGS     1
>>>>   #define TCG_TARGET_CALL_STACK_OFFSET   0
>>>
>>> The 32-bit Arm procedure call standard only guarantees 8-alignment
>>> of SP, not 16-alignment, so I suspect this is not the correct fix.
>>
>> Wouldn't it be a good idea if asserts in TCG dumped out something
>> useful about the guest code?  Because I can only reproduce this bug in
>> a very awkward batch environment I need to collect as much information
>> from log messages as possible.
> 
> Is the failure case short enough to allow -d ... logging to
> be taken? That's usually the most useful info, but it's so huge
> it's often not feasible.
> 
> Anyway, the problem here is that the arm backend now supports
> Neon, and it will try to do some operations on 128 bit types,
> where at least the loads and stores require 16-alignment. But
> nothing is enforcing that we have 16 alignment. (Without the assert
> we'd probably blunder on and fault on an unaligned access.)

Correct.

And while we could actively align the stack, that makes the prologue and epilogue overly 
complicated, and we can't just use pop {reg-list}.

My preferred solution is to add another per-backend define that says what the required 
alignment for vector stack spills, and then *don't* add the :128 bit alignment specifier 
to the vector load/store insns we emit.

I'll wait til I get home in 10 days or so before I tackle this.


r~
diff mbox series

Patch

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index d113b7f8db..09df3b39a1 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -115,7 +115,7 @@  extern bool use_neon_instructions;
 #endif
 
 /* used for function call generation */
-#define TCG_TARGET_STACK_ALIGN		8
+#define TCG_TARGET_STACK_ALIGN          16
 #define TCG_TARGET_CALL_ALIGN_ARGS	1
 #define TCG_TARGET_CALL_STACK_OFFSET	0