diff mbox series

[for-4.0,2/3] target/ppc: Enable "decrement and test CTR" version of bcctr

Message ID 155327782604.1283071.10640596307206921951.stgit@bahia.lan (mailing list archive)
State New, archived
Headers show
Series target/ppc: Fix pseries.cap-ibs=workaround with TCG | expand

Commit Message

Greg Kurz March 22, 2019, 6:03 p.m. UTC
Even if all ISAs up to v3 indeed mention:

    If the "decrement and test CTR" option is specified (BO2=0), the
    instruction form is invalid.

The UMs of all existing 64-bit server class processors say:

    If BO[2] = 0, the contents of CTR (before any update) are used as the
    target address and for the test of the contents of CTR to resolve the
    branch. The contents of the CTR are then decremented and written back
    to the CTR.

The linux kernel has spectre v2 mitigation code that relies on a
BO[2] = 0 variant of bcctr, which is now activated by default on
spapr, even with TCG. This causes linux guests to panic with
the default machine type under TCG.

Since any CPU model can provide its own behaviour for invalid forms,
we could possibly introduce a new instruction flag to handle this.
In practice, since the behaviour is shared by all 64-bit server
processors starting with 970 up to POWER9, let's reuse the
PPC_SEGMENT_64B flag. Caveat: this may have to be fixed later if
POWER10 introduces a different behaviour.

The existing behaviour of throwing a program interrupt is kept for
all other CPU models.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 target/ppc/translate.c |   52 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 37 insertions(+), 15 deletions(-)

Comments

Suraj Jitindar Singh March 25, 2019, 12:38 a.m. UTC | #1
On Fri, 2019-03-22 at 19:03 +0100, Greg Kurz wrote:
> Even if all ISAs up to v3 indeed mention:
> 
>     If the "decrement and test CTR" option is specified (BO2=0), the
>     instruction form is invalid.
> 
> The UMs of all existing 64-bit server class processors say:
> 
>     If BO[2] = 0, the contents of CTR (before any update) are used as
> the
>     target address and for the test of the contents of CTR to resolve
> the
>     branch. The contents of the CTR are then decremented and written
> back
>     to the CTR.
> 
> The linux kernel has spectre v2 mitigation code that relies on a
> BO[2] = 0 variant of bcctr, which is now activated by default on
> spapr, even with TCG. This causes linux guests to panic with
> the default machine type under TCG.
> 
> Since any CPU model can provide its own behaviour for invalid forms,
> we could possibly introduce a new instruction flag to handle this.
> In practice, since the behaviour is shared by all 64-bit server
> processors starting with 970 up to POWER9, let's reuse the
> PPC_SEGMENT_64B flag. Caveat: this may have to be fixed later if
> POWER10 introduces a different behaviour.
> 
> The existing behaviour of throwing a program interrupt is kept for
> all other CPU models.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>

> ---
>  target/ppc/translate.c |   52 ++++++++++++++++++++++++++++++++++--
> ------------
>  1 file changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index aaafa3a715d8..d3aaa6482c6a 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -3747,22 +3747,44 @@ static void gen_bcond(DisasContext *ctx, int
> type)
>      if ((bo & 0x4) == 0) {
>          /* Decrement and test CTR */
>          TCGv temp = tcg_temp_new();
> -        if (unlikely(type == BCOND_CTR)) {
> -            gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> -            tcg_temp_free(temp);
> -            tcg_temp_free(target);
> -            return;
> -        }
> -        tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
> -        if (NARROW_MODE(ctx)) {
> -            tcg_gen_ext32u_tl(temp, cpu_ctr);
> -        } else {
> -            tcg_gen_mov_tl(temp, cpu_ctr);
> -        }
> -        if (bo & 0x2) {
> -            tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
> +
> +        if (type == BCOND_CTR) {
> +            /*
> +             * All ISAs up to v3 describe this form of bcctr as
> invalid but
> +             * some processors, ie. 64-bit server processors
> compliant with
> +             * arch 2.x, do implement a "test and decrement" logic
> instead,
> +             * as described in their respective UMs.
> +             */
> +            if (unlikely(!(ctx->insns_flags & PPC_SEGMENT_64B))) {
> +                gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> +                tcg_temp_free(temp);
> +                tcg_temp_free(target);
> +                return;
> +            }
> +
> +            if (NARROW_MODE(ctx)) {
> +                tcg_gen_ext32u_tl(temp, cpu_ctr);
> +            } else {
> +                tcg_gen_mov_tl(temp, cpu_ctr);
> +            }
> +            if (bo & 0x2) {
> +                tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
> +            } else {
> +                tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
> +            }
> +            tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
>          } else {
> -            tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
> +            tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
> +            if (NARROW_MODE(ctx)) {
> +                tcg_gen_ext32u_tl(temp, cpu_ctr);
> +            } else {
> +                tcg_gen_mov_tl(temp, cpu_ctr);
> +            }
> +            if (bo & 0x2) {
> +                tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
> +            } else {
> +                tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
> +            }
>          }
>          tcg_temp_free(temp);
>      }
> 
>
David Gibson March 25, 2019, 6:38 a.m. UTC | #2
On Fri, Mar 22, 2019 at 07:03:46PM +0100, Greg Kurz wrote:
> Even if all ISAs up to v3 indeed mention:
> 
>     If the "decrement and test CTR" option is specified (BO2=0), the
>     instruction form is invalid.
> 
> The UMs of all existing 64-bit server class processors say:

I've applied this series because it fixes an immediate problem, but I
have some significant reservations about it, read on..

>     If BO[2] = 0, the contents of CTR (before any update) are used as the
>     target address and for the test of the contents of CTR to resolve the
>     branch. The contents of the CTR are then decremented and written back
>     to the CTR.

So, if that's what the hardware does, I guess that's what we need to
do.  That behaviour seems totally bizarre though - how can it make
sense for the same register value to act as both the branch target and
a flag/counter?  Or am I misreading something?

> The linux kernel has spectre v2 mitigation code that relies on a
> BO[2] = 0 variant of bcctr, which is now activated by default on
> spapr, even with TCG. This causes linux guests to panic with
> the default machine type under TCG.
> 
> Since any CPU model can provide its own behaviour for invalid forms,
> we could possibly introduce a new instruction flag to handle this.
> In practice, since the behaviour is shared by all 64-bit server
> processors starting with 970 up to POWER9, let's reuse the
> PPC_SEGMENT_64B flag. Caveat: this may have to be fixed later if
> POWER10 introduces a different behaviour.

Yeah.. this makes me nervous.  It's going to be very non-obvious that
a flag about MMU behaviour is linked to an obscure conditional branch
behaviour, so I suspect the chances of forgetting to fix that later if
necessary are close to 100%.

> The existing behaviour of throwing a program interrupt is kept for
> all other CPU models.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  target/ppc/translate.c |   52 ++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index aaafa3a715d8..d3aaa6482c6a 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -3747,22 +3747,44 @@ static void gen_bcond(DisasContext *ctx, int type)
>      if ((bo & 0x4) == 0) {
>          /* Decrement and test CTR */
>          TCGv temp = tcg_temp_new();
> -        if (unlikely(type == BCOND_CTR)) {
> -            gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> -            tcg_temp_free(temp);
> -            tcg_temp_free(target);
> -            return;
> -        }
> -        tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
> -        if (NARROW_MODE(ctx)) {
> -            tcg_gen_ext32u_tl(temp, cpu_ctr);
> -        } else {
> -            tcg_gen_mov_tl(temp, cpu_ctr);
> -        }
> -        if (bo & 0x2) {
> -            tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
> +
> +        if (type == BCOND_CTR) {
> +            /*
> +             * All ISAs up to v3 describe this form of bcctr as invalid but
> +             * some processors, ie. 64-bit server processors compliant with
> +             * arch 2.x, do implement a "test and decrement" logic instead,
> +             * as described in their respective UMs.
> +             */
> +            if (unlikely(!(ctx->insns_flags & PPC_SEGMENT_64B))) {
> +                gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> +                tcg_temp_free(temp);
> +                tcg_temp_free(target);
> +                return;
> +            }
> +
> +            if (NARROW_MODE(ctx)) {
> +                tcg_gen_ext32u_tl(temp, cpu_ctr);
> +            } else {
> +                tcg_gen_mov_tl(temp, cpu_ctr);
> +            }
> +            if (bo & 0x2) {
> +                tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
> +            } else {
> +                tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
> +            }
> +            tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
>          } else {
> -            tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
> +            tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
> +            if (NARROW_MODE(ctx)) {
> +                tcg_gen_ext32u_tl(temp, cpu_ctr);
> +            } else {
> +                tcg_gen_mov_tl(temp, cpu_ctr);
> +            }
> +            if (bo & 0x2) {
> +                tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
> +            } else {
> +                tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
> +            }
>          }
>          tcg_temp_free(temp);
>      }
>
Greg Kurz March 25, 2019, 11:25 a.m. UTC | #3
On Mon, 25 Mar 2019 17:38:49 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Mar 22, 2019 at 07:03:46PM +0100, Greg Kurz wrote:
> > Even if all ISAs up to v3 indeed mention:
> > 
> >     If the "decrement and test CTR" option is specified (BO2=0), the
> >     instruction form is invalid.
> > 
> > The UMs of all existing 64-bit server class processors say:  
> 
> I've applied this series because it fixes an immediate problem, but I
> have some significant reservations about it, read on..
> 
> >     If BO[2] = 0, the contents of CTR (before any update) are used as the
> >     target address and for the test of the contents of CTR to resolve the
> >     branch. The contents of the CTR are then decremented and written back
> >     to the CTR.  
> 
> So, if that's what the hardware does, I guess that's what we need to
> do.  That behaviour seems totally bizarre though - how can it make
> sense for the same register value to act as both the branch target and
> a flag/counter?  Or am I misreading something?
> 

I must confess this puzzles me as well :-\ ... and the linux commit that
introduces the use of this opcode, ie. ee13cb249fab ("powerpc/64s: Add
support for software count cache flush") doesn't provide any details,
at least for someone ignorant like me :)

Maybe Michael can enlight us ?

> > The linux kernel has spectre v2 mitigation code that relies on a
> > BO[2] = 0 variant of bcctr, which is now activated by default on
> > spapr, even with TCG. This causes linux guests to panic with
> > the default machine type under TCG.
> > 
> > Since any CPU model can provide its own behaviour for invalid forms,
> > we could possibly introduce a new instruction flag to handle this.
> > In practice, since the behaviour is shared by all 64-bit server
> > processors starting with 970 up to POWER9, let's reuse the
> > PPC_SEGMENT_64B flag. Caveat: this may have to be fixed later if
> > POWER10 introduces a different behaviour.  
> 
> Yeah.. this makes me nervous.  It's going to be very non-obvious that
> a flag about MMU behaviour is linked to an obscure conditional branch
> behaviour, so I suspect the chances of forgetting to fix that later if
> necessary are close to 100%.
> 
> > The existing behaviour of throwing a program interrupt is kept for
> > all other CPU models.
> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  target/ppc/translate.c |   52 ++++++++++++++++++++++++++++++++++--------------
> >  1 file changed, 37 insertions(+), 15 deletions(-)
> > 
> > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > index aaafa3a715d8..d3aaa6482c6a 100644
> > --- a/target/ppc/translate.c
> > +++ b/target/ppc/translate.c
> > @@ -3747,22 +3747,44 @@ static void gen_bcond(DisasContext *ctx, int type)
> >      if ((bo & 0x4) == 0) {
> >          /* Decrement and test CTR */
> >          TCGv temp = tcg_temp_new();
> > -        if (unlikely(type == BCOND_CTR)) {
> > -            gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> > -            tcg_temp_free(temp);
> > -            tcg_temp_free(target);
> > -            return;
> > -        }
> > -        tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
> > -        if (NARROW_MODE(ctx)) {
> > -            tcg_gen_ext32u_tl(temp, cpu_ctr);
> > -        } else {
> > -            tcg_gen_mov_tl(temp, cpu_ctr);
> > -        }
> > -        if (bo & 0x2) {
> > -            tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
> > +
> > +        if (type == BCOND_CTR) {
> > +            /*
> > +             * All ISAs up to v3 describe this form of bcctr as invalid but
> > +             * some processors, ie. 64-bit server processors compliant with
> > +             * arch 2.x, do implement a "test and decrement" logic instead,
> > +             * as described in their respective UMs.
> > +             */
> > +            if (unlikely(!(ctx->insns_flags & PPC_SEGMENT_64B))) {
> > +                gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> > +                tcg_temp_free(temp);
> > +                tcg_temp_free(target);
> > +                return;
> > +            }
> > +
> > +            if (NARROW_MODE(ctx)) {
> > +                tcg_gen_ext32u_tl(temp, cpu_ctr);
> > +            } else {
> > +                tcg_gen_mov_tl(temp, cpu_ctr);
> > +            }
> > +            if (bo & 0x2) {
> > +                tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
> > +            } else {
> > +                tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
> > +            }
> > +            tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
> >          } else {
> > -            tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
> > +            tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
> > +            if (NARROW_MODE(ctx)) {
> > +                tcg_gen_ext32u_tl(temp, cpu_ctr);
> > +            } else {
> > +                tcg_gen_mov_tl(temp, cpu_ctr);
> > +            }
> > +            if (bo & 0x2) {
> > +                tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
> > +            } else {
> > +                tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
> > +            }
> >          }
> >          tcg_temp_free(temp);
> >      }
> >   
>
David Gibson March 26, 2019, 3:57 a.m. UTC | #4
On Mon, Mar 25, 2019 at 12:25:15PM +0100, Greg Kurz wrote:
> On Mon, 25 Mar 2019 17:38:49 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Mar 22, 2019 at 07:03:46PM +0100, Greg Kurz wrote:
> > > Even if all ISAs up to v3 indeed mention:
> > > 
> > >     If the "decrement and test CTR" option is specified (BO2=0), the
> > >     instruction form is invalid.
> > > 
> > > The UMs of all existing 64-bit server class processors say:  
> > 
> > I've applied this series because it fixes an immediate problem, but I
> > have some significant reservations about it, read on..
> > 
> > >     If BO[2] = 0, the contents of CTR (before any update) are used as the
> > >     target address and for the test of the contents of CTR to resolve the
> > >     branch. The contents of the CTR are then decremented and written back
> > >     to the CTR.  
> > 
> > So, if that's what the hardware does, I guess that's what we need to
> > do.  That behaviour seems totally bizarre though - how can it make
> > sense for the same register value to act as both the branch target and
> > a flag/counter?  Or am I misreading something?
> > 
> 
> I must confess this puzzles me as well :-\ ... and the linux commit that
> introduces the use of this opcode, ie. ee13cb249fab ("powerpc/64s: Add
> support for software count cache flush") doesn't provide any details,
> at least for someone ignorant like me :)

Oh... I suspect what's going on here is not actually that the branch
has that effect, but that they've re-used this nonsensical option
combination to mark the special cache count flush they need.  In which
case we'd need to not trap, but do something else in the TCG code.

> Maybe Michael can enlight us ?

Michael or Suraj..?
David Gibson March 26, 2019, 4:17 a.m. UTC | #5
On Mon, Mar 25, 2019 at 12:25:15PM +0100, Greg Kurz wrote:
> On Mon, 25 Mar 2019 17:38:49 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Mar 22, 2019 at 07:03:46PM +0100, Greg Kurz wrote:
> > > Even if all ISAs up to v3 indeed mention:
> > > 
> > >     If the "decrement and test CTR" option is specified (BO2=0), the
> > >     instruction form is invalid.
> > > 
> > > The UMs of all existing 64-bit server class processors say:  
> > 
> > I've applied this series because it fixes an immediate problem, but I
> > have some significant reservations about it, read on..
> > 
> > >     If BO[2] = 0, the contents of CTR (before any update) are used as the
> > >     target address and for the test of the contents of CTR to resolve the
> > >     branch. The contents of the CTR are then decremented and written back
> > >     to the CTR.  
> > 
> > So, if that's what the hardware does, I guess that's what we need to
> > do.  That behaviour seems totally bizarre though - how can it make
> > sense for the same register value to act as both the branch target and
> > a flag/counter?  Or am I misreading something?
> > 
> 
> I must confess this puzzles me as well :-\ ... and the linux commit that
> introduces the use of this opcode, ie. ee13cb249fab ("powerpc/64s: Add
> support for software count cache flush") doesn't provide any details,
> at least for someone ignorant like me :)
> 
> Maybe Michael can enlight us ?

Ok, I talked to Michael on Slack.  Here's my understanding:

- The behaviour as described in the UM is correct, although bizarre
  and basically useless

- I'm guessing the original reason for not making it an explicit
  invalid form is that the strange behaviour arises naturally out of
  orthogonality, and this avoided having an explicit check to trap on
  this form

- Because this instruction form is otherwise useless and never used,
  it was chosen to trigger the extra micro-architectural side-effect
  needed for the Spectre workaround

- The workaround code in the kernel isn't actually *using* the
  bizarre architected behaviour - the arguments and context are chosen
  to make it basically a no-op (w.r.t. architected state).  The kernel
  only cares about the micro-architectural side effect

So, I think the patches as applied are correct as they are.  However,
adding some comments saying how this situation arose would be a very
good idea.
diff mbox series

Patch

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index aaafa3a715d8..d3aaa6482c6a 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3747,22 +3747,44 @@  static void gen_bcond(DisasContext *ctx, int type)
     if ((bo & 0x4) == 0) {
         /* Decrement and test CTR */
         TCGv temp = tcg_temp_new();
-        if (unlikely(type == BCOND_CTR)) {
-            gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
-            tcg_temp_free(temp);
-            tcg_temp_free(target);
-            return;
-        }
-        tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
-        if (NARROW_MODE(ctx)) {
-            tcg_gen_ext32u_tl(temp, cpu_ctr);
-        } else {
-            tcg_gen_mov_tl(temp, cpu_ctr);
-        }
-        if (bo & 0x2) {
-            tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
+
+        if (type == BCOND_CTR) {
+            /*
+             * All ISAs up to v3 describe this form of bcctr as invalid but
+             * some processors, ie. 64-bit server processors compliant with
+             * arch 2.x, do implement a "test and decrement" logic instead,
+             * as described in their respective UMs.
+             */
+            if (unlikely(!(ctx->insns_flags & PPC_SEGMENT_64B))) {
+                gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
+                tcg_temp_free(temp);
+                tcg_temp_free(target);
+                return;
+            }
+
+            if (NARROW_MODE(ctx)) {
+                tcg_gen_ext32u_tl(temp, cpu_ctr);
+            } else {
+                tcg_gen_mov_tl(temp, cpu_ctr);
+            }
+            if (bo & 0x2) {
+                tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
+            } else {
+                tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
+            }
+            tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
         } else {
-            tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
+            tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1);
+            if (NARROW_MODE(ctx)) {
+                tcg_gen_ext32u_tl(temp, cpu_ctr);
+            } else {
+                tcg_gen_mov_tl(temp, cpu_ctr);
+            }
+            if (bo & 0x2) {
+                tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1);
+            } else {
+                tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1);
+            }
         }
         tcg_temp_free(temp);
     }