Message ID | 155327782604.1283071.10640596307206921951.stgit@bahia.lan (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | target/ppc: Fix pseries.cap-ibs=workaround with TCG | expand |
On Fri, 2019-03-22 at 19:03 +0100, Greg Kurz wrote: > Even if all ISAs up to v3 indeed mention: > > If the "decrement and test CTR" option is specified (BO2=0), the > instruction form is invalid. > > The UMs of all existing 64-bit server class processors say: > > If BO[2] = 0, the contents of CTR (before any update) are used as > the > target address and for the test of the contents of CTR to resolve > the > branch. The contents of the CTR are then decremented and written > back > to the CTR. > > The linux kernel has spectre v2 mitigation code that relies on a > BO[2] = 0 variant of bcctr, which is now activated by default on > spapr, even with TCG. This causes linux guests to panic with > the default machine type under TCG. > > Since any CPU model can provide its own behaviour for invalid forms, > we could possibly introduce a new instruction flag to handle this. > In practice, since the behaviour is shared by all 64-bit server > processors starting with 970 up to POWER9, let's reuse the > PPC_SEGMENT_64B flag. Caveat: this may have to be fixed later if > POWER10 introduces a different behaviour. > > The existing behaviour of throwing a program interrupt is kept for > all other CPU models. > > Signed-off-by: Greg Kurz <groug@kaod.org> Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> > --- > target/ppc/translate.c | 52 ++++++++++++++++++++++++++++++++++-- > ------------ > 1 file changed, 37 insertions(+), 15 deletions(-) > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c > index aaafa3a715d8..d3aaa6482c6a 100644 > --- a/target/ppc/translate.c > +++ b/target/ppc/translate.c > @@ -3747,22 +3747,44 @@ static void gen_bcond(DisasContext *ctx, int > type) > if ((bo & 0x4) == 0) { > /* Decrement and test CTR */ > TCGv temp = tcg_temp_new(); > - if (unlikely(type == BCOND_CTR)) { > - gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL); > - tcg_temp_free(temp); > - tcg_temp_free(target); > - return; > - } > - tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); > - if (NARROW_MODE(ctx)) { > - tcg_gen_ext32u_tl(temp, cpu_ctr); > - } else { > - tcg_gen_mov_tl(temp, cpu_ctr); > - } > - if (bo & 0x2) { > - tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); > + > + if (type == BCOND_CTR) { > + /* > + * All ISAs up to v3 describe this form of bcctr as > invalid but > + * some processors, ie. 64-bit server processors > compliant with > + * arch 2.x, do implement a "test and decrement" logic > instead, > + * as described in their respective UMs. > + */ > + if (unlikely(!(ctx->insns_flags & PPC_SEGMENT_64B))) { > + gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL); > + tcg_temp_free(temp); > + tcg_temp_free(target); > + return; > + } > + > + if (NARROW_MODE(ctx)) { > + tcg_gen_ext32u_tl(temp, cpu_ctr); > + } else { > + tcg_gen_mov_tl(temp, cpu_ctr); > + } > + if (bo & 0x2) { > + tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); > + } else { > + tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); > + } > + tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); > } else { > - tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); > + tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); > + if (NARROW_MODE(ctx)) { > + tcg_gen_ext32u_tl(temp, cpu_ctr); > + } else { > + tcg_gen_mov_tl(temp, cpu_ctr); > + } > + if (bo & 0x2) { > + tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); > + } else { > + tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); > + } > } > tcg_temp_free(temp); > } > >
On Fri, Mar 22, 2019 at 07:03:46PM +0100, Greg Kurz wrote: > Even if all ISAs up to v3 indeed mention: > > If the "decrement and test CTR" option is specified (BO2=0), the > instruction form is invalid. > > The UMs of all existing 64-bit server class processors say: I've applied this series because it fixes an immediate problem, but I have some significant reservations about it, read on.. > If BO[2] = 0, the contents of CTR (before any update) are used as the > target address and for the test of the contents of CTR to resolve the > branch. The contents of the CTR are then decremented and written back > to the CTR. So, if that's what the hardware does, I guess that's what we need to do. That behaviour seems totally bizarre though - how can it make sense for the same register value to act as both the branch target and a flag/counter? Or am I misreading something? > The linux kernel has spectre v2 mitigation code that relies on a > BO[2] = 0 variant of bcctr, which is now activated by default on > spapr, even with TCG. This causes linux guests to panic with > the default machine type under TCG. > > Since any CPU model can provide its own behaviour for invalid forms, > we could possibly introduce a new instruction flag to handle this. > In practice, since the behaviour is shared by all 64-bit server > processors starting with 970 up to POWER9, let's reuse the > PPC_SEGMENT_64B flag. Caveat: this may have to be fixed later if > POWER10 introduces a different behaviour. Yeah.. this makes me nervous. It's going to be very non-obvious that a flag about MMU behaviour is linked to an obscure conditional branch behaviour, so I suspect the chances of forgetting to fix that later if necessary are close to 100%. > The existing behaviour of throwing a program interrupt is kept for > all other CPU models. > > Signed-off-by: Greg Kurz <groug@kaod.org> > --- > target/ppc/translate.c | 52 ++++++++++++++++++++++++++++++++++-------------- > 1 file changed, 37 insertions(+), 15 deletions(-) > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c > index aaafa3a715d8..d3aaa6482c6a 100644 > --- a/target/ppc/translate.c > +++ b/target/ppc/translate.c > @@ -3747,22 +3747,44 @@ static void gen_bcond(DisasContext *ctx, int type) > if ((bo & 0x4) == 0) { > /* Decrement and test CTR */ > TCGv temp = tcg_temp_new(); > - if (unlikely(type == BCOND_CTR)) { > - gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL); > - tcg_temp_free(temp); > - tcg_temp_free(target); > - return; > - } > - tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); > - if (NARROW_MODE(ctx)) { > - tcg_gen_ext32u_tl(temp, cpu_ctr); > - } else { > - tcg_gen_mov_tl(temp, cpu_ctr); > - } > - if (bo & 0x2) { > - tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); > + > + if (type == BCOND_CTR) { > + /* > + * All ISAs up to v3 describe this form of bcctr as invalid but > + * some processors, ie. 64-bit server processors compliant with > + * arch 2.x, do implement a "test and decrement" logic instead, > + * as described in their respective UMs. > + */ > + if (unlikely(!(ctx->insns_flags & PPC_SEGMENT_64B))) { > + gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL); > + tcg_temp_free(temp); > + tcg_temp_free(target); > + return; > + } > + > + if (NARROW_MODE(ctx)) { > + tcg_gen_ext32u_tl(temp, cpu_ctr); > + } else { > + tcg_gen_mov_tl(temp, cpu_ctr); > + } > + if (bo & 0x2) { > + tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); > + } else { > + tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); > + } > + tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); > } else { > - tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); > + tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); > + if (NARROW_MODE(ctx)) { > + tcg_gen_ext32u_tl(temp, cpu_ctr); > + } else { > + tcg_gen_mov_tl(temp, cpu_ctr); > + } > + if (bo & 0x2) { > + tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); > + } else { > + tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); > + } > } > tcg_temp_free(temp); > } >
On Mon, 25 Mar 2019 17:38:49 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > On Fri, Mar 22, 2019 at 07:03:46PM +0100, Greg Kurz wrote: > > Even if all ISAs up to v3 indeed mention: > > > > If the "decrement and test CTR" option is specified (BO2=0), the > > instruction form is invalid. > > > > The UMs of all existing 64-bit server class processors say: > > I've applied this series because it fixes an immediate problem, but I > have some significant reservations about it, read on.. > > > If BO[2] = 0, the contents of CTR (before any update) are used as the > > target address and for the test of the contents of CTR to resolve the > > branch. The contents of the CTR are then decremented and written back > > to the CTR. > > So, if that's what the hardware does, I guess that's what we need to > do. That behaviour seems totally bizarre though - how can it make > sense for the same register value to act as both the branch target and > a flag/counter? Or am I misreading something? > I must confess this puzzles me as well :-\ ... and the linux commit that introduces the use of this opcode, ie. ee13cb249fab ("powerpc/64s: Add support for software count cache flush") doesn't provide any details, at least for someone ignorant like me :) Maybe Michael can enlight us ? > > The linux kernel has spectre v2 mitigation code that relies on a > > BO[2] = 0 variant of bcctr, which is now activated by default on > > spapr, even with TCG. This causes linux guests to panic with > > the default machine type under TCG. > > > > Since any CPU model can provide its own behaviour for invalid forms, > > we could possibly introduce a new instruction flag to handle this. > > In practice, since the behaviour is shared by all 64-bit server > > processors starting with 970 up to POWER9, let's reuse the > > PPC_SEGMENT_64B flag. Caveat: this may have to be fixed later if > > POWER10 introduces a different behaviour. > > Yeah.. this makes me nervous. It's going to be very non-obvious that > a flag about MMU behaviour is linked to an obscure conditional branch > behaviour, so I suspect the chances of forgetting to fix that later if > necessary are close to 100%. > > > The existing behaviour of throwing a program interrupt is kept for > > all other CPU models. > > > > Signed-off-by: Greg Kurz <groug@kaod.org> > > --- > > target/ppc/translate.c | 52 ++++++++++++++++++++++++++++++++++-------------- > > 1 file changed, 37 insertions(+), 15 deletions(-) > > > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c > > index aaafa3a715d8..d3aaa6482c6a 100644 > > --- a/target/ppc/translate.c > > +++ b/target/ppc/translate.c > > @@ -3747,22 +3747,44 @@ static void gen_bcond(DisasContext *ctx, int type) > > if ((bo & 0x4) == 0) { > > /* Decrement and test CTR */ > > TCGv temp = tcg_temp_new(); > > - if (unlikely(type == BCOND_CTR)) { > > - gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL); > > - tcg_temp_free(temp); > > - tcg_temp_free(target); > > - return; > > - } > > - tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); > > - if (NARROW_MODE(ctx)) { > > - tcg_gen_ext32u_tl(temp, cpu_ctr); > > - } else { > > - tcg_gen_mov_tl(temp, cpu_ctr); > > - } > > - if (bo & 0x2) { > > - tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); > > + > > + if (type == BCOND_CTR) { > > + /* > > + * All ISAs up to v3 describe this form of bcctr as invalid but > > + * some processors, ie. 64-bit server processors compliant with > > + * arch 2.x, do implement a "test and decrement" logic instead, > > + * as described in their respective UMs. > > + */ > > + if (unlikely(!(ctx->insns_flags & PPC_SEGMENT_64B))) { > > + gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL); > > + tcg_temp_free(temp); > > + tcg_temp_free(target); > > + return; > > + } > > + > > + if (NARROW_MODE(ctx)) { > > + tcg_gen_ext32u_tl(temp, cpu_ctr); > > + } else { > > + tcg_gen_mov_tl(temp, cpu_ctr); > > + } > > + if (bo & 0x2) { > > + tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); > > + } else { > > + tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); > > + } > > + tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); > > } else { > > - tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); > > + tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); > > + if (NARROW_MODE(ctx)) { > > + tcg_gen_ext32u_tl(temp, cpu_ctr); > > + } else { > > + tcg_gen_mov_tl(temp, cpu_ctr); > > + } > > + if (bo & 0x2) { > > + tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); > > + } else { > > + tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); > > + } > > } > > tcg_temp_free(temp); > > } > > >
On Mon, Mar 25, 2019 at 12:25:15PM +0100, Greg Kurz wrote: > On Mon, 25 Mar 2019 17:38:49 +1100 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Fri, Mar 22, 2019 at 07:03:46PM +0100, Greg Kurz wrote: > > > Even if all ISAs up to v3 indeed mention: > > > > > > If the "decrement and test CTR" option is specified (BO2=0), the > > > instruction form is invalid. > > > > > > The UMs of all existing 64-bit server class processors say: > > > > I've applied this series because it fixes an immediate problem, but I > > have some significant reservations about it, read on.. > > > > > If BO[2] = 0, the contents of CTR (before any update) are used as the > > > target address and for the test of the contents of CTR to resolve the > > > branch. The contents of the CTR are then decremented and written back > > > to the CTR. > > > > So, if that's what the hardware does, I guess that's what we need to > > do. That behaviour seems totally bizarre though - how can it make > > sense for the same register value to act as both the branch target and > > a flag/counter? Or am I misreading something? > > > > I must confess this puzzles me as well :-\ ... and the linux commit that > introduces the use of this opcode, ie. ee13cb249fab ("powerpc/64s: Add > support for software count cache flush") doesn't provide any details, > at least for someone ignorant like me :) Oh... I suspect what's going on here is not actually that the branch has that effect, but that they've re-used this nonsensical option combination to mark the special cache count flush they need. In which case we'd need to not trap, but do something else in the TCG code. > Maybe Michael can enlight us ? Michael or Suraj..?
On Mon, Mar 25, 2019 at 12:25:15PM +0100, Greg Kurz wrote: > On Mon, 25 Mar 2019 17:38:49 +1100 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Fri, Mar 22, 2019 at 07:03:46PM +0100, Greg Kurz wrote: > > > Even if all ISAs up to v3 indeed mention: > > > > > > If the "decrement and test CTR" option is specified (BO2=0), the > > > instruction form is invalid. > > > > > > The UMs of all existing 64-bit server class processors say: > > > > I've applied this series because it fixes an immediate problem, but I > > have some significant reservations about it, read on.. > > > > > If BO[2] = 0, the contents of CTR (before any update) are used as the > > > target address and for the test of the contents of CTR to resolve the > > > branch. The contents of the CTR are then decremented and written back > > > to the CTR. > > > > So, if that's what the hardware does, I guess that's what we need to > > do. That behaviour seems totally bizarre though - how can it make > > sense for the same register value to act as both the branch target and > > a flag/counter? Or am I misreading something? > > > > I must confess this puzzles me as well :-\ ... and the linux commit that > introduces the use of this opcode, ie. ee13cb249fab ("powerpc/64s: Add > support for software count cache flush") doesn't provide any details, > at least for someone ignorant like me :) > > Maybe Michael can enlight us ? Ok, I talked to Michael on Slack. Here's my understanding: - The behaviour as described in the UM is correct, although bizarre and basically useless - I'm guessing the original reason for not making it an explicit invalid form is that the strange behaviour arises naturally out of orthogonality, and this avoided having an explicit check to trap on this form - Because this instruction form is otherwise useless and never used, it was chosen to trigger the extra micro-architectural side-effect needed for the Spectre workaround - The workaround code in the kernel isn't actually *using* the bizarre architected behaviour - the arguments and context are chosen to make it basically a no-op (w.r.t. architected state). The kernel only cares about the micro-architectural side effect So, I think the patches as applied are correct as they are. However, adding some comments saying how this situation arose would be a very good idea.
diff --git a/target/ppc/translate.c b/target/ppc/translate.c index aaafa3a715d8..d3aaa6482c6a 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -3747,22 +3747,44 @@ static void gen_bcond(DisasContext *ctx, int type) if ((bo & 0x4) == 0) { /* Decrement and test CTR */ TCGv temp = tcg_temp_new(); - if (unlikely(type == BCOND_CTR)) { - gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL); - tcg_temp_free(temp); - tcg_temp_free(target); - return; - } - tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); - if (NARROW_MODE(ctx)) { - tcg_gen_ext32u_tl(temp, cpu_ctr); - } else { - tcg_gen_mov_tl(temp, cpu_ctr); - } - if (bo & 0x2) { - tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); + + if (type == BCOND_CTR) { + /* + * All ISAs up to v3 describe this form of bcctr as invalid but + * some processors, ie. 64-bit server processors compliant with + * arch 2.x, do implement a "test and decrement" logic instead, + * as described in their respective UMs. + */ + if (unlikely(!(ctx->insns_flags & PPC_SEGMENT_64B))) { + gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL); + tcg_temp_free(temp); + tcg_temp_free(target); + return; + } + + if (NARROW_MODE(ctx)) { + tcg_gen_ext32u_tl(temp, cpu_ctr); + } else { + tcg_gen_mov_tl(temp, cpu_ctr); + } + if (bo & 0x2) { + tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); + } else { + tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); + } + tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); } else { - tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); + tcg_gen_subi_tl(cpu_ctr, cpu_ctr, 1); + if (NARROW_MODE(ctx)) { + tcg_gen_ext32u_tl(temp, cpu_ctr); + } else { + tcg_gen_mov_tl(temp, cpu_ctr); + } + if (bo & 0x2) { + tcg_gen_brcondi_tl(TCG_COND_NE, temp, 0, l1); + } else { + tcg_gen_brcondi_tl(TCG_COND_EQ, temp, 0, l1); + } } tcg_temp_free(temp); }
Even if all ISAs up to v3 indeed mention: If the "decrement and test CTR" option is specified (BO2=0), the instruction form is invalid. The UMs of all existing 64-bit server class processors say: If BO[2] = 0, the contents of CTR (before any update) are used as the target address and for the test of the contents of CTR to resolve the branch. The contents of the CTR are then decremented and written back to the CTR. The linux kernel has spectre v2 mitigation code that relies on a BO[2] = 0 variant of bcctr, which is now activated by default on spapr, even with TCG. This causes linux guests to panic with the default machine type under TCG. Since any CPU model can provide its own behaviour for invalid forms, we could possibly introduce a new instruction flag to handle this. In practice, since the behaviour is shared by all 64-bit server processors starting with 970 up to POWER9, let's reuse the PPC_SEGMENT_64B flag. Caveat: this may have to be fixed later if POWER10 introduces a different behaviour. The existing behaviour of throwing a program interrupt is kept for all other CPU models. Signed-off-by: Greg Kurz <groug@kaod.org> --- target/ppc/translate.c | 52 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 37 insertions(+), 15 deletions(-)