Message ID | 20190607222222.15300-1-palmer@sifive.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] RISC-V: Break load reservations during switch_to | expand |
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
On 19-06-07 15:22:22, Palmer Dabbelt wrote: > The comment describes why in detail. This was found because QEMU never > gives up load reservations, the issue is unlikely to manifest on real > hardware. Makes sense, however it obviously will not help until qemu actually clears load reservations on SC (or otherwise handles the interleaved SC case). See comment inline. > Thanks to Carlos Eduardo for finding the bug! > > Signed-off-by: Palmer Dabbelt <palmer@sifive.com> > --- > Changes since v1 <20190605231735.26581-1-palmer@sifive.com>: > > * REG_SC is now defined as a helper macro, for any code that wants to SC > a register-sized value. > * The explicit #ifdef to check that TASK_THREAD_RA_RA is 0 has been > removed. Instead we rely on the assembler to catch non-zero SC > offsets. I've tested this does actually work. > > arch/riscv/include/asm/asm.h | 1 + > arch/riscv/kernel/entry.S | 11 +++++++++++ > 2 files changed, 12 insertions(+) > > diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h > index 5ad4cb622bed..946b671f996c 100644 > --- a/arch/riscv/include/asm/asm.h > +++ b/arch/riscv/include/asm/asm.h > @@ -30,6 +30,7 @@ > > #define REG_L __REG_SEL(ld, lw) > #define REG_S __REG_SEL(sd, sw) > +#define REG_SC __REG_SEL(sc.w, sc.d) The instructions appear to be inverted here (i.e. "sc.d, sc.w"). > #define SZREG __REG_SEL(8, 4) > #define LGREG __REG_SEL(3, 2) > > diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S > index 1c1ecc238cfa..af685782af17 100644 > --- a/arch/riscv/kernel/entry.S > +++ b/arch/riscv/kernel/entry.S > @@ -330,6 +330,17 @@ ENTRY(__switch_to) > add a3, a0, a4 > add a4, a1, a4 > REG_S ra, TASK_THREAD_RA_RA(a3) > + /* > + * The Linux ABI allows programs to depend on load reservations being > + * broken on context switches, but the ISA doesn't require that the > + * hardware ever breaks a load reservation. The only way to break a > + * load reservation is with a store conditional, so we emit one here. > + * Since nothing ever takes a load reservation on TASK_THREAD_RA_RA we > + * know this will always fail, but just to be on the safe side this > + * writes the same value that was unconditionally written by the > + * previous instruction. > + */ > + REG_SC x0, ra, TASK_THREAD_RA_RA(a3) > REG_S sp, TASK_THREAD_SP_RA(a3) > REG_S s0, TASK_THREAD_S0_RA(a3) > REG_S s1, TASK_THREAD_S1_RA(a3) > -- > 2.21.0 >
On Sun, 16 Jun 2019 10:54:06 PDT (-0700), joel@sing.id.au wrote: > On 19-06-07 15:22:22, Palmer Dabbelt wrote: >> The comment describes why in detail. This was found because QEMU never >> gives up load reservations, the issue is unlikely to manifest on real >> hardware. > > Makes sense, however it obviously will not help until qemu actually > clears load reservations on SC (or otherwise handles the interleaved > SC case). Ya, I wasn't paying close enough attention. I think this should do it diff --git a/target/riscv/insn_trans/trans_rva.inc.c b/target/riscv/insn_trans/trans_rva.inc.c index f6dbbc065e15..001a68ced005 100644 --- a/target/riscv/insn_trans/trans_rva.inc.c +++ b/target/riscv/insn_trans/trans_rva.inc.c @@ -69,6 +69,7 @@ static inline bool gen_sc(DisasContext *ctx, arg_atomic *a, TCGMemOp mop) gen_set_gpr(a->rd, dat); gen_set_label(l2); + tcg_gen_movi_tl(load_res, -1); tcg_temp_free(dat); tcg_temp_free(src1); tcg_temp_free(src2); I'll send the patch out to the QEMU mailing list. > See comment inline. > >> Thanks to Carlos Eduardo for finding the bug! >> >> Signed-off-by: Palmer Dabbelt <palmer@sifive.com> >> --- >> Changes since v1 <20190605231735.26581-1-palmer@sifive.com>: >> >> * REG_SC is now defined as a helper macro, for any code that wants to SC >> a register-sized value. >> * The explicit #ifdef to check that TASK_THREAD_RA_RA is 0 has been >> removed. Instead we rely on the assembler to catch non-zero SC >> offsets. I've tested this does actually work. >> >> arch/riscv/include/asm/asm.h | 1 + >> arch/riscv/kernel/entry.S | 11 +++++++++++ >> 2 files changed, 12 insertions(+) >> >> diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h >> index 5ad4cb622bed..946b671f996c 100644 >> --- a/arch/riscv/include/asm/asm.h >> +++ b/arch/riscv/include/asm/asm.h >> @@ -30,6 +30,7 @@ >> >> #define REG_L __REG_SEL(ld, lw) >> #define REG_S __REG_SEL(sd, sw) >> +#define REG_SC __REG_SEL(sc.w, sc.d) > > The instructions appear to be inverted here (i.e. "sc.d, sc.w"). Thanks, I'll send a v3. >> #define SZREG __REG_SEL(8, 4) >> #define LGREG __REG_SEL(3, 2) >> >> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S >> index 1c1ecc238cfa..af685782af17 100644 >> --- a/arch/riscv/kernel/entry.S >> +++ b/arch/riscv/kernel/entry.S >> @@ -330,6 +330,17 @@ ENTRY(__switch_to) >> add a3, a0, a4 >> add a4, a1, a4 >> REG_S ra, TASK_THREAD_RA_RA(a3) >> + /* >> + * The Linux ABI allows programs to depend on load reservations being >> + * broken on context switches, but the ISA doesn't require that the >> + * hardware ever breaks a load reservation. The only way to break a >> + * load reservation is with a store conditional, so we emit one here. >> + * Since nothing ever takes a load reservation on TASK_THREAD_RA_RA we >> + * know this will always fail, but just to be on the safe side this >> + * writes the same value that was unconditionally written by the >> + * previous instruction. >> + */ >> + REG_SC x0, ra, TASK_THREAD_RA_RA(a3) >> REG_S sp, TASK_THREAD_SP_RA(a3) >> REG_S s0, TASK_THREAD_S0_RA(a3) >> REG_S s1, TASK_THREAD_S1_RA(a3) >> -- >> 2.21.0 >>
On Fri, Jun 07, 2019 at 03:22:22PM -0700, Palmer Dabbelt wrote: > The comment describes why in detail. This was found because QEMU never > gives up load reservations, the issue is unlikely to manifest on real > hardware. > > Thanks to Carlos Eduardo for finding the bug! > @@ -330,6 +330,17 @@ ENTRY(__switch_to) > add a3, a0, a4 > add a4, a1, a4 > REG_S ra, TASK_THREAD_RA_RA(a3) > + /* > + * The Linux ABI allows programs to depend on load reservations being > + * broken on context switches, but the ISA doesn't require that the > + * hardware ever breaks a load reservation. The only way to break a > + * load reservation is with a store conditional, so we emit one here. > + * Since nothing ever takes a load reservation on TASK_THREAD_RA_RA we > + * know this will always fail, but just to be on the safe side this > + * writes the same value that was unconditionally written by the > + * previous instruction. > + */ I suspect that you need to do the same as 32-bit ARM, and clear this in your exception return path, rather than in __switch_to, since handlers for interrupts and other exceptions could leave a dangling reservation. For ARM, the architecture permits a store-exclusive to succeed even if the address differed from the load-exclusive. I don't know if the same applies here, but regardless I believe the case above applies if an IRQ is taken from kernel context, since the handler can manipulate the same variable as the interrupted code. Thanks, Mark.
On Wed, 19 Jun 2019 00:36:01 PDT (-0700), mark.rutland@arm.com wrote: > On Fri, Jun 07, 2019 at 03:22:22PM -0700, Palmer Dabbelt wrote: >> The comment describes why in detail. This was found because QEMU never >> gives up load reservations, the issue is unlikely to manifest on real >> hardware. >> >> Thanks to Carlos Eduardo for finding the bug! > >> @@ -330,6 +330,17 @@ ENTRY(__switch_to) >> add a3, a0, a4 >> add a4, a1, a4 >> REG_S ra, TASK_THREAD_RA_RA(a3) >> + /* >> + * The Linux ABI allows programs to depend on load reservations being >> + * broken on context switches, but the ISA doesn't require that the >> + * hardware ever breaks a load reservation. The only way to break a >> + * load reservation is with a store conditional, so we emit one here. >> + * Since nothing ever takes a load reservation on TASK_THREAD_RA_RA we >> + * know this will always fail, but just to be on the safe side this >> + * writes the same value that was unconditionally written by the >> + * previous instruction. >> + */ > > I suspect that you need to do the same as 32-bit ARM, and clear this in > your exception return path, rather than in __switch_to, since handlers > for interrupts and other exceptions could leave a dangling reservation. > > For ARM, the architecture permits a store-exclusive to succeed even if > the address differed from the load-exclusive. I don't know if the same > applies here, but regardless I believe the case above applies if an IRQ > is taken from kernel context, since the handler can manipulate the same > variable as the interrupted code. RISC-V has the same constraint: an LR can cause the subsequent SC on any address to succeed. When writing the patch I thought they had to have matching addresses, v4 should have a correct comment (assuming I've managed to send it, I'm on my third continent this week so I'm a bit out of it). I'd considered breaking reservations on trap entry, but decided it wasn't necessary. I hadn't considered doing this on trap exit, but I don't see a difference so I might be missing something. The case that I see as an issue is when a trap comes in the middle of an LR/SC sequence, which boils down to three cases: * The trap handler doesn't contend with the LR/SC sequence in any way, in which case it's fine for the sequence to succeed. * The trap handler contends by doing its own LR/SC sequence. Since the trap handler must execute completely before returning control back the parent, we know the SC in the trap handler will execute. Thus there is no way the SC in the parent can pair with the LR in the trap handler. This applies even when traps are nested. * The trap handler contends by performing a regular store to the same address as the LR that was interrupted. In this case the SC must fail, and while I assumed that the store would cause that failure the ISA manual doesn't appear to require that behavior -- it does allow the SC to always fail in that case, but it doesn't mandate it always fails (which is how I got confused). Assuming the ISA manual is correct in not specifying that stores within an LR/SC sequence break the load reservations, then I think we do need to break load reservations on all traps. I'll go check with the ISA people, but now that I've noticed it this does seem somewhat reasonable -- essentially it lets LR just take a line exclusively, SC to succeed only on already exclusively held lines, and doesn't impose any extra constraints on regular memory operations. I don't see any reason that breaking reservations on entry as opposed to exit would be incorrect. I feel like doing this on entry is a better bet, as we're already doing a bunch of stores there so I don't need to bring an additional cache line in for writing. These lines would be on the kernel stack so it's unlikely anyone else has them for writing anyway, so maybe it just doesn't matter. The only issue I can think of here is that there might be something tricky for implementations to handle WRT the regular store -> store conditional forwarding, as that's a pattern that is unlikely to manifest on regular code but is now in a high performance path. The regular load -> store conditional may be easier to handle, and since this is done on the kernel stack it's unlikely that upgrading a line for writing would be an expensive operation anyway. LMK if I'm missing something, otherwise I'll spin another version of the patch to break reservations on trap entry.
diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h index 5ad4cb622bed..946b671f996c 100644 --- a/arch/riscv/include/asm/asm.h +++ b/arch/riscv/include/asm/asm.h @@ -30,6 +30,7 @@ #define REG_L __REG_SEL(ld, lw) #define REG_S __REG_SEL(sd, sw) +#define REG_SC __REG_SEL(sc.w, sc.d) #define SZREG __REG_SEL(8, 4) #define LGREG __REG_SEL(3, 2) diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index 1c1ecc238cfa..af685782af17 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -330,6 +330,17 @@ ENTRY(__switch_to) add a3, a0, a4 add a4, a1, a4 REG_S ra, TASK_THREAD_RA_RA(a3) + /* + * The Linux ABI allows programs to depend on load reservations being + * broken on context switches, but the ISA doesn't require that the + * hardware ever breaks a load reservation. The only way to break a + * load reservation is with a store conditional, so we emit one here. + * Since nothing ever takes a load reservation on TASK_THREAD_RA_RA we + * know this will always fail, but just to be on the safe side this + * writes the same value that was unconditionally written by the + * previous instruction. + */ + REG_SC x0, ra, TASK_THREAD_RA_RA(a3) REG_S sp, TASK_THREAD_SP_RA(a3) REG_S s0, TASK_THREAD_S0_RA(a3) REG_S s1, TASK_THREAD_S1_RA(a3)
The comment describes why in detail. This was found because QEMU never gives up load reservations, the issue is unlikely to manifest on real hardware. Thanks to Carlos Eduardo for finding the bug! Signed-off-by: Palmer Dabbelt <palmer@sifive.com> --- Changes since v1 <20190605231735.26581-1-palmer@sifive.com>: * REG_SC is now defined as a helper macro, for any code that wants to SC a register-sized value. * The explicit #ifdef to check that TASK_THREAD_RA_RA is 0 has been removed. Instead we rely on the assembler to catch non-zero SC offsets. I've tested this does actually work. arch/riscv/include/asm/asm.h | 1 + arch/riscv/kernel/entry.S | 11 +++++++++++ 2 files changed, 12 insertions(+)