Message ID | 20230228155121.3416-1-ubizjak@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | rcu: use try_cmpxchg in check_cpu_stall | expand |
On Tue, Feb 28, 2023 at 04:51:21PM +0100, Uros Bizjak wrote: > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in > check_cpu_stall. x86 CMPXCHG instruction returns success in ZF flag, so > this change saves a compare after cmpxchg (and related move instruction in > front of cmpxchg). In my codegen, I am not seeing mov instruction before the cmp removed, how can that be? The rax has to be populated with a mov before cmpxchg right? So try_cmpxchg gives: mov, cmpxchg, cmp, jne Where as cmpxchg gives: mov, cmpxchg, mov, jne So yeah you got rid of compare, but I am not seeing reduction in moves. Either way, I think it is an improvement due to dropping cmp so: Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> thanks, - Joel > > No functional change intended. > > Cc: "Paul E. McKenney" <paulmck@kernel.org> > Cc: Frederic Weisbecker <frederic@kernel.org> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> > Cc: Josh Triplett <josh@joshtriplett.org> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Cc: Lai Jiangshan <jiangshanlai@gmail.com> > Cc: Joel Fernandes <joel@joelfernandes.org> > Signed-off-by: Uros Bizjak <ubizjak@gmail.com> > --- > kernel/rcu/tree_stall.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index b10b8349bb2a..d81c88e66b42 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -760,7 +760,7 @@ static void check_cpu_stall(struct rcu_data *rdp) > jn = jiffies + ULONG_MAX / 2; > if (rcu_gp_in_progress() && > (READ_ONCE(rnp->qsmask) & rdp->grpmask) && > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { > > /* > * If a virtual machine is stopped by the host it can look to > @@ -778,7 +778,7 @@ static void check_cpu_stall(struct rcu_data *rdp) > > } else if (rcu_gp_in_progress() && > ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { > > /* > * If a virtual machine is stopped by the host it can look to > -- > 2.39.2 >
> On Feb 28, 2023, at 3:39 PM, Joel Fernandes <joel@joelfernandes.org> wrote: > > On Tue, Feb 28, 2023 at 04:51:21PM +0100, Uros Bizjak wrote: >> Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in >> check_cpu_stall. x86 CMPXCHG instruction returns success in ZF flag, so >> this change saves a compare after cmpxchg (and related move instruction in >> front of cmpxchg). > > In my codegen, I am not seeing mov instruction before the cmp removed, how > can that be? The rax has to be populated with a mov before cmpxchg right? > > So try_cmpxchg gives: mov, cmpxchg, cmp, jne > Where as cmpxchg gives: mov, cmpxchg, mov, jne > > So yeah you got rid of compare, but I am not seeing reduction in moves. > Either way, I think it is an improvement due to dropping cmp so: > > Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Ah there probably is a lesser mov to hold the value to compare against, but maybe not always? Would be nice to clarify that in the changelog anyway. Thanks, - Joel > > thanks, > > - Joel > > >> >> No functional change intended. >> >> Cc: "Paul E. McKenney" <paulmck@kernel.org> >> Cc: Frederic Weisbecker <frederic@kernel.org> >> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> >> Cc: Josh Triplett <josh@joshtriplett.org> >> Cc: Steven Rostedt <rostedt@goodmis.org> >> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> >> Cc: Lai Jiangshan <jiangshanlai@gmail.com> >> Cc: Joel Fernandes <joel@joelfernandes.org> >> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> >> --- >> kernel/rcu/tree_stall.h | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h >> index b10b8349bb2a..d81c88e66b42 100644 >> --- a/kernel/rcu/tree_stall.h >> +++ b/kernel/rcu/tree_stall.h >> @@ -760,7 +760,7 @@ static void check_cpu_stall(struct rcu_data *rdp) >> jn = jiffies + ULONG_MAX / 2; >> if (rcu_gp_in_progress() && >> (READ_ONCE(rnp->qsmask) & rdp->grpmask) && >> - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { >> + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { >> >> /* >> * If a virtual machine is stopped by the host it can look to >> @@ -778,7 +778,7 @@ static void check_cpu_stall(struct rcu_data *rdp) >> >> } else if (rcu_gp_in_progress() && >> ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && >> - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { >> + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { >> >> /* >> * If a virtual machine is stopped by the host it can look to >> -- >> 2.39.2 >>
On Tue, 28 Feb 2023 20:39:30 +0000 Joel Fernandes <joel@joelfernandes.org> wrote: > On Tue, Feb 28, 2023 at 04:51:21PM +0100, Uros Bizjak wrote: > > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in > > check_cpu_stall. x86 CMPXCHG instruction returns success in ZF flag, so > > this change saves a compare after cmpxchg (and related move instruction in > > front of cmpxchg). > > In my codegen, I am not seeing mov instruction before the cmp removed, how > can that be? The rax has to be populated with a mov before cmpxchg right? > > So try_cmpxchg gives: mov, cmpxchg, cmp, jne > Where as cmpxchg gives: mov, cmpxchg, mov, jne > > So yeah you got rid of compare, but I am not seeing reduction in moves. > Either way, I think it is an improvement due to dropping cmp so: Did you get the above backwards? Anyway, when looking at the conversion of cmpxchg() to try_cmpxchg() that Uros sent to me for the ring buffer, the code went from: 0000000000000070 <ring_buffer_record_off>: 70: 48 8d 4f 08 lea 0x8(%rdi),%rcx 74: 8b 57 08 mov 0x8(%rdi),%edx 77: 89 d6 mov %edx,%esi 79: 89 d0 mov %edx,%eax 7b: 81 ce 00 00 10 00 or $0x100000,%esi 81: f0 0f b1 31 lock cmpxchg %esi,(%rcx) 85: 39 d0 cmp %edx,%eax 87: 75 eb jne 74 <ring_buffer_record_off+0x4> 89: e9 00 00 00 00 jmp 8e <ring_buffer_record_off+0x1e> 8a: R_X86_64_PLT32 __x86_return_thunk-0x4 8e: 66 90 xchg %ax,%ax To 00000000000001a0 <ring_buffer_record_off>: 1a0: 8b 47 08 mov 0x8(%rdi),%eax 1a3: 48 8d 4f 08 lea 0x8(%rdi),%rcx 1a7: 89 c2 mov %eax,%edx 1a9: 81 ca 00 00 10 00 or $0x100000,%edx 1af: f0 0f b1 57 08 lock cmpxchg %edx,0x8(%rdi) 1b4: 75 05 jne 1bb <ring_buffer_record_off+0x1b> 1b6: e9 00 00 00 00 jmp 1bb <ring_buffer_record_off+0x1b> 1b7: R_X86_64_PLT32 __x86_return_thunk-0x4 1bb: 89 c2 mov %eax,%edx 1bd: 81 ca 00 00 10 00 or $0x100000,%edx 1c3: f0 0f b1 11 lock cmpxchg %edx,(%rcx) 1c7: 75 f2 jne 1bb <ring_buffer_record_off+0x1b> 1c9: e9 00 00 00 00 jmp 1ce <ring_buffer_record_off+0x2e> 1ca: R_X86_64_PLT32 __x86_return_thunk-0x4 1ce: 66 90 xchg %ax,%ax It does add a bit more code, but the fast path seems better (where the cmpxchg succeeds). That would be: 00000000000001a0 <ring_buffer_record_off>: 1a0: 8b 47 08 mov 0x8(%rdi),%eax 1a3: 48 8d 4f 08 lea 0x8(%rdi),%rcx 1a7: 89 c2 mov %eax,%edx 1a9: 81 ca 00 00 10 00 or $0x100000,%edx 1af: f0 0f b1 57 08 lock cmpxchg %edx,0x8(%rdi) 1b4: 75 05 jne 1bb <ring_buffer_record_off+0x1b> 1b6: e9 00 00 00 00 jmp 1bb <ring_buffer_record_off+0x1b> 1b7: R_X86_64_PLT32 __x86_return_thunk-0x4 Where there's only two moves and no cmp, where the former has 3 moves and a cmp in the fast path. -- Steve
On Tue, Feb 28, 2023 at 04:03:24PM -0500, Steven Rostedt wrote: > On Tue, 28 Feb 2023 20:39:30 +0000 > Joel Fernandes <joel@joelfernandes.org> wrote: > > > On Tue, Feb 28, 2023 at 04:51:21PM +0100, Uros Bizjak wrote: > > > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in > > > check_cpu_stall. x86 CMPXCHG instruction returns success in ZF flag, so > > > this change saves a compare after cmpxchg (and related move instruction in > > > front of cmpxchg). > > > > In my codegen, I am not seeing mov instruction before the cmp removed, how > > can that be? The rax has to be populated with a mov before cmpxchg right? > > > > So try_cmpxchg gives: mov, cmpxchg, cmp, jne > > Where as cmpxchg gives: mov, cmpxchg, mov, jne > > > > So yeah you got rid of compare, but I am not seeing reduction in moves. > > Either way, I think it is an improvement due to dropping cmp so: > > Did you get the above backwards? > > Anyway, when looking at the conversion of cmpxchg() to try_cmpxchg() that > Uros sent to me for the ring buffer, the code went from: > > 0000000000000070 <ring_buffer_record_off>: > 70: 48 8d 4f 08 lea 0x8(%rdi),%rcx > 74: 8b 57 08 mov 0x8(%rdi),%edx > 77: 89 d6 mov %edx,%esi > 79: 89 d0 mov %edx,%eax > 7b: 81 ce 00 00 10 00 or $0x100000,%esi > 81: f0 0f b1 31 lock cmpxchg %esi,(%rcx) > 85: 39 d0 cmp %edx,%eax > 87: 75 eb jne 74 <ring_buffer_record_off+0x4> > 89: e9 00 00 00 00 jmp 8e <ring_buffer_record_off+0x1e> > 8a: R_X86_64_PLT32 __x86_return_thunk-0x4 > 8e: 66 90 xchg %ax,%ax > > > To > > 00000000000001a0 <ring_buffer_record_off>: > 1a0: 8b 47 08 mov 0x8(%rdi),%eax > 1a3: 48 8d 4f 08 lea 0x8(%rdi),%rcx > 1a7: 89 c2 mov %eax,%edx > 1a9: 81 ca 00 00 10 00 or $0x100000,%edx > 1af: f0 0f b1 57 08 lock cmpxchg %edx,0x8(%rdi) > 1b4: 75 05 jne 1bb <ring_buffer_record_off+0x1b> > 1b6: e9 00 00 00 00 jmp 1bb <ring_buffer_record_off+0x1b> > 1b7: R_X86_64_PLT32 __x86_return_thunk-0x4 > 1bb: 89 c2 mov %eax,%edx > 1bd: 81 ca 00 00 10 00 or $0x100000,%edx > 1c3: f0 0f b1 11 lock cmpxchg %edx,(%rcx) > 1c7: 75 f2 jne 1bb <ring_buffer_record_off+0x1b> > 1c9: e9 00 00 00 00 jmp 1ce <ring_buffer_record_off+0x2e> > 1ca: R_X86_64_PLT32 __x86_return_thunk-0x4 > 1ce: 66 90 xchg %ax,%ax > > > It does add a bit more code, but the fast path seems better (where the > cmpxchg succeeds). That would be: > > 00000000000001a0 <ring_buffer_record_off>: > 1a0: 8b 47 08 mov 0x8(%rdi),%eax > 1a3: 48 8d 4f 08 lea 0x8(%rdi),%rcx > 1a7: 89 c2 mov %eax,%edx > 1a9: 81 ca 00 00 10 00 or $0x100000,%edx > 1af: f0 0f b1 57 08 lock cmpxchg %edx,0x8(%rdi) > 1b4: 75 05 jne 1bb <ring_buffer_record_off+0x1b> > 1b6: e9 00 00 00 00 jmp 1bb <ring_buffer_record_off+0x1b> > 1b7: R_X86_64_PLT32 __x86_return_thunk-0x4 > > > Where there's only two moves and no cmp, where the former has 3 moves and a > cmp in the fast path. All well and good, but the stall-warning code is nowhere near a fastpath. Is try_cmpxchg() considered more readable in this context? Thanx, Paul
On Tue, 28 Feb 2023 13:29:11 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote: > All well and good, but the stall-warning code is nowhere near a fastpath. > > Is try_cmpxchg() considered more readable in this context? - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { It's basically the same :-/ But looking at this use case, I'd actually NAK it, as it is misleading. As try_cmpxchg() is used to get rid of the updating of the old value. As in the ring buffer code we had: void ring_buffer_record_off(struct trace_buffer *buffer) { unsigned int rd; unsigned int new_rd; do { rd = atomic_read(&buffer->record_disabled); new_rd = rd | RB_BUFFER_OFF; } while (!atomic_cmpxchg(&buffer->record_disabled, &rd, new_rd) != rd); } and the try_cmpxchg() converted it to: void ring_buffer_record_off(struct trace_buffer *buffer) { unsigned int rd; unsigned int new_rd; rd = atomic_read(&buffer->record_disabled); do { new_rd = rd | RB_BUFFER_OFF; } while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd)); } Which got rid of the need to constantly update the rd variable (cmpxchg will load rax with the value read, so it removes the need for an extra move). But in your case, we don't need to update js, in which case the try_cmpxchg() does. The patch that Uros sent me for the ring buffer code also does some of that, which I feel is wrong. So with that, I would nack the patch. -- Steve
On Tue, Feb 28, 2023 at 04:41:24PM -0500, Steven Rostedt wrote: > On Tue, 28 Feb 2023 13:29:11 -0800 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > All well and good, but the stall-warning code is nowhere near a fastpath. > > > > Is try_cmpxchg() considered more readable in this context? > > > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { > > It's basically the same :-/ That was my assessment. ;-) > But looking at this use case, I'd actually NAK it, as it is misleading. > > As try_cmpxchg() is used to get rid of the updating of the old value. As in > the ring buffer code we had: > > void ring_buffer_record_off(struct trace_buffer *buffer) > { > unsigned int rd; > unsigned int new_rd; > > do { > rd = atomic_read(&buffer->record_disabled); > new_rd = rd | RB_BUFFER_OFF; > } while (!atomic_cmpxchg(&buffer->record_disabled, &rd, new_rd) != rd); > } > > and the try_cmpxchg() converted it to: > > void ring_buffer_record_off(struct trace_buffer *buffer) > { > unsigned int rd; > unsigned int new_rd; > > rd = atomic_read(&buffer->record_disabled); > do { > new_rd = rd | RB_BUFFER_OFF; > } while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd)); > } > > Which got rid of the need to constantly update the rd variable (cmpxchg > will load rax with the value read, so it removes the need for an extra > move). > > But in your case, we don't need to update js, in which case the > try_cmpxchg() does. > > The patch that Uros sent me for the ring buffer code also does some of > that, which I feel is wrong. > > So with that, I would nack the patch. OK, I will leave this one out. Thanx, Paul
Hey Steve, On Tue, Feb 28, 2023 at 4:41 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Tue, 28 Feb 2023 13:29:11 -0800 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > All well and good, but the stall-warning code is nowhere near a fastpath. > > > > Is try_cmpxchg() considered more readable in this context? > > > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { > > It's basically the same :-/ > > But looking at this use case, I'd actually NAK it, as it is misleading. I'm trying to parse this. You are saying it is misleading, because it updates js when it doesn't need to? > As try_cmpxchg() is used to get rid of the updating of the old value. As in > the ring buffer code we had: > > void ring_buffer_record_off(struct trace_buffer *buffer) > { > unsigned int rd; > unsigned int new_rd; > > do { > rd = atomic_read(&buffer->record_disabled); > new_rd = rd | RB_BUFFER_OFF; > } while (!atomic_cmpxchg(&buffer->record_disabled, &rd, new_rd) != rd); Hear you actually meant "rd" as the second parameter without the & ? > } > > and the try_cmpxchg() converted it to: > > void ring_buffer_record_off(struct trace_buffer *buffer) > { > unsigned int rd; > unsigned int new_rd; > > rd = atomic_read(&buffer->record_disabled); > do { > new_rd = rd | RB_BUFFER_OFF; > } while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd)); > } > > Which got rid of the need to constantly update the rd variable (cmpxchg > will load rax with the value read, so it removes the need for an extra > move). So that's a good thing? > > But in your case, we don't need to update js, in which case the > try_cmpxchg() does. Right, it has lesser value here but I'm curious why you feel it also doesn't belong in that ring buffer loop you shared (or did you mean, it does belong there but not in other ftrace code modified by Uros?). > The patch that Uros sent me for the ring buffer code also does some of > that, which I feel is wrong. I am guessing that is code *other than* the ring buffer loop above. > So with that, I would nack the patch. Dropping it for RCU is fine with me as well. (And yes for the other thread, I had it backwards, try_cmpxchg is what drops the need for cmp instruction -- sorry.) thanks, -Joel
On Tue, 28 Feb 2023 18:30:14 -0500 Joel Fernandes <joel@joelfernandes.org> wrote: > > > > But looking at this use case, I'd actually NAK it, as it is misleading. > > I'm trying to parse this. You are saying it is misleading, because it > updates js when it doesn't need to? Correct. > > > As try_cmpxchg() is used to get rid of the updating of the old value. As in > > the ring buffer code we had: > > > > void ring_buffer_record_off(struct trace_buffer *buffer) > > { > > unsigned int rd; > > unsigned int new_rd; > > > > do { > > rd = atomic_read(&buffer->record_disabled); > > new_rd = rd | RB_BUFFER_OFF; > > } while (!atomic_cmpxchg(&buffer->record_disabled, &rd, new_rd) != rd); > > Hear you actually meant "rd" as the second parameter without the & ? Yes, I cut and pasted the updated code and incorrectly try to revert it in this example :-p > > > } > > > > and the try_cmpxchg() converted it to: > > > > void ring_buffer_record_off(struct trace_buffer *buffer) > > { > > unsigned int rd; > > unsigned int new_rd; > > > > rd = atomic_read(&buffer->record_disabled); > > do { > > new_rd = rd | RB_BUFFER_OFF; > > } while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd)); > > } > > > > Which got rid of the need to constantly update the rd variable (cmpxchg > > will load rax with the value read, so it removes the need for an extra > > move). > > So that's a good thing? Yes. For looping, try_cmpxchg() is the proper function to use. But in the RCU case (and other cases in the ring-buffer patch) there is no loop, and no need to modify the "old" variable. > > > > > But in your case, we don't need to update js, in which case the > > try_cmpxchg() does. > > Right, it has lesser value here but I'm curious why you feel it also > doesn't belong in that ring buffer loop you shared (or did you mean, > it does belong there but not in other ftrace code modified by Uros?). The ring buffer patch had more than one change, where half the updates were fine, and half were not. -- Steve
Hey Steve, On Tue, Feb 28, 2023 at 7:08 PM Steven Rostedt <rostedt@goodmis.org> wrote: [...] > > > > > > > > But in your case, we don't need to update js, in which case the > > > try_cmpxchg() does. > > > > Right, it has lesser value here but I'm curious why you feel it also > > doesn't belong in that ring buffer loop you shared (or did you mean, > > it does belong there but not in other ftrace code modified by Uros?). > > The ring buffer patch had more than one change, where half the updates were > fine, and half were not. Thanks for all the clarifications. Good to know such loop design patterns can benefit from it! - Joel
On Tue, Feb 28, 2023 at 9:39 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > On Tue, Feb 28, 2023 at 04:51:21PM +0100, Uros Bizjak wrote: > > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in > > check_cpu_stall. x86 CMPXCHG instruction returns success in ZF flag, so > > this change saves a compare after cmpxchg (and related move instruction in > > front of cmpxchg). > > In my codegen, I am not seeing mov instruction before the cmp removed, how > can that be? The rax has to be populated with a mov before cmpxchg right? > > So try_cmpxchg gives: mov, cmpxchg, cmp, jne > Where as cmpxchg gives: mov, cmpxchg, mov, jne (I assume the above is reversed) You have quite a new compiler (e.g. gcc-12+) which is able to reorder basic blocks, reschedule instructions and create fast paths through the code based on likely/unlikely annotations in the definition of try_cmpxchg. [On a related note, some code growth is allowed to the compiler in the hot path since the kernel is compiled with -O2, not -Os.] gcc-10.3.1 improves the code from tree.c from: a1c5: 0f 84 53 03 00 00 je a51e <rcu_sched_clock_irq+0x70e> a1cb: 48 89 c8 mov %rcx,%rax a1ce: f0 48 0f b1 35 00 00 lock cmpxchg %rsi,0x0(%rip) # a1d7 <rcu_sched_clock_irq+0x3c7> a1d5: 00 00 a1d3: R_X86_64_PC32 .data+0xf9c a1d7: 48 39 c1 cmp %rax,%rcx a1da: 0f 85 3e 03 00 00 jne a51e <rcu_sched_clock_irq+0x70e> to: a1d0: 0f 84 49 03 00 00 je a51f <rcu_sched_clock_irq+0x70f> a1d6: f0 48 0f b1 35 00 00 lock cmpxchg %rsi,0x0(%rip) # a1df <rcu_sched_clock_irq+0x3cf> a1dd: 00 00 a1db: R_X86_64_PC32 .data+0xf9c a1df: 0f 85 3a 03 00 00 jne a51f <rcu_sched_clock_irq+0x70f> The change brings the following code size improvement: text data bss dec hex filename 57712 9945 86 67743 1089f tree-new.o 57760 9945 86 67791 108cf tree-old.o Small change, but the result of effectively an almost mechanical one-line substitutions. Uros. > So yeah you got rid of compare, but I am not seeing reduction in moves. > Either way, I think it is an improvement due to dropping cmp so: > > Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > thanks, > > - Joel > > > > > > No functional change intended. > > > > Cc: "Paul E. McKenney" <paulmck@kernel.org> > > Cc: Frederic Weisbecker <frederic@kernel.org> > > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> > > Cc: Josh Triplett <josh@joshtriplett.org> > > Cc: Steven Rostedt <rostedt@goodmis.org> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com> > > Cc: Joel Fernandes <joel@joelfernandes.org> > > Signed-off-by: Uros Bizjak <ubizjak@gmail.com> > > --- > > kernel/rcu/tree_stall.h | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > > index b10b8349bb2a..d81c88e66b42 100644 > > --- a/kernel/rcu/tree_stall.h > > +++ b/kernel/rcu/tree_stall.h > > @@ -760,7 +760,7 @@ static void check_cpu_stall(struct rcu_data *rdp) > > jn = jiffies + ULONG_MAX / 2; > > if (rcu_gp_in_progress() && > > (READ_ONCE(rnp->qsmask) & rdp->grpmask) && > > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > > + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { > > > > /* > > * If a virtual machine is stopped by the host it can look to > > @@ -778,7 +778,7 @@ static void check_cpu_stall(struct rcu_data *rdp) > > > > } else if (rcu_gp_in_progress() && > > ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && > > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > > + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { > > > > /* > > * If a virtual machine is stopped by the host it can look to > > -- > > 2.39.2 > >
On Wed, Mar 1, 2023 at 1:08 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Tue, 28 Feb 2023 18:30:14 -0500 > Joel Fernandes <joel@joelfernandes.org> wrote: > > > > > > But looking at this use case, I'd actually NAK it, as it is misleading. > > > > I'm trying to parse this. You are saying it is misleading, because it > > updates js when it doesn't need to? > > Correct. I'm a bit late to the discussion (well, I have to sleep from time to time, too), but in the hope that everybody interested in this issue will find the reply, I'll try to clarify the "updates" claim: The try_cmpxchg is written in such a way that benefits loops as well as linear code, in the latter case it depends on the compiler to eliminate the dead assignment. When changing linear code from cmpxchg to try_cmpxchg, one has to take care that the variable, passed by reference, is unused after cmpxchg, so it can be considered as a temporary variable (as said elsewhere, the alternative is to copy the value to a local temporary variable and pass the pointer to this variable to try_cmpxchg - the compiler will eliminate the assignment if the original variable is unused). Even in linear code, the conversion from cmpxchg to try_cmpxchg is able to eliminate assignment and compare, as can be seen when the code is compiled with gcc-10.3.1: a1c5: 0f 84 53 03 00 00 je a51e <rcu_sched_clock_irq+0x70e> a1cb: 48 89 c8 mov %rcx,%rax a1ce: f0 48 0f b1 35 00 00 lock cmpxchg %rsi,0x0(%rip) # a1d7 <rcu_sched_clock_irq+0x3c7> a1d5: 00 00 a1d3: R_X86_64_PC32 .data+0xf9c a1d7: 48 39 c1 cmp %rax,%rcx a1da: 0f 85 3e 03 00 00 jne a51e <rcu_sched_clock_irq+0x70e> to: a1d0: 0f 84 49 03 00 00 je a51f <rcu_sched_clock_irq+0x70f> a1d6: f0 48 0f b1 35 00 00 lock cmpxchg %rsi,0x0(%rip) # a1df <rcu_sched_clock_irq+0x3cf> a1dd: 00 00 a1db: R_X86_64_PC32 .data+0xf9c a1df: 0f 85 3a 03 00 00 jne a51f <rcu_sched_clock_irq+0x70f> Newer compilers (e.g. gcc-12+) are able to use likely/unlikely annotations to reorder the code, so the change is less visible. But due to reordering, even targets that don't define try_cmpxchg natively benefit from the change, please see thread at [1]. These benefits are the reason the change to try_cmpxchg was accepted also in the linear code elsewhere in the linux kernel, e.g. [2,3] to name a few commits, with a thumbs-up and a claim that the new code is actually *clearer* at the merge commit [4]. I really think that the above demonstrates various improvements, and would be unfortunate not to consider them. [1] https://lore.kernel.org/lkml/871qwgmqws.fsf@mpe.ellerman.id.au/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e1da8fe031303599e78f88e0dad9f44272e4f99 [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8baceabca656d5ef4494cdeb3b9b9fbb844ac613 [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=91bc559d8d3aed488b4b50e9eba1d7ebb1da7bbf Uros. > > > > > As try_cmpxchg() is used to get rid of the updating of the old value. As in > > > the ring buffer code we had: > > > > > > void ring_buffer_record_off(struct trace_buffer *buffer) > > > { > > > unsigned int rd; > > > unsigned int new_rd; > > > > > > do { > > > rd = atomic_read(&buffer->record_disabled); > > > new_rd = rd | RB_BUFFER_OFF; > > > } while (!atomic_cmpxchg(&buffer->record_disabled, &rd, new_rd) != rd); > > > > Hear you actually meant "rd" as the second parameter without the & ? > > Yes, I cut and pasted the updated code and incorrectly try to revert it in > this example :-p > > > > > > } > > > > > > and the try_cmpxchg() converted it to: > > > > > > void ring_buffer_record_off(struct trace_buffer *buffer) > > > { > > > unsigned int rd; > > > unsigned int new_rd; > > > > > > rd = atomic_read(&buffer->record_disabled); > > > do { > > > new_rd = rd | RB_BUFFER_OFF; > > > } while (!atomic_try_cmpxchg(&buffer->record_disabled, &rd, new_rd)); > > > } > > > > > > Which got rid of the need to constantly update the rd variable (cmpxchg > > > will load rax with the value read, so it removes the need for an extra > > > move). > > > > So that's a good thing? > > Yes. For looping, try_cmpxchg() is the proper function to use. But in the > RCU case (and other cases in the ring-buffer patch) there is no loop, and > no need to modify the "old" variable. > > > > > > > > > But in your case, we don't need to update js, in which case the > > > try_cmpxchg() does. > > > > Right, it has lesser value here but I'm curious why you feel it also > > doesn't belong in that ring buffer loop you shared (or did you mean, > > it does belong there but not in other ftrace code modified by Uros?). > > The ring buffer patch had more than one change, where half the updates were > fine, and half were not. > > -- Steve
On Wed, 1 Mar 2023 11:28:47 +0100 Uros Bizjak <ubizjak@gmail.com> wrote: > These benefits are the reason the change to try_cmpxchg was accepted > also in the linear code elsewhere in the linux kernel, e.g. [2,3] to > name a few commits, with a thumbs-up and a claim that the new code is > actually *clearer* at the merge commit [4]. I'll say it here too. I really like Joel's suggestion of having a cmpxchg_success() that does not have the added side effect of modifying the old variable. I think that would allow for the arch optimizations that you are trying to achieve, as well as remove the side effect that might cause issues down the road. -- Steve
On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Wed, 1 Mar 2023 11:28:47 +0100 > Uros Bizjak <ubizjak@gmail.com> wrote: > > > These benefits are the reason the change to try_cmpxchg was accepted > > also in the linear code elsewhere in the linux kernel, e.g. [2,3] to > > name a few commits, with a thumbs-up and a claim that the new code is > > actually *clearer* at the merge commit [4]. > > I'll say it here too. I really like Joel's suggestion of having a > cmpxchg_success() that does not have the added side effect of modifying the > old variable. > > I think that would allow for the arch optimizations that you are trying to > achieve, as well as remove the side effect that might cause issues down the > road. Attached patch implements this suggestion. Uros. diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b10b8349bb2a..229263ebba3b 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -709,6 +709,12 @@ static void print_cpu_stall(unsigned long gps) set_preempt_need_resched(); } +#define cmpxchg_success(ptr, old, new) \ +({ \ + __typeof__(*(ptr)) __tmp = (old); \ + try_cmpxchg((ptr), &__tmp, (new)); \ +}) + static void check_cpu_stall(struct rcu_data *rdp) { bool didstall = false; @@ -760,7 +766,7 @@ static void check_cpu_stall(struct rcu_data *rdp) jn = jiffies + ULONG_MAX / 2; if (rcu_gp_in_progress() && (READ_ONCE(rnp->qsmask) & rdp->grpmask) && - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) { /* * If a virtual machine is stopped by the host it can look to @@ -778,7 +784,7 @@ static void check_cpu_stall(struct rcu_data *rdp) } else if (rcu_gp_in_progress() && ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) { /* * If a virtual machine is stopped by the host it can look to
On Wed, 1 Mar 2023 19:43:34 +0100 Uros Bizjak <ubizjak@gmail.com> wrote: > On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Wed, 1 Mar 2023 11:28:47 +0100 > > Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > These benefits are the reason the change to try_cmpxchg was accepted > > > also in the linear code elsewhere in the linux kernel, e.g. [2,3] to > > > name a few commits, with a thumbs-up and a claim that the new code is > > > actually *clearer* at the merge commit [4]. > > > > I'll say it here too. I really like Joel's suggestion of having a > > cmpxchg_success() that does not have the added side effect of modifying the > > old variable. > > > > I think that would allow for the arch optimizations that you are trying to > > achieve, as well as remove the side effect that might cause issues down the > > road. > > Attached patch implements this suggestion. I like it! Anyway to make this more generic? -- Steve
On Wed, Mar 1, 2023 at 7:52 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Wed, 1 Mar 2023 19:43:34 +0100 > Uros Bizjak <ubizjak@gmail.com> wrote: > > > On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > On Wed, 1 Mar 2023 11:28:47 +0100 > > > Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > These benefits are the reason the change to try_cmpxchg was accepted > > > > also in the linear code elsewhere in the linux kernel, e.g. [2,3] to > > > > name a few commits, with a thumbs-up and a claim that the new code is > > > > actually *clearer* at the merge commit [4]. > > > > > > I'll say it here too. I really like Joel's suggestion of having a > > > cmpxchg_success() that does not have the added side effect of modifying the > > > old variable. > > > > > > I think that would allow for the arch optimizations that you are trying to > > > achieve, as well as remove the side effect that might cause issues down the > > > road. > > > > Attached patch implements this suggestion. > > I like it! Thanks! > Anyway to make this more generic? If we want to put the definition in generic headers, then we also need to define acquire/release/relaxed and 64bit variants. ATM, we have two sites that require this definition and I think that for now we could live with two instances of the same definition in two separate subsystems. But this would definitely be a good future addition. There is some code in the form of if (cmpxchg (ptr, 0, 1) == 0) that can not be converted to try_cmpxchg, but can use cmpxchg_success. Uros.
On Wed, 1 Mar 2023 20:18:11 +0100 Uros Bizjak <ubizjak@gmail.com> wrote: > If we want to put the definition in generic headers, then we also need > to define acquire/release/relaxed and 64bit variants. ATM, we have two > sites that require this definition and I think that for now we could > live with two instances of the same definition in two separate > subsystems. But this would definitely be a good future addition. There > is some code in the form of > > if (cmpxchg (ptr, 0, 1) == 0) > > that can not be converted to try_cmpxchg, but can use cmpxchg_success. And even modify code that uses temp variables. For example, where you modified the ring buffer code to use try_cmpxchg(), I could convert your: static int rb_head_page_replace(struct buffer_page *old, struct buffer_page *new) { unsigned long *ptr = (unsigned long *)&old->list.prev->next; unsigned long val; val = *ptr & ~RB_FLAG_MASK; val |= RB_PAGE_HEAD; return try_cmpxchg(ptr, &val, (unsigned long)&new->list); } Into just: static int rb_head_page_replace(struct buffer_page *old, struct buffer_page *new) { unsigned long *ptr = (unsigned long *)&old->list.prev->next; unsigned long val; val = *ptr & ~RB_FLAG_MASK; return cmpxchg_success(ptr, val | RB_PAGE_HEAD, (unsigned long)&new->list); } -- Steve
> On Mar 1, 2023, at 2:18 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > > On Wed, Mar 1, 2023 at 7:52 PM Steven Rostedt <rostedt@goodmis.org> wrote: >> >>> On Wed, 1 Mar 2023 19:43:34 +0100 >>> Uros Bizjak <ubizjak@gmail.com> wrote: >>> >>> On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote: >>>> >>>> On Wed, 1 Mar 2023 11:28:47 +0100 >>>> Uros Bizjak <ubizjak@gmail.com> wrote: >>>> >>>>> These benefits are the reason the change to try_cmpxchg was accepted >>>>> also in the linear code elsewhere in the linux kernel, e.g. [2,3] to >>>>> name a few commits, with a thumbs-up and a claim that the new code is >>>>> actually *clearer* at the merge commit [4]. >>>> >>>> I'll say it here too. I really like Joel's suggestion of having a >>>> cmpxchg_success() that does not have the added side effect of modifying the >>>> old variable. >>>> >>>> I think that would allow for the arch optimizations that you are trying to >>>> achieve, as well as remove the side effect that might cause issues down the >>>> road. >>> >>> Attached patch implements this suggestion. >> >> I like it! Me too :) > Thanks! > >> Anyway to make this more generic? > > If we want to put the definition in generic headers, then we also need > to define acquire/release/relaxed and 64bit variants. ATM, we have two > sites that require this definition and I think that for now we could > live with two instances of the same definition in two separate > subsystems. But this would definitely be a good future addition. There > is some code in the form of > > if (cmpxchg (ptr, 0, 1) == 0) > > that can not be converted to try_cmpxchg, but can use cmpxchg_success. I would prefer if we can put it in generic headers instead of duplicating across ftrace and RCU. thanks, - Joel > > Uros.
> On Mar 1, 2023, at 2:43 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > On Wed, 1 Mar 2023 20:18:11 +0100 > Uros Bizjak <ubizjak@gmail.com> wrote: > >> If we want to put the definition in generic headers, then we also need >> to define acquire/release/relaxed and 64bit variants. ATM, we have two >> sites that require this definition and I think that for now we could >> live with two instances of the same definition in two separate >> subsystems. But this would definitely be a good future addition. There >> is some code in the form of >> >> if (cmpxchg (ptr, 0, 1) == 0) >> >> that can not be converted to try_cmpxchg, but can use cmpxchg_success. > > And even modify code that uses temp variables. For example, where you > modified the ring buffer code to use try_cmpxchg(), I could convert your: > > static int rb_head_page_replace(struct buffer_page *old, > struct buffer_page *new) > { > unsigned long *ptr = (unsigned long *)&old->list.prev->next; > unsigned long val; > > val = *ptr & ~RB_FLAG_MASK; > val |= RB_PAGE_HEAD; > > return try_cmpxchg(ptr, &val, (unsigned long)&new->list); > } > > Into just: > > static int rb_head_page_replace(struct buffer_page *old, > struct buffer_page *new) > { > unsigned long *ptr = (unsigned long *)&old->list.prev->next; > unsigned long val; > > val = *ptr & ~RB_FLAG_MASK; > > return cmpxchg_success(ptr, val | RB_PAGE_HEAD, (unsigned long)&new->list); > } Nice! Looks much better. Thanks, - Joel > > -- Steve
On Wed, Mar 01, 2023 at 07:43:34PM +0100, Uros Bizjak wrote: > On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Wed, 1 Mar 2023 11:28:47 +0100 > > Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > These benefits are the reason the change to try_cmpxchg was accepted > > > also in the linear code elsewhere in the linux kernel, e.g. [2,3] to > > > name a few commits, with a thumbs-up and a claim that the new code is > > > actually *clearer* at the merge commit [4]. > > > > I'll say it here too. I really like Joel's suggestion of having a > > cmpxchg_success() that does not have the added side effect of modifying the > > old variable. > > > > I think that would allow for the arch optimizations that you are trying to > > achieve, as well as remove the side effect that might cause issues down the > > road. > > Attached patch implements this suggestion. Please help me out here. Why on earth are we even discussing making this change to code that normally never executes? Performance is not a consideration here. What am I missing here? Is there some sort of forward-progress issue that this change addresses? Thanx, Paul > Uros. > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h > index b10b8349bb2a..229263ebba3b 100644 > --- a/kernel/rcu/tree_stall.h > +++ b/kernel/rcu/tree_stall.h > @@ -709,6 +709,12 @@ static void print_cpu_stall(unsigned long gps) > set_preempt_need_resched(); > } > > +#define cmpxchg_success(ptr, old, new) \ > +({ \ > + __typeof__(*(ptr)) __tmp = (old); \ > + try_cmpxchg((ptr), &__tmp, (new)); \ > +}) > + > static void check_cpu_stall(struct rcu_data *rdp) > { > bool didstall = false; > @@ -760,7 +766,7 @@ static void check_cpu_stall(struct rcu_data *rdp) > jn = jiffies + ULONG_MAX / 2; > if (rcu_gp_in_progress() && > (READ_ONCE(rnp->qsmask) & rdp->grpmask) && > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > + cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) { > > /* > * If a virtual machine is stopped by the host it can look to > @@ -778,7 +784,7 @@ static void check_cpu_stall(struct rcu_data *rdp) > > } else if (rcu_gp_in_progress() && > ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > + cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) { > > /* > * If a virtual machine is stopped by the host it can look to
> On Mar 1, 2023, at 3:08 PM, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Wed, Mar 01, 2023 at 07:43:34PM +0100, Uros Bizjak wrote: >>> On Wed, Mar 1, 2023 at 5:38 PM Steven Rostedt <rostedt@goodmis.org> wrote: >>> >>> On Wed, 1 Mar 2023 11:28:47 +0100 >>> Uros Bizjak <ubizjak@gmail.com> wrote: >>> >>>> These benefits are the reason the change to try_cmpxchg was accepted >>>> also in the linear code elsewhere in the linux kernel, e.g. [2,3] to >>>> name a few commits, with a thumbs-up and a claim that the new code is >>>> actually *clearer* at the merge commit [4]. >>> >>> I'll say it here too. I really like Joel's suggestion of having a >>> cmpxchg_success() that does not have the added side effect of modifying the >>> old variable. >>> >>> I think that would allow for the arch optimizations that you are trying to >>> achieve, as well as remove the side effect that might cause issues down the >>> road. >> >> Attached patch implements this suggestion. > > Please help me out here. > > Why on earth are we even discussing making this change to code that > normally never executes? Performance is not a consideration here. > > What am I missing here? Is there some sort of forward-progress > issue that this change addresses? I do not think it is anything with performance. The suggestion just makes the code easier to read. In the case of ftrace (not RCU), it results in further deleted lines of code. Maybe it got confusing because we are discussing the change as it applies to both ftrace and RCU. You could argue that it has to do with performance in the fast path, but it is probably down in the noise. - Joel > > Thanx, Paul > >> Uros. > >> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h >> index b10b8349bb2a..229263ebba3b 100644 >> --- a/kernel/rcu/tree_stall.h >> +++ b/kernel/rcu/tree_stall.h >> @@ -709,6 +709,12 @@ static void print_cpu_stall(unsigned long gps) >> set_preempt_need_resched(); >> } >> >> +#define cmpxchg_success(ptr, old, new) \ >> +({ \ >> + __typeof__(*(ptr)) __tmp = (old); \ >> + try_cmpxchg((ptr), &__tmp, (new)); \ >> +}) >> + >> static void check_cpu_stall(struct rcu_data *rdp) >> { >> bool didstall = false; >> @@ -760,7 +766,7 @@ static void check_cpu_stall(struct rcu_data *rdp) >> jn = jiffies + ULONG_MAX / 2; >> if (rcu_gp_in_progress() && >> (READ_ONCE(rnp->qsmask) & rdp->grpmask) && >> - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { >> + cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) { >> >> /* >> * If a virtual machine is stopped by the host it can look to >> @@ -778,7 +784,7 @@ static void check_cpu_stall(struct rcu_data *rdp) >> >> } else if (rcu_gp_in_progress() && >> ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && >> - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { >> + cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) { >> >> /* >> * If a virtual machine is stopped by the host it can look to >
On Wed, 1 Mar 2023 12:08:20 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote: > > Attached patch implements this suggestion. > > Please help me out here. > > Why on earth are we even discussing making this change to code that > normally never executes? Performance is not a consideration here. > > What am I missing here? Is there some sort of forward-progress > issue that this change addresses? Well, we sorta hijacked this thread. It turned into a more general discussion, as there is code that this change will be useful for (ring_buffer.c), but we just happen to be having the discussion here. Where it will at most remove some text and give you back a few extra bytes of memory ;-) But if we do use cmpxchg_success() IMHO, it does improve readability. > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > + cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) { -- Steve
On Wed, Mar 01, 2023 at 03:18:26PM -0500, Steven Rostedt wrote: > On Wed, 1 Mar 2023 12:08:20 -0800 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > > Attached patch implements this suggestion. > > > > Please help me out here. > > > > Why on earth are we even discussing making this change to code that > > normally never executes? Performance is not a consideration here. > > > > What am I missing here? Is there some sort of forward-progress > > issue that this change addresses? > > Well, we sorta hijacked this thread. It turned into a more general > discussion, as there is code that this change will be useful for > (ring_buffer.c), but we just happen to be having the discussion here. > > Where it will at most remove some text and give you back a few extra bytes > of memory ;-) > > But if we do use cmpxchg_success() IMHO, it does improve readability. > > > - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { > > + cmpxchg_success(&rcu_state.jiffies_stall, js, jn)) { Some years down the road, should cmpxchg_success() be on the tip of the tongue of every kernel hacker, perhaps. Or perhaps not. In the meantime, we have yet another abysmally documented atomic operation that is not well known throughout the community. And then the people coming across this curse everyone who had anything to do with it, as they search the source code, dig through assembly output, and so on trying to work out exactly what this thing does. Sorry, but no way. Again, unless there is some sort of forward-progress argument or similar convincing argument. Thanx, Paul
On Wed, 1 Mar 2023 12:36:45 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote: > Some years down the road, should cmpxchg_success() be on the tip of > the tongue of every kernel hacker, perhaps. Or perhaps not. A bit of a catch-22 I would say. It will only become something everyone knows if it exists. > > In the meantime, we have yet another abysmally documented atomic Is it? > operation that is not well known throughout the community. And then the > people coming across this curse everyone who had anything to do with it, > as they search the source code, dig through assembly output, and so on > trying to work out exactly what this thing does. > > Sorry, but no way. > > Again, unless there is some sort of forward-progress argument or > similar convincing argument. Speaking of forward progress... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/atomic_t.txt#n316 Anyway, I'm guessing this will not become part of rcu any time soon. But for the ring buffer, I would happily take it. -- Steve
On Wed, Mar 01, 2023 at 03:46:41PM -0500, Steven Rostedt wrote: > On Wed, 1 Mar 2023 12:36:45 -0800 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > Some years down the road, should cmpxchg_success() be on the tip of > > the tongue of every kernel hacker, perhaps. Or perhaps not. > > A bit of a catch-22 I would say. It will only become something everyone > knows if it exists. > > > In the meantime, we have yet another abysmally documented atomic > > Is it? Is sure is. > > operation that is not well known throughout the community. And then the > > people coming across this curse everyone who had anything to do with it, > > as they search the source code, dig through assembly output, and so on > > trying to work out exactly what this thing does. > > > > Sorry, but no way. > > > > Again, unless there is some sort of forward-progress argument or > > similar convincing argument. > > Speaking of forward progress... > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/atomic_t.txt#n316 Yes, that is probably the best starting point. And if you have been around long enough, you know that this is in fact the best starting point. Plus if you can correctly interpret the words in that document. And if you are familiar with the entire document. But otherwise, good luck learning the semantics for something like atomic_fetch_add_release(). > Anyway, I'm guessing this will not become part of rcu any time soon. But > for the ring buffer, I would happily take it. Certainly not unless someone comes up with a good reason for it. Thanx, Paul
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b10b8349bb2a..d81c88e66b42 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -760,7 +760,7 @@ static void check_cpu_stall(struct rcu_data *rdp) jn = jiffies + ULONG_MAX / 2; if (rcu_gp_in_progress() && (READ_ONCE(rnp->qsmask) & rdp->grpmask) && - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { /* * If a virtual machine is stopped by the host it can look to @@ -778,7 +778,7 @@ static void check_cpu_stall(struct rcu_data *rdp) } else if (rcu_gp_in_progress() && ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) && - cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) { + try_cmpxchg(&rcu_state.jiffies_stall, &js, jn)) { /* * If a virtual machine is stopped by the host it can look to
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in check_cpu_stall. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). No functional change intended. Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> --- kernel/rcu/tree_stall.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)