Message ID | 20230228175929.7534-1-ubizjak@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Improve trace/ring_buffer.c | expand |
On Tue, 28 Feb 2023 18:59:26 +0100 Uros Bizjak <ubizjak@gmail.com> wrote: > This series improves ring_buffer.c by changing the type of some > static functions to void or bool and Well, it's not really an improvement unless it has a functional change. But I'm OK with taking these. > uses try_cmpxchg instead of > cmpxchg (*ptr, old, new) == old where appropriate. Now, the try_cmpxchg() could be an improvement on x86. -- Steve > > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Masami Hiramatsu <mhiramat@kernel.org> > > Uros Bizjak (3): > ring_buffer: Change some static functions to void > ring_buffer: Change some static functions to bool > ring_buffer: Use try_cmpxchg instead of cmpxchg > > kernel/trace/ring_buffer.c | 89 ++++++++++++++++---------------------- > 1 file changed, 37 insertions(+), 52 deletions(-) >
On Tue, Feb 28, 2023 at 8:35 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Tue, 28 Feb 2023 18:59:26 +0100 > Uros Bizjak <ubizjak@gmail.com> wrote: > > > This series improves ring_buffer.c by changing the type of some > > static functions to void or bool and > > Well, it's not really an improvement unless it has a functional change. But > I'm OK with taking these. "No functional changes intended" claim should be taken in the sense that the same functionality is achieved in a more optimal way. As a trivial example, changing some functions to bool eliminates many unnecessary zero-extensions and results in smaller code: size ring_buffer-*.o text data bss dec hex filename 25599 1762 4 27365 6ae5 ring_buffer-vanilla.o 25551 1762 4 27317 6ab5 ring_buffer-bool.o Please also note that setting the output value with "SETcc r8" results in a partial register stall on x86 when returning r32. The compiler knows how to handle this issue (using register dependency breaking XOR in front of SETcc), but it is better to avoid the issue altogether. So, there are also secondary benefits of the above "No functional changes intended" change, besides lower code size. > > uses try_cmpxchg instead of > > cmpxchg (*ptr, old, new) == old where appropriate. > > Now, the try_cmpxchg() could be an improvement on x86. True, the concrete example is e.g. ring_buffer_record_off, where the cmpxchg loop improves from: 78: 8b 57 08 mov 0x8(%rdi),%edx 7b: 89 d6 mov %edx,%esi 7d: 89 d0 mov %edx,%eax 7f: 81 ce 00 00 10 00 or $0x100000,%esi 85: f0 0f b1 31 lock cmpxchg %esi,(%rcx) 89: 39 d0 cmp %edx,%eax 8b: 75 eb jne 78 <ring_buffer_record_off+0x8> to: 1bb: 89 c2 mov %eax,%edx 1bd: 81 ca 00 00 10 00 or $0x100000,%edx 1c3: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 1c7: 75 f2 jne 1bb <ring_buffer_record_off+0xb> As can be demonstrated above, the move to a temporary reg and the comparison with said temporary are both eliminated. The above code is generated with gcc-10.3.1 to show the direct benefits of the change. gcc-12+ is able to use likely/unlikely annotations inside try_cmpxchg definition and generates fast paths through the code (as mentioned by you in the RCU patch-set thread). Please also note that try_cmpxchg generates better code also for targets that do not define try_cmpxchg and use fallback code, as reported in [1] and follow-up messages. The API of try_cmpxhg is written in such a way that benefits loops and also linear code. I will discuss this in the reply of PATCH 3/3. [1] https://lore.kernel.org/lkml/871qwgmqws.fsf@mpe.ellerman.id.au/ Uros.