mbox series

[0/3] Improve trace/ring_buffer.c

Message ID 20230228175929.7534-1-ubizjak@gmail.com (mailing list archive)
Headers show
Series Improve trace/ring_buffer.c | expand

Message

Uros Bizjak Feb. 28, 2023, 5:59 p.m. UTC
This series improves ring_buffer.c by changing the type of some
static functions to void or bool and uses try_cmpxchg instead of
cmpxchg (*ptr, old, new) == old where appropriate.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>

Uros Bizjak (3):
  ring_buffer: Change some static functions to void
  ring_buffer: Change some static functions to bool
  ring_buffer: Use try_cmpxchg instead of cmpxchg

 kernel/trace/ring_buffer.c | 89 ++++++++++++++++----------------------
 1 file changed, 37 insertions(+), 52 deletions(-)

Comments

Steven Rostedt Feb. 28, 2023, 7:35 p.m. UTC | #1
On Tue, 28 Feb 2023 18:59:26 +0100
Uros Bizjak <ubizjak@gmail.com> wrote:

> This series improves ring_buffer.c by changing the type of some
> static functions to void or bool and

Well, it's not really an improvement unless it has a functional change. But
I'm OK with taking these.

> uses try_cmpxchg instead of
> cmpxchg (*ptr, old, new) == old where appropriate.

Now, the try_cmpxchg() could be an improvement on x86.

-- Steve

> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> 
> Uros Bizjak (3):
>   ring_buffer: Change some static functions to void
>   ring_buffer: Change some static functions to bool
>   ring_buffer: Use try_cmpxchg instead of cmpxchg
> 
>  kernel/trace/ring_buffer.c | 89 ++++++++++++++++----------------------
>  1 file changed, 37 insertions(+), 52 deletions(-)
>
Uros Bizjak March 1, 2023, 8:35 a.m. UTC | #2
On Tue, Feb 28, 2023 at 8:35 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 28 Feb 2023 18:59:26 +0100
> Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > This series improves ring_buffer.c by changing the type of some
> > static functions to void or bool and
>
> Well, it's not really an improvement unless it has a functional change. But
> I'm OK with taking these.

"No functional changes intended" claim should be taken in the sense
that the same functionality is achieved in a more optimal way. As a
trivial example, changing some functions to bool eliminates many
unnecessary zero-extensions and results in smaller code:

size ring_buffer-*.o
  text    data     bss     dec     hex filename
  25599    1762       4   27365    6ae5 ring_buffer-vanilla.o
  25551    1762       4   27317    6ab5 ring_buffer-bool.o

Please also note that setting the output value with "SETcc r8" results
in a partial register stall on x86 when returning r32. The compiler
knows how to handle this issue (using register dependency breaking XOR
in front of SETcc), but it is better to avoid the issue altogether.
So, there are also secondary benefits of the above "No functional
changes intended" change, besides lower code size.

> > uses try_cmpxchg instead of
> > cmpxchg (*ptr, old, new) == old where appropriate.
>
> Now, the try_cmpxchg() could be an improvement on x86.

True, the concrete example is e.g. ring_buffer_record_off, where the
cmpxchg loop improves from:

      78:    8b 57 08                 mov    0x8(%rdi),%edx
      7b:    89 d6                    mov    %edx,%esi
      7d:    89 d0                    mov    %edx,%eax
      7f:    81 ce 00 00 10 00        or     $0x100000,%esi
      85:    f0 0f b1 31              lock cmpxchg %esi,(%rcx)
      89:    39 d0                    cmp    %edx,%eax
      8b:    75 eb                    jne    78 <ring_buffer_record_off+0x8>

to:

     1bb:    89 c2                    mov    %eax,%edx
     1bd:    81 ca 00 00 10 00        or     $0x100000,%edx
     1c3:    f0 0f b1 17              lock cmpxchg %edx,(%rdi)
     1c7:    75 f2                    jne    1bb <ring_buffer_record_off+0xb>

As can be demonstrated above, the move to a temporary reg and the
comparison with said temporary are both eliminated.

The above code is generated with gcc-10.3.1 to show the direct
benefits of the change. gcc-12+ is able to use likely/unlikely
annotations inside try_cmpxchg definition and generates fast paths
through the code (as mentioned by you in the RCU patch-set thread).

Please also note that try_cmpxchg generates better code also for
targets that do not define try_cmpxchg and use fallback code, as
reported in [1] and follow-up messages.

The API of try_cmpxhg is written in such a way that benefits loops and
also linear code. I will discuss this in the reply of PATCH 3/3.

[1] https://lore.kernel.org/lkml/871qwgmqws.fsf@mpe.ellerman.id.au/

Uros.