Message ID | 20241120150725.3378-1-ubizjak@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | xfs: Use xchg() in xlog_cil_insert_pcp_aggregate() | expand |
On 11/20/24 9:06 AM, Uros Bizjak wrote: > try_cmpxchg() loop with constant "new" value can be substituted > with just xchg() to atomically get and clear the location. You're right. With a constant new value (0), there is no need to loop to ensure we get a "stable" update. Is the READ_ONCE() is still needed? -Alex > The code on x86_64 improves from: > > 1e7f: 48 89 4c 24 10 mov %rcx,0x10(%rsp) > 1e84: 48 03 14 c5 00 00 00 add 0x0(,%rax,8),%rdx > 1e8b: 00 > 1e88: R_X86_64_32S __per_cpu_offset > 1e8c: 8b 02 mov (%rdx),%eax > 1e8e: 41 89 c5 mov %eax,%r13d > 1e91: 31 c9 xor %ecx,%ecx > 1e93: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) > 1e97: 75 f5 jne 1e8e <xlog_cil_commit+0x84e> > 1e99: 48 8b 4c 24 10 mov 0x10(%rsp),%rcx > 1e9e: 45 01 e9 add %r13d,%r9d > > to just: > > 1e7f: 48 03 14 cd 00 00 00 add 0x0(,%rcx,8),%rdx > 1e86: 00 > 1e83: R_X86_64_32S __per_cpu_offset > 1e87: 31 c9 xor %ecx,%ecx > 1e89: 87 0a xchg %ecx,(%rdx) > 1e8b: 41 01 cb add %ecx,%r11d > > No functional change intended. > > Signed-off-by: Uros Bizjak <ubizjak@gmail.com> > Cc: Chandan Babu R <chandan.babu@oracle.com> > Cc: "Darrick J. Wong" <djwong@kernel.org> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_log_cil.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > index 80da0cf87d7a..9d667be1d909 100644 > --- a/fs/xfs/xfs_log_cil.c > +++ b/fs/xfs/xfs_log_cil.c > @@ -171,11 +171,8 @@ xlog_cil_insert_pcp_aggregate( > */ > for_each_cpu(cpu, &ctx->cil_pcpmask) { > struct xlog_cil_pcp *cilpcp = per_cpu_ptr(cil->xc_pcp, cpu); > - int old = READ_ONCE(cilpcp->space_used); > > - while (!try_cmpxchg(&cilpcp->space_used, &old, 0)) > - ; > - count += old; > + count += xchg(&cilpcp->space_used, 0); > } > atomic_add(count, &ctx->space_used); > }
On Wed, Nov 20, 2024 at 4:34 PM Alex Elder <elder@riscstar.com> wrote: > > On 11/20/24 9:06 AM, Uros Bizjak wrote: > > try_cmpxchg() loop with constant "new" value can be substituted > > with just xchg() to atomically get and clear the location. > > You're right. With a constant new value (0), there is no need > to loop to ensure we get a "stable" update. > > Is the READ_ONCE() is still needed? No, xchg() guarantees atomic access on its own. Uros.
On 11/20/24 9:36 AM, Uros Bizjak wrote: > On Wed, Nov 20, 2024 at 4:34 PM Alex Elder <elder@riscstar.com> wrote: >> >> On 11/20/24 9:06 AM, Uros Bizjak wrote: >>> try_cmpxchg() loop with constant "new" value can be substituted >>> with just xchg() to atomically get and clear the location. >> >> You're right. With a constant new value (0), there is no need >> to loop to ensure we get a "stable" update. >> >> Is the READ_ONCE() is still needed? > > No, xchg() guarantees atomic access on its own. > > Uros. Based on that: Reviewed-by: Alex Elder <elder@riscstar.com>
On Wed, Nov 20, 2024 at 04:06:22PM +0100, Uros Bizjak wrote: > try_cmpxchg() loop with constant "new" value can be substituted > with just xchg() to atomically get and clear the location. > > The code on x86_64 improves from: > > 1e7f: 48 89 4c 24 10 mov %rcx,0x10(%rsp) > 1e84: 48 03 14 c5 00 00 00 add 0x0(,%rax,8),%rdx > 1e8b: 00 > 1e88: R_X86_64_32S __per_cpu_offset > 1e8c: 8b 02 mov (%rdx),%eax > 1e8e: 41 89 c5 mov %eax,%r13d > 1e91: 31 c9 xor %ecx,%ecx > 1e93: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) > 1e97: 75 f5 jne 1e8e <xlog_cil_commit+0x84e> > 1e99: 48 8b 4c 24 10 mov 0x10(%rsp),%rcx > 1e9e: 45 01 e9 add %r13d,%r9d > > to just: > > 1e7f: 48 03 14 cd 00 00 00 add 0x0(,%rcx,8),%rdx > 1e86: 00 > 1e83: R_X86_64_32S __per_cpu_offset > 1e87: 31 c9 xor %ecx,%ecx > 1e89: 87 0a xchg %ecx,(%rdx) > 1e8b: 41 01 cb add %ecx,%r11d > > No functional change intended. > > Signed-off-by: Uros Bizjak <ubizjak@gmail.com> > Cc: Chandan Babu R <chandan.babu@oracle.com> > Cc: "Darrick J. Wong" <djwong@kernel.org> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_log_cil.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > index 80da0cf87d7a..9d667be1d909 100644 > --- a/fs/xfs/xfs_log_cil.c > +++ b/fs/xfs/xfs_log_cil.c > @@ -171,11 +171,8 @@ xlog_cil_insert_pcp_aggregate( > */ > for_each_cpu(cpu, &ctx->cil_pcpmask) { > struct xlog_cil_pcp *cilpcp = per_cpu_ptr(cil->xc_pcp, cpu); > - int old = READ_ONCE(cilpcp->space_used); > > - while (!try_cmpxchg(&cilpcp->space_used, &old, 0)) > - ; > - count += old; > + count += xchg(&cilpcp->space_used, 0); > } > atomic_add(count, &ctx->space_used); > } Looks fine. Reviewed-by: Dave Chinner <dchinner@redhat.com>
On Wed, 20 Nov 2024 16:06:22 +0100, Uros Bizjak wrote: > try_cmpxchg() loop with constant "new" value can be substituted > with just xchg() to atomically get and clear the location. > > The code on x86_64 improves from: > > 1e7f: 48 89 4c 24 10 mov %rcx,0x10(%rsp) > 1e84: 48 03 14 c5 00 00 00 add 0x0(,%rax,8),%rdx > 1e8b: 00 > 1e88: R_X86_64_32S __per_cpu_offset > 1e8c: 8b 02 mov (%rdx),%eax > 1e8e: 41 89 c5 mov %eax,%r13d > 1e91: 31 c9 xor %ecx,%ecx > 1e93: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) > 1e97: 75 f5 jne 1e8e <xlog_cil_commit+0x84e> > 1e99: 48 8b 4c 24 10 mov 0x10(%rsp),%rcx > 1e9e: 45 01 e9 add %r13d,%r9d > > [...] Applied to for-next, thanks! [1/1] xfs: Use xchg() in xlog_cil_insert_pcp_aggregate() commit: 214093534f3c046bf5acc9affbf4e6bd9af4538b Best regards,
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 80da0cf87d7a..9d667be1d909 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -171,11 +171,8 @@ xlog_cil_insert_pcp_aggregate( */ for_each_cpu(cpu, &ctx->cil_pcpmask) { struct xlog_cil_pcp *cilpcp = per_cpu_ptr(cil->xc_pcp, cpu); - int old = READ_ONCE(cilpcp->space_used); - while (!try_cmpxchg(&cilpcp->space_used, &old, 0)) - ; - count += old; + count += xchg(&cilpcp->space_used, 0); } atomic_add(count, &ctx->space_used); }
try_cmpxchg() loop with constant "new" value can be substituted with just xchg() to atomically get and clear the location. The code on x86_64 improves from: 1e7f: 48 89 4c 24 10 mov %rcx,0x10(%rsp) 1e84: 48 03 14 c5 00 00 00 add 0x0(,%rax,8),%rdx 1e8b: 00 1e88: R_X86_64_32S __per_cpu_offset 1e8c: 8b 02 mov (%rdx),%eax 1e8e: 41 89 c5 mov %eax,%r13d 1e91: 31 c9 xor %ecx,%ecx 1e93: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) 1e97: 75 f5 jne 1e8e <xlog_cil_commit+0x84e> 1e99: 48 8b 4c 24 10 mov 0x10(%rsp),%rcx 1e9e: 45 01 e9 add %r13d,%r9d to just: 1e7f: 48 03 14 cd 00 00 00 add 0x0(,%rcx,8),%rdx 1e86: 00 1e83: R_X86_64_32S __per_cpu_offset 1e87: 31 c9 xor %ecx,%ecx 1e89: 87 0a xchg %ecx,(%rdx) 1e8b: 41 01 cb add %ecx,%r11d No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_log_cil.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)