[08/17] alpha: Implement xor_unlock_is_negative_byte

Message ID	20230915183707.2707298-9-willy@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@vger.kernel.org> From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, torvalds@linux-foundation.org, Nicholas Piggin <npiggin@gmail.com> Subject: [PATCH 08/17] alpha: Implement xor_unlock_is_negative_byte Date: Fri, 15 Sep 2023 19:36:58 +0100 Message-Id: <20230915183707.2707298-9-willy@infradead.org> In-Reply-To: <20230915183707.2707298-1-willy@infradead.org> References: <20230915183707.2707298-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Add folio_end_read \| expand [00/17] Add folio_end_read [01/17] iomap: Hold state_lock over call to ifs_set_range_uptodate() [02/17] iomap: Protect read_bytes_pending with the state_lock [03/17] mm: Add folio_end_read() [04/17] ext4: Use folio_end_read() [05/17] buffer: Use folio_end_read() [06/17] iomap: Use folio_end_read() [07/17] bitops: Add xor_unlock_is_negative_byte() [08/17] alpha: Implement xor_unlock_is_negative_byte [09/17] m68k: Implement xor_unlock_is_negative_byte [10/17] mips: Implement xor_unlock_is_negative_byte [11/17] powerpc: Implement arch_xor_unlock_is_negative_byte on 32-bit [12/17] riscv: Implement xor_unlock_is_negative_byte [13/17] s390: Implement arch_xor_unlock_is_negative_byte [14/17] mm: Delete checks for xor_unlock_is_negative_byte() [15/17] mm: Add folio_xor_flags_has_waiters() [16/17] mm: Make __end_folio_writeback() return void [17/17] mm: Use folio_xor_flags_has_waiters() in folio_end_writeback()

Matthew Wilcox Sept. 15, 2023, 6:36 p.m. UTC

Inspired by the alpha clear_bit() and arch_atomic_add_return(), this
will surely be more efficient than the generic one defined in filemap.c.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/alpha/include/asm/bitops.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Linus Torvalds Sept. 16, 2023, 12:27 a.m. UTC | #1

On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> +       "1:     ldl_l %0,%4\n"
> +       "       xor %0,%3,%0\n"
> +       "       xor %0,%3,%2\n"
> +       "       stl_c %0,%1\n"

What an odd thing to do.

Why don't you just save the old value? That double xor looks all kinds
of strange, and is a data dependency for no good reason that I can
see.

Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead?

Not that I think alpha matters, but since I was looking through the
series, this just made me go "Whaa?"

                Linus

Matthew Wilcox Sept. 16, 2023, 12:38 a.m. UTC | #2

On Fri, Sep 15, 2023 at 05:27:17PM -0700, Linus Torvalds wrote:
> On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle)
> <willy@infradead.org> wrote:
> >
> > +       "1:     ldl_l %0,%4\n"
> > +       "       xor %0,%3,%0\n"
> > +       "       xor %0,%3,%2\n"
> > +       "       stl_c %0,%1\n"
> 
> What an odd thing to do.
> 
> Why don't you just save the old value? That double xor looks all kinds
> of strange, and is a data dependency for no good reason that I can
> see.
> 
> Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead?
> 
> Not that I think alpha matters, but since I was looking through the
> series, this just made me go "Whaa?"

Well, this is my first time writing Alpha assembler ;-)  I stole this
from ATOMIC_OP_RETURN:

        "1:     ldl_l %0,%1\n"                                          \
        "       " #asm_op " %0,%3,%2\n"                                 \
        "       " #asm_op " %0,%3,%0\n"                                 \
        "       stl_c %0,%1\n"                                          \
        "       beq %0,2f\n"                                            \
        ".subsection 2\n"                                               \
        "2:     br 1b\n"                                                \
        ".previous"                                                     \

but yes, mov would do the trick here.  Is it really faster than xor?

Linus Torvalds Sept. 16, 2023, 2:01 a.m. UTC | #3

On Fri, 15 Sept 2023 at 17:38, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Sep 15, 2023 at 05:27:17PM -0700, Linus Torvalds wrote:
> > On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle)
> > <willy@infradead.org> wrote:
> > >
> > > +       "1:     ldl_l %0,%4\n"
> > > +       "       xor %0,%3,%0\n"
> > > +       "       xor %0,%3,%2\n"
> > > +       "       stl_c %0,%1\n"
> >
> > What an odd thing to do.
> >
> > Why don't you just save the old value? That double xor looks all kinds
> > of strange, and is a data dependency for no good reason that I can
> > see.
> >
> > Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead?
> >
> > Not that I think alpha matters, but since I was looking through the
> > series, this just made me go "Whaa?"
>
> Well, this is my first time writing Alpha assembler ;-)  I stole this
> from ATOMIC_OP_RETURN:
>
>         "1:     ldl_l %0,%1\n"                                          \
>         "       " #asm_op " %0,%3,%2\n"                                 \
>         "       " #asm_op " %0,%3,%0\n"                                 \

Note how that does "orig" assignment first (ie the '%2" destination is
the first instruction), unlike your version.

So in that ATOMIC_OP_RETURN, it does indeed do the same ALU op twice,
but there's no data dependency between the two, so they can execute in
parallel.

> but yes, mov would do the trick here.  Is it really faster than xor?

No, I think "mov src,dst" is just a pseudo-op for "or src,src,dst",
there's no actual "mov" instruction, iirc.

So it's an ALU op too.

What makes your version expensive is the data dependency, not the ALU op.

So the *odd* thing is not that you have two xor's per se, but how you
create the original value by xor'ing the value once, and then xoring
the new value with the same mask, giving you the original value back -
but with that odd data dependency so that it won't schedule in the
same cycle.

Does any of this matter? Nope. It's alpha. There's probably a handful
of machines, and it's maybe one extra cycle. It's really the oddity
that threw me.

In ATOMIC_OP_RETURN, the reason it does that op twice is simply that
it wants to return the new value. But you literally made it return the
*old* value by doing an xor twice in succession, which reverses the
bits twice.

Was that really what you intended?

               Linus

Linus Torvalds Sept. 16, 2023, 2:14 a.m. UTC | #4

On Fri, 15 Sept 2023 at 19:01, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> No, I think "mov src,dst" is just a pseudo-op for "or src,src,dst",
> there's no actual "mov" instruction, iirc.

Bah. I looked it up. It's actually supposed to be "BIS r31,src,dst".

Where "BIS" is indeed what most sane people call just "or". I think
it's "BIt Set", but the assembler will accept the normal "or" mnemonic
too.

There's BIC ("BIt Clear") too. Also known as "and with complement".

I assume it comes from some VAX background. Or maybe it's just a NIH
thing and alpha wanted to be "special".

              Linus

Matthew Wilcox Sept. 16, 2023, 3:59 p.m. UTC | #5

On Fri, Sep 15, 2023 at 07:01:14PM -0700, Linus Torvalds wrote:
> On Fri, 15 Sept 2023 at 17:38, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Sep 15, 2023 at 05:27:17PM -0700, Linus Torvalds wrote:
> > > On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle)
> > > <willy@infradead.org> wrote:
> > > >
> > > > +       "1:     ldl_l %0,%4\n"
> > > > +       "       xor %0,%3,%0\n"
> > > > +       "       xor %0,%3,%2\n"
> > > > +       "       stl_c %0,%1\n"
> > >
> > > What an odd thing to do.
> > >
> > > Why don't you just save the old value? That double xor looks all kinds
> > > of strange, and is a data dependency for no good reason that I can
> > > see.
> > >
> > > Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead?
> > >
> > > Not that I think alpha matters, but since I was looking through the
> > > series, this just made me go "Whaa?"
> >
> > Well, this is my first time writing Alpha assembler ;-)  I stole this
> > from ATOMIC_OP_RETURN:
> >
> >         "1:     ldl_l %0,%1\n"                                          \
> >         "       " #asm_op " %0,%3,%2\n"                                 \
> >         "       " #asm_op " %0,%3,%0\n"                                 \
> 
> Note how that does "orig" assignment first (ie the '%2" destination is
> the first instruction), unlike your version.

Wow.  I totally missed that I'd transposed those two lines.  I read
it back with the lines in the order that they should have been in.
Every time I read it.  I was wondering why you were talking about a data
dependency, and I just couldn't see it.  With the lines in the order that
they're actually in, it's quite obvious and totally not what I meant.
Of course, it doesn't matter which order they're in from the point of
view of testing the waiters bit since we don't change the waiters bit.

> Does any of this matter? Nope. It's alpha. There's probably a handful
> of machines, and it's maybe one extra cycle. It's really the oddity
> that threw me.

I'll admit to spending far more time on the m68k version of this than
the alpha version ;-)

[08/17] alpha: Implement xor_unlock_is_negative_byte

Commit Message

Comments

Patch