diff mbox series

[08/17] alpha: Implement xor_unlock_is_negative_byte

Message ID 20230915183707.2707298-9-willy@infradead.org (mailing list archive)
State New, archived
Headers show
Series Add folio_end_read | expand

Commit Message

Matthew Wilcox Sept. 15, 2023, 6:36 p.m. UTC
Inspired by the alpha clear_bit() and arch_atomic_add_return(), this
will surely be more efficient than the generic one defined in filemap.c.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/alpha/include/asm/bitops.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Linus Torvalds Sept. 16, 2023, 12:27 a.m. UTC | #1
On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> +       "1:     ldl_l %0,%4\n"
> +       "       xor %0,%3,%0\n"
> +       "       xor %0,%3,%2\n"
> +       "       stl_c %0,%1\n"

What an odd thing to do.

Why don't you just save the old value? That double xor looks all kinds
of strange, and is a data dependency for no good reason that I can
see.

Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead?

Not that I think alpha matters, but since I was looking through the
series, this just made me go "Whaa?"

                Linus
Matthew Wilcox Sept. 16, 2023, 12:38 a.m. UTC | #2
On Fri, Sep 15, 2023 at 05:27:17PM -0700, Linus Torvalds wrote:
> On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle)
> <willy@infradead.org> wrote:
> >
> > +       "1:     ldl_l %0,%4\n"
> > +       "       xor %0,%3,%0\n"
> > +       "       xor %0,%3,%2\n"
> > +       "       stl_c %0,%1\n"
> 
> What an odd thing to do.
> 
> Why don't you just save the old value? That double xor looks all kinds
> of strange, and is a data dependency for no good reason that I can
> see.
> 
> Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead?
> 
> Not that I think alpha matters, but since I was looking through the
> series, this just made me go "Whaa?"

Well, this is my first time writing Alpha assembler ;-)  I stole this
from ATOMIC_OP_RETURN:

        "1:     ldl_l %0,%1\n"                                          \
        "       " #asm_op " %0,%3,%2\n"                                 \
        "       " #asm_op " %0,%3,%0\n"                                 \
        "       stl_c %0,%1\n"                                          \
        "       beq %0,2f\n"                                            \
        ".subsection 2\n"                                               \
        "2:     br 1b\n"                                                \
        ".previous"                                                     \

but yes, mov would do the trick here.  Is it really faster than xor?
Linus Torvalds Sept. 16, 2023, 2:01 a.m. UTC | #3
On Fri, 15 Sept 2023 at 17:38, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Sep 15, 2023 at 05:27:17PM -0700, Linus Torvalds wrote:
> > On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle)
> > <willy@infradead.org> wrote:
> > >
> > > +       "1:     ldl_l %0,%4\n"
> > > +       "       xor %0,%3,%0\n"
> > > +       "       xor %0,%3,%2\n"
> > > +       "       stl_c %0,%1\n"
> >
> > What an odd thing to do.
> >
> > Why don't you just save the old value? That double xor looks all kinds
> > of strange, and is a data dependency for no good reason that I can
> > see.
> >
> > Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead?
> >
> > Not that I think alpha matters, but since I was looking through the
> > series, this just made me go "Whaa?"
>
> Well, this is my first time writing Alpha assembler ;-)  I stole this
> from ATOMIC_OP_RETURN:
>
>         "1:     ldl_l %0,%1\n"                                          \
>         "       " #asm_op " %0,%3,%2\n"                                 \
>         "       " #asm_op " %0,%3,%0\n"                                 \

Note how that does "orig" assignment first (ie the '%2" destination is
the first instruction), unlike your version.

So in that ATOMIC_OP_RETURN, it does indeed do the same ALU op twice,
but there's no data dependency between the two, so they can execute in
parallel.

> but yes, mov would do the trick here.  Is it really faster than xor?

No, I think "mov src,dst" is just a pseudo-op for "or src,src,dst",
there's no actual "mov" instruction, iirc.

So it's an ALU op too.

What makes your version expensive is the data dependency, not the ALU op.

So the *odd* thing is not that you have two xor's per se, but how you
create the original value by xor'ing the value once, and then xoring
the new value with the same mask, giving you the original value back -
but with that odd data dependency so that it won't schedule in the
same cycle.

Does any of this matter? Nope. It's alpha. There's probably a handful
of machines, and it's maybe one extra cycle. It's really the oddity
that threw me.

In ATOMIC_OP_RETURN, the reason it does that op twice is simply that
it wants to return the new value. But you literally made it return the
*old* value by doing an xor twice in succession, which reverses the
bits twice.

Was that really what you intended?

               Linus
Linus Torvalds Sept. 16, 2023, 2:14 a.m. UTC | #4
On Fri, 15 Sept 2023 at 19:01, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> No, I think "mov src,dst" is just a pseudo-op for "or src,src,dst",
> there's no actual "mov" instruction, iirc.

Bah. I looked it up. It's actually supposed to be "BIS r31,src,dst".

Where "BIS" is indeed what most sane people call just "or". I think
it's "BIt Set", but the assembler will accept the normal "or" mnemonic
too.

There's BIC ("BIt Clear") too. Also known as "and with complement".

I assume it comes from some VAX background. Or maybe it's just a NIH
thing and alpha wanted to be "special".

              Linus
Matthew Wilcox Sept. 16, 2023, 3:59 p.m. UTC | #5
On Fri, Sep 15, 2023 at 07:01:14PM -0700, Linus Torvalds wrote:
> On Fri, 15 Sept 2023 at 17:38, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Sep 15, 2023 at 05:27:17PM -0700, Linus Torvalds wrote:
> > > On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle)
> > > <willy@infradead.org> wrote:
> > > >
> > > > +       "1:     ldl_l %0,%4\n"
> > > > +       "       xor %0,%3,%0\n"
> > > > +       "       xor %0,%3,%2\n"
> > > > +       "       stl_c %0,%1\n"
> > >
> > > What an odd thing to do.
> > >
> > > Why don't you just save the old value? That double xor looks all kinds
> > > of strange, and is a data dependency for no good reason that I can
> > > see.
> > >
> > > Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead?
> > >
> > > Not that I think alpha matters, but since I was looking through the
> > > series, this just made me go "Whaa?"
> >
> > Well, this is my first time writing Alpha assembler ;-)  I stole this
> > from ATOMIC_OP_RETURN:
> >
> >         "1:     ldl_l %0,%1\n"                                          \
> >         "       " #asm_op " %0,%3,%2\n"                                 \
> >         "       " #asm_op " %0,%3,%0\n"                                 \
> 
> Note how that does "orig" assignment first (ie the '%2" destination is
> the first instruction), unlike your version.

Wow.  I totally missed that I'd transposed those two lines.  I read
it back with the lines in the order that they should have been in.
Every time I read it.  I was wondering why you were talking about a data
dependency, and I just couldn't see it.  With the lines in the order that
they're actually in, it's quite obvious and totally not what I meant.
Of course, it doesn't matter which order they're in from the point of
view of testing the waiters bit since we don't change the waiters bit.

> Does any of this matter? Nope. It's alpha. There's probably a handful
> of machines, and it's maybe one extra cycle. It's really the oddity
> that threw me.

I'll admit to spending far more time on the m68k version of this than
the alpha version ;-)
diff mbox series

Patch

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index bafb1c1f0fdc..138930e8a504 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -286,6 +286,27 @@  arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
 #define arch_test_bit generic_test_bit
 #define arch_test_bit_acquire generic_test_bit_acquire
 
+static inline bool xor_unlock_is_negative_byte(unsigned long mask,
+		volatile unsigned long *p)
+{
+	unsigned long temp, old;
+
+	__asm__ __volatile__(
+	"1:	ldl_l %0,%4\n"
+	"	xor %0,%3,%0\n"
+	"	xor %0,%3,%2\n"
+	"	stl_c %0,%1\n"
+	"	beq %0,2f\n"
+	".subsection 2\n"
+	"2:	br 1b\n"
+	".previous"
+	:"=&r" (temp), "=m" (*p), "=&r" (old)
+	:"Ir" (mask), "m" (*p));
+
+	return (old & BIT(7)) != 0;
+}
+#define xor_unlock_is_negative_byte xor_unlock_is_negative_byte
+
 /*
  * ffz = Find First Zero in word. Undefined if no zero exists,
  * so code should check against ~0UL first..