Message ID | 20230915183707.2707298-9-willy@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add folio_end_read | expand |
On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle) <willy@infradead.org> wrote: > > + "1: ldl_l %0,%4\n" > + " xor %0,%3,%0\n" > + " xor %0,%3,%2\n" > + " stl_c %0,%1\n" What an odd thing to do. Why don't you just save the old value? That double xor looks all kinds of strange, and is a data dependency for no good reason that I can see. Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead? Not that I think alpha matters, but since I was looking through the series, this just made me go "Whaa?" Linus
On Fri, Sep 15, 2023 at 05:27:17PM -0700, Linus Torvalds wrote: > On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle) > <willy@infradead.org> wrote: > > > > + "1: ldl_l %0,%4\n" > > + " xor %0,%3,%0\n" > > + " xor %0,%3,%2\n" > > + " stl_c %0,%1\n" > > What an odd thing to do. > > Why don't you just save the old value? That double xor looks all kinds > of strange, and is a data dependency for no good reason that I can > see. > > Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead? > > Not that I think alpha matters, but since I was looking through the > series, this just made me go "Whaa?" Well, this is my first time writing Alpha assembler ;-) I stole this from ATOMIC_OP_RETURN: "1: ldl_l %0,%1\n" \ " " #asm_op " %0,%3,%2\n" \ " " #asm_op " %0,%3,%0\n" \ " stl_c %0,%1\n" \ " beq %0,2f\n" \ ".subsection 2\n" \ "2: br 1b\n" \ ".previous" \ but yes, mov would do the trick here. Is it really faster than xor?
On Fri, 15 Sept 2023 at 17:38, Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Sep 15, 2023 at 05:27:17PM -0700, Linus Torvalds wrote: > > On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle) > > <willy@infradead.org> wrote: > > > > > > + "1: ldl_l %0,%4\n" > > > + " xor %0,%3,%0\n" > > > + " xor %0,%3,%2\n" > > > + " stl_c %0,%1\n" > > > > What an odd thing to do. > > > > Why don't you just save the old value? That double xor looks all kinds > > of strange, and is a data dependency for no good reason that I can > > see. > > > > Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead? > > > > Not that I think alpha matters, but since I was looking through the > > series, this just made me go "Whaa?" > > Well, this is my first time writing Alpha assembler ;-) I stole this > from ATOMIC_OP_RETURN: > > "1: ldl_l %0,%1\n" \ > " " #asm_op " %0,%3,%2\n" \ > " " #asm_op " %0,%3,%0\n" \ Note how that does "orig" assignment first (ie the '%2" destination is the first instruction), unlike your version. So in that ATOMIC_OP_RETURN, it does indeed do the same ALU op twice, but there's no data dependency between the two, so they can execute in parallel. > but yes, mov would do the trick here. Is it really faster than xor? No, I think "mov src,dst" is just a pseudo-op for "or src,src,dst", there's no actual "mov" instruction, iirc. So it's an ALU op too. What makes your version expensive is the data dependency, not the ALU op. So the *odd* thing is not that you have two xor's per se, but how you create the original value by xor'ing the value once, and then xoring the new value with the same mask, giving you the original value back - but with that odd data dependency so that it won't schedule in the same cycle. Does any of this matter? Nope. It's alpha. There's probably a handful of machines, and it's maybe one extra cycle. It's really the oddity that threw me. In ATOMIC_OP_RETURN, the reason it does that op twice is simply that it wants to return the new value. But you literally made it return the *old* value by doing an xor twice in succession, which reverses the bits twice. Was that really what you intended? Linus
On Fri, 15 Sept 2023 at 19:01, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > No, I think "mov src,dst" is just a pseudo-op for "or src,src,dst", > there's no actual "mov" instruction, iirc. Bah. I looked it up. It's actually supposed to be "BIS r31,src,dst". Where "BIS" is indeed what most sane people call just "or". I think it's "BIt Set", but the assembler will accept the normal "or" mnemonic too. There's BIC ("BIt Clear") too. Also known as "and with complement". I assume it comes from some VAX background. Or maybe it's just a NIH thing and alpha wanted to be "special". Linus
On Fri, Sep 15, 2023 at 07:01:14PM -0700, Linus Torvalds wrote: > On Fri, 15 Sept 2023 at 17:38, Matthew Wilcox <willy@infradead.org> wrote: > > > > On Fri, Sep 15, 2023 at 05:27:17PM -0700, Linus Torvalds wrote: > > > On Fri, 15 Sept 2023 at 11:37, Matthew Wilcox (Oracle) > > > <willy@infradead.org> wrote: > > > > > > > > + "1: ldl_l %0,%4\n" > > > > + " xor %0,%3,%0\n" > > > > + " xor %0,%3,%2\n" > > > > + " stl_c %0,%1\n" > > > > > > What an odd thing to do. > > > > > > Why don't you just save the old value? That double xor looks all kinds > > > of strange, and is a data dependency for no good reason that I can > > > see. > > > > > > Why isn't this "ldl_l + mov %0,%2 + xor + stl_c" instead? > > > > > > Not that I think alpha matters, but since I was looking through the > > > series, this just made me go "Whaa?" > > > > Well, this is my first time writing Alpha assembler ;-) I stole this > > from ATOMIC_OP_RETURN: > > > > "1: ldl_l %0,%1\n" \ > > " " #asm_op " %0,%3,%2\n" \ > > " " #asm_op " %0,%3,%0\n" \ > > Note how that does "orig" assignment first (ie the '%2" destination is > the first instruction), unlike your version. Wow. I totally missed that I'd transposed those two lines. I read it back with the lines in the order that they should have been in. Every time I read it. I was wondering why you were talking about a data dependency, and I just couldn't see it. With the lines in the order that they're actually in, it's quite obvious and totally not what I meant. Of course, it doesn't matter which order they're in from the point of view of testing the waiters bit since we don't change the waiters bit. > Does any of this matter? Nope. It's alpha. There's probably a handful > of machines, and it's maybe one extra cycle. It's really the oddity > that threw me. I'll admit to spending far more time on the m68k version of this than the alpha version ;-)
diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h index bafb1c1f0fdc..138930e8a504 100644 --- a/arch/alpha/include/asm/bitops.h +++ b/arch/alpha/include/asm/bitops.h @@ -286,6 +286,27 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr) #define arch_test_bit generic_test_bit #define arch_test_bit_acquire generic_test_bit_acquire +static inline bool xor_unlock_is_negative_byte(unsigned long mask, + volatile unsigned long *p) +{ + unsigned long temp, old; + + __asm__ __volatile__( + "1: ldl_l %0,%4\n" + " xor %0,%3,%0\n" + " xor %0,%3,%2\n" + " stl_c %0,%1\n" + " beq %0,2f\n" + ".subsection 2\n" + "2: br 1b\n" + ".previous" + :"=&r" (temp), "=m" (*p), "=&r" (old) + :"Ir" (mask), "m" (*p)); + + return (old & BIT(7)) != 0; +} +#define xor_unlock_is_negative_byte xor_unlock_is_negative_byte + /* * ffz = Find First Zero in word. Undefined if no zero exists, * so code should check against ~0UL first..
Inspired by the alpha clear_bit() and arch_atomic_add_return(), this will surely be more efficient than the generic one defined in filemap.c. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- arch/alpha/include/asm/bitops.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)