Message ID | 20230915183707.2707298-10-willy@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add folio_end_read | expand |
On 16/9/23 04:36, Matthew Wilcox (Oracle) wrote: > Using EOR to clear the guaranteed-to-be-set lock bit will test the > negative flag just like the x86 implementation. This should be > more efficient than the generic implementation in filemap.c. It > would be better if m68k had __GCC_ASM_FLAG_OUTPUTS__. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > arch/m68k/include/asm/bitops.h | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h > index e984af71df6b..909ebe7cab5d 100644 > --- a/arch/m68k/include/asm/bitops.h > +++ b/arch/m68k/include/asm/bitops.h > @@ -319,6 +319,20 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr) > return test_and_change_bit(nr, addr); > } > > +static inline bool xor_unlock_is_negative_byte(unsigned long mask, > + volatile unsigned long *p) > +{ > + char result; > + char *cp = (char *)p + 3; /* m68k is big-endian */ > + > + __asm__ __volatile__ ("eor.b %1, %2; smi %0" The ColdFire members of the 68k family do not support byte size eor: CC mm/filemap.o {standard input}: Assembler messages: {standard input}:824: Error: invalid instruction for this architecture; needs 68000 or higher (68000 [68ec000, 68hc000, 68hc001, 68008, 68302, 68306, 68307, 68322, 68356], 68010, 68020 [68k, 68ec020], 68030 [68ec030], 68040 [68ec040], 68060 [68ec060], cpu32 [68330, 68331, 68332, 68333, 68334, 68336, 68340, 68341, 68349, 68360], fidoa [fido]) -- statement `eor.b #1,3(%a0)' ignored Regards Greg > + : "=d" (result) > + : "di" (mask), "o" (*cp) > + : "memory"); > + return result; > +} > +#define xor_unlock_is_negative_byte xor_unlock_is_negative_byte > + > /* > * The true 68020 and more advanced processors support the "bfffo" > * instruction for finding bits. ColdFire and simple 68000 parts
On Sat, Sep 16, 2023 at 11:11:32PM +1000, Greg Ungerer wrote: > On 16/9/23 04:36, Matthew Wilcox (Oracle) wrote: > > Using EOR to clear the guaranteed-to-be-set lock bit will test the > > negative flag just like the x86 implementation. This should be > > more efficient than the generic implementation in filemap.c. It > > would be better if m68k had __GCC_ASM_FLAG_OUTPUTS__. > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > --- > > arch/m68k/include/asm/bitops.h | 14 ++++++++++++++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h > > index e984af71df6b..909ebe7cab5d 100644 > > --- a/arch/m68k/include/asm/bitops.h > > +++ b/arch/m68k/include/asm/bitops.h > > @@ -319,6 +319,20 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr) > > return test_and_change_bit(nr, addr); > > } > > +static inline bool xor_unlock_is_negative_byte(unsigned long mask, > > + volatile unsigned long *p) > > +{ > > + char result; > > + char *cp = (char *)p + 3; /* m68k is big-endian */ > > + > > + __asm__ __volatile__ ("eor.b %1, %2; smi %0" > > The ColdFire members of the 68k family do not support byte size eor: > > CC mm/filemap.o > {standard input}: Assembler messages: > {standard input}:824: Error: invalid instruction for this architecture; needs 68000 or higher (68000 [68ec000, 68hc000, 68hc001, 68008, 68302, 68306, 68307, 68322, 68356], 68010, 68020 [68k, 68ec020], 68030 [68ec030], 68040 [68ec040], 68060 [68ec060], cpu32 [68330, 68331, 68332, 68333, 68334, 68336, 68340, 68341, 68349, 68360], fidoa [fido]) -- statement `eor.b #1,3(%a0)' ignored Well, that sucks. What do you suggest for Coldfire? (Shame you didn't join in on the original discussion: https://lore.kernel.org/linux-m68k/ZLmKq2VLjYGBVhMI@casper.infradead.org/ )
On 17/9/23 00:34, Matthew Wilcox wrote: > On Sat, Sep 16, 2023 at 11:11:32PM +1000, Greg Ungerer wrote: >> On 16/9/23 04:36, Matthew Wilcox (Oracle) wrote: >>> Using EOR to clear the guaranteed-to-be-set lock bit will test the >>> negative flag just like the x86 implementation. This should be >>> more efficient than the generic implementation in filemap.c. It >>> would be better if m68k had __GCC_ASM_FLAG_OUTPUTS__. >>> >>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >>> --- >>> arch/m68k/include/asm/bitops.h | 14 ++++++++++++++ >>> 1 file changed, 14 insertions(+) >>> >>> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h >>> index e984af71df6b..909ebe7cab5d 100644 >>> --- a/arch/m68k/include/asm/bitops.h >>> +++ b/arch/m68k/include/asm/bitops.h >>> @@ -319,6 +319,20 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr) >>> return test_and_change_bit(nr, addr); >>> } >>> +static inline bool xor_unlock_is_negative_byte(unsigned long mask, >>> + volatile unsigned long *p) >>> +{ >>> + char result; >>> + char *cp = (char *)p + 3; /* m68k is big-endian */ >>> + >>> + __asm__ __volatile__ ("eor.b %1, %2; smi %0" >> >> The ColdFire members of the 68k family do not support byte size eor: >> >> CC mm/filemap.o >> {standard input}: Assembler messages: >> {standard input}:824: Error: invalid instruction for this architecture; needs 68000 or higher (68000 [68ec000, 68hc000, 68hc001, 68008, 68302, 68306, 68307, 68322, 68356], 68010, 68020 [68k, 68ec020], 68030 [68ec030], 68040 [68ec040], 68060 [68ec060], cpu32 [68330, 68331, 68332, 68333, 68334, 68336, 68340, 68341, 68349, 68360], fidoa [fido]) -- statement `eor.b #1,3(%a0)' ignored > > Well, that sucks. What do you suggest for Coldfire? I am not seeing an easy way to not fall back to something like the MIPS implementation for ColdFire. Could obviously assemblerize this to do better than gcc, but if it has to be atomic I think we are stuck with the irq locking. static inline bool cf_xor_is_negative_byte(unsigned long mask, volatile unsigned long *addr) { unsigned long flags; unsigned long data; local_irq_save(flags) data = *addr; *addr = data ^ mask; local_irq_restore(flags); return (data & BIT(7)) != 0; } Regards Greg > (Shame you didn't join in on the original discussion: > https://lore.kernel.org/linux-m68k/ZLmKq2VLjYGBVhMI@casper.infradead.org/ )
> Well, that sucks. What do you suggest for Coldfire?
Can you just do a 32bit xor ?
Unless you've got smp m68k I'd presume it is ok?
(And assuming you aren't falling off a page.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Tue, Sep 19, 2023 at 01:23:08PM +0000, David Laight wrote: > > Well, that sucks. What do you suggest for Coldfire? > > Can you just do a 32bit xor ? > Unless you've got smp m68k I'd presume it is ok? > (And assuming you aren't falling off a page.) Patch welcome.
From: Matthew Wilcox <willy@infradead.org> > Sent: 19 September 2023 15:26 > > On Tue, Sep 19, 2023 at 01:23:08PM +0000, David Laight wrote: > > > Well, that sucks. What do you suggest for Coldfire? > > > > Can you just do a 32bit xor ? > > Unless you've got smp m68k I'd presume it is ok? > > (And assuming you aren't falling off a page.) > > Patch welcome. My 68020 book seems to be at work and I'm at home. (The 286, 386 and cy7c600 (sparc 32) books don't help). But if the code is trying to do *ptr ^= 0x80 and check the sign flag then you just need to use eor.l with 0x80000000 on the same address. All the 68k I used would do misaligned accesses. Although they can fault mid-instruction on the microcode stack. Any smp 68020 had to be certain to resume on the same cpu. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Tue, Sep 19, 2023 at 02:35:25PM +0000, David Laight wrote: > From: Matthew Wilcox <willy@infradead.org> > > Sent: 19 September 2023 15:26 > > > > On Tue, Sep 19, 2023 at 01:23:08PM +0000, David Laight wrote: > > > > Well, that sucks. What do you suggest for Coldfire? > > > > > > Can you just do a 32bit xor ? > > > Unless you've got smp m68k I'd presume it is ok? > > > (And assuming you aren't falling off a page.) > > > > Patch welcome. > > My 68020 book seems to be at work and I'm at home. > (The 286, 386 and cy7c600 (sparc 32) books don't help). > > But if the code is trying to do *ptr ^= 0x80 and check the > sign flag then you just need to use eor.l with 0x80000000 > on the same address. I have a 68020 book; what I don't have is a Coldfire manual. Anyway, that's not the brief. We're looking to (eg) clear bit 0 and test whether bit 7 was set. So it's the sign bit of the byte, not the sign bit of the int.
From: Matthew Wilcox <willy@infradead.org> > Sent: 19 September 2023 16:15 > > On Tue, Sep 19, 2023 at 02:35:25PM +0000, David Laight wrote: > > From: Matthew Wilcox <willy@infradead.org> > > > Sent: 19 September 2023 15:26 > > > > > > On Tue, Sep 19, 2023 at 01:23:08PM +0000, David Laight wrote: > > > > > Well, that sucks. What do you suggest for Coldfire? > > > > > > > > Can you just do a 32bit xor ? > > > > Unless you've got smp m68k I'd presume it is ok? > > > > (And assuming you aren't falling off a page.) > > > > > > Patch welcome. ... > I have a 68020 book; what I don't have is a Coldfire manual. > > Anyway, that's not the brief. We're looking to (eg) clear bit 0 > and test whether bit 7 was set. So it's the sign bit of the byte, > not the sign bit of the int. Use the address of the byte as an int and xor with 1u<<24. The xor will do a rmw on the three bytes following, but I doubt that matters. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Tue, Sep 19, 2023 at 03:22:25PM +0000, David Laight wrote: > > Anyway, that's not the brief. We're looking to (eg) clear bit 0 > > and test whether bit 7 was set. So it's the sign bit of the byte, > > not the sign bit of the int. > > Use the address of the byte as an int and xor with 1u<<24. > The xor will do a rmw on the three bytes following, but I > doubt that matters. Bet you a shiny penny that Coldfire takes an unaligned access trap ... and besides, this is done on _every_ call to unlock_page(). That might cross not only a cacheline boundary but also a page boundary. I cannot believe that would be a high-performing solution. It might be just fine on m68000 but I bet even by the 030 it's lower performing.
From: Matthew Wilcox > Sent: 19 September 2023 16:47 > > On Tue, Sep 19, 2023 at 03:22:25PM +0000, David Laight wrote: > > > Anyway, that's not the brief. We're looking to (eg) clear bit 0 > > > and test whether bit 7 was set. So it's the sign bit of the byte, > > > not the sign bit of the int. > > > > Use the address of the byte as an int and xor with 1u<<24. > > The xor will do a rmw on the three bytes following, but I > > doubt that matters. > > Bet you a shiny penny that Coldfire takes an unaligned access trap ... and then the 'firmware' silently fixed it up for you a few 1000 clocks later... > and besides, this is done on _every_ call to unlock_page(). That might > cross not only a cacheline boundary but also a page boundary. I cannot > believe that would be a high-performing solution. It might be just fine > on m68000 but I bet even by the 030 it's lower performing. I do remember managing to use 'cas2' to add an item to a linked list. But it is so painful so setup it was better just to disable interrupts. For non-smp that is almost certainly ok. (Unless the instructions are slow because of synchronisation.) Otherwise you need to use 'cas' on the aligned word. Assuming coldfire even has cas. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 20/9/23 01:57, David Laight wrote: > From: Matthew Wilcox >> Sent: 19 September 2023 16:47 >> >> On Tue, Sep 19, 2023 at 03:22:25PM +0000, David Laight wrote: >>>> Anyway, that's not the brief. We're looking to (eg) clear bit 0 >>>> and test whether bit 7 was set. So it's the sign bit of the byte, >>>> not the sign bit of the int. >>> >>> Use the address of the byte as an int and xor with 1u<<24. >>> The xor will do a rmw on the three bytes following, but I >>> doubt that matters. >> >> Bet you a shiny penny that Coldfire takes an unaligned access trap ... > > and then the 'firmware' silently fixed it up for you a few 1000 > clocks later... > >> and besides, this is done on _every_ call to unlock_page(). That might >> cross not only a cacheline boundary but also a page boundary. I cannot >> believe that would be a high-performing solution. It might be just fine >> on m68000 but I bet even by the 030 it's lower performing. > > I do remember managing to use 'cas2' to add an item to a linked list. > But it is so painful so setup it was better just to disable interrupts. > For non-smp that is almost certainly ok. > (Unless the instructions are slow because of synchronisation.) > Otherwise you need to use 'cas' on the aligned word. > Assuming coldfire even has cas. It doesn't. See CONFIG_CPU_HAS_NO_CAS in arch/m68k/Kconfig.cpu for how m68k deals with ColdFire and early 68000 parts not having it. Regards Greg > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >
On 20/9/23 01:14, Matthew Wilcox wrote: > On Tue, Sep 19, 2023 at 02:35:25PM +0000, David Laight wrote: >> From: Matthew Wilcox <willy@infradead.org> >>> Sent: 19 September 2023 15:26 >>> >>> On Tue, Sep 19, 2023 at 01:23:08PM +0000, David Laight wrote: >>>>> Well, that sucks. What do you suggest for Coldfire? >>>> >>>> Can you just do a 32bit xor ? >>>> Unless you've got smp m68k I'd presume it is ok? >>>> (And assuming you aren't falling off a page.) >>> >>> Patch welcome. >> >> My 68020 book seems to be at work and I'm at home. >> (The 286, 386 and cy7c600 (sparc 32) books don't help). >> >> But if the code is trying to do *ptr ^= 0x80 and check the >> sign flag then you just need to use eor.l with 0x80000000 >> on the same address. > > I have a 68020 book; what I don't have is a Coldfire manual. You can find it here: https://www.nxp.com/docs/en/reference-manual/CFPRM.pdf Regards Greg > Anyway, that's not the brief. We're looking to (eg) clear bit 0 > and test whether bit 7 was set. So it's the sign bit of the byte, > not the sign bit of the int.
On 19/9/23 00:37, Greg Ungerer wrote: > On 17/9/23 00:34, Matthew Wilcox wrote: >> On Sat, Sep 16, 2023 at 11:11:32PM +1000, Greg Ungerer wrote: >>> On 16/9/23 04:36, Matthew Wilcox (Oracle) wrote: >>>> Using EOR to clear the guaranteed-to-be-set lock bit will test the >>>> negative flag just like the x86 implementation. This should be >>>> more efficient than the generic implementation in filemap.c. It >>>> would be better if m68k had __GCC_ASM_FLAG_OUTPUTS__. >>>> >>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >>>> --- >>>> arch/m68k/include/asm/bitops.h | 14 ++++++++++++++ >>>> 1 file changed, 14 insertions(+) >>>> >>>> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h >>>> index e984af71df6b..909ebe7cab5d 100644 >>>> --- a/arch/m68k/include/asm/bitops.h >>>> +++ b/arch/m68k/include/asm/bitops.h >>>> @@ -319,6 +319,20 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr) >>>> return test_and_change_bit(nr, addr); >>>> } >>>> +static inline bool xor_unlock_is_negative_byte(unsigned long mask, >>>> + volatile unsigned long *p) >>>> +{ >>>> + char result; >>>> + char *cp = (char *)p + 3; /* m68k is big-endian */ >>>> + >>>> + __asm__ __volatile__ ("eor.b %1, %2; smi %0" >>> >>> The ColdFire members of the 68k family do not support byte size eor: >>> >>> CC mm/filemap.o >>> {standard input}: Assembler messages: >>> {standard input}:824: Error: invalid instruction for this architecture; needs 68000 or higher (68000 [68ec000, 68hc000, 68hc001, 68008, 68302, 68306, 68307, 68322, 68356], 68010, 68020 [68k, 68ec020], 68030 [68ec030], 68040 [68ec040], 68060 [68ec060], cpu32 [68330, 68331, 68332, 68333, 68334, 68336, 68340, 68341, 68349, 68360], fidoa [fido]) -- statement `eor.b #1,3(%a0)' ignored >> >> Well, that sucks. What do you suggest for Coldfire? > > I am not seeing an easy way to not fall back to something like the MIPS > implementation for ColdFire. Could obviously assemblerize this to do better > than gcc, but if it has to be atomic I think we are stuck with the irq locking. > > static inline bool cf_xor_is_negative_byte(unsigned long mask, > volatile unsigned long *addr) > { > unsigned long flags; > unsigned long data; > > local_irq_save(flags) > data = *addr; > *addr = data ^ mask; > local_irq_restore(flags); > > return (data & BIT(7)) != 0; > } The problem with this C implementation is that need to use loal_irq_save() which results in some ugly header dependencies trying top include irqflags.h. This version at least compiles and run, though we can probably do better still. diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h index e984af71df6b..99392c26e784 100644 --- a/arch/m68k/include/asm/bitops.h +++ b/arch/m68k/include/asm/bitops.h @@ -319,6 +319,48 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr) return test_and_change_bit(nr, addr); } +static inline bool cf_xor_unlock_is_negative_byte(unsigned long mask, + volatile unsigned long *addr) +{ + unsigned long data; + + asm volatile ( + "move.w %%sr,%%d1 \n\t" + "move.w %%d1,%%d0 \n\t" + "ori.l #0x0700,%%d0 \n\t" + "move.w %%d0,%%sr \n\t" + + "move.l %2@,%0 \n\t" + "eor.l %1,%0 \n\t" + "move.l %0,%2@ \n\t" + + "movew %%d1,%%sr \n" + : "=d" (data) + : "di" (mask), "a" (addr) + : "cc", "%d0", "%d1", "memory"); + + return (data & BIT(7)) != 0; +} + +static inline bool m68k_xor_unlock_is_negative_byte(unsigned long mask, + volatile unsigned long *p) +{ + char result; + char *cp = (char *)p + 3; /* m68k is big-endian */ + + __asm__ __volatile__ ("eor.b %1, %2; smi %0" + : "=d" (result) + : "di" (mask), "o" (*cp) + : "memory"); + return result; +} + +#if defined(CONFIG_COLDFIRE) +#define xor_unlock_is_negative_byte(mask, p) cf_xor_unlock_is_negative_byte(mask, p) +#else +#define xor_unlock_is_negative_byte(mask, p) m68k_xor_unlock_is_negative_byte(mask, p) +#endif + /* * The true 68020 and more advanced processors support the "bfffo" * instruction for finding bits. ColdFire and simple 68000 parts Regards Greg
On Wed, 20 Sept 2023 at 00:45, Greg Ungerer <gregungerer@westnet.com.au> wrote: > > The problem with this C implementation is that need to use loal_irq_save() > which results in some ugly header dependencies trying top include irqflags.h. > > This version at least compiles and run, though we can probably do better still. I was going to say "can't you use CAS?" but apparently coldfire doesn't have that either. What a horrible thing. I do wonder if we should just say "let's use the top bit instead"? The reason we used bit #7 is that - only x86 used to do the clear_bit_unlock_is_negative_byte - it was easy with a simple "andb". - it was just a trivial "move two bits around". but honestly, on x86, doing it with "andl/andq" shouldn't be any worse, and we can still use a (sign-extended) 8-bit immediate value to xor the low seven bits and test the high bit at the same time - so it should be basically exactly the same code sequence. There's a question about mixing access widths - which can be deadly to performance on x86 - but generally this op should be the only op on the page flags in that sequence, and doing a byte access isn't necessarily better. Of course, using the top bit then screws with all the zone/node/section/lru_gen bits that we currently put in the high bits. So it's absolutely *not* just a trivial "move this bit" operation. It would change all the <linux/page-flags-layout.h> games we do. That might be enough for any sane person to go "this is not worth it", but *if* Willy goes "I like the bit twiddling games", maybe he'd be willing to look at that. I mean, he wrote alpha assembler for this, that certainly says *something* about WIlly ;) Willy? Linus
On Wed, Sep 20, 2023 at 05:22:33PM +1000, Greg Ungerer wrote: > On 20/9/23 01:14, Matthew Wilcox wrote: > > I have a 68020 book; what I don't have is a Coldfire manual. > > You can find it here: https://www.nxp.com/docs/en/reference-manual/CFPRM.pdf Thanks, Greg. This is almost good: static inline bool xor_unlock_is_negative_byte(unsigned long mask, volatile unsigned long *p) { #ifdef CONFIG_COLDFIRE __asm__ __volatile__ ("eorl %1, %0" : "+m" (*p) : "d" (mask) : "memory"); return *p & (1 << 7); #else char result; char *cp = (char *)p + 3; /* m68k is big-endian */ __asm__ __volatile__ ("eor.b %1, %2; smi %0" : "=d" (result) : "di" (mask), "o" (*cp) : "memory"); return result; #endif } folio_end_read() does about as well as can be expected: 00000708 <folio_end_read>: 708: 206f 0004 moveal %sp@(4),%a0 70c: 7009 moveq #9,%d0 70e: 4a2f 000b tstb %sp@(11) 712: 6602 bnes 716 <folio_end_read+0xe> 714: 7001 moveq #1,%d0 716: b190 eorl %d0,%a0@ 718: 2010 movel %a0@,%d0 71a: 4a00 tstb %d0 71c: 6a0c bpls 72a <folio_end_read+0x22> 71e: 42af 0008 clrl %sp@(8) 722: 2f48 0004 movel %a0,%sp@(4) 726: 6000 fcfe braw 426 <folio_wake_bit> 72a: 4e75 rts However, it seems that folio_unlock() could shave off an instruction: 00000918 <folio_unlock>: 918: 206f 0004 moveal %sp@(4),%a0 91c: 7001 moveq #1,%d0 91e: b190 eorl %d0,%a0@ 920: 2010 movel %a0@,%d0 922: 4a00 tstb %d0 924: 6a0a bpls 930 <folio_unlock+0x18> 926: 42a7 clrl %sp@- 928: 2f08 movel %a0,%sp@- 92a: 4eba fafa jsr %pc@(426 <folio_wake_bit>) 92e: 508f addql #8,%sp 930: 4e75 rts We could use eori instead of eorl, at least according to table 3-9 on page 3-8: EOR Dy,<ea>x L Source ^ Destination → Destination ISA_A EORI #<data>,Dx L Immediate Data ^ Destination → Destination ISA_A but gas is unhappy with everything I've tried to use eori. I'm building with stmark2_defconfig, which I assume should work.
On 3/10/23 06:07, Matthew Wilcox wrote: > On Wed, Sep 20, 2023 at 05:22:33PM +1000, Greg Ungerer wrote: >> On 20/9/23 01:14, Matthew Wilcox wrote: >>> I have a 68020 book; what I don't have is a Coldfire manual. >> >> You can find it here: https://www.nxp.com/docs/en/reference-manual/CFPRM.pdf > > Thanks, Greg. This is almost good: > > static inline bool xor_unlock_is_negative_byte(unsigned long mask, > volatile unsigned long *p) > { > #ifdef CONFIG_COLDFIRE > __asm__ __volatile__ ("eorl %1, %0" > : "+m" (*p) > : "d" (mask) > : "memory"); > return *p & (1 << 7); > #else > char result; > char *cp = (char *)p + 3; /* m68k is big-endian */ > > __asm__ __volatile__ ("eor.b %1, %2; smi %0" > : "=d" (result) > : "di" (mask), "o" (*cp) > : "memory"); > return result; > #endif > } > > folio_end_read() does about as well as can be expected: > > 00000708 <folio_end_read>: > 708: 206f 0004 moveal %sp@(4),%a0 > 70c: 7009 moveq #9,%d0 > 70e: 4a2f 000b tstb %sp@(11) > 712: 6602 bnes 716 <folio_end_read+0xe> > 714: 7001 moveq #1,%d0 > 716: b190 eorl %d0,%a0@ > 718: 2010 movel %a0@,%d0 > 71a: 4a00 tstb %d0 > 71c: 6a0c bpls 72a <folio_end_read+0x22> > 71e: 42af 0008 clrl %sp@(8) > 722: 2f48 0004 movel %a0,%sp@(4) > 726: 6000 fcfe braw 426 <folio_wake_bit> > 72a: 4e75 rts > > However, it seems that folio_unlock() could shave off an instruction: > > 00000918 <folio_unlock>: > 918: 206f 0004 moveal %sp@(4),%a0 > 91c: 7001 moveq #1,%d0 > 91e: b190 eorl %d0,%a0@ > 920: 2010 movel %a0@,%d0 > 922: 4a00 tstb %d0 > 924: 6a0a bpls 930 <folio_unlock+0x18> > 926: 42a7 clrl %sp@- > 928: 2f08 movel %a0,%sp@- > 92a: 4eba fafa jsr %pc@(426 <folio_wake_bit>) > 92e: 508f addql #8,%sp > 930: 4e75 rts > > We could use eori instead of eorl, at least according to table 3-9 on > page 3-8: > > EOR Dy,<ea>x L Source ^ Destination → Destination ISA_A > EORI #<data>,Dx L Immediate Data ^ Destination → Destination ISA_A > > but gas is unhappy with everything I've tried to use eori. I'm building I can't seem to get it to always use it either. This comes close: __asm__ __volatile__ ("eorl %1, %0" : "+d" (*p) : "di" (mask) : "memory"); return *p & (1 << 7); Using eoril for folio_unlock, but not for folio_end_read: 400413e6 <folio_unlock>: 400413e6: 206f 0004 moveal %sp@(4),%a0 400413ea: 2010 movel %a0@,%d0 400413ec: 0a80 0000 0001 eoril #1,%d0 400413f2: 2080 movel %d0,%a0@ 400413f4: 2010 movel %a0@,%d0 400413f6: 4a00 tstb %d0 400413f8: 6c0a bges 40041404 <folio_unlock+0x1e> 400413fa: 42a7 clrl %sp@- 400413fc: 2f08 movel %a0,%sp@- 400413fe: 4eba ff30 jsr %pc@(40041330 <folio_wake_bit>) 40041402: 508f addql #8,%sp 40041404: 4e75 rts But that is still worse anyway. > with stmark2_defconfig, which I assume should work. Yes, or any of amcore, m5208evb, m5249evb, m5272c3, m5275evb, m5307c3, m5407c3. Regards Greg
On Wed, Oct 04, 2023 at 12:14:10AM +1000, Greg Ungerer wrote: > On 3/10/23 06:07, Matthew Wilcox wrote: > > 00000918 <folio_unlock>: > > 918: 206f 0004 moveal %sp@(4),%a0 > > 91c: 7001 moveq #1,%d0 > > 91e: b190 eorl %d0,%a0@ > > 920: 2010 movel %a0@,%d0 > > 922: 4a00 tstb %d0 > > 924: 6a0a bpls 930 <folio_unlock+0x18> > > 926: 42a7 clrl %sp@- > > 928: 2f08 movel %a0,%sp@- > > 92a: 4eba fafa jsr %pc@(426 <folio_wake_bit>) > > 92e: 508f addql #8,%sp > > 930: 4e75 rts fwiw, here's what folio_unlock looks like today without any of my patches: 00000746 <folio_unlock>: 746: 206f 0004 moveal %sp@(4),%a0 74a: 43e8 0003 lea %a0@(3),%a1 74e: 0891 0000 bclr #0,%a1@ 752: 2010 movel %a0@,%d0 754: 4a00 tstb %d0 756: 6a0a bpls 762 <folio_unlock+0x1c> 758: 42a7 clrl %sp@- 75a: 2f08 movel %a0,%sp@- 75c: 4eba fcc8 jsr %pc@(426 <folio_wake_bit>) 760: 508f addql #8,%sp 762: 4e75 rts Same number of instructions, but today's code has slightly longer insns, so I'm tempted to take the win? > > We could use eori instead of eorl, at least according to table 3-9 on > > page 3-8: > > > > EOR Dy,<ea>x L Source ^ Destination → Destination ISA_A > > EORI #<data>,Dx L Immediate Data ^ Destination → Destination ISA_A Oh. I misread. It only does EORI to a data register; it can't do EORI to an address. > 400413e6 <folio_unlock>: > 400413e6: 206f 0004 moveal %sp@(4),%a0 > 400413ea: 2010 movel %a0@,%d0 > 400413ec: 0a80 0000 0001 eoril #1,%d0 > 400413f2: 2080 movel %d0,%a0@ > 400413f4: 2010 movel %a0@,%d0 > 400413f6: 4a00 tstb %d0 > 400413f8: 6c0a bges 40041404 <folio_unlock+0x1e> > 400413fa: 42a7 clrl %sp@- > 400413fc: 2f08 movel %a0,%sp@- > 400413fe: 4eba ff30 jsr %pc@(40041330 <folio_wake_bit>) > 40041402: 508f addql #8,%sp > 40041404: 4e75 rts > > But that is still worse anyway. Yup. Looks like the version I posted actually does the best! I'll munge that into the patch series and repost. Thanks for your help!
On 4/10/23 06:07, Matthew Wilcox wrote: > On Wed, Oct 04, 2023 at 12:14:10AM +1000, Greg Ungerer wrote: >> On 3/10/23 06:07, Matthew Wilcox wrote: >>> 00000918 <folio_unlock>: >>> 918: 206f 0004 moveal %sp@(4),%a0 >>> 91c: 7001 moveq #1,%d0 >>> 91e: b190 eorl %d0,%a0@ >>> 920: 2010 movel %a0@,%d0 >>> 922: 4a00 tstb %d0 >>> 924: 6a0a bpls 930 <folio_unlock+0x18> >>> 926: 42a7 clrl %sp@- >>> 928: 2f08 movel %a0,%sp@- >>> 92a: 4eba fafa jsr %pc@(426 <folio_wake_bit>) >>> 92e: 508f addql #8,%sp >>> 930: 4e75 rts > > fwiw, here's what folio_unlock looks like today without any of my > patches: > > 00000746 <folio_unlock>: > 746: 206f 0004 moveal %sp@(4),%a0 > 74a: 43e8 0003 lea %a0@(3),%a1 > 74e: 0891 0000 bclr #0,%a1@ > 752: 2010 movel %a0@,%d0 > 754: 4a00 tstb %d0 > 756: 6a0a bpls 762 <folio_unlock+0x1c> > 758: 42a7 clrl %sp@- > 75a: 2f08 movel %a0,%sp@- > 75c: 4eba fcc8 jsr %pc@(426 <folio_wake_bit>) > 760: 508f addql #8,%sp > 762: 4e75 rts > > Same number of instructions, but today's code has slightly longer insns, > so I'm tempted to take the win? Yes, I reckon so. >>> We could use eori instead of eorl, at least according to table 3-9 on >>> page 3-8: >>> >>> EOR Dy,<ea>x L Source ^ Destination → Destination ISA_A >>> EORI #<data>,Dx L Immediate Data ^ Destination → Destination ISA_A > > Oh. I misread. It only does EORI to a data register; it can't do EORI > to an address. > >> 400413e6 <folio_unlock>: >> 400413e6: 206f 0004 moveal %sp@(4),%a0 >> 400413ea: 2010 movel %a0@,%d0 >> 400413ec: 0a80 0000 0001 eoril #1,%d0 >> 400413f2: 2080 movel %d0,%a0@ >> 400413f4: 2010 movel %a0@,%d0 >> 400413f6: 4a00 tstb %d0 >> 400413f8: 6c0a bges 40041404 <folio_unlock+0x1e> >> 400413fa: 42a7 clrl %sp@- >> 400413fc: 2f08 movel %a0,%sp@- >> 400413fe: 4eba ff30 jsr %pc@(40041330 <folio_wake_bit>) >> 40041402: 508f addql #8,%sp >> 40041404: 4e75 rts >> >> But that is still worse anyway. > > Yup. Looks like the version I posted actually does the best! I'll > munge that into the patch series and repost. Thanks for your help! No worries. Sorry I didn't notice it earlier, but glad it is sorted now. Regards Greg
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h index e984af71df6b..909ebe7cab5d 100644 --- a/arch/m68k/include/asm/bitops.h +++ b/arch/m68k/include/asm/bitops.h @@ -319,6 +319,20 @@ arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr) return test_and_change_bit(nr, addr); } +static inline bool xor_unlock_is_negative_byte(unsigned long mask, + volatile unsigned long *p) +{ + char result; + char *cp = (char *)p + 3; /* m68k is big-endian */ + + __asm__ __volatile__ ("eor.b %1, %2; smi %0" + : "=d" (result) + : "di" (mask), "o" (*cp) + : "memory"); + return result; +} +#define xor_unlock_is_negative_byte xor_unlock_is_negative_byte + /* * The true 68020 and more advanced processors support the "bfffo" * instruction for finding bits. ColdFire and simple 68000 parts
Using EOR to clear the guaranteed-to-be-set lock bit will test the negative flag just like the x86 implementation. This should be more efficient than the generic implementation in filemap.c. It would be better if m68k had __GCC_ASM_FLAG_OUTPUTS__. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- arch/m68k/include/asm/bitops.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+)