Message ID | 1451572003-2440-32-git-send-email-mst@redhat.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Geert Uytterhoeven |
Headers | show |
On Thu, Dec 31, 2015 at 09:09:47PM +0200, Michael S. Tsirkin wrote: > At the moment, xchg on sh only supports 4 and 1 byte values, so using it > from smp_store_mb means attempts to store a 2 byte value using this > macro fail. > > And happens to be exactly what virtio drivers want to do. > > Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > arch/sh/include/asm/barrier.h | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/arch/sh/include/asm/barrier.h b/arch/sh/include/asm/barrier.h > index f887c64..0cc5735 100644 > --- a/arch/sh/include/asm/barrier.h > +++ b/arch/sh/include/asm/barrier.h > @@ -32,7 +32,15 @@ > #define ctrl_barrier() __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop") > #endif > > -#define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0) > +#define __smp_store_mb(var, value) do { \ > + if (sizeof(var) != 4 && sizeof(var) != 1) { \ > + WRITE_ONCE(var, value); \ > + __smp_mb(); \ > + } else { \ > + (void)xchg(&var, value); \ > + } \ > +} while (0) So SH is an orphaned arch, which is also why I did not comment on using xchg() for the UP smp_store_mb() thing. But I really think we should try fixing the xchg() implementation instead of this duct-tape. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 31, 2015 at 09:09:47PM +0200, Michael S. Tsirkin wrote: > At the moment, xchg on sh only supports 4 and 1 byte values, so using it > from smp_store_mb means attempts to store a 2 byte value using this > macro fail. > > And happens to be exactly what virtio drivers want to do. > > Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb. Can you please do this for size 1 as well (i.e. all sizes != 4)? If you check the source, the code for size-1 xchg in sh cmpxchg-llsc.h is completely wrong and operates on a 32-bit object at the address passed to it. This code is presently unused anyway and I plan to submit a patch to remove the size 1 case. Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 05, 2016 at 06:27:35PM -0500, Rich Felker wrote: > On Thu, Dec 31, 2015 at 09:09:47PM +0200, Michael S. Tsirkin wrote: > > At the moment, xchg on sh only supports 4 and 1 byte values, so using it > > from smp_store_mb means attempts to store a 2 byte value using this > > macro fail. > > > > And happens to be exactly what virtio drivers want to do. > > > > Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb. > > Can you please do this for size 1 as well (i.e. all sizes != 4)? If > you check the source, the code for size-1 xchg in sh cmpxchg-llsc.h is > completely wrong and operates on a 32-bit object at the address passed > to it. This code is presently unused anyway and I plan to submit a > patch to remove the size 1 case. > > Rich Ouch. And PeterZ says I should write a 2-byte xchg in asm instead, and Fedora can't even build a full kernel for this arch at the moment :( Peter, what do you think? How about I leave this patch as is for now?
On Wed, Jan 06, 2016 at 01:19:44PM +0200, Michael S. Tsirkin wrote: > On Tue, Jan 05, 2016 at 06:27:35PM -0500, Rich Felker wrote: > > On Thu, Dec 31, 2015 at 09:09:47PM +0200, Michael S. Tsirkin wrote: > > > At the moment, xchg on sh only supports 4 and 1 byte values, so using it > > > from smp_store_mb means attempts to store a 2 byte value using this > > > macro fail. > > > > > > And happens to be exactly what virtio drivers want to do. > > > > > > Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb. > > > > Can you please do this for size 1 as well (i.e. all sizes != 4)? If > > you check the source, the code for size-1 xchg in sh cmpxchg-llsc.h is > > completely wrong and operates on a 32-bit object at the address passed > > to it. This code is presently unused anyway and I plan to submit a > > patch to remove the size 1 case. > > > > Rich > > Ouch. And PeterZ says I should write a 2-byte xchg in asm instead, > and Fedora can't even build a full kernel for this arch at the moment :( Does the kernel.org hosted cross compiler work? > Peter, what do you think? How about I leave this patch as is for now? No, and I object to removing the single byte implementation too. Either remove the full arch or fix xchg() to conform. xchg() should work on all native word sizes, for SH that would be 1,2 and 4 bytes. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 06, 2016 at 12:40:23PM +0100, Peter Zijlstra wrote: > On Wed, Jan 06, 2016 at 01:19:44PM +0200, Michael S. Tsirkin wrote: > > On Tue, Jan 05, 2016 at 06:27:35PM -0500, Rich Felker wrote: > > > On Thu, Dec 31, 2015 at 09:09:47PM +0200, Michael S. Tsirkin wrote: > > > > At the moment, xchg on sh only supports 4 and 1 byte values, so using it > > > > from smp_store_mb means attempts to store a 2 byte value using this > > > > macro fail. > > > > > > > > And happens to be exactly what virtio drivers want to do. > > > > > > > > Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb. > > > > > > Can you please do this for size 1 as well (i.e. all sizes != 4)? If > > > you check the source, the code for size-1 xchg in sh cmpxchg-llsc.h is > > > completely wrong and operates on a 32-bit object at the address passed > > > to it. This code is presently unused anyway and I plan to submit a > > > patch to remove the size 1 case. > > > > > > Rich > > > > Ouch. And PeterZ says I should write a 2-byte xchg in asm instead, > > and Fedora can't even build a full kernel for this arch at the moment :( > > Does the kernel.org hosted cross compiler work? I'll test, thanks for the hint. > > Peter, what do you think? How about I leave this patch as is for now? > > No, and I object to removing the single byte implementation too. Either > remove the full arch or fix xchg() to conform. xchg() should work on all > native word sizes, for SH that would be 1,2 and 4 bytes. Rick, maybe you could explain how is current 1 byte xchg on llsc wrong? It does use 4 byte accesses but IIUC that is all that exists on this architecture.
On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote: > > > Peter, what do you think? How about I leave this patch as is for now? > > > > No, and I object to removing the single byte implementation too. Either > > remove the full arch or fix xchg() to conform. xchg() should work on all > > native word sizes, for SH that would be 1,2 and 4 bytes. > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong? It doesn't seem to preserve the 3 other bytes in the word. > It does use 4 byte accesses but IIUC that is all that exists on > this architecture. Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for example. A store to another portion of the word should make the store-conditional fail and we'll retry the loop. The short versions should however preserve the other bytes in the word. SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte versions. In any case, I'm all for rm -rf arch/sh/, one less arch to worry about is always good, but ISTR some people wanting to resurrect SH: http://old.lwn.net/Articles/647636/ Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and take up an active interest in SH lest someone 'accidentally' nukes it? -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/06/2016 08:32 AM, Peter Zijlstra wrote: > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote: > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte > versions. We added a new cmpxchg() in j-core (smp on sh2 was not previously a thing), but still need to support the old ones. > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about > is always good, but ISTR some people wanting to resurrect SH: > > http://old.lwn.net/Articles/647636/ I note that old architectures in general become interesting again as their patents expire and they're available for reimplementation as open hardware. This tends to be about when the original manufacturer loses interest, and thus people try to delete existing (working) project support before the clones can get up to speed. (I would have thought the presence of working QEMU support would tide us over providing an easy basic regression testing environment, but people keep insisting that's not real and doesn't count. But if we can keep it 99% working until the sh4 patents expire later this year, we can add mmu and have full sh4 in hardware again with BSD VHDL.) > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and > take up an active interest in SH lest someone 'accidentally' nukes it? We have an active interest, we just didn't think anybody would want a MAINTAINERS patch until we had skin in the game, and as you saw from our previous patch the kernel code wasn't remotely acceptable upstream yet. (Rich has been redoing it as device tree.) We've been talking about this offline. (The superh list is cc'd on a bunch of Renesas arm drivers because Renesas, so it has a high noise to signal ratio.) Sato-san agreed to co-maintain but needs somebody else to take point. (I similarly want to assist but don't have the expertise to be the main guy.) That leaves Rich, who is ok with doing it but was trying to finish the device tree port of the jcore board first. That said, if you'd ack a submission, Rich already has my Acked-by line on a maintainers patch (AND one to remove the extra cc's from the sh kernel list, and I acked Chen Gang's syscall addition patch back in https://lkml.org/lkml/2015/6/20/193 but nobody noticed...) Rich? Maintain please. Rob -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 06, 2016 at 09:42:35AM -0600, Rob Landley wrote: > (I would have thought the presence of working QEMU support would tide us > over providing an easy basic regression testing environment, but people > keep insisting that's not real and doesn't count. But if we can keep it > 99% working until the sh4 patents expire later this year, we can add mmu > and have full sh4 in hardware again with BSD VHDL.) I didn't know there was a 'working' qemu for SH. My personal 'complaint' with SH is its lack of maintainer feedback. I do full arch sweeps on semi-regular basis, and while I know in very board terms how a fair number of archs work its impossible to know everything about all 25+ we support. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote: > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote: > > > > Peter, what do you think? How about I leave this patch as is for now? > > > > > > No, and I object to removing the single byte implementation too. Either > > > remove the full arch or fix xchg() to conform. xchg() should work on all > > > native word sizes, for SH that would be 1,2 and 4 bytes. > > > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong? > > It doesn't seem to preserve the 3 other bytes in the word. > > > It does use 4 byte accesses but IIUC that is all that exists on > > this architecture. > > Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for > example. A store to another portion of the word should make the > store-conditional fail and we'll retry the loop. > > The short versions should however preserve the other bytes in the word. Indeed. Also, accesses must be aligned, so the asm needs to round down to an aligned address and perform a correct read-modify-write on it, placing the new byte in the correct offset in the word. Alternatively (my preference) this logic can be impemented in C as a wrapper around the 32-bit cmpxchg. I think this is less error-prone and it can be shared between the multiple sh cmpxchg back-ends, including the new cas.l one we need for J2. > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte > versions. > > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about > is always good, but ISTR some people wanting to resurrect SH: > > http://old.lwn.net/Articles/647636/ > > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and > take up an active interest in SH lest someone 'accidentally' nukes it? We're in the process of preparing such a proposal right now. That current intent is that Sato-san and I will co-maintain arch/sh. We'll include more details about motivation, proposed development direction, existing work to be merged, etc. in that proposal. Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 6, 2016 at 4:42 PM, Rob Landley <rob@landley.net> wrote: > That said, if you'd ack a submission, Rich already has my Acked-by line > on a maintainers patch (AND one to remove the extra cc's from the sh > kernel list, and I acked Chen Gang's syscall addition patch back in > https://lkml.org/lkml/2015/6/20/193 but nobody noticed...) If you want patches for orphan architectures to be picked up, you should CC Andrew Morton... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/06/2016 10:57 AM, Peter Zijlstra wrote: > On Wed, Jan 06, 2016 at 09:42:35AM -0600, Rob Landley wrote: >> (I would have thought the presence of working QEMU support would tide us >> over providing an easy basic regression testing environment, but people >> keep insisting that's not real and doesn't count. But if we can keep it >> 99% working until the sh4 patents expire later this year, we can add mmu >> and have full sh4 in hardware again with BSD VHDL.) > > I didn't know there was a 'working' qemu for SH. Yes, for several years now? https://lists.gnu.org/archive/html/qemu-devel/2010-03/msg00976.html I try to build bootable images with each new kernel, although I'm a few versions behind at the moment (this is 4.1 I think?): wget http://landley.net/aboriginal/bin/system-image-sh4.tar.gz tar xvzf system-image-sh4.tar.gz cd system-image-sh4 ./run-emulator.sh There's an sh2eb one in there too, but you need a $50 FPGA board to run it (Numato Mimas v2, setup walkthrough is at http://nommu.org/jcore). I keep meaning to poke at qemu and get their r2d board emulation to give me more than 64 megs of memory so I can do native compiles. (I have a native toolchain but building much more than "hello world" requires setting up a swap file because the board emulation only gives me one virtual disk. None of the other architectures need that...) I'd _also_ like to get proper sh2 support into qemu (sh2 code runs under sh4 but still), but the sh4 patents expire later this year and sometime after that we want to add an MMU to the VHDL, so... (We still want the nommu version because hard realtime is actually easier to verify without page faults, and the big product needs nanosecond accurate timestamps on stuff...) > My personal 'complaint' with SH is its lack of maintainer feedback. I do > full arch sweeps on semi-regular basis, and while I know in very board > terms how a fair number of archs work its impossible to know everything > about all 25+ we support. We (the j-core guys) have wanted to take over arch/sh maintainership for a while, we've just been trying to get the board we're working on in position to be upstreamed first. The feedback on my craptacular first effort to chip off a chunk that other people could at least reproduce against a then-current kernel was "ew" and "redo it all as device tree". So we went away again to work on that... Meanwhile all $DAYJOB's in-house resources (at se-instruments.com) have been tied up making SMP work for a product. (Yes, sh2 SMP. A NOMMU smp system. There were some teething troubles, but it's working now. Alas, not on the above $50 FPGA because that's only got an LX9 FPGA which one SOC instance uses like 2/3 of the gates in. We're doing the SMP stuff in LX45 boards, which are crazy expensive.) I note I just sent a Numato board to the buildroot maintainer so I can walk him through adding jcore support to buildroot. And I got toybox working nommu, and Rich added sh2 support to musl-libc... There has been activity on this arch, just not on this list due to the noise from a bunch of different Renesas arm systems and whatever "arm/shmobile" is. ("Not superh or jcore compatible", that's all I know...) Rob -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 06, 2016 at 01:23:50PM -0500, Rich Felker wrote: > On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote: > > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote: > > > > > Peter, what do you think? How about I leave this patch as is for now? > > > > > > > > No, and I object to removing the single byte implementation too. Either > > > > remove the full arch or fix xchg() to conform. xchg() should work on all > > > > native word sizes, for SH that would be 1,2 and 4 bytes. > > > > > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong? > > > > It doesn't seem to preserve the 3 other bytes in the word. > > > > > It does use 4 byte accesses but IIUC that is all that exists on > > > this architecture. > > > > Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for > > example. A store to another portion of the word should make the > > store-conditional fail and we'll retry the loop. > > > > The short versions should however preserve the other bytes in the word. > > Indeed. Also, accesses must be aligned, so the asm needs to round down > to an aligned address and perform a correct read-modify-write on it, > placing the new byte in the correct offset in the word. > > Alternatively (my preference) this logic can be impemented in C as a > wrapper around the 32-bit cmpxchg. I think this is less error-prone > and it can be shared between the multiple sh cmpxchg back-ends, > including the new cas.l one we need for J2. > > > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte > > versions. > > > > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about > > is always good, but ISTR some people wanting to resurrect SH: > > > > http://old.lwn.net/Articles/647636/ > > > > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and > > take up an active interest in SH lest someone 'accidentally' nukes it? > > We're in the process of preparing such a proposal right now. That > current intent is that Sato-san and I will co-maintain arch/sh. We'll > include more details about motivation, proposed development direction, > existing work to be merged, etc. in that proposal. > > Rich Well I'd like to be able to make progress with generic arch cleanups meanwhile. Could you quickly write a version of 1 and 2 byte xchg that works so I can include it? Thanks!
On Wed, Jan 06, 2016 at 01:23:50PM -0500, Rich Felker wrote: > On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote: > > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote: > > > > > Peter, what do you think? How about I leave this patch as is for now? > > > > > > > > No, and I object to removing the single byte implementation too. Either > > > > remove the full arch or fix xchg() to conform. xchg() should work on all > > > > native word sizes, for SH that would be 1,2 and 4 bytes. > > > > > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong? > > > > It doesn't seem to preserve the 3 other bytes in the word. > > > > > It does use 4 byte accesses but IIUC that is all that exists on > > > this architecture. > > > > Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for > > example. A store to another portion of the word should make the > > store-conditional fail and we'll retry the loop. > > > > The short versions should however preserve the other bytes in the word. > > Indeed. Also, accesses must be aligned, so the asm needs to round down > to an aligned address and perform a correct read-modify-write on it, > placing the new byte in the correct offset in the word. > > Alternatively (my preference) this logic can be impemented in C as a > wrapper around the 32-bit cmpxchg. I think this is less error-prone > and it can be shared between the multiple sh cmpxchg back-ends, > including the new cas.l one we need for J2. Sounds much more reasonable. > > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte > > versions. > > > > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about > > is always good, but ISTR some people wanting to resurrect SH: > > > > http://old.lwn.net/Articles/647636/ > > > > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and > > take up an active interest in SH lest someone 'accidentally' nukes it? > > We're in the process of preparing such a proposal right now. That > current intent is that Sato-san and I will co-maintain arch/sh. We'll > include more details about motivation, proposed development direction, > existing work to be merged, etc. in that proposal. > > Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 06, 2016 at 10:23:12PM +0200, Michael S. Tsirkin wrote: > On Wed, Jan 06, 2016 at 01:23:50PM -0500, Rich Felker wrote: > > On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote: > > > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote: > > > > > > Peter, what do you think? How about I leave this patch as is for now? > > > > > > > > > > No, and I object to removing the single byte implementation too. Either > > > > > remove the full arch or fix xchg() to conform. xchg() should work on all > > > > > native word sizes, for SH that would be 1,2 and 4 bytes. > > > > > > > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong? > > > > > > It doesn't seem to preserve the 3 other bytes in the word. > > > > > > > It does use 4 byte accesses but IIUC that is all that exists on > > > > this architecture. > > > > > > Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for > > > example. A store to another portion of the word should make the > > > store-conditional fail and we'll retry the loop. > > > > > > The short versions should however preserve the other bytes in the word. > > > > Indeed. Also, accesses must be aligned, so the asm needs to round down > > to an aligned address and perform a correct read-modify-write on it, > > placing the new byte in the correct offset in the word. > > > > Alternatively (my preference) this logic can be impemented in C as a > > wrapper around the 32-bit cmpxchg. I think this is less error-prone > > and it can be shared between the multiple sh cmpxchg back-ends, > > including the new cas.l one we need for J2. > > > > > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte > > > versions. > > > > > > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about > > > is always good, but ISTR some people wanting to resurrect SH: > > > > > > http://old.lwn.net/Articles/647636/ > > > > > > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and > > > take up an active interest in SH lest someone 'accidentally' nukes it? > > > > We're in the process of preparing such a proposal right now. That > > current intent is that Sato-san and I will co-maintain arch/sh. We'll > > include more details about motivation, proposed development direction, > > existing work to be merged, etc. in that proposal. > > Well I'd like to be able to make progress with generic > arch cleanups meanwhile. > > Could you quickly write a version of 1 and 2 byte xchg that > works so I can include it? Here are quick, untested generic ones: static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val) { u32 old; unsigned long offset = (unsigned long)m & 3; volatile u32 *w = (volatile u32 *)(m - offset); union { u32 w; u8 b[4]; } u; do { old = u.w = *w; result = w.b[offset]; w.b[offset] = val; } while (cmpxchg(w, old, u.w) != old); return result; } static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val) { u32 old; unsigned long result; unsigned long offset = ((unsigned long)m & 3) >> 1; volatile u32 *w = (volatile u32 *)(m - offset); union { u32 w; u16 h[2]; } u; do { old = u.w = *w; result = w.h[offset]; w.h[offset] = val; } while (cmpxchg(w, old, u.w) != old); return result; } It would be nice to have these in asm-generic for archs which don't define their own versions rather than having cruft like this repeated per-arch. Strictly speaking, the volatile u32 used to access the 32-bit word containing the u8 or u16 should be __attribute__((__may_alias__)) too. Is there an existing kernel type for a "may_alias u32" or should it perhaps be added? Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 06, 2016 at 06:53:01PM -0500, Rich Felker wrote: > It would be nice to have these in asm-generic for archs which don't > define their own versions rather than having cruft like this repeated > per-arch. Maybe, but I'm not sure how many archs would indeed suffer this problem, so far I'm only aware of Alpha and SH that do not have short atomic ops. > Strictly speaking, the volatile u32 used to access the > 32-bit word containing the u8 or u16 should be > __attribute__((__may_alias__)) too. Is there an existing kernel type > for a "may_alias u32" or should it perhaps be added? The kernel does -fno-strict-aliasing because the C aliasing rules are crap (TM) :-), so I suspect we do not need the alias attribute here. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 06, 2016 at 06:53:01PM -0500, Rich Felker wrote: > On Wed, Jan 06, 2016 at 10:23:12PM +0200, Michael S. Tsirkin wrote: > > On Wed, Jan 06, 2016 at 01:23:50PM -0500, Rich Felker wrote: > > > On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote: > > > > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote: > > > > > > > Peter, what do you think? How about I leave this patch as is for now? > > > > > > > > > > > > No, and I object to removing the single byte implementation too. Either > > > > > > remove the full arch or fix xchg() to conform. xchg() should work on all > > > > > > native word sizes, for SH that would be 1,2 and 4 bytes. > > > > > > > > > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong? > > > > > > > > It doesn't seem to preserve the 3 other bytes in the word. > > > > > > > > > It does use 4 byte accesses but IIUC that is all that exists on > > > > > this architecture. > > > > > > > > Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for > > > > example. A store to another portion of the word should make the > > > > store-conditional fail and we'll retry the loop. > > > > > > > > The short versions should however preserve the other bytes in the word. > > > > > > Indeed. Also, accesses must be aligned, so the asm needs to round down > > > to an aligned address and perform a correct read-modify-write on it, > > > placing the new byte in the correct offset in the word. > > > > > > Alternatively (my preference) this logic can be impemented in C as a > > > wrapper around the 32-bit cmpxchg. I think this is less error-prone > > > and it can be shared between the multiple sh cmpxchg back-ends, > > > including the new cas.l one we need for J2. > > > > > > > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte > > > > versions. > > > > > > > > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about > > > > is always good, but ISTR some people wanting to resurrect SH: > > > > > > > > http://old.lwn.net/Articles/647636/ > > > > > > > > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and > > > > take up an active interest in SH lest someone 'accidentally' nukes it? > > > > > > We're in the process of preparing such a proposal right now. That > > > current intent is that Sato-san and I will co-maintain arch/sh. We'll > > > include more details about motivation, proposed development direction, > > > existing work to be merged, etc. in that proposal. > > > > Well I'd like to be able to make progress with generic > > arch cleanups meanwhile. > > > > Could you quickly write a version of 1 and 2 byte xchg that > > works so I can include it? > > Here are quick, untested generic ones: > > static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val) > { > u32 old; > unsigned long offset = (unsigned long)m & 3; > volatile u32 *w = (volatile u32 *)(m - offset); > union { u32 w; u8 b[4]; } u; > do { > old = u.w = *w; > result = w.b[offset]; > w.b[offset] = val; > } while (cmpxchg(w, old, u.w) != old); > return result; > } > > static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val) > { > u32 old; > unsigned long result; > unsigned long offset = ((unsigned long)m & 3) >> 1; > volatile u32 *w = (volatile u32 *)(m - offset); > union { u32 w; u16 h[2]; } u; > do { > old = u.w = *w; > result = w.h[offset]; > w.h[offset] = val; > } while (cmpxchg(w, old, u.w) != old); > return result; > } > > It would be nice to have these in asm-generic for archs which don't > define their own versions rather than having cruft like this repeated > per-arch. Strictly speaking, the volatile u32 used to access the > 32-bit word containing the u8 or u16 should be > __attribute__((__may_alias__)) too. Is there an existing kernel type > for a "may_alias u32" or should it perhaps be added? > > Rich BTW this seems suboptimal for grb and irq variants which apparently can do things correctly. -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 07, 2016 at 02:37:49PM +0100, Peter Zijlstra wrote: > On Wed, Jan 06, 2016 at 06:53:01PM -0500, Rich Felker wrote: > > It would be nice to have these in asm-generic for archs which don't > > define their own versions rather than having cruft like this repeated > > per-arch. > > Maybe, but I'm not sure how many archs would indeed suffer this problem, > so far I'm only aware of Alpha and SH that do not have short atomic ops. Apparently original armv6 (non-k) lacked u8 and u16 variants of ldrex/strex. I'm pretty sure or1k also lacks them, and mips, microblaze, and some powerpc versions might too. Not sure about risc-v. > > Strictly speaking, the volatile u32 used to access the > > 32-bit word containing the u8 or u16 should be > > __attribute__((__may_alias__)) too. Is there an existing kernel type > > for a "may_alias u32" or should it perhaps be added? > > The kernel does -fno-strict-aliasing because the C aliasing rules are > crap (TM) :-), so I suspect we do not need the alias attribute here. I suspect working on the kernel I'm going to have to get used to getting "corrected" for writing proper C... ;-) Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 07, 2016 at 07:48:08PM +0200, Michael S. Tsirkin wrote: > On Wed, Jan 06, 2016 at 06:53:01PM -0500, Rich Felker wrote: > > On Wed, Jan 06, 2016 at 10:23:12PM +0200, Michael S. Tsirkin wrote: > > > On Wed, Jan 06, 2016 at 01:23:50PM -0500, Rich Felker wrote: > > > > On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote: > > > > > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote: > > > > > > > > Peter, what do you think? How about I leave this patch as is for now? > > > > > > > > > > > > > > No, and I object to removing the single byte implementation too. Either > > > > > > > remove the full arch or fix xchg() to conform. xchg() should work on all > > > > > > > native word sizes, for SH that would be 1,2 and 4 bytes. > > > > > > > > > > > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong? > > > > > > > > > > It doesn't seem to preserve the 3 other bytes in the word. > > > > > > > > > > > It does use 4 byte accesses but IIUC that is all that exists on > > > > > > this architecture. > > > > > > > > > > Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for > > > > > example. A store to another portion of the word should make the > > > > > store-conditional fail and we'll retry the loop. > > > > > > > > > > The short versions should however preserve the other bytes in the word. > > > > > > > > Indeed. Also, accesses must be aligned, so the asm needs to round down > > > > to an aligned address and perform a correct read-modify-write on it, > > > > placing the new byte in the correct offset in the word. > > > > > > > > Alternatively (my preference) this logic can be impemented in C as a > > > > wrapper around the 32-bit cmpxchg. I think this is less error-prone > > > > and it can be shared between the multiple sh cmpxchg back-ends, > > > > including the new cas.l one we need for J2. > > > > > > > > > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte > > > > > versions. > > > > > > > > > > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about > > > > > is always good, but ISTR some people wanting to resurrect SH: > > > > > > > > > > http://old.lwn.net/Articles/647636/ > > > > > > > > > > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and > > > > > take up an active interest in SH lest someone 'accidentally' nukes it? > > > > > > > > We're in the process of preparing such a proposal right now. That > > > > current intent is that Sato-san and I will co-maintain arch/sh. We'll > > > > include more details about motivation, proposed development direction, > > > > existing work to be merged, etc. in that proposal. > > > > > > Well I'd like to be able to make progress with generic > > > arch cleanups meanwhile. > > > > > > Could you quickly write a version of 1 and 2 byte xchg that > > > works so I can include it? > > > > Here are quick, untested generic ones: > > > > static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val) > > { > > u32 old; > > unsigned long offset = (unsigned long)m & 3; > > volatile u32 *w = (volatile u32 *)(m - offset); > > union { u32 w; u8 b[4]; } u; > > do { > > old = u.w = *w; > > result = w.b[offset]; > > w.b[offset] = val; > > } while (cmpxchg(w, old, u.w) != old); > > return result; > > } > > > > static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val) > > { > > u32 old; > > unsigned long result; > > unsigned long offset = ((unsigned long)m & 3) >> 1; > > volatile u32 *w = (volatile u32 *)(m - offset); > > union { u32 w; u16 h[2]; } u; > > do { > > old = u.w = *w; > > result = w.h[offset]; > > w.h[offset] = val; > > } while (cmpxchg(w, old, u.w) != old); > > return result; > > } > > > > It would be nice to have these in asm-generic for archs which don't > > define their own versions rather than having cruft like this repeated > > per-arch. Strictly speaking, the volatile u32 used to access the > > 32-bit word containing the u8 or u16 should be > > __attribute__((__may_alias__)) too. Is there an existing kernel type > > for a "may_alias u32" or should it perhaps be added? > > > > Rich > > BTW this seems suboptimal for grb and irq variants which apparently > can do things correctly. In principle I agree, but u8/u16 xchg is mostly unused (completely unused in my builds) and unlikely to matter to performance. Also, the irq variant is only for the original sh2 which is not even produced anymore afaik. Our reimplementation of the sh2 ISA, the J2, has a cas.l instruction that will be used instead because it supports SMP where interrupt masking is insufficient to achieve atomicity. Rich -- To unsubscribe from this list: send the line "unsubscribe linux-sh" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/sh/include/asm/barrier.h b/arch/sh/include/asm/barrier.h index f887c64..0cc5735 100644 --- a/arch/sh/include/asm/barrier.h +++ b/arch/sh/include/asm/barrier.h @@ -32,7 +32,15 @@ #define ctrl_barrier() __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop") #endif -#define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0) +#define __smp_store_mb(var, value) do { \ + if (sizeof(var) != 4 && sizeof(var) != 1) { \ + WRITE_ONCE(var, value); \ + __smp_mb(); \ + } else { \ + (void)xchg(&var, value); \ + } \ +} while (0) + #define smp_store_mb(var, value) __smp_store_mb(var, value) #include <asm-generic/barrier.h>
At the moment, xchg on sh only supports 4 and 1 byte values, so using it from smp_store_mb means attempts to store a 2 byte value using this macro fail. And happens to be exactly what virtio drivers want to do. Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- arch/sh/include/asm/barrier.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)