diff mbox

[v2,31/32] sh: support a 2-byte smp_store_mb

Message ID 1451572003-2440-32-git-send-email-mst@redhat.com (mailing list archive)
State Awaiting Upstream
Delegated to: Geert Uytterhoeven
Headers show

Commit Message

Michael S. Tsirkin Dec. 31, 2015, 7:09 p.m. UTC
At the moment, xchg on sh only supports 4 and 1 byte values, so using it
from smp_store_mb means attempts to store a 2 byte value using this
macro fail.

And happens to be exactly what virtio drivers want to do.

Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/sh/include/asm/barrier.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra Jan. 4, 2016, 2:05 p.m. UTC | #1
On Thu, Dec 31, 2015 at 09:09:47PM +0200, Michael S. Tsirkin wrote:
> At the moment, xchg on sh only supports 4 and 1 byte values, so using it
> from smp_store_mb means attempts to store a 2 byte value using this
> macro fail.
> 
> And happens to be exactly what virtio drivers want to do.
> 
> Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  arch/sh/include/asm/barrier.h | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/sh/include/asm/barrier.h b/arch/sh/include/asm/barrier.h
> index f887c64..0cc5735 100644
> --- a/arch/sh/include/asm/barrier.h
> +++ b/arch/sh/include/asm/barrier.h
> @@ -32,7 +32,15 @@
>  #define ctrl_barrier()	__asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop")
>  #endif
>  
> -#define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
> +#define __smp_store_mb(var, value) do { \
> +	if (sizeof(var) != 4 && sizeof(var) != 1) { \
> +		 WRITE_ONCE(var, value); \
> +		__smp_mb(); \
> +	} else { \
> +		(void)xchg(&var, value);  \
> +	} \
> +} while (0)

So SH is an orphaned arch, which is also why I did not comment on using
xchg() for the UP smp_store_mb() thing.

But I really think we should try fixing the xchg() implementation
instead of this duct-tape.
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rich Felker Jan. 5, 2016, 11:27 p.m. UTC | #2
On Thu, Dec 31, 2015 at 09:09:47PM +0200, Michael S. Tsirkin wrote:
> At the moment, xchg on sh only supports 4 and 1 byte values, so using it
> from smp_store_mb means attempts to store a 2 byte value using this
> macro fail.
> 
> And happens to be exactly what virtio drivers want to do.
> 
> Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb.

Can you please do this for size 1 as well (i.e. all sizes != 4)? If
you check the source, the code for size-1 xchg in sh cmpxchg-llsc.h is
completely wrong and operates on a 32-bit object at the address passed
to it. This code is presently unused anyway and I plan to submit a
patch to remove the size 1 case.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Jan. 6, 2016, 11:19 a.m. UTC | #3
On Tue, Jan 05, 2016 at 06:27:35PM -0500, Rich Felker wrote:
> On Thu, Dec 31, 2015 at 09:09:47PM +0200, Michael S. Tsirkin wrote:
> > At the moment, xchg on sh only supports 4 and 1 byte values, so using it
> > from smp_store_mb means attempts to store a 2 byte value using this
> > macro fail.
> > 
> > And happens to be exactly what virtio drivers want to do.
> > 
> > Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb.
> 
> Can you please do this for size 1 as well (i.e. all sizes != 4)? If
> you check the source, the code for size-1 xchg in sh cmpxchg-llsc.h is
> completely wrong and operates on a 32-bit object at the address passed
> to it. This code is presently unused anyway and I plan to submit a
> patch to remove the size 1 case.
> 
> Rich

Ouch. And PeterZ says I should write a 2-byte xchg in asm instead,
and Fedora can't even build a full kernel for this arch at the moment :(

Peter, what do you think? How about I leave this patch as is for now?
Peter Zijlstra Jan. 6, 2016, 11:40 a.m. UTC | #4
On Wed, Jan 06, 2016 at 01:19:44PM +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 05, 2016 at 06:27:35PM -0500, Rich Felker wrote:
> > On Thu, Dec 31, 2015 at 09:09:47PM +0200, Michael S. Tsirkin wrote:
> > > At the moment, xchg on sh only supports 4 and 1 byte values, so using it
> > > from smp_store_mb means attempts to store a 2 byte value using this
> > > macro fail.
> > > 
> > > And happens to be exactly what virtio drivers want to do.
> > > 
> > > Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb.
> > 
> > Can you please do this for size 1 as well (i.e. all sizes != 4)? If
> > you check the source, the code for size-1 xchg in sh cmpxchg-llsc.h is
> > completely wrong and operates on a 32-bit object at the address passed
> > to it. This code is presently unused anyway and I plan to submit a
> > patch to remove the size 1 case.
> > 
> > Rich
> 
> Ouch. And PeterZ says I should write a 2-byte xchg in asm instead,
> and Fedora can't even build a full kernel for this arch at the moment :(

Does the kernel.org hosted cross compiler work?

> Peter, what do you think? How about I leave this patch as is for now?

No, and I object to removing the single byte implementation too. Either
remove the full arch or fix xchg() to conform. xchg() should work on all
native word sizes, for SH that would be 1,2 and 4 bytes.
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Jan. 6, 2016, 11:52 a.m. UTC | #5
On Wed, Jan 06, 2016 at 12:40:23PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 06, 2016 at 01:19:44PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Jan 05, 2016 at 06:27:35PM -0500, Rich Felker wrote:
> > > On Thu, Dec 31, 2015 at 09:09:47PM +0200, Michael S. Tsirkin wrote:
> > > > At the moment, xchg on sh only supports 4 and 1 byte values, so using it
> > > > from smp_store_mb means attempts to store a 2 byte value using this
> > > > macro fail.
> > > > 
> > > > And happens to be exactly what virtio drivers want to do.
> > > > 
> > > > Check size and fall back to a slower, but safe, WRITE_ONCE+smp_mb.
> > > 
> > > Can you please do this for size 1 as well (i.e. all sizes != 4)? If
> > > you check the source, the code for size-1 xchg in sh cmpxchg-llsc.h is
> > > completely wrong and operates on a 32-bit object at the address passed
> > > to it. This code is presently unused anyway and I plan to submit a
> > > patch to remove the size 1 case.
> > > 
> > > Rich
> > 
> > Ouch. And PeterZ says I should write a 2-byte xchg in asm instead,
> > and Fedora can't even build a full kernel for this arch at the moment :(
> 
> Does the kernel.org hosted cross compiler work?

I'll test, thanks for the hint.

> > Peter, what do you think? How about I leave this patch as is for now?
> 
> No, and I object to removing the single byte implementation too. Either
> remove the full arch or fix xchg() to conform. xchg() should work on all
> native word sizes, for SH that would be 1,2 and 4 bytes.

Rick, maybe you could explain how is current 1 byte xchg on llsc wrong?

It does use 4 byte accesses but IIUC that is all that exists on
this architecture.
Peter Zijlstra Jan. 6, 2016, 2:32 p.m. UTC | #6
On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote:
> > > Peter, what do you think? How about I leave this patch as is for now?
> > 
> > No, and I object to removing the single byte implementation too. Either
> > remove the full arch or fix xchg() to conform. xchg() should work on all
> > native word sizes, for SH that would be 1,2 and 4 bytes.
> 
> Rick, maybe you could explain how is current 1 byte xchg on llsc wrong?

It doesn't seem to preserve the 3 other bytes in the word.

> It does use 4 byte accesses but IIUC that is all that exists on
> this architecture.

Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for
example. A store to another portion of the word should make the
store-conditional fail and we'll retry the loop.

The short versions should however preserve the other bytes in the word.

SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte
versions.

In any case, I'm all for rm -rf arch/sh/, one less arch to worry about
is always good, but ISTR some people wanting to resurrect SH:

  http://old.lwn.net/Articles/647636/

Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and
take up an active interest in SH lest someone 'accidentally' nukes it?
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Landley Jan. 6, 2016, 3:42 p.m. UTC | #7
On 01/06/2016 08:32 AM, Peter Zijlstra wrote:
> On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote:
> SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte
> versions.

We added a new cmpxchg() in j-core (smp on sh2 was not previously a
thing), but still need to support the old ones.

> In any case, I'm all for rm -rf arch/sh/, one less arch to worry about
> is always good, but ISTR some people wanting to resurrect SH:
> 
>   http://old.lwn.net/Articles/647636/

I note that old architectures in general become interesting again as
their patents expire and they're available for reimplementation as open
hardware. This tends to be about when the original manufacturer loses
interest, and thus people try to delete existing (working) project
support before the clones can get up to speed.

(I would have thought the presence of working QEMU support would tide us
over providing an easy basic regression testing environment, but people
keep insisting that's not real and doesn't count. But if we can keep it
99% working until the sh4 patents expire later this year, we can add mmu
and have full sh4 in hardware again with BSD VHDL.)

> Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and
> take up an active interest in SH lest someone 'accidentally' nukes it?

We have an active interest, we just didn't think anybody would want a
MAINTAINERS patch until we had skin in the game, and as you saw from our
previous patch the kernel code wasn't remotely acceptable upstream yet.
(Rich has been redoing it as device tree.)

We've been talking about this offline. (The superh list is cc'd on a
bunch of Renesas arm drivers because Renesas, so it has a high noise to
signal ratio.)

Sato-san agreed to co-maintain but needs somebody else to take point. (I
similarly want to assist but don't have the expertise to be the main
guy.) That leaves Rich, who is ok with doing it but was trying to finish
the device tree port of the jcore board first.

That said, if you'd ack a submission, Rich already has my Acked-by line
on a maintainers patch (AND one to remove the extra cc's from the sh
kernel list, and I acked Chen Gang's syscall addition patch back in
https://lkml.org/lkml/2015/6/20/193 but nobody noticed...)

Rich? Maintain please.

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Jan. 6, 2016, 4:57 p.m. UTC | #8
On Wed, Jan 06, 2016 at 09:42:35AM -0600, Rob Landley wrote:
> (I would have thought the presence of working QEMU support would tide us
> over providing an easy basic regression testing environment, but people
> keep insisting that's not real and doesn't count. But if we can keep it
> 99% working until the sh4 patents expire later this year, we can add mmu
> and have full sh4 in hardware again with BSD VHDL.)

I didn't know there was a 'working' qemu for SH.

My personal 'complaint' with SH is its lack of maintainer feedback. I do
full arch sweeps on semi-regular basis, and while I know in very board
terms how a fair number of archs work its impossible to know everything
about all 25+ we support.


--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rich Felker Jan. 6, 2016, 6:23 p.m. UTC | #9
On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote:
> > > > Peter, what do you think? How about I leave this patch as is for now?
> > > 
> > > No, and I object to removing the single byte implementation too. Either
> > > remove the full arch or fix xchg() to conform. xchg() should work on all
> > > native word sizes, for SH that would be 1,2 and 4 bytes.
> > 
> > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong?
> 
> It doesn't seem to preserve the 3 other bytes in the word.
> 
> > It does use 4 byte accesses but IIUC that is all that exists on
> > this architecture.
> 
> Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for
> example. A store to another portion of the word should make the
> store-conditional fail and we'll retry the loop.
> 
> The short versions should however preserve the other bytes in the word.

Indeed. Also, accesses must be aligned, so the asm needs to round down
to an aligned address and perform a correct read-modify-write on it,
placing the new byte in the correct offset in the word.

Alternatively (my preference) this logic can be impemented in C as a
wrapper around the 32-bit cmpxchg. I think this is less error-prone
and it can be shared between the multiple sh cmpxchg back-ends,
including the new cas.l one we need for J2.

> SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte
> versions.
> 
> In any case, I'm all for rm -rf arch/sh/, one less arch to worry about
> is always good, but ISTR some people wanting to resurrect SH:
> 
>   http://old.lwn.net/Articles/647636/
> 
> Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and
> take up an active interest in SH lest someone 'accidentally' nukes it?

We're in the process of preparing such a proposal right now. That
current intent is that Sato-san and I will co-maintain arch/sh. We'll
include more details about motivation, proposed development direction,
existing work to be merged, etc. in that proposal.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Geert Uytterhoeven Jan. 6, 2016, 6:57 p.m. UTC | #10
On Wed, Jan 6, 2016 at 4:42 PM, Rob Landley <rob@landley.net> wrote:
> That said, if you'd ack a submission, Rich already has my Acked-by line
> on a maintainers patch (AND one to remove the extra cc's from the sh
> kernel list, and I acked Chen Gang's syscall addition patch back in
> https://lkml.org/lkml/2015/6/20/193 but nobody noticed...)

If you want patches for orphan architectures to be picked up, you should
CC Andrew Morton...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Landley Jan. 6, 2016, 8:21 p.m. UTC | #11
On 01/06/2016 10:57 AM, Peter Zijlstra wrote:
> On Wed, Jan 06, 2016 at 09:42:35AM -0600, Rob Landley wrote:
>> (I would have thought the presence of working QEMU support would tide us
>> over providing an easy basic regression testing environment, but people
>> keep insisting that's not real and doesn't count. But if we can keep it
>> 99% working until the sh4 patents expire later this year, we can add mmu
>> and have full sh4 in hardware again with BSD VHDL.)
> 
> I didn't know there was a 'working' qemu for SH.

Yes, for several years now?

  https://lists.gnu.org/archive/html/qemu-devel/2010-03/msg00976.html

I try to build bootable images with each new kernel, although I'm a few
versions behind at the moment (this is 4.1 I think?):

  wget http://landley.net/aboriginal/bin/system-image-sh4.tar.gz
  tar xvzf system-image-sh4.tar.gz
  cd system-image-sh4
  ./run-emulator.sh

There's an sh2eb one in there too, but you need a $50 FPGA board to run
it (Numato Mimas v2, setup walkthrough is at http://nommu.org/jcore).

I keep meaning to poke at qemu and get their r2d board emulation to give
me more than 64 megs of memory so I can do native compiles. (I have a
native toolchain but building much more than "hello world" requires
setting up a swap file because the board emulation only gives me one
virtual disk. None of the other architectures need that...)

I'd _also_ like to get proper sh2 support into qemu (sh2 code runs under
sh4 but still), but the sh4 patents expire later this year and sometime
after that we want to add an MMU to the VHDL, so...

(We still want the nommu version because hard realtime is actually
easier to verify without page faults, and the big product needs
nanosecond accurate timestamps on stuff...)

> My personal 'complaint' with SH is its lack of maintainer feedback. I do
> full arch sweeps on semi-regular basis, and while I know in very board
> terms how a fair number of archs work its impossible to know everything
> about all 25+ we support.

We (the j-core guys) have wanted to take over arch/sh maintainership for
a while, we've just been trying to get the board we're working on in
position to be upstreamed first. The feedback on my craptacular first
effort to chip off a chunk that other people could at least reproduce
against a then-current kernel was "ew" and "redo it all as device tree".
So we went away again to work on that...

Meanwhile all $DAYJOB's in-house resources (at se-instruments.com) have
been tied up making SMP work for a product. (Yes, sh2 SMP. A NOMMU smp
system. There were some teething troubles, but it's working now. Alas,
not on the above $50 FPGA because that's only got an LX9 FPGA which one
SOC instance uses like 2/3 of the gates in. We're doing the SMP stuff in
LX45 boards, which are crazy expensive.)

I note I just sent a Numato board to the buildroot maintainer so I can
walk him through adding jcore support to buildroot. And I got toybox
working nommu, and Rich added sh2 support to musl-libc... There has been
activity on this arch, just not on this list due to the noise from a
bunch of different Renesas arm systems and whatever "arm/shmobile" is.
("Not superh or jcore compatible", that's all I know...)

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Jan. 6, 2016, 8:23 p.m. UTC | #12
On Wed, Jan 06, 2016 at 01:23:50PM -0500, Rich Felker wrote:
> On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote:
> > > > > Peter, what do you think? How about I leave this patch as is for now?
> > > > 
> > > > No, and I object to removing the single byte implementation too. Either
> > > > remove the full arch or fix xchg() to conform. xchg() should work on all
> > > > native word sizes, for SH that would be 1,2 and 4 bytes.
> > > 
> > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong?
> > 
> > It doesn't seem to preserve the 3 other bytes in the word.
> > 
> > > It does use 4 byte accesses but IIUC that is all that exists on
> > > this architecture.
> > 
> > Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for
> > example. A store to another portion of the word should make the
> > store-conditional fail and we'll retry the loop.
> > 
> > The short versions should however preserve the other bytes in the word.
> 
> Indeed. Also, accesses must be aligned, so the asm needs to round down
> to an aligned address and perform a correct read-modify-write on it,
> placing the new byte in the correct offset in the word.
> 
> Alternatively (my preference) this logic can be impemented in C as a
> wrapper around the 32-bit cmpxchg. I think this is less error-prone
> and it can be shared between the multiple sh cmpxchg back-ends,
> including the new cas.l one we need for J2.
> 
> > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte
> > versions.
> > 
> > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about
> > is always good, but ISTR some people wanting to resurrect SH:
> > 
> >   http://old.lwn.net/Articles/647636/
> > 
> > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and
> > take up an active interest in SH lest someone 'accidentally' nukes it?
> 
> We're in the process of preparing such a proposal right now. That
> current intent is that Sato-san and I will co-maintain arch/sh. We'll
> include more details about motivation, proposed development direction,
> existing work to be merged, etc. in that proposal.
> 
> Rich

Well I'd like to be able to make progress with generic
arch cleanups meanwhile.

Could you quickly write a version of 1 and 2 byte xchg that
works so I can include it?

Thanks!
Michael S. Tsirkin Jan. 6, 2016, 10:14 p.m. UTC | #13
On Wed, Jan 06, 2016 at 01:23:50PM -0500, Rich Felker wrote:
> On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote:
> > > > > Peter, what do you think? How about I leave this patch as is for now?
> > > > 
> > > > No, and I object to removing the single byte implementation too. Either
> > > > remove the full arch or fix xchg() to conform. xchg() should work on all
> > > > native word sizes, for SH that would be 1,2 and 4 bytes.
> > > 
> > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong?
> > 
> > It doesn't seem to preserve the 3 other bytes in the word.
> > 
> > > It does use 4 byte accesses but IIUC that is all that exists on
> > > this architecture.
> > 
> > Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for
> > example. A store to another portion of the word should make the
> > store-conditional fail and we'll retry the loop.
> > 
> > The short versions should however preserve the other bytes in the word.
> 
> Indeed. Also, accesses must be aligned, so the asm needs to round down
> to an aligned address and perform a correct read-modify-write on it,
> placing the new byte in the correct offset in the word.
> 
> Alternatively (my preference) this logic can be impemented in C as a
> wrapper around the 32-bit cmpxchg. I think this is less error-prone
> and it can be shared between the multiple sh cmpxchg back-ends,
> including the new cas.l one we need for J2.

Sounds much more reasonable.

> > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte
> > versions.
> > 
> > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about
> > is always good, but ISTR some people wanting to resurrect SH:
> > 
> >   http://old.lwn.net/Articles/647636/
> > 
> > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and
> > take up an active interest in SH lest someone 'accidentally' nukes it?
> 
> We're in the process of preparing such a proposal right now. That
> current intent is that Sato-san and I will co-maintain arch/sh. We'll
> include more details about motivation, proposed development direction,
> existing work to be merged, etc. in that proposal.
> 
> Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rich Felker Jan. 6, 2016, 11:53 p.m. UTC | #14
On Wed, Jan 06, 2016 at 10:23:12PM +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 06, 2016 at 01:23:50PM -0500, Rich Felker wrote:
> > On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote:
> > > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote:
> > > > > > Peter, what do you think? How about I leave this patch as is for now?
> > > > > 
> > > > > No, and I object to removing the single byte implementation too. Either
> > > > > remove the full arch or fix xchg() to conform. xchg() should work on all
> > > > > native word sizes, for SH that would be 1,2 and 4 bytes.
> > > > 
> > > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong?
> > > 
> > > It doesn't seem to preserve the 3 other bytes in the word.
> > > 
> > > > It does use 4 byte accesses but IIUC that is all that exists on
> > > > this architecture.
> > > 
> > > Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for
> > > example. A store to another portion of the word should make the
> > > store-conditional fail and we'll retry the loop.
> > > 
> > > The short versions should however preserve the other bytes in the word.
> > 
> > Indeed. Also, accesses must be aligned, so the asm needs to round down
> > to an aligned address and perform a correct read-modify-write on it,
> > placing the new byte in the correct offset in the word.
> > 
> > Alternatively (my preference) this logic can be impemented in C as a
> > wrapper around the 32-bit cmpxchg. I think this is less error-prone
> > and it can be shared between the multiple sh cmpxchg back-ends,
> > including the new cas.l one we need for J2.
> > 
> > > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte
> > > versions.
> > > 
> > > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about
> > > is always good, but ISTR some people wanting to resurrect SH:
> > > 
> > >   http://old.lwn.net/Articles/647636/
> > > 
> > > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and
> > > take up an active interest in SH lest someone 'accidentally' nukes it?
> > 
> > We're in the process of preparing such a proposal right now. That
> > current intent is that Sato-san and I will co-maintain arch/sh. We'll
> > include more details about motivation, proposed development direction,
> > existing work to be merged, etc. in that proposal.
> 
> Well I'd like to be able to make progress with generic
> arch cleanups meanwhile.
> 
> Could you quickly write a version of 1 and 2 byte xchg that
> works so I can include it?

Here are quick, untested generic ones:

static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
{
	u32 old;
	unsigned long offset = (unsigned long)m & 3;
	volatile u32 *w = (volatile u32 *)(m - offset);
	union { u32 w; u8 b[4]; } u;
	do {
		old = u.w = *w;
		result = w.b[offset];
		w.b[offset] = val;
	} while (cmpxchg(w, old, u.w) != old);
	return result;
}

static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
{
	u32 old;
	unsigned long result;
	unsigned long offset = ((unsigned long)m & 3) >> 1;
	volatile u32 *w = (volatile u32 *)(m - offset);
	union { u32 w; u16 h[2]; } u;
	do {
		old = u.w = *w;
		result = w.h[offset];
		w.h[offset] = val;
	} while (cmpxchg(w, old, u.w) != old);
	return result;
}

It would be nice to have these in asm-generic for archs which don't
define their own versions rather than having cruft like this repeated
per-arch. Strictly speaking, the volatile u32 used to access the
32-bit word containing the u8 or u16 should be
__attribute__((__may_alias__)) too. Is there an existing kernel type
for a "may_alias u32" or should it perhaps be added?

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Jan. 7, 2016, 1:37 p.m. UTC | #15
On Wed, Jan 06, 2016 at 06:53:01PM -0500, Rich Felker wrote:
> It would be nice to have these in asm-generic for archs which don't
> define their own versions rather than having cruft like this repeated
> per-arch. 

Maybe, but I'm not sure how many archs would indeed suffer this problem,
so far I'm only aware of Alpha and SH that do not have short atomic ops.

> Strictly speaking, the volatile u32 used to access the
> 32-bit word containing the u8 or u16 should be
> __attribute__((__may_alias__)) too. Is there an existing kernel type
> for a "may_alias u32" or should it perhaps be added?

The kernel does -fno-strict-aliasing because the C aliasing rules are
crap (TM) :-), so I suspect we do not need the alias attribute here.
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Jan. 7, 2016, 5:48 p.m. UTC | #16
On Wed, Jan 06, 2016 at 06:53:01PM -0500, Rich Felker wrote:
> On Wed, Jan 06, 2016 at 10:23:12PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Jan 06, 2016 at 01:23:50PM -0500, Rich Felker wrote:
> > > On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote:
> > > > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote:
> > > > > > > Peter, what do you think? How about I leave this patch as is for now?
> > > > > > 
> > > > > > No, and I object to removing the single byte implementation too. Either
> > > > > > remove the full arch or fix xchg() to conform. xchg() should work on all
> > > > > > native word sizes, for SH that would be 1,2 and 4 bytes.
> > > > > 
> > > > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong?
> > > > 
> > > > It doesn't seem to preserve the 3 other bytes in the word.
> > > > 
> > > > > It does use 4 byte accesses but IIUC that is all that exists on
> > > > > this architecture.
> > > > 
> > > > Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for
> > > > example. A store to another portion of the word should make the
> > > > store-conditional fail and we'll retry the loop.
> > > > 
> > > > The short versions should however preserve the other bytes in the word.
> > > 
> > > Indeed. Also, accesses must be aligned, so the asm needs to round down
> > > to an aligned address and perform a correct read-modify-write on it,
> > > placing the new byte in the correct offset in the word.
> > > 
> > > Alternatively (my preference) this logic can be impemented in C as a
> > > wrapper around the 32-bit cmpxchg. I think this is less error-prone
> > > and it can be shared between the multiple sh cmpxchg back-ends,
> > > including the new cas.l one we need for J2.
> > > 
> > > > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte
> > > > versions.
> > > > 
> > > > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about
> > > > is always good, but ISTR some people wanting to resurrect SH:
> > > > 
> > > >   http://old.lwn.net/Articles/647636/
> > > > 
> > > > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and
> > > > take up an active interest in SH lest someone 'accidentally' nukes it?
> > > 
> > > We're in the process of preparing such a proposal right now. That
> > > current intent is that Sato-san and I will co-maintain arch/sh. We'll
> > > include more details about motivation, proposed development direction,
> > > existing work to be merged, etc. in that proposal.
> > 
> > Well I'd like to be able to make progress with generic
> > arch cleanups meanwhile.
> > 
> > Could you quickly write a version of 1 and 2 byte xchg that
> > works so I can include it?
> 
> Here are quick, untested generic ones:
> 
> static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
> {
> 	u32 old;
> 	unsigned long offset = (unsigned long)m & 3;
> 	volatile u32 *w = (volatile u32 *)(m - offset);
> 	union { u32 w; u8 b[4]; } u;
> 	do {
> 		old = u.w = *w;
> 		result = w.b[offset];
> 		w.b[offset] = val;
> 	} while (cmpxchg(w, old, u.w) != old);
> 	return result;
> }
> 
> static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
> {
> 	u32 old;
> 	unsigned long result;
> 	unsigned long offset = ((unsigned long)m & 3) >> 1;
> 	volatile u32 *w = (volatile u32 *)(m - offset);
> 	union { u32 w; u16 h[2]; } u;
> 	do {
> 		old = u.w = *w;
> 		result = w.h[offset];
> 		w.h[offset] = val;
> 	} while (cmpxchg(w, old, u.w) != old);
> 	return result;
> }
> 
> It would be nice to have these in asm-generic for archs which don't
> define their own versions rather than having cruft like this repeated
> per-arch. Strictly speaking, the volatile u32 used to access the
> 32-bit word containing the u8 or u16 should be
> __attribute__((__may_alias__)) too. Is there an existing kernel type
> for a "may_alias u32" or should it perhaps be added?
> 
> Rich

BTW this seems suboptimal for grb and irq variants which apparently
can do things correctly.

--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rich Felker Jan. 7, 2016, 7:05 p.m. UTC | #17
On Thu, Jan 07, 2016 at 02:37:49PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 06, 2016 at 06:53:01PM -0500, Rich Felker wrote:
> > It would be nice to have these in asm-generic for archs which don't
> > define their own versions rather than having cruft like this repeated
> > per-arch. 
> 
> Maybe, but I'm not sure how many archs would indeed suffer this problem,
> so far I'm only aware of Alpha and SH that do not have short atomic ops.

Apparently original armv6 (non-k) lacked u8 and u16 variants of
ldrex/strex. I'm pretty sure or1k also lacks them, and mips,
microblaze, and some powerpc versions might too. Not sure about
risc-v.

> > Strictly speaking, the volatile u32 used to access the
> > 32-bit word containing the u8 or u16 should be
> > __attribute__((__may_alias__)) too. Is there an existing kernel type
> > for a "may_alias u32" or should it perhaps be added?
> 
> The kernel does -fno-strict-aliasing because the C aliasing rules are
> crap (TM) :-), so I suspect we do not need the alias attribute here.

I suspect working on the kernel I'm going to have to get used to
getting "corrected" for writing proper C... ;-)

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rich Felker Jan. 7, 2016, 7:10 p.m. UTC | #18
On Thu, Jan 07, 2016 at 07:48:08PM +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 06, 2016 at 06:53:01PM -0500, Rich Felker wrote:
> > On Wed, Jan 06, 2016 at 10:23:12PM +0200, Michael S. Tsirkin wrote:
> > > On Wed, Jan 06, 2016 at 01:23:50PM -0500, Rich Felker wrote:
> > > > On Wed, Jan 06, 2016 at 03:32:18PM +0100, Peter Zijlstra wrote:
> > > > > On Wed, Jan 06, 2016 at 01:52:17PM +0200, Michael S. Tsirkin wrote:
> > > > > > > > Peter, what do you think? How about I leave this patch as is for now?
> > > > > > > 
> > > > > > > No, and I object to removing the single byte implementation too. Either
> > > > > > > remove the full arch or fix xchg() to conform. xchg() should work on all
> > > > > > > native word sizes, for SH that would be 1,2 and 4 bytes.
> > > > > > 
> > > > > > Rick, maybe you could explain how is current 1 byte xchg on llsc wrong?
> > > > > 
> > > > > It doesn't seem to preserve the 3 other bytes in the word.
> > > > > 
> > > > > > It does use 4 byte accesses but IIUC that is all that exists on
> > > > > > this architecture.
> > > > > 
> > > > > Right, that's not a problem, look at arch/alpha/include/asm/xchg.h for
> > > > > example. A store to another portion of the word should make the
> > > > > store-conditional fail and we'll retry the loop.
> > > > > 
> > > > > The short versions should however preserve the other bytes in the word.
> > > > 
> > > > Indeed. Also, accesses must be aligned, so the asm needs to round down
> > > > to an aligned address and perform a correct read-modify-write on it,
> > > > placing the new byte in the correct offset in the word.
> > > > 
> > > > Alternatively (my preference) this logic can be impemented in C as a
> > > > wrapper around the 32-bit cmpxchg. I think this is less error-prone
> > > > and it can be shared between the multiple sh cmpxchg back-ends,
> > > > including the new cas.l one we need for J2.
> > > > 
> > > > > SH's cmpxchg() is equally incomplete and does not provide 1 and 2 byte
> > > > > versions.
> > > > > 
> > > > > In any case, I'm all for rm -rf arch/sh/, one less arch to worry about
> > > > > is always good, but ISTR some people wanting to resurrect SH:
> > > > > 
> > > > >   http://old.lwn.net/Articles/647636/
> > > > > 
> > > > > Rob, Jeff, Sato-san, might I suggest you send a MAINTAINERS patch and
> > > > > take up an active interest in SH lest someone 'accidentally' nukes it?
> > > > 
> > > > We're in the process of preparing such a proposal right now. That
> > > > current intent is that Sato-san and I will co-maintain arch/sh. We'll
> > > > include more details about motivation, proposed development direction,
> > > > existing work to be merged, etc. in that proposal.
> > > 
> > > Well I'd like to be able to make progress with generic
> > > arch cleanups meanwhile.
> > > 
> > > Could you quickly write a version of 1 and 2 byte xchg that
> > > works so I can include it?
> > 
> > Here are quick, untested generic ones:
> > 
> > static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
> > {
> > 	u32 old;
> > 	unsigned long offset = (unsigned long)m & 3;
> > 	volatile u32 *w = (volatile u32 *)(m - offset);
> > 	union { u32 w; u8 b[4]; } u;
> > 	do {
> > 		old = u.w = *w;
> > 		result = w.b[offset];
> > 		w.b[offset] = val;
> > 	} while (cmpxchg(w, old, u.w) != old);
> > 	return result;
> > }
> > 
> > static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
> > {
> > 	u32 old;
> > 	unsigned long result;
> > 	unsigned long offset = ((unsigned long)m & 3) >> 1;
> > 	volatile u32 *w = (volatile u32 *)(m - offset);
> > 	union { u32 w; u16 h[2]; } u;
> > 	do {
> > 		old = u.w = *w;
> > 		result = w.h[offset];
> > 		w.h[offset] = val;
> > 	} while (cmpxchg(w, old, u.w) != old);
> > 	return result;
> > }
> > 
> > It would be nice to have these in asm-generic for archs which don't
> > define their own versions rather than having cruft like this repeated
> > per-arch. Strictly speaking, the volatile u32 used to access the
> > 32-bit word containing the u8 or u16 should be
> > __attribute__((__may_alias__)) too. Is there an existing kernel type
> > for a "may_alias u32" or should it perhaps be added?
> > 
> > Rich
> 
> BTW this seems suboptimal for grb and irq variants which apparently
> can do things correctly.

In principle I agree, but u8/u16 xchg is mostly unused (completely
unused in my builds) and unlikely to matter to performance. Also, the
irq variant is only for the original sh2 which is not even produced
anymore afaik. Our reimplementation of the sh2 ISA, the J2, has a
cas.l instruction that will be used instead because it supports SMP
where interrupt masking is insufficient to achieve atomicity.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/sh/include/asm/barrier.h b/arch/sh/include/asm/barrier.h
index f887c64..0cc5735 100644
--- a/arch/sh/include/asm/barrier.h
+++ b/arch/sh/include/asm/barrier.h
@@ -32,7 +32,15 @@ 
 #define ctrl_barrier()	__asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop")
 #endif
 
-#define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
+#define __smp_store_mb(var, value) do { \
+	if (sizeof(var) != 4 && sizeof(var) != 1) { \
+		 WRITE_ONCE(var, value); \
+		__smp_mb(); \
+	} else { \
+		(void)xchg(&var, value);  \
+	} \
+} while (0)
+
 #define smp_store_mb(var, value) __smp_store_mb(var, value)
 
 #include <asm-generic/barrier.h>