diff mbox series

[3/3] locking/atomic: openrisc: use generic_cmpxchg[64]_local for arch_cmpxchg[64]_local

Message ID 20231026073932.702197-1-wuqiang.matt@bytedance.com (mailing list archive)
State Handled Elsewhere
Headers show
Series [1/3] locking/atomic: arc: use generic_cmpxchg[64]_local for arch_cmpxchg[64]_local | expand

Commit Message

wuqiang.matt Oct. 26, 2023, 7:39 a.m. UTC
arch_cmpxchg[64]_local() are not defined for openrisc. So implement
them with generci_cmpxchg[64]_local, advised by Masami Hiramatsu.

Closes: https://lore.kernel.org/linux-trace-kernel/169824660459.24340.14614817132696360531.stgit@devnote2
Closes: https://lore.kernel.org/oe-kbuild-all/202310241310.Ir5uukOG-lkp@intel.com

Signed-off-by: wuqiang.matt <wuqiang.matt@bytedance.com>
---
 arch/openrisc/include/asm/cmpxchg.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Arnd Bergmann Oct. 26, 2023, 8:46 a.m. UTC | #1
On Thu, Oct 26, 2023, at 09:39, wuqiang.matt wrote:
> arch_cmpxchg[64]_local() are not defined for openrisc. So implement
> them with generci_cmpxchg[64]_local, advised by Masami Hiramatsu.
>
> Closes: 
> https://lore.kernel.org/linux-trace-kernel/169824660459.24340.14614817132696360531.stgit@devnote2
> Closes: 
> https://lore.kernel.org/oe-kbuild-all/202310241310.Ir5uukOG-lkp@intel.com
>
> Signed-off-by: wuqiang.matt <wuqiang.matt@bytedance.com>

I think on architectures that have actual atomics, you
generally want to define this to be the same as arch_cmpxchg()
rather than the generic version.

It depends on the relative cost of doing one atomic compared
to an irq-disable/enable pair, but everyone else went with
the former if they could. The exceptions are armv4/armv5,
sparc32 and parisc, which don't have a generic cmpxchg()
or similar operation.

You could do the thing that sparc64 and xtensa do, which
use the native cmpxchg for supported word sizes but the
generic version for 1- and 2-byte swaps, but that has its
own set of problems if you end up doing operations on both
the entire word and a sub-unit of the same thing.

    Arnd
wuqiang.matt Oct. 26, 2023, 11:05 a.m. UTC | #2
On 2023/10/26 16:46, Arnd Bergmann wrote:
> On Thu, Oct 26, 2023, at 09:39, wuqiang.matt wrote:
>> arch_cmpxchg[64]_local() are not defined for openrisc. So implement
>> them with generci_cmpxchg[64]_local, advised by Masami Hiramatsu.
>>
>> Closes:
>> https://lore.kernel.org/linux-trace-kernel/169824660459.24340.14614817132696360531.stgit@devnote2
>> Closes:
>> https://lore.kernel.org/oe-kbuild-all/202310241310.Ir5uukOG-lkp@intel.com
>>
>> Signed-off-by: wuqiang.matt <wuqiang.matt@bytedance.com>
> 
> I think on architectures that have actual atomics, you
> generally want to define this to be the same as arch_cmpxchg()
> rather than the generic version.
> 
> It depends on the relative cost of doing one atomic compared
> to an irq-disable/enable pair, but everyone else went with
> the former if they could. The exceptions are armv4/armv5,
> sparc32 and parisc, which don't have a generic cmpxchg()
> or similar operation.

Sure, better native than the generic. I'll try to collect more
insights before next move.

> You could do the thing that sparc64 and xtensa do, which
> use the native cmpxchg for supported word sizes but the
> generic version for 1- and 2-byte swaps, but that has its
> own set of problems if you end up doing operations on both
> the entire word and a sub-unit of the same thing.

Thank you for pointing out this. I'll do some research on these
implementations.

>      Arnd

Regards,
wuqiang
Masami Hiramatsu (Google) Oct. 28, 2023, 12:49 p.m. UTC | #3
Hi Wuqiang,

On Thu, 26 Oct 2023 19:05:51 +0800
"wuqiang.matt" <wuqiang.matt@bytedance.com> wrote:

> On 2023/10/26 16:46, Arnd Bergmann wrote:
> > On Thu, Oct 26, 2023, at 09:39, wuqiang.matt wrote:
> >> arch_cmpxchg[64]_local() are not defined for openrisc. So implement
> >> them with generci_cmpxchg[64]_local, advised by Masami Hiramatsu.
> >>
> >> Closes:
> >> https://lore.kernel.org/linux-trace-kernel/169824660459.24340.14614817132696360531.stgit@devnote2
> >> Closes:
> >> https://lore.kernel.org/oe-kbuild-all/202310241310.Ir5uukOG-lkp@intel.com
> >>
> >> Signed-off-by: wuqiang.matt <wuqiang.matt@bytedance.com>
> > 
> > I think on architectures that have actual atomics, you
> > generally want to define this to be the same as arch_cmpxchg()
> > rather than the generic version.
> > 
> > It depends on the relative cost of doing one atomic compared
> > to an irq-disable/enable pair, but everyone else went with
> > the former if they could. The exceptions are armv4/armv5,
> > sparc32 and parisc, which don't have a generic cmpxchg()
> > or similar operation.
> 
> Sure, better native than the generic. I'll try to collect more
> insights before next move.

So I will temporally remove the last change (use arch_cmpxchg_local
in objpool) until these series are rewritten with arch native code,
so that the next release will not break the kernel build.

But this must be fixed because arch_cmpxchg_local() is required
for each arch anyway.

> 
> > You could do the thing that sparc64 and xtensa do, which
> > use the native cmpxchg for supported word sizes but the
> > generic version for 1- and 2-byte swaps, but that has its
> > own set of problems if you end up doing operations on both
> > the entire word and a sub-unit of the same thing.
> 
> Thank you for pointing out this. I'll do some research on these
> implementations.

arc also has the LL-SC instruction but depends on the core feature,
so I think we can use it.

Thank you,

> 
> >      Arnd
> 
> Regards,
> wuqiang
>
wuqiang.matt Oct. 28, 2023, 4:40 p.m. UTC | #4
On 2023/10/28 20:49, Masami Hiramatsu (Google) wrote:
> Hi Wuqiang,
> 
> On Thu, 26 Oct 2023 19:05:51 +0800
> "wuqiang.matt" <wuqiang.matt@bytedance.com> wrote:
> 
>> On 2023/10/26 16:46, Arnd Bergmann wrote:
>>> On Thu, Oct 26, 2023, at 09:39, wuqiang.matt wrote:
>>>> arch_cmpxchg[64]_local() are not defined for openrisc. So implement
>>>> them with generci_cmpxchg[64]_local, advised by Masami Hiramatsu.
>>>>
>>>> Closes:
>>>> https://lore.kernel.org/linux-trace-kernel/169824660459.24340.14614817132696360531.stgit@devnote2
>>>> Closes:
>>>> https://lore.kernel.org/oe-kbuild-all/202310241310.Ir5uukOG-lkp@intel.com
>>>>
>>>> Signed-off-by: wuqiang.matt <wuqiang.matt@bytedance.com>
>>>
>>> I think on architectures that have actual atomics, you
>>> generally want to define this to be the same as arch_cmpxchg()
>>> rather than the generic version.
>>>
>>> It depends on the relative cost of doing one atomic compared
>>> to an irq-disable/enable pair, but everyone else went with
>>> the former if they could. The exceptions are armv4/armv5,
>>> sparc32 and parisc, which don't have a generic cmpxchg()
>>> or similar operation.
>>
>> Sure, better native than the generic. I'll try to collect more
>> insights before next move.
> 
> So I will temporally remove the last change (use arch_cmpxchg_local
> in objpool) until these series are rewritten with arch native code,
> so that the next release will not break the kernel build.

Ok, it's fine to me. Thank you.


> But this must be fixed because arch_cmpxchg_local() is required
> for each arch anyway.

Yes. I'm working on the new update for arc/openrisc/hexagon. It would
be better resolve this issue first, then consider the objpool update
of using arch_cmpxchg_local.

>>
>>> You could do the thing that sparc64 and xtensa do, which
>>> use the native cmpxchg for supported word sizes but the
>>> generic version for 1- and 2-byte swaps, but that has its
>>> own set of problems if you end up doing operations on both
>>> the entire word and a sub-unit of the same thing.
>>
>> Thank you for pointing out this. I'll do some research on these
>> implementations.
> 
> arc also has the LL-SC instruction but depends on the core feature,
> so I think we can use it.

Right. The arc processor does have the CONFIG_ARC_HAS_LLSC option, but
I doubt the correctness of arch_cmpxchg_relaxed and arch_cmpxchg:

arch_cmpxchg_relaxed:
...
         switch(sizeof((_p_))) {
         case 4:
....

arch_cmpxchg:
...
	BUILD_BUG_ON(sizeof(_p_) != 4);		
...

_p is the address pointer, so I'm thinking it's a typo but I couldn't
yet confirm. There is not much about arc processors in the web :(


> Thank you,
> 
>>
>>>       Arnd
>>
>> Regards,
>> wuqiang
>>
> 
>
Masami Hiramatsu (Google) Oct. 29, 2023, 3:26 a.m. UTC | #5
On Sun, 29 Oct 2023 00:40:17 +0800
"wuqiang.matt" <wuqiang.matt@bytedance.com> wrote:

> On 2023/10/28 20:49, Masami Hiramatsu (Google) wrote:
> > Hi Wuqiang,
> > 
> > On Thu, 26 Oct 2023 19:05:51 +0800
> > "wuqiang.matt" <wuqiang.matt@bytedance.com> wrote:
> > 
> >> On 2023/10/26 16:46, Arnd Bergmann wrote:
> >>> On Thu, Oct 26, 2023, at 09:39, wuqiang.matt wrote:
> >>>> arch_cmpxchg[64]_local() are not defined for openrisc. So implement
> >>>> them with generci_cmpxchg[64]_local, advised by Masami Hiramatsu.
> >>>>
> >>>> Closes:
> >>>> https://lore.kernel.org/linux-trace-kernel/169824660459.24340.14614817132696360531.stgit@devnote2
> >>>> Closes:
> >>>> https://lore.kernel.org/oe-kbuild-all/202310241310.Ir5uukOG-lkp@intel.com
> >>>>
> >>>> Signed-off-by: wuqiang.matt <wuqiang.matt@bytedance.com>
> >>>
> >>> I think on architectures that have actual atomics, you
> >>> generally want to define this to be the same as arch_cmpxchg()
> >>> rather than the generic version.
> >>>
> >>> It depends on the relative cost of doing one atomic compared
> >>> to an irq-disable/enable pair, but everyone else went with
> >>> the former if they could. The exceptions are armv4/armv5,
> >>> sparc32 and parisc, which don't have a generic cmpxchg()
> >>> or similar operation.
> >>
> >> Sure, better native than the generic. I'll try to collect more
> >> insights before next move.
> > 
> > So I will temporally remove the last change (use arch_cmpxchg_local
> > in objpool) until these series are rewritten with arch native code,
> > so that the next release will not break the kernel build.
> 
> Ok, it's fine to me. Thank you.
> 
> 
> > But this must be fixed because arch_cmpxchg_local() is required
> > for each arch anyway.
> 
> Yes. I'm working on the new update for arc/openrisc/hexagon. It would
> be better resolve this issue first, then consider the objpool update
> of using arch_cmpxchg_local.
> 
> >>
> >>> You could do the thing that sparc64 and xtensa do, which
> >>> use the native cmpxchg for supported word sizes but the
> >>> generic version for 1- and 2-byte swaps, but that has its
> >>> own set of problems if you end up doing operations on both
> >>> the entire word and a sub-unit of the same thing.
> >>
> >> Thank you for pointing out this. I'll do some research on these
> >> implementations.
> > 
> > arc also has the LL-SC instruction but depends on the core feature,
> > so I think we can use it.
> 
> Right. The arc processor does have the CONFIG_ARC_HAS_LLSC option, but
> I doubt the correctness of arch_cmpxchg_relaxed and arch_cmpxchg:
> 
> arch_cmpxchg_relaxed:
> ...
>          switch(sizeof((_p_))) {
>          case 4:
> ....
> 
> arch_cmpxchg:
> ...
> 	BUILD_BUG_ON(sizeof(_p_) != 4);		
> ...
> 
> _p is the address pointer, so I'm thinking it's a typo but I couldn't
> yet confirm. There is not much about arc processors in the web :(

Hmm, indeed. This seems like a bug but it depends on the 'llock  %0, [%1]'
can take a 32bit address or 32bit data register. Usually it should
check the size of data, but need to check with ISA manual.

Vineet, can you check this suspicious bug?

Thank you,

> 
> 
> > Thank you,
> > 
> >>
> >>>       Arnd
> >>
> >> Regards,
> >> wuqiang
> >>
> > 
> > 
>
Vineet Gupta Oct. 30, 2023, 2:22 a.m. UTC | #6
On 10/28/23 20:26, Masami Hiramatsu (Google) wrote:
> On Sun, 29 Oct 2023 00:40:17 +0800
> "wuqiang.matt" <wuqiang.matt@bytedance.com> wrote:
>
>> On 2023/10/28 20:49, Masami Hiramatsu (Google) wrote:
>>> Hi Wuqiang,
>>>
>>> On Thu, 26 Oct 2023 19:05:51 +0800
>>> "wuqiang.matt" <wuqiang.matt@bytedance.com> wrote:
>>>
>>>> On 2023/10/26 16:46, Arnd Bergmann wrote:
>>>>> On Thu, Oct 26, 2023, at 09:39, wuqiang.matt wrote:
>>>>>> arch_cmpxchg[64]_local() are not defined for openrisc. So implement
>>>>>> them with generci_cmpxchg[64]_local, advised by Masami Hiramatsu.
>>>>>>
>>>>>> Closes:
>>>>>> https://lore.kernel.org/linux-trace-kernel/169824660459.24340.14614817132696360531.stgit@devnote2
>>>>>> Closes:
>>>>>> https://lore.kernel.org/oe-kbuild-all/202310241310.Ir5uukOG-lkp@intel.com
>>>>>>
>>>>>> Signed-off-by: wuqiang.matt <wuqiang.matt@bytedance.com>
>>>>> I think on architectures that have actual atomics, you
>>>>> generally want to define this to be the same as arch_cmpxchg()
>>>>> rather than the generic version.
>>>>>
>>>>> It depends on the relative cost of doing one atomic compared
>>>>> to an irq-disable/enable pair, but everyone else went with
>>>>> the former if they could. The exceptions are armv4/armv5,
>>>>> sparc32 and parisc, which don't have a generic cmpxchg()
>>>>> or similar operation.
>>>> Sure, better native than the generic. I'll try to collect more
>>>> insights before next move.
>>> So I will temporally remove the last change (use arch_cmpxchg_local
>>> in objpool) until these series are rewritten with arch native code,
>>> so that the next release will not break the kernel build.
>> Ok, it's fine to me. Thank you.
>>
>>
>>> But this must be fixed because arch_cmpxchg_local() is required
>>> for each arch anyway.
>> Yes. I'm working on the new update for arc/openrisc/hexagon. It would
>> be better resolve this issue first, then consider the objpool update
>> of using arch_cmpxchg_local.
>>
>>>>> You could do the thing that sparc64 and xtensa do, which
>>>>> use the native cmpxchg for supported word sizes but the
>>>>> generic version for 1- and 2-byte swaps, but that has its
>>>>> own set of problems if you end up doing operations on both
>>>>> the entire word and a sub-unit of the same thing.
>>>> Thank you for pointing out this. I'll do some research on these
>>>> implementations.
>>> arc also has the LL-SC instruction but depends on the core feature,
>>> so I think we can use it.
>> Right. The arc processor does have the CONFIG_ARC_HAS_LLSC option, but
>> I doubt the correctness of arch_cmpxchg_relaxed and arch_cmpxchg:
>>
>> arch_cmpxchg_relaxed:
>> ...
>>           switch(sizeof((_p_))) {
>>           case 4:
>> ....
>>
>> arch_cmpxchg:
>> ...
>> 	BUILD_BUG_ON(sizeof(_p_) != 4);		
>> ...
>>
>> _p is the address pointer, so I'm thinking it's a typo but I couldn't
>> yet confirm. There is not much about arc processors in the web :(
> Hmm, indeed. This seems like a bug but it depends on the 'llock  %0, [%1]'
> can take a 32bit address or 32bit data register. Usually it should
> check the size of data, but need to check with ISA manual.
>
> Vineet, can you check this suspicious bug?

ARCv2 is a 32-bit ISA and LLOCK/SCOND work on 32-bit data.
So the pointers will be 32-bit anyways. Is the issue that 
pointer/cmpxchg operation could be on a smaller data type ?

-Vineet
wuqiang.matt Oct. 30, 2023, 3:41 a.m. UTC | #7
On 2023/10/30 10:22, Vineet Gupta wrote:
> 
> 
> On 10/28/23 20:26, Masami Hiramatsu (Google) wrote:
>> On Sun, 29 Oct 2023 00:40:17 +0800
>> "wuqiang.matt" <wuqiang.matt@bytedance.com> wrote:
>>
>>> On 2023/10/28 20:49, Masami Hiramatsu (Google) wrote:
>>>> Hi Wuqiang,
>>>>
>>>> On Thu, 26 Oct 2023 19:05:51 +0800
>>>> "wuqiang.matt" <wuqiang.matt@bytedance.com> wrote:
>>>>
>>>>> On 2023/10/26 16:46, Arnd Bergmann wrote:
>>>>>> On Thu, Oct 26, 2023, at 09:39, wuqiang.matt wrote:
>>>>>>> arch_cmpxchg[64]_local() are not defined for openrisc. So implement
>>>>>>> them with generci_cmpxchg[64]_local, advised by Masami Hiramatsu.
>>>>>>>
>>>>>>> Closes:
>>>>>>> https://lore.kernel.org/linux-trace-kernel/169824660459.24340.14614817132696360531.stgit@devnote2
>>>>>>> Closes:
>>>>>>> https://lore.kernel.org/oe-kbuild-all/202310241310.Ir5uukOG-lkp@intel.com
>>>>>>>
>>>>>>> Signed-off-by: wuqiang.matt <wuqiang.matt@bytedance.com>
>>>>>> I think on architectures that have actual atomics, you
>>>>>> generally want to define this to be the same as arch_cmpxchg()
>>>>>> rather than the generic version.
>>>>>>
>>>>>> It depends on the relative cost of doing one atomic compared
>>>>>> to an irq-disable/enable pair, but everyone else went with
>>>>>> the former if they could. The exceptions are armv4/armv5,
>>>>>> sparc32 and parisc, which don't have a generic cmpxchg()
>>>>>> or similar operation.
>>>>> Sure, better native than the generic. I'll try to collect more
>>>>> insights before next move.
>>>> So I will temporally remove the last change (use arch_cmpxchg_local
>>>> in objpool) until these series are rewritten with arch native code,
>>>> so that the next release will not break the kernel build.
>>> Ok, it's fine to me. Thank you.
>>>
>>>
>>>> But this must be fixed because arch_cmpxchg_local() is required
>>>> for each arch anyway.
>>> Yes. I'm working on the new update for arc/openrisc/hexagon. It would
>>> be better resolve this issue first, then consider the objpool update
>>> of using arch_cmpxchg_local.
>>>
>>>>>> You could do the thing that sparc64 and xtensa do, which
>>>>>> use the native cmpxchg for supported word sizes but the
>>>>>> generic version for 1- and 2-byte swaps, but that has its
>>>>>> own set of problems if you end up doing operations on both
>>>>>> the entire word and a sub-unit of the same thing.
>>>>> Thank you for pointing out this. I'll do some research on these
>>>>> implementations.
>>>> arc also has the LL-SC instruction but depends on the core feature,
>>>> so I think we can use it.
>>> Right. The arc processor does have the CONFIG_ARC_HAS_LLSC option, but
>>> I doubt the correctness of arch_cmpxchg_relaxed and arch_cmpxchg:
>>>
>>> arch_cmpxchg_relaxed:
>>> ...
>>>           switch(sizeof((_p_))) {
>>>           case 4:
>>> ....
>>>
>>> arch_cmpxchg:
>>> ...
>>>     BUILD_BUG_ON(sizeof(_p_) != 4);
>>> ...
>>>
>>> _p is the address pointer, so I'm thinking it's a typo but I couldn't
>>> yet confirm. There is not much about arc processors in the web :(
>> Hmm, indeed. This seems like a bug but it depends on the 'llock  %0, [%1]'
>> can take a 32bit address or 32bit data register. Usually it should
>> check the size of data, but need to check with ISA manual.
>>
>> Vineet, can you check this suspicious bug?
> 
> ARCv2 is a 32-bit ISA and LLOCK/SCOND work on 32-bit data.
> So the pointers will be 32-bit anyways. Is the issue that pointer/cmpxchg 
> operation could be on a smaller data type ?

For ARCv2 with CONFIG_ARC_HAS_LLSC, better add the data size checking and
only permit 32bit data size. Even for 32-bit system, data should can be
64bit 'long long'.

And In the case that CONFIG_ARC_HAS_LLSC is undefined, in arch_cmpxchg: the
pointer size checking is unnecessary, since it's using spinlock internally:

https://elixir.bootlin.com/linux/v6.6-rc7/source/arch/arc/include/asm/cmpxchg.h#L60: 

	BUILD_BUG_ON(sizeof(_p_) != 4);					\
									\
	/*								\
	 * spin lock/unlock provide the needed smp_mb() before/after	\
	 */								\
	atomic_ops_lock(__flags);					\
	_prev_ = *_p_;							\
	if (_prev_ == _o_)						\
		*_p_ = _n_;						\
	atomic_ops_unlock(__flags);

Another question about the naming: arch_cmpxchg_relaxed() implemented if
CONFIG_ARC_HAS_LLSC is configured and arch_cmpxchg() defined for the rest.
Are there any reasons for difference names ?

As I checked, Synopsys has released 64bit ARC processors (HS66/HS68), but
I don't know the status of Linux kernel support.

> -Vineet

Regards,
wuqiang
Vineet Gupta Nov. 2, 2023, 4:53 a.m. UTC | #8
On 10/29/23 20:41, wuqiang.matt wrote:
>>>>
>>>> arch_cmpxchg_relaxed:
>>>> ...
>>>>           switch(sizeof((_p_))) {
>>>>           case 4:
>>>> ....
>>>>
>>>> arch_cmpxchg:
>>>> ...
>>>>     BUILD_BUG_ON(sizeof(_p_) != 4);
>>>> ...
>>>>
>>>> _p is the address pointer, so I'm thinking it's a typo but I couldn't
>>>> yet confirm. There is not much about arc processors in the web :(
>>> Hmm, indeed. This seems like a bug but it depends on the 'llock  %0, 
>>> [%1]'
>>> can take a 32bit address or 32bit data register. Usually it should
>>> check the size of data, but need to check with ISA manual.
>>>
>>> Vineet, can you check this suspicious bug?
>>
>> ARCv2 is a 32-bit ISA and LLOCK/SCOND work on 32-bit data.
>> So the pointers will be 32-bit anyways. Is the issue that 
>> pointer/cmpxchg operation could be on a smaller data type ?
>
> For ARCv2 with CONFIG_ARC_HAS_LLSC, better add the data size checking and
> only permit 32bit data size. Even for 32-bit system, data should can be
> 64bit 'long long'.

Right, makes sense. Can you send a patch ?

>
> And In the case that CONFIG_ARC_HAS_LLSC is undefined, in 
> arch_cmpxchg: the
> pointer size checking is unnecessary, since it's using spinlock 
> internally:

Correct, I had the same thought.

>
> https://elixir.bootlin.com/linux/v6.6-rc7/source/arch/arc/include/asm/cmpxchg.h#L60: 
>
>     BUILD_BUG_ON(sizeof(_p_) != 4);                    \
>                                     \
>     /*                                \
>      * spin lock/unlock provide the needed smp_mb() before/after    \
>      */                                \
>     atomic_ops_lock(__flags);                    \
>     _prev_ = *_p_;                            \
>     if (_prev_ == _o_)                        \
>         *_p_ = _n_;                        \
>     atomic_ops_unlock(__flags);

Can you do that fix in same patch as well ?

> Another question about the naming: arch_cmpxchg_relaxed() implemented if
> CONFIG_ARC_HAS_LLSC is configured and arch_cmpxchg() defined for the 
> rest.
> Are there any reasons for difference names ?

Yes the atomics API consists of _relaxed, _acquire, _release and unadorned.
_relaxed is the most efficient, with rest having some extra barriers.
If arch provides _relaxed, generic code can create the rest - assuming 
arch hardware can support the relaxed ones.
That is true for LLSC.

However for !LLSC, spinlock versions already have full barriers due to 
spinlock, so relaxed variant doesn't make sense.

>
> As I checked, Synopsys has released 64bit ARC processors (HS66/HS68), but
> I don't know the status of Linux kernel support.

Yes there's an internal ARC64 port running but it is not ready for 
upstreaming yet.

-Vineet
wuqiang.matt Nov. 2, 2023, 9:30 a.m. UTC | #9
On 2023/11/2 12:53, Vineet Gupta wrote:
> 
> 
> On 10/29/23 20:41, wuqiang.matt wrote:
>>>>>
>>>>> arch_cmpxchg_relaxed:
>>>>> ...
>>>>>           switch(sizeof((_p_))) {
>>>>>           case 4:
>>>>> ....
>>>>>
>>>>> arch_cmpxchg:
>>>>> ...
>>>>>     BUILD_BUG_ON(sizeof(_p_) != 4);
>>>>> ...
>>>>>
>>>>> _p is the address pointer, so I'm thinking it's a typo but I couldn't
>>>>> yet confirm. There is not much about arc processors in the web :(
>>>> Hmm, indeed. This seems like a bug but it depends on the 'llock  %0, [%1]'
>>>> can take a 32bit address or 32bit data register. Usually it should
>>>> check the size of data, but need to check with ISA manual.
>>>>
>>>> Vineet, can you check this suspicious bug?
>>>
>>> ARCv2 is a 32-bit ISA and LLOCK/SCOND work on 32-bit data.
>>> So the pointers will be 32-bit anyways. Is the issue that pointer/cmpxchg 
>>> operation could be on a smaller data type ?
>>
>> For ARCv2 with CONFIG_ARC_HAS_LLSC, better add the data size checking and
>> only permit 32bit data size. Even for 32-bit system, data should can be
>> 64bit 'long long'.
> 
> Right, makes sense. Can you send a patch ?

Okay, I will.

>>
>> And In the case that CONFIG_ARC_HAS_LLSC is undefined, in arch_cmpxchg: the
>> pointer size checking is unnecessary, since it's using spinlock internally:
> 
> Correct, I had the same thought.
> 
>>
>> https://elixir.bootlin.com/linux/v6.6-rc7/source/arch/arc/include/asm/cmpxchg.h#L60:
>>     BUILD_BUG_ON(sizeof(_p_) != 4);                    \
>>                                     \
>>     /*                                \
>>      * spin lock/unlock provide the needed smp_mb() before/after    \
>>      */                                \
>>     atomic_ops_lock(__flags);                    \
>>     _prev_ = *_p_;                            \
>>     if (_prev_ == _o_)                        \
>>         *_p_ = _n_;                        \
>>     atomic_ops_unlock(__flags);
> 
> Can you do that fix in same patch as well ?

Sure.

>> Another question about the naming: arch_cmpxchg_relaxed() implemented if
>> CONFIG_ARC_HAS_LLSC is configured and arch_cmpxchg() defined for the rest.
>> Are there any reasons for difference names ?
> 
> Yes the atomics API consists of _relaxed, _acquire, _release and unadorned.
> _relaxed is the most efficient, with rest having some extra barriers.
> If arch provides _relaxed, generic code can create the rest - assuming arch 
> hardware can support the relaxed ones.

Definitely Yes! The generic cmpxchg will prefer arch_cmpxchg_relaxed rather
than arch_cmpxchg.

> That is true for LLSC.
> 
> However for !LLSC, spinlock versions already have full barriers due to 
> spinlock, so relaxed variant doesn't make sense.
> 
>>
>> As I checked, Synopsys has released 64bit ARC processors (HS66/HS68), but
>> I don't know the status of Linux kernel support.
> 
> Yes there's an internal ARC64 port running but it is not ready for upstreaming 
> yet.

Thanks for the update.

> -Vineet

Regards,
wuqiang
wuqiang.matt Nov. 2, 2023, 10:05 a.m. UTC | #10
On 2023/11/2 17:30, wuqiang.matt wrote:
> On 2023/11/2 12:53, Vineet Gupta wrote:
>>
>>
>> On 10/29/23 20:41, wuqiang.matt wrote:
>>>>>>
>>>>>> arch_cmpxchg_relaxed:
>>>>>> ...
>>>>>>           switch(sizeof((_p_))) {
>>>>>>           case 4:
>>>>>> ....
>>>>>>
>>>>>> arch_cmpxchg:
>>>>>> ...
>>>>>>     BUILD_BUG_ON(sizeof(_p_) != 4);
>>>>>> ...
>>>>>>
>>>>>> _p is the address pointer, so I'm thinking it's a typo but I couldn't
>>>>>> yet confirm. There is not much about arc processors in the web :(
>>>>> Hmm, indeed. This seems like a bug but it depends on the 'llock  %0, [%1]'
>>>>> can take a 32bit address or 32bit data register. Usually it should
>>>>> check the size of data, but need to check with ISA manual.
>>>>>
>>>>> Vineet, can you check this suspicious bug?
>>>>
>>>> ARCv2 is a 32-bit ISA and LLOCK/SCOND work on 32-bit data.
>>>> So the pointers will be 32-bit anyways. Is the issue that pointer/cmpxchg 
>>>> operation could be on a smaller data type ?
>>>
>>> For ARCv2 with CONFIG_ARC_HAS_LLSC, better add the data size checking and
>>> only permit 32bit data size. Even for 32-bit system, data should can be
>>> 64bit 'long long'.
>>
>> Right, makes sense. Can you send a patch ?
> 
> Okay, I will.
> 
>>>
>>> And In the case that CONFIG_ARC_HAS_LLSC is undefined, in arch_cmpxchg: the
>>> pointer size checking is unnecessary, since it's using spinlock internally:
>>
>> Correct, I had the same thought.
>>
>>>
>>> https://elixir.bootlin.com/linux/v6.6-rc7/source/arch/arc/include/asm/cmpxchg.h#L60:
>>>     BUILD_BUG_ON(sizeof(_p_) != 4);                    \
>>>                                     \
>>>     /*                                \
>>>      * spin lock/unlock provide the needed smp_mb() before/after    \
>>>      */                                \
>>>     atomic_ops_lock(__flags);                    \
>>>     _prev_ = *_p_;                            \
>>>     if (_prev_ == _o_)                        \
>>>         *_p_ = _n_;                        \
>>>     atomic_ops_unlock(__flags);
>>
>> Can you do that fix in same patch as well ?
> 
> Sure.
> 
>>> Another question about the naming: arch_cmpxchg_relaxed() implemented if
>>> CONFIG_ARC_HAS_LLSC is configured and arch_cmpxchg() defined for the rest.
>>> Are there any reasons for difference names ?
>>
>> Yes the atomics API consists of _relaxed, _acquire, _release and unadorned.
>> _relaxed is the most efficient, with rest having some extra barriers.
>> If arch provides _relaxed, generic code can create the rest - assuming arch 
>> hardware can support the relaxed ones.
> 
> Definitely Yes! The generic cmpxchg will prefer arch_cmpxchg_relaxed rather
> than arch_cmpxchg.

It should be arch_cmpxchg (if defined) rather than arch_cmpxchg_relaxed. I
just double checked with v6.6:

	...
	#if defined(arch_cmpxchg)
	#define raw_cmpxchg arch_cmpxchg
	#elif defined(arch_cmpxchg_relaxed)
		#define raw_cmpxchg(...) \
		__atomic_op_fence(arch_cmpxchg, __VA_ARGS__)
	#else
	extern void raw_cmpxchg_not_implemented(void);
	#define raw_cmpxchg(...) raw_cmpxchg_not_implemented()
	#endif
	...
	#if defined(arch_cmpxchg_relaxed)
	#define raw_cmpxchg_relaxed arch_cmpxchg_relaxed
	#elif defined(arch_cmpxchg)
	#define raw_cmpxchg_relaxed arch_cmpxchg
	#else
	extern void raw_cmpxchg_relaxed_not_implemented(void);
	#define raw_cmpxchg_relaxed(...) raw_cmpxchg_relaxed_not_implemented()
	#endif
	...

I made the mistake since the definitions of v5.2 confused me. The followings
are for v5.2, and these conditions are removed from v5.14:

	...
	#if !defined(arch_cmpxchg_relaxed) || defined(arch_cmpxchg)
	#define cmpxchg(ptr, ...) \
	({ \
		typeof(ptr) __ai_ptr = (ptr); \
		instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
		arch_cmpxchg(__ai_ptr, __VA_ARGS__); \
	})
	#endif
	...
	#if defined(arch_cmpxchg_relaxed)
	#define cmpxchg_relaxed(ptr, ...) \
	({ \
		typeof(ptr) __ai_ptr = (ptr); \
		instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
		arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \
	})
	#endif
	...

Sorry for the noise.

>> That is true for LLSC.
>>
>> However for !LLSC, spinlock versions already have full barriers due to 
>> spinlock, so relaxed variant doesn't make sense.
>>
>>>
>>> As I checked, Synopsys has released 64bit ARC processors (HS66/HS68), but
>>> I don't know the status of Linux kernel support.
>>
>> Yes there's an internal ARC64 port running but it is not ready for 
>> upstreaming yet.
> 
> Thanks for the update.
> 
>> -Vineet
> 
> Regards,
> wuqiang
diff mbox series

Patch

diff --git a/arch/openrisc/include/asm/cmpxchg.h b/arch/openrisc/include/asm/cmpxchg.h
index 8ee151c072e4..7d0555257389 100644
--- a/arch/openrisc/include/asm/cmpxchg.h
+++ b/arch/openrisc/include/asm/cmpxchg.h
@@ -168,4 +168,18 @@  __arch_xchg(volatile void *ptr, unsigned long with, int size)
 						 sizeof(*(ptr)));	\
 	})
 
+/*
+ * always make arch_cmpxchg[64]_local available. __generic_cmpxchg[64]_local
+ * are atomic with respect to current cpu.
+ */
+#include <asm-generic/cmpxchg-local.h>
+
+#define arch_cmpxchg_local(ptr, o, n) ({				\
+	(__typeof__(*ptr))__generic_cmpxchg_local((ptr),		\
+						(unsigned long)(o),	\
+						(unsigned long)(n),	\
+						sizeof(*(ptr)));	\
+})
+#define arch_cmpxchg64_local(ptr, o, n) __generic_cmpxchg64_local((ptr), (o), (n))
+
 #endif /* __ASM_OPENRISC_CMPXCHG_H */