[v6,10/20] xen/riscv: introduce atomic.h

Message ID	22ee9c8cc62c76cfb799fed800636e7c8bf25a17.1710517542.git.oleksii.kurochko@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Oleksii Kurochko <oleksii.kurochko@gmail.com> To: xen-devel@lists.xenproject.org Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>, Alistair Francis <alistair.francis@wdc.com>, Bob Eshleman <bobbyeshleman@gmail.com>, Connor Davis <connojdavis@gmail.com>, Andrew Cooper <andrew.cooper3@citrix.com>, George Dunlap <george.dunlap@citrix.com>, Jan Beulich <jbeulich@suse.com>, Julien Grall <julien@xen.org>, Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org> Subject: [PATCH v6 10/20] xen/riscv: introduce atomic.h Date: Fri, 15 Mar 2024 19:06:06 +0100 Message-ID: <22ee9c8cc62c76cfb799fed800636e7c8bf25a17.1710517542.git.oleksii.kurochko@gmail.com> In-Reply-To: <cover.1710517542.git.oleksii.kurochko@gmail.com> References: <cover.1710517542.git.oleksii.kurochko@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Enable build of full Xen for RISC-V \| expand [v6,00/20] Enable build of full Xen for RISC-V [v6,01/20] automation: introduce fixed randconfig for RISC-V [v6,02/20] xen/riscv: disable unnecessary configs [v6,03/20] xen/riscv: introduce extenstion support check by compiler [v6,04/20] xen/asm-generic: introduce generic non-atomic test_bit() [v6,05/20] xen/bitops: implement fls{l}() in common logic [v6,06/20] xen/bitops: put __ffs() and ffz() into linux compatible header [v6,07/20] xen/riscv: introduce bitops.h [v6,08/20] xen/riscv: introduce cmpxchg.h [v6,09/20] xen/riscv: introduce io.h [v6,10/20] xen/riscv: introduce atomic.h [v6,11/20] xen/riscv: introduce monitor.h [v6,12/20] xen/riscv: add definition of __read_mostly [v6,13/20] xen/riscv: add required things to current.h [v6,14/20] xen/riscv: add minimal stuff to page.h to build full Xen [v6,15/20] xen/riscv: add minimal stuff to processor.h to build full Xen [v6,16/20] xen/riscv: add minimal stuff to mm.h to build full Xen [v6,17/20] xen/riscv: introduce vm_event_() functions [v6,18/20] xen/riscv: add minimal amount of stubs to build full Xen [v6,19/20] xen/riscv: enable full Xen build [v6,20/20] xen/README: add compiler and binutils versions for RISC-V64

Oleksii Kurochko March 15, 2024, 6:06 p.m. UTC

Initially the patch was introduced by Bobby, who takes the header from
Linux kernel.

The following changes were done on top of Linux kernel header:
 - atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
     to use__*xchg_generic()
 - drop casts in write_atomic() as they are unnecessary
 - drop introduction of WRITE_ONCE() and READ_ONCE().
   Xen provides ACCESS_ONCE()
 - remove zero-length array access in read_atomic()
 - drop defines similar to pattern
 - #define atomic_add_return_relaxed   atomic_add_return_relaxed
 - move not RISC-V specific functions to asm-generic/atomics-ops.h
 - drop  atomic##prefix##_{cmp}xchg_{release, aquire, release}() as they
   are not used in Xen.
 - update the defintion of  atomic##prefix##_{cmp}xchg according to
   {cmp}xchg() implementation in Xen.

Signed-off-by: Bobby Eshleman <bobbyeshleman@gmail.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V6:
 - drop atomic##prefix##_{cmp}xchg_{release, aquire, relaxed} as they aren't used
   by Xen
 - code style fixes.
 - %s/__asm__ __volatile__/asm volatile
 - add explanational comments.
 - move inclusion of "#include <asm-generic/atomic-ops.h>" further down in atomic.h
   header.
---
Changes in V5:
 - fence.h changes were moved to separate patch as patches related to io.h and cmpxchg.h,
   which are dependecies for this patch, also needed changes in fence.h
 - remove accessing of zero-length array
 - drops cast in write_atomic()
 - drop introduction of WRITE_ONCE() and READ_ONCE().
 - drop defines similar to pattern #define atomic_add_return_relaxed   atomic_add_return_relaxed
 - Xen code style fixes
 - move not RISC-V specific functions to asm-generic/atomics-ops.h
---
Changes in V4:
 - do changes related to the updates of [PATCH v3 13/34] xen/riscv: introduce cmpxchg.h
 - drop casts in read_atomic_size(), write_atomic(), add_sized()
 - tabs -> spaces
 - drop #ifdef CONFIG_SMP ... #endif in fence.ha as it is simpler to handle NR_CPUS=1
   the same as NR_CPUS>1 with accepting less than ideal performance.
---
Changes in V3:
  - update the commit message
  - add SPDX for fence.h
  - code style fixes
  - Remove /* TODO: ... */ for add_sized macros. It looks correct to me.
  - re-order the patch
  - merge to this patch fence.h
---
Changes in V2:
 - Change an author of commit. I got this header from Bobby's old repo.
---
 xen/arch/riscv/include/asm/atomic.h  | 263 +++++++++++++++++++++++++++
 xen/include/asm-generic/atomic-ops.h |  97 ++++++++++
 2 files changed, 360 insertions(+)
 create mode 100644 xen/arch/riscv/include/asm/atomic.h
 create mode 100644 xen/include/asm-generic/atomic-ops.h

Jan Beulich March 21, 2024, 1:03 p.m. UTC | #1

On 15.03.2024 19:06, Oleksii Kurochko wrote:
> Initially the patch was introduced by Bobby, who takes the header from
> Linux kernel.
> 
> The following changes were done on top of Linux kernel header:
>  - atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
>      to use__*xchg_generic()
>  - drop casts in write_atomic() as they are unnecessary
>  - drop introduction of WRITE_ONCE() and READ_ONCE().
>    Xen provides ACCESS_ONCE()

Here and in the code comment: While this may be describing what you did
on top of what Bobby had, here you're describing differences to the Linux
header.

>  - remove zero-length array access in read_atomic()
>  - drop defines similar to pattern

pattern? Which one? Oh, wait, ...

>  - #define atomic_add_return_relaxed   atomic_add_return_relaxed

... this line really isn't a separate bullet point.

> Changes in V6:
>  - drop atomic##prefix##_{cmp}xchg_{release, aquire, relaxed} as they aren't used
>    by Xen
>  - code style fixes.
>  - %s/__asm__ __volatile__/asm volatile

Btw, this is another activity that could do with being carried out
consistently through the series.

> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/atomic.h
> @@ -0,0 +1,263 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Taken and modified from Linux.
> + *
> + * The following changes were done:
> + * - * atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
> + *     to use__*xchg_generic()
> + * - drop casts in write_atomic() as they are unnecessary
> + * - drop introduction of WRITE_ONCE() and READ_ONCE().
> + *   Xen provides ACCESS_ONCE()
> + * - remove zero-length array access in read_atomic()
> + * - drop defines similar to pattern
> + *   #define atomic_add_return_relaxed   atomic_add_return_relaxed
> + * - move not RISC-V specific functions to asm-generic/atomics-ops.h
> + * 
> + * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
> + * Copyright (C) 2012 Regents of the University of California
> + * Copyright (C) 2017 SiFive
> + * Copyright (C) 2024 Vates SAS
> + */
> +
> +#ifndef _ASM_RISCV_ATOMIC_H
> +#define _ASM_RISCV_ATOMIC_H
> +
> +#include <xen/atomic.h>
> +
> +#include <asm/cmpxchg.h>
> +#include <asm/fence.h>
> +#include <asm/io.h>
> +#include <asm/system.h>
> +
> +void __bad_atomic_size(void);
> +
> +/*
> + * Legacy from Linux kernel. For some reason they wanted to have ordered
> + * read/write access. Thereby read* is used instead of read<X>_cpu()

Either read<X> and read<X>_cpu() or read* and read*_cpu(), please.

> + */
> +static always_inline void read_atomic_size(const volatile void *p,
> +                                           void *res,
> +                                           unsigned int size)
> +{
> +    switch ( size )
> +    {
> +    case 1: *(uint8_t *)res = readb(p); break;
> +    case 2: *(uint16_t *)res = readw(p); break;
> +    case 4: *(uint32_t *)res = readl(p); break;
> +    case 8: *(uint32_t *)res  = readq(p); break;

Nit: Excess blank before =.

Also - no #ifdef here to be RV32-ready?

> +    default: __bad_atomic_size(); break;
> +    }
> +}
> +
> +#define read_atomic(p) ({                                   \
> +    union { typeof(*(p)) val; char c[sizeof(*(p))]; } x_;   \

One trailing underscore here, but ...

> +    read_atomic_size(p, x_.c, sizeof(*(p)));                \
> +    x_.val;                                                 \
> +})
> +
> +#define write_atomic(p, x)                              \
> +({                                                      \
> +    typeof(*(p)) x__ = (x);                             \

... two here and ...

> +    switch ( sizeof(*(p)) )                             \
> +    {                                                   \
> +    case 1: writeb(x__, p); break;                      \
> +    case 2: writew(x__, p); break;                      \
> +    case 4: writel(x__, p); break;                      \
> +    case 8: writeq(x__, p); break;                      \
> +    default: __bad_atomic_size(); break;                \
> +    }                                                   \
> +    x__;                                                \
> +})
> +
> +#define add_sized(p, x)                                 \
> +({                                                      \
> +    typeof(*(p)) x__ = (x);                             \

... here?

> +    switch ( sizeof(*(p)) )                             \
> +    {                                                   \
> +    case 1: writeb(read_atomic(p) + x__, p); break;     \
> +    case 2: writew(read_atomic(p) + x__, p); break;     \
> +    case 4: writel(read_atomic(p) + x__, p); break;     \
> +    case 8: writeq(read_atomic(p) + x__, p); break;     \
> +    default: __bad_atomic_size(); break;                \
> +    }                                                   \
> +})
> +
> +#define __atomic_acquire_fence() \
> +    asm volatile ( RISCV_ACQUIRE_BARRIER "" ::: "memory" )
> +
> +#define __atomic_release_fence() \
> +    asm volatile ( RISCV_RELEASE_BARRIER "" ::: "memory" )
> +
> +/*
> + * First, the atomic ops that have no ordering constraints and therefor don't
> + * have the AQ or RL bits set.  These don't return anything, so there's only
> + * one version to worry about.
> + */
> +#define ATOMIC_OP(op, asm_op, I, asm_type, c_type, prefix)  \
> +static inline                                               \
> +void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
> +{                                                           \
> +    asm volatile (                                          \
> +        "   amo" #asm_op "." #asm_type " zero, %1, %0"      \
> +        : "+A" (v->counter)                                 \
> +        : "r" (I)                                           \

Btw, I consider this pretty confusing. At the 1st and 2nd glance this
looks like a mistake, i.e. as if i was meant. Imo ...

> +        : "memory" );                                       \
> +}                                                           \
> +
> +/*
> + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the reason why
> + * last argument for ATOMIC_OP isn't used.
> + */
> +#define ATOMIC_OPS(op, asm_op, I)                           \
> +        ATOMIC_OP (op, asm_op, I, w, int,   )
> +
> +ATOMIC_OPS(add, add,  i)
> +ATOMIC_OPS(sub, add, -i)
> +ATOMIC_OPS(and, and,  i)
> +ATOMIC_OPS( or,  or,  i)
> +ATOMIC_OPS(xor, xor,  i)

... here you want to only pass the (unary) operator (and leaving that blank
is as fine as using +).

> +#undef ATOMIC_OP
> +#undef ATOMIC_OPS
> +
> +#include <asm-generic/atomic-ops.h>
> +
> +/*
> + * Atomic ops that have ordered, relaxed, acquire, and release variants.

Only the first is implemented afaict; imo the comment would better reflect
that one way or another.

> + * There's two flavors of these: the arithmatic ops have both fetch and return
> + * versions, while the logical ops only have fetch versions.
> + */
> +#define ATOMIC_FETCH_OP(op, asm_op, I, asm_type, c_type, prefix)    \
> +static inline                                                       \
> +c_type atomic##prefix##_fetch_##op##_relaxed(c_type i,              \
> +                         atomic##prefix##_t *v)                     \
> +{                                                                   \
> +    register c_type ret;                                            \
> +    asm volatile (                                                  \
> +        "   amo" #asm_op "." #asm_type " %1, %2, %0"                \
> +        : "+A" (v->counter), "=r" (ret)                             \
> +        : "r" (I)                                                   \
> +        : "memory" );                                               \
> +    return ret;                                                     \
> +}                                                                   \

Actually a relaxed form is provided here, but does that have any user?

> +static inline                                                       \
> +c_type atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t *v) \
> +{                                                                   \
> +    register c_type ret;                                            \
> +    asm volatile (                                                  \
> +        "   amo" #asm_op "." #asm_type ".aqrl  %1, %2, %0"          \
> +        : "+A" (v->counter), "=r" (ret)                             \
> +        : "r" (I)                                                   \
> +        : "memory" );                                               \
> +    return ret;                                                     \
> +}
> +
> +#define ATOMIC_OP_RETURN(op, asm_op, c_op, I, asm_type, c_type, prefix) \
> +static inline                                                           \
> +c_type atomic##prefix##_##op##_return_relaxed(c_type i,                 \
> +                          atomic##prefix##_t *v)                        \
> +{                                                                       \
> +        return atomic##prefix##_fetch_##op##_relaxed(i, v) c_op I;      \
> +}                                                                       \
> +static inline                                                           \
> +c_type atomic##prefix##_##op##_return(c_type i, atomic##prefix##_t *v)  \
> +{                                                                       \
> +        return atomic##prefix##_fetch_##op(i, v) c_op I;                \

I (or whatever the replacement expression is going to be following the
earlier comment) wants parenthesizing here.

> +}
> +
> +/*
> + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the reason why
> + * last argument of ATOMIC_FETCH_OP, ATOMIC_OP_RETURN isn't used.
> + */
> +#define ATOMIC_OPS(op, asm_op, c_op, I)                                 \
> +        ATOMIC_FETCH_OP( op, asm_op,       I, w, int,   )               \
> +        ATOMIC_OP_RETURN(op, asm_op, c_op, I, w, int,   )
> +
> +ATOMIC_OPS(add, add, +,  i)
> +ATOMIC_OPS(sub, add, +, -i)
> +
> +#undef ATOMIC_OPS
> +
> +#define ATOMIC_OPS(op, asm_op, I) \
> +        ATOMIC_FETCH_OP(op, asm_op, I, w, int,   )
> +
> +ATOMIC_OPS(and, and, i)
> +ATOMIC_OPS( or,  or, i)
> +ATOMIC_OPS(xor, xor, i)
> +
> +#undef ATOMIC_OPS
> +
> +#undef ATOMIC_FETCH_OP
> +#undef ATOMIC_OP_RETURN
> +
> +/* This is required to provide a full barrier on success. */
> +static inline int atomic_add_unless(atomic_t *v, int a, int u)
> +{
> +       int prev, rc;
> +
> +    asm volatile (
> +        "0: lr.w     %[p],  %[c]\n"
> +        "   beq      %[p],  %[u], 1f\n"
> +        "   add      %[rc], %[p], %[a]\n"
> +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
> +        "   bnez     %[rc], 0b\n"
> +        RISCV_FULL_BARRIER

With this and no .aq on the load, why the .rl on the store?

> +        "1:\n"
> +        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
> +        : [a] "r" (a), [u] "r" (u)
> +        : "memory");
> +    return prev;
> +}
> +
> +/*
> + * atomic_{cmp,}xchg is required to have exactly the same ordering semantics as
> + * {cmp,}xchg and the operations that return.
> + */
> +#define ATOMIC_OP(c_t, prefix, size)                            \
> +static inline                                                   \
> +c_t atomic##prefix##_xchg(atomic##prefix##_t *v, c_t n)         \
> +{                                                               \
> +    return __xchg(&(v->counter), n, size);                      \

No need for the inner parentheses, just like ...

> +}                                                               \
> +static inline                                                   \
> +c_t atomic##prefix##_cmpxchg(atomic##prefix##_t *v, c_t o, c_t n) \
> +{                                                               \
> +    return __cmpxchg(&v->counter, o, n, size);                  \

... you have it here.

> +}
> +
> +#define ATOMIC_OPS() \
> +    ATOMIC_OP(int,   , 4)
> +
> +ATOMIC_OPS()
> +
> +#undef ATOMIC_OPS
> +#undef ATOMIC_OP
> +
> +static inline int atomic_sub_if_positive(atomic_t *v, int offset)
> +{
> +       int prev, rc;
> +
> +    asm volatile (
> +        "0: lr.w     %[p],  %[c]\n"
> +        "   sub      %[rc], %[p], %[o]\n"
> +        "   bltz     %[rc], 1f\n"
> +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
> +        "   bnez     %[rc], 0b\n"
> +        "   fence    rw, rw\n"
> +        "1:\n"
> +        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
> +        : [o] "r" (offset)
> +        : "memory" );
> +    return prev - offset;
> +}

This probably would be nicer if sitting next to atomic_add_unless().

> --- /dev/null
> +++ b/xen/include/asm-generic/atomic-ops.h
> @@ -0,0 +1,97 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * The header provides default implementations for every xen/atomic.h-provided
> + * forward inline declaration that can be synthesized from other atomic
> + * functions.

Or from scratch, as e.g. ...

> + */
> +#ifndef _ASM_GENERIC_ATOMIC_OPS_H_
> +#define _ASM_GENERIC_ATOMIC_OPS_H_
> +
> +#include <xen/atomic.h>
> +#include <xen/lib.h>
> +
> +#ifndef ATOMIC_READ
> +static inline int atomic_read(const atomic_t *v)
> +{
> +    return ACCESS_ONCE(v->counter);
> +}
> +#endif
> +
> +#ifndef _ATOMIC_READ
> +static inline int _atomic_read(atomic_t v)
> +{
> +    return v.counter;
> +}
> +#endif
> +
> +#ifndef ATOMIC_SET
> +static inline void atomic_set(atomic_t *v, int i)
> +{
> +    ACCESS_ONCE(v->counter) = i;
> +}
> +#endif
> +
> +#ifndef _ATOMIC_SET
> +static inline void _atomic_set(atomic_t *v, int i)
> +{
> +    v->counter = i;
> +}
> +#endif

... all of these.

> +#ifndef ATOMIC_SUB_AND_TEST
> +static inline int atomic_sub_and_test(int i, atomic_t *v)
> +{
> +    return atomic_sub_return(i, v) == 0;
> +}
> +#endif
> +
> +#ifndef ATOMIC_INC
> +static inline void atomic_inc(atomic_t *v)
> +{
> +    atomic_add(1, v);
> +}
> +#endif
> +
> +#ifndef ATOMIC_INC_RETURN
> +static inline int atomic_inc_return(atomic_t *v)
> +{
> +    return atomic_add_return(1, v);
> +}
> +#endif
> +
> +#ifndef ATOMIC_DEC
> +static inline void atomic_dec(atomic_t *v)
> +{
> +    atomic_sub(1, v);
> +}
> +#endif
> +
> +#ifndef ATOMIC_DEC_RETURN
> +static inline int atomic_dec_return(atomic_t *v)
> +{
> +    return atomic_sub_return(1, v);
> +}
> +#endif
> +
> +#ifndef ATOMIC_DEC_AND_TEST
> +static inline int atomic_dec_and_test(atomic_t *v)
> +{
> +    return atomic_sub_return(1, v) == 0;
> +}
> +#endif
> +
> +#ifndef ATOMIC_ADD_NEGATIVE
> +static inline int atomic_add_negative(int i, atomic_t *v)
> +{
> +    return atomic_add_return(i, v) < 0;
> +}
> +#endif
> +
> +#ifndef ATOMIC_INC_AND_TEST
> +static inline int atomic_inc_and_test(atomic_t *v)
> +{
> +    return atomic_add_return(1, v) == 0;
> +}
> +#endif

Can this be moved up a little, perhaps next to the other inc-s (or else
next to dec_and_test), please?

Jan

Oleksii Kurochko March 22, 2024, 12:25 p.m. UTC | #2

On Thu, 2024-03-21 at 14:03 +0100, Jan Beulich wrote:
> On 15.03.2024 19:06, Oleksii Kurochko wrote:
> > Initially the patch was introduced by Bobby, who takes the header
> > from
> > Linux kernel.
> > 
> > The following changes were done on top of Linux kernel header:
> >  - atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were
> > updated
> >      to use__*xchg_generic()
> >  - drop casts in write_atomic() as they are unnecessary
> >  - drop introduction of WRITE_ONCE() and READ_ONCE().
> >    Xen provides ACCESS_ONCE()
> 
> Here and in the code comment: While this may be describing what you
> did
> on top of what Bobby had, here you're describing differences to the
> Linux
> header.
> 
> >  - remove zero-length array access in read_atomic()
> >  - drop defines similar to pattern
> 
> pattern? Which one? Oh, wait, ...
> 
> >  - #define atomic_add_return_relaxed   atomic_add_return_relaxed
> 
> ... this line really isn't a separate bullet point.
Yes, '-' is not needed in this text.

> 
> > + */
> > +static always_inline void read_atomic_size(const volatile void *p,
> > +                                           void *res,
> > +                                           unsigned int size)
> > +{
> > +    switch ( size )
> > +    {
> > +    case 1: *(uint8_t *)res = readb(p); break;
> > +    case 2: *(uint16_t *)res = readw(p); break;
> > +    case 4: *(uint32_t *)res = readl(p); break;
> > +    case 8: *(uint32_t *)res  = readq(p); break;
> 
> Nit: Excess blank before =.
> 
> Also - no #ifdef here to be RV32-ready?
Because there is #ifdef RV32 in io.h for readq().

> 
> > +    default: __bad_atomic_size(); break;
> > +    }
> > +}
> > +
> > +#define read_atomic(p) ({                                   \
> > +    union { typeof(*(p)) val; char c[sizeof(*(p))]; } x_;   \
> 
> One trailing underscore here, but ...
> 
> > +    read_atomic_size(p, x_.c, sizeof(*(p)));                \
> > +    x_.val;                                                 \
> > +})
> > +
> > +#define write_atomic(p, x)                              \
> > +({                                                      \
> > +    typeof(*(p)) x__ = (x);                             \
> 
> ... two here and ...
> 
> > +    switch ( sizeof(*(p)) )                             \
> > +    {                                                   \
> > +    case 1: writeb(x__, p); break;                      \
> > +    case 2: writew(x__, p); break;                      \
> > +    case 4: writel(x__, p); break;                      \
> > +    case 8: writeq(x__, p); break;                      \
> > +    default: __bad_atomic_size(); break;                \
> > +    }                                                   \
> > +    x__;                                                \
> > +})
> > +
> > +#define add_sized(p, x)                                 \
> > +({                                                      \
> > +    typeof(*(p)) x__ = (x);                             \
> 
> ... here?
I'll update in the same way.

> 
> > +    switch ( sizeof(*(p)) )                             \
> > +    {                                                   \
> > +    case 1: writeb(read_atomic(p) + x__, p); break;     \
> > +    case 2: writew(read_atomic(p) + x__, p); break;     \
> > +    case 4: writel(read_atomic(p) + x__, p); break;     \
> > +    case 8: writeq(read_atomic(p) + x__, p); break;     \
> > +    default: __bad_atomic_size(); break;                \
> > +    }                                                   \
> > +})
> > +
> > +#define __atomic_acquire_fence() \
> > +    asm volatile ( RISCV_ACQUIRE_BARRIER "" ::: "memory" )
> > +
> > +#define __atomic_release_fence() \
> > +    asm volatile ( RISCV_RELEASE_BARRIER "" ::: "memory" )
> > +
> > +/*
> > + * First, the atomic ops that have no ordering constraints and
> > therefor don't
> > + * have the AQ or RL bits set.  These don't return anything, so
> > there's only
> > + * one version to worry about.
> > + */
> > +#define ATOMIC_OP(op, asm_op, I, asm_type, c_type, prefix)  \
> > +static inline                                               \
> > +void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
> > +{                                                           \
> > +    asm volatile (                                          \
> > +        "   amo" #asm_op "." #asm_type " zero, %1, %0"      \
> > +        : "+A" (v->counter)                                 \
> > +        : "r" (I)                                           \
> 
> Btw, I consider this pretty confusing. At the 1st and 2nd glance this
> looks like a mistake, i.e. as if i was meant. Imo ...
> 
> > +        : "memory" );                                       \
> > +}                                                           \
> > +
> > +/*
> > + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the
> > reason why
> > + * last argument for ATOMIC_OP isn't used.
> > + */
> > +#define ATOMIC_OPS(op, asm_op, I)                           \
> > +        ATOMIC_OP (op, asm_op, I, w, int,   )
> > +
> > +ATOMIC_OPS(add, add,  i)
> > +ATOMIC_OPS(sub, add, -i)
> > +ATOMIC_OPS(and, and,  i)
> > +ATOMIC_OPS( or,  or,  i)
> > +ATOMIC_OPS(xor, xor,  i)
> 
> ... here you want to only pass the (unary) operator (and leaving that
> blank
> is as fine as using +).
I agree that a game with 'i' and 'I' looks confusing, but I am not
really understand what is wrong with using ' i' here. It seems that
preprocessed macros looks fine:
   static inline void atomic_add(int i, atomic_t *v) { asm volatile ( "  
   amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (i) :
   "memory" ); }
   
   static inline void atomic_sub(int i, atomic_t *v) { asm volatile ( "  
   amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (-i) :
   "memory" ); }

> 
> > +#undef ATOMIC_OP
> > +#undef ATOMIC_OPS
> > +
> > +#include <asm-generic/atomic-ops.h>
> > +
> > +/*
> > + * Atomic ops that have ordered, relaxed, acquire, and release
> > variants.
> 
> Only the first is implemented afaict; imo the comment would better
> reflect
> that one way or another.
> 
> > + * There's two flavors of these: the arithmatic ops have both
> > fetch and return
> > + * versions, while the logical ops only have fetch versions.
> > + */
> > +#define ATOMIC_FETCH_OP(op, asm_op, I, asm_type, c_type,
> > prefix)    \
> > +static
> > inline                                                       \
> > +c_type atomic##prefix##_fetch_##op##_relaxed(c_type
> > i,              \
> > +                         atomic##prefix##_t
> > *v)                     \
> > +{                                                                 
> >   \
> > +    register c_type
> > ret;                                            \
> > +    asm volatile
> > (                                                  \
> > +        "   amo" #asm_op "." #asm_type " %1, %2,
> > %0"                \
> > +        : "+A" (v->counter), "=r"
> > (ret)                             \
> > +        : "r"
> > (I)                                                   \
> > +        : "memory"
> > );                                               \
> > +    return
> > ret;                                                     \
> > +}                                                                 
> >   \
> 
> Actually a relaxed form is provided here, but does that have any
> user?
There is no user for a relaxed form, just overlooked that.

> 
> > +static
> > inline                                                       \
> > +c_type atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t
> > *v) \
> > +{                                                                 
> >   \
> > +    register c_type
> > ret;                                            \
> > +    asm volatile
> > (                                                  \
> > +        "   amo" #asm_op "." #asm_type ".aqrl  %1, %2,
> > %0"          \
> > +        : "+A" (v->counter), "=r"
> > (ret)                             \
> > +        : "r"
> > (I)                                                   \
> > +        : "memory"
> > );                                               \
> > +    return
> > ret;                                                     \
> > +}
> > +
> > +#define ATOMIC_OP_RETURN(op, asm_op, c_op, I, asm_type, c_type,
> > prefix) \
> > +static
> > inline                                                           \
> > +c_type atomic##prefix##_##op##_return_relaxed(c_type
> > i,                 \
> > +                          atomic##prefix##_t
> > *v)                        \
> > +{                                                                 
> >       \
> > +        return atomic##prefix##_fetch_##op##_relaxed(i, v) c_op
> > I;      \
> > +}                                                                 
> >       \
> > +static
> > inline                                                           \
> > +c_type atomic##prefix##_##op##_return(c_type i, atomic##prefix##_t
> > *v)  \
> > +{                                                                 
> >       \
> > +        return atomic##prefix##_fetch_##op(i, v) c_op
> > I;                \
> 
> I (or whatever the replacement expression is going to be following
> the
> earlier comment) wants parenthesizing here.
> 
> > +}
> > +
> > +/*
> > + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the
> > reason why
> > + * last argument of ATOMIC_FETCH_OP, ATOMIC_OP_RETURN isn't used.
> > + */
> > +#define ATOMIC_OPS(op, asm_op, c_op,
> > I)                                 \
> > +        ATOMIC_FETCH_OP( op, asm_op,       I, w, int,  
> > )               \
> > +        ATOMIC_OP_RETURN(op, asm_op, c_op, I, w, int,   )
> > +
> > +ATOMIC_OPS(add, add, +,  i)
> > +ATOMIC_OPS(sub, add, +, -i)
> > +
> > +#undef ATOMIC_OPS
> > +
> > +#define ATOMIC_OPS(op, asm_op, I) \
> > +        ATOMIC_FETCH_OP(op, asm_op, I, w, int,   )
> > +
> > +ATOMIC_OPS(and, and, i)
> > +ATOMIC_OPS( or,  or, i)
> > +ATOMIC_OPS(xor, xor, i)
> > +
> > +#undef ATOMIC_OPS
> > +
> > +#undef ATOMIC_FETCH_OP
> > +#undef ATOMIC_OP_RETURN
> > +
> > +/* This is required to provide a full barrier on success. */
> > +static inline int atomic_add_unless(atomic_t *v, int a, int u)
> > +{
> > +       int prev, rc;
> > +
> > +    asm volatile (
> > +        "0: lr.w     %[p],  %[c]\n"
> > +        "   beq      %[p],  %[u], 1f\n"
> > +        "   add      %[rc], %[p], %[a]\n"
> > +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
> > +        "   bnez     %[rc], 0b\n"
> > +        RISCV_FULL_BARRIER
> 
> With this and no .aq on the load, why the .rl on the store?
It is something that LKMM requires [1].

This is not fully clear to me what is so specific in LKMM, but accoring
to the spec:
   Ordering Annotation Fence-based Equivalent
   l{b|h|w|d|r}.aq     l{b|h|w|d|r}; fence r,rw
   l{b|h|w|d|r}.aqrl   fence rw,rw; l{b|h|w|d|r}; fence r,rw
   s{b|h|w|d|c}.rl     fence rw,w; s{b|h|w|d|c}
   s{b|h|w|d|c}.aqrl   fence rw,w; s{b|h|w|d|c}
   amo<op>.aq          amo<op>; fence r,rw
   amo<op>.rl          fence rw,w; amo<op>
   amo<op>.aqrl        fence rw,rw; amo<op>; fence rw,rw
   Table 2.2: Mappings from .aq and/or .rl to fence-based equivalents.
   An alternative mapping places a fence rw,rw after the existing 
   s{b|h|w|d|c} mapping rather than at the front of the
   l{b|h|w|d|r} mapping.
   
   It is also safe to translate any .aq, .rl, or .aqrl annotation into
   the fence-based snippets of
   Table 2.2. These can also be used as a legal implementation of
   l{b|h|w|d} or s{b|h|w|d} pseu-
   doinstructions for as long as those instructions are not added to
   the ISA.

So according to the spec, it should be:
 sc.w ...
 RISCV_FULL_BARRIER.

Considering [1] and how this code looks before, it seems to me that it
is safe to use lr.w.aq/sc.w.rl here or an fence equivalent.

But in general it ( a combination of fence, .aq, .rl ) can be
considered as the same things in this context, so it is possible to
leave this function as is to be synced here with Linux kernel.

[1]https://lore.kernel.org/lkml/1520274276-21871-1-git-send-email-parri.andrea@gmail.com/

~ Oleksii

> 
> > +        "1:\n"
> > +        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
> > +        : [a] "r" (a), [u] "r" (u)
> > +        : "memory");
> > +    return prev;
> > +}
> > +
> > +/*
> > + * atomic_{cmp,}xchg is required to have exactly the same ordering
> > semantics as
> > + * {cmp,}xchg and the operations that return.
> > + */
> > +#define ATOMIC_OP(c_t, prefix, size)                            \
> > +static inline                                                   \
> > +c_t atomic##prefix##_xchg(atomic##prefix##_t *v, c_t n)         \
> > +{                                                               \
> > +    return __xchg(&(v->counter), n, size);                      \
> 
> No need for the inner parentheses, just like ...
> 
> > +}                                                               \
> > +static inline                                                   \
> > +c_t atomic##prefix##_cmpxchg(atomic##prefix##_t *v, c_t o, c_t n)
> > \
> > +{                                                               \
> > +    return __cmpxchg(&v->counter, o, n, size);                  \
> 
> ... you have it here.
> 
> > +}
> > +
> > +#define ATOMIC_OPS() \
> > +    ATOMIC_OP(int,   , 4)
> > +
> > +ATOMIC_OPS()
> > +
> > +#undef ATOMIC_OPS
> > +#undef ATOMIC_OP
> > +
> > +static inline int atomic_sub_if_positive(atomic_t *v, int offset)
> > +{
> > +       int prev, rc;
> > +
> > +    asm volatile (
> > +        "0: lr.w     %[p],  %[c]\n"
> > +        "   sub      %[rc], %[p], %[o]\n"
> > +        "   bltz     %[rc], 1f\n"
> > +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
> > +        "   bnez     %[rc], 0b\n"
> > +        "   fence    rw, rw\n"
> > +        "1:\n"
> > +        : [p] "=&r" (prev), [rc] "=&r" (rc), [c] "+A" (v->counter)
> > +        : [o] "r" (offset)
> > +        : "memory" );
> > +    return prev - offset;
> > +}
> 
> This probably would be nicer if sitting next to atomic_add_unless().
> 
> > --- /dev/null
> > +++ b/xen/include/asm-generic/atomic-ops.h
> > @@ -0,0 +1,97 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * The header provides default implementations for every
> > xen/atomic.h-provided
> > + * forward inline declaration that can be synthesized from other
> > atomic
> > + * functions.
> 
> Or from scratch, as e.g. ...
> 
> > + */
> > +#ifndef _ASM_GENERIC_ATOMIC_OPS_H_
> > +#define _ASM_GENERIC_ATOMIC_OPS_H_
> > +
> > +#include <xen/atomic.h>
> > +#include <xen/lib.h>
> > +
> > +#ifndef ATOMIC_READ
> > +static inline int atomic_read(const atomic_t *v)
> > +{
> > +    return ACCESS_ONCE(v->counter);
> > +}
> > +#endif
> > +
> > +#ifndef _ATOMIC_READ
> > +static inline int _atomic_read(atomic_t v)
> > +{
> > +    return v.counter;
> > +}
> > +#endif
> > +
> > +#ifndef ATOMIC_SET
> > +static inline void atomic_set(atomic_t *v, int i)
> > +{
> > +    ACCESS_ONCE(v->counter) = i;
> > +}
> > +#endif
> > +
> > +#ifndef _ATOMIC_SET
> > +static inline void _atomic_set(atomic_t *v, int i)
> > +{
> > +    v->counter = i;
> > +}
> > +#endif
> 
> ... all of these.
> 
> > +#ifndef ATOMIC_SUB_AND_TEST
> > +static inline int atomic_sub_and_test(int i, atomic_t *v)
> > +{
> > +    return atomic_sub_return(i, v) == 0;
> > +}
> > +#endif
> > +
> > +#ifndef ATOMIC_INC
> > +static inline void atomic_inc(atomic_t *v)
> > +{
> > +    atomic_add(1, v);
> > +}
> > +#endif
> > +
> > +#ifndef ATOMIC_INC_RETURN
> > +static inline int atomic_inc_return(atomic_t *v)
> > +{
> > +    return atomic_add_return(1, v);
> > +}
> > +#endif
> > +
> > +#ifndef ATOMIC_DEC
> > +static inline void atomic_dec(atomic_t *v)
> > +{
> > +    atomic_sub(1, v);
> > +}
> > +#endif
> > +
> > +#ifndef ATOMIC_DEC_RETURN
> > +static inline int atomic_dec_return(atomic_t *v)
> > +{
> > +    return atomic_sub_return(1, v);
> > +}
> > +#endif
> > +
> > +#ifndef ATOMIC_DEC_AND_TEST
> > +static inline int atomic_dec_and_test(atomic_t *v)
> > +{
> > +    return atomic_sub_return(1, v) == 0;
> > +}
> > +#endif
> > +
> > +#ifndef ATOMIC_ADD_NEGATIVE
> > +static inline int atomic_add_negative(int i, atomic_t *v)
> > +{
> > +    return atomic_add_return(i, v) < 0;
> > +}
> > +#endif
> > +
> > +#ifndef ATOMIC_INC_AND_TEST
> > +static inline int atomic_inc_and_test(atomic_t *v)
> > +{
> > +    return atomic_add_return(1, v) == 0;
> > +}
> > +#endif
> 
> Can this be moved up a little, perhaps next to the other inc-s (or
> else
> next to dec_and_test), please?
> 
> Jan

Jan Beulich March 25, 2024, 8:18 a.m. UTC | #3

On 22.03.2024 13:25, Oleksii wrote:
> On Thu, 2024-03-21 at 14:03 +0100, Jan Beulich wrote:
>> On 15.03.2024 19:06, Oleksii Kurochko wrote:
>>> + */
>>> +static always_inline void read_atomic_size(const volatile void *p,
>>> +                                           void *res,
>>> +                                           unsigned int size)
>>> +{
>>> +    switch ( size )
>>> +    {
>>> +    case 1: *(uint8_t *)res = readb(p); break;
>>> +    case 2: *(uint16_t *)res = readw(p); break;
>>> +    case 4: *(uint32_t *)res = readl(p); break;
>>> +    case 8: *(uint32_t *)res  = readq(p); break;
>>
>> Nit: Excess blank before =.
>>
>> Also - no #ifdef here to be RV32-ready?
> Because there is #ifdef RV32 in io.h for readq().

There you'd run into __raw_readq()'s BUILD_BUG_ON(), afaict even for
1-, 2-, or 4-byte accesses. That's not quite what we want here.

>>> +/*
>>> + * First, the atomic ops that have no ordering constraints and
>>> therefor don't
>>> + * have the AQ or RL bits set.  These don't return anything, so
>>> there's only
>>> + * one version to worry about.
>>> + */
>>> +#define ATOMIC_OP(op, asm_op, I, asm_type, c_type, prefix)  \
>>> +static inline                                               \
>>> +void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
>>> +{                                                           \
>>> +    asm volatile (                                          \
>>> +        "   amo" #asm_op "." #asm_type " zero, %1, %0"      \
>>> +        : "+A" (v->counter)                                 \
>>> +        : "r" (I)                                           \
>>
>> Btw, I consider this pretty confusing. At the 1st and 2nd glance this
>> looks like a mistake, i.e. as if i was meant. Imo ...
>>
>>> +        : "memory" );                                       \
>>> +}                                                           \
>>> +
>>> +/*
>>> + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is the
>>> reason why
>>> + * last argument for ATOMIC_OP isn't used.
>>> + */
>>> +#define ATOMIC_OPS(op, asm_op, I)                           \
>>> +        ATOMIC_OP (op, asm_op, I, w, int,   )
>>> +
>>> +ATOMIC_OPS(add, add,  i)
>>> +ATOMIC_OPS(sub, add, -i)
>>> +ATOMIC_OPS(and, and,  i)
>>> +ATOMIC_OPS( or,  or,  i)
>>> +ATOMIC_OPS(xor, xor,  i)
>>
>> ... here you want to only pass the (unary) operator (and leaving that
>> blank
>> is as fine as using +).
> I agree that a game with 'i' and 'I' looks confusing, but I am not
> really understand what is wrong with using ' i' here. It seems that
> preprocessed macros looks fine:
>    static inline void atomic_add(int i, atomic_t *v) { asm volatile ( "  
>    amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (i) :
>    "memory" ); }
>    
>    static inline void atomic_sub(int i, atomic_t *v) { asm volatile ( "  
>    amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (-i) :
>    "memory" ); }

I didn't question the pre-processed result being correct. Instead I said
that I consider the construct confusing to the reader, for looking as if
there was a mistake (in the case of the letter i used). Note also in
particular how the macro invocations need to be in sync with the macro
implementation, for lower case i being used both in the macro and in its
invocations. Anything parameterized would better be fully so, at the
very least to avoid, as said, confusion. (Having macros depend on
context at their use sites _may_ be okay for local helper macros, but
here we're talking about a not even private header file.)

>>> +/* This is required to provide a full barrier on success. */
>>> +static inline int atomic_add_unless(atomic_t *v, int a, int u)
>>> +{
>>> +       int prev, rc;
>>> +
>>> +    asm volatile (
>>> +        "0: lr.w     %[p],  %[c]\n"
>>> +        "   beq      %[p],  %[u], 1f\n"
>>> +        "   add      %[rc], %[p], %[a]\n"
>>> +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
>>> +        "   bnez     %[rc], 0b\n"
>>> +        RISCV_FULL_BARRIER
>>
>> With this and no .aq on the load, why the .rl on the store?
> It is something that LKMM requires [1].
> 
> This is not fully clear to me what is so specific in LKMM, but accoring
> to the spec:
>    Ordering Annotation Fence-based Equivalent
>    l{b|h|w|d|r}.aq     l{b|h|w|d|r}; fence r,rw
>    l{b|h|w|d|r}.aqrl   fence rw,rw; l{b|h|w|d|r}; fence r,rw
>    s{b|h|w|d|c}.rl     fence rw,w; s{b|h|w|d|c}
>    s{b|h|w|d|c}.aqrl   fence rw,w; s{b|h|w|d|c}
>    amo<op>.aq          amo<op>; fence r,rw
>    amo<op>.rl          fence rw,w; amo<op>
>    amo<op>.aqrl        fence rw,rw; amo<op>; fence rw,rw
>    Table 2.2: Mappings from .aq and/or .rl to fence-based equivalents.
>    An alternative mapping places a fence rw,rw after the existing 
>    s{b|h|w|d|c} mapping rather than at the front of the
>    l{b|h|w|d|r} mapping.

I'm afraid I can't spot the specific case in this table. None of the
stores in the right column have a .rl suffix.

>    It is also safe to translate any .aq, .rl, or .aqrl annotation into
>    the fence-based snippets of
>    Table 2.2. These can also be used as a legal implementation of
>    l{b|h|w|d} or s{b|h|w|d} pseu-
>    doinstructions for as long as those instructions are not added to
>    the ISA.
> 
> So according to the spec, it should be:
>  sc.w ...
>  RISCV_FULL_BARRIER.
> 
> Considering [1] and how this code looks before, it seems to me that it
> is safe to use lr.w.aq/sc.w.rl here or an fence equivalent.

Here you say "or". Then why dos the code use sc.?.rl _and_ a fence?

> But in general it ( a combination of fence, .aq, .rl ) can be
> considered as the same things in this context, so it is possible to
> leave this function as is to be synced here with Linux kernel.

In turn I also don't understand this. Yes, the excess .rl certainly
doesn't render things unsafe. But what's the purpose of the .rl? That's
what my original question boiled down to.

Jan

> [1]https://lore.kernel.org/lkml/1520274276-21871-1-git-send-email-parri.andrea@gmail.com/
> 
> ~ Oleksii

Oleksii Kurochko March 26, 2024, 7:02 p.m. UTC | #4

On Mon, 2024-03-25 at 09:18 +0100, Jan Beulich wrote:
> On 22.03.2024 13:25, Oleksii wrote:
> > On Thu, 2024-03-21 at 14:03 +0100, Jan Beulich wrote:
> > > On 15.03.2024 19:06, Oleksii Kurochko wrote:
> > > > + */
> > > > +static always_inline void read_atomic_size(const volatile void
> > > > *p,
> > > > +                                           void *res,
> > > > +                                           unsigned int size)
> > > > +{
> > > > +    switch ( size )
> > > > +    {
> > > > +    case 1: *(uint8_t *)res = readb(p); break;
> > > > +    case 2: *(uint16_t *)res = readw(p); break;
> > > > +    case 4: *(uint32_t *)res = readl(p); break;
> > > > +    case 8: *(uint32_t *)res  = readq(p); break;
> > > 
> > > Nit: Excess blank before =.
> > > 
> > > Also - no #ifdef here to be RV32-ready?
> > Because there is #ifdef RV32 in io.h for readq().
> 
> There you'd run into __raw_readq()'s BUILD_BUG_ON(), afaict even for
> 1-, 2-, or 4-byte accesses. That's not quite what we want here.
Do you mean that if someone will redefine readq() in another way and
not wrap it by #ifdef RV32? Except this I am not sure that there is an
issue as it will be still a compilation error, so anyway it will be
needed to provide an implementation for __raw_readq().

One of the reason why I decided to wrap with #ifdef RV32 in io.h to not
go over the source code and add wrapping. Also for some code it will be
needed to rewrite it. For example, I am not sure that I can add #ifdef
inside macros, f.e.:
   #define write_atomic(p, x)                              \
   ({                                                      \
       typeof(*(p)) x__ = (x);                             \
       switch ( sizeof(*(p)) )                             \
       {                                                   \
       case 1: writeb(x__, p); break;                      \
       case 2: writew(x__, p); break;                      \
       case 4: writel(x__, p); break;                      \
       case 8: writeq(x__, p); break;                      \
       default: __bad_atomic_size(); break;                \
       }                                                   \
       x__;                                                \
   })

> 
> > > > +/*
> > > > + * First, the atomic ops that have no ordering constraints and
> > > > therefor don't
> > > > + * have the AQ or RL bits set.  These don't return anything,
> > > > so
> > > > there's only
> > > > + * one version to worry about.
> > > > + */
> > > > +#define ATOMIC_OP(op, asm_op, I, asm_type, c_type, prefix)  \
> > > > +static inline                                               \
> > > > +void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
> > > > +{                                                           \
> > > > +    asm volatile (                                          \
> > > > +        "   amo" #asm_op "." #asm_type " zero, %1, %0"      \
> > > > +        : "+A" (v->counter)                                 \
> > > > +        : "r" (I)                                           \
> > > 
> > > Btw, I consider this pretty confusing. At the 1st and 2nd glance
> > > this
> > > looks like a mistake, i.e. as if i was meant. Imo ...
> > > 
> > > > +        : "memory" );                                       \
> > > > +}                                                           \
> > > > +
> > > > +/*
> > > > + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is
> > > > the
> > > > reason why
> > > > + * last argument for ATOMIC_OP isn't used.
> > > > + */
> > > > +#define ATOMIC_OPS(op, asm_op, I)                           \
> > > > +        ATOMIC_OP (op, asm_op, I, w, int,   )
> > > > +
> > > > +ATOMIC_OPS(add, add,  i)
> > > > +ATOMIC_OPS(sub, add, -i)
> > > > +ATOMIC_OPS(and, and,  i)
> > > > +ATOMIC_OPS( or,  or,  i)
> > > > +ATOMIC_OPS(xor, xor,  i)
> > > 
> > > ... here you want to only pass the (unary) operator (and leaving
> > > that
> > > blank
> > > is as fine as using +).
> > I agree that a game with 'i' and 'I' looks confusing, but I am not
> > really understand what is wrong with using ' i' here. It seems that
> > preprocessed macros looks fine:
> >    static inline void atomic_add(int i, atomic_t *v) { asm volatile
> > ( "  
> >    amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (i)
> > :
> >    "memory" ); }
> >    
> >    static inline void atomic_sub(int i, atomic_t *v) { asm volatile
> > ( "  
> >    amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (-
> > i) :
> >    "memory" ); }
> 
> I didn't question the pre-processed result being correct. Instead I
> said
> that I consider the construct confusing to the reader, for looking as
> if
> there was a mistake (in the case of the letter i used). Note also in
> particular how the macro invocations need to be in sync with the
> macro
> implementation, for lower case i being used both in the macro and in
> its
> invocations. Anything parameterized would better be fully so, at the
> very least to avoid, as said, confusion. (Having macros depend on
> context at their use sites _may_ be okay for local helper macros, but
> here we're talking about a not even private header file.)
I am not sure then I understand how mentioning '+i' will help
significantly remove confusion.

> 
> > > > +/* This is required to provide a full barrier on success. */
> > > > +static inline int atomic_add_unless(atomic_t *v, int a, int u)
> > > > +{
> > > > +       int prev, rc;
> > > > +
> > > > +    asm volatile (
> > > > +        "0: lr.w     %[p],  %[c]\n"
> > > > +        "   beq      %[p],  %[u], 1f\n"
> > > > +        "   add      %[rc], %[p], %[a]\n"
> > > > +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
> > > > +        "   bnez     %[rc], 0b\n"
> > > > +        RISCV_FULL_BARRIER
> > > 
> > > With this and no .aq on the load, why the .rl on the store?
> > It is something that LKMM requires [1].
> > 
> > This is not fully clear to me what is so specific in LKMM, but
> > accoring
> > to the spec:
> >    Ordering Annotation Fence-based Equivalent
> >    l{b|h|w|d|r}.aq     l{b|h|w|d|r}; fence r,rw
> >    l{b|h|w|d|r}.aqrl   fence rw,rw; l{b|h|w|d|r}; fence r,rw
> >    s{b|h|w|d|c}.rl     fence rw,w; s{b|h|w|d|c}
> >    s{b|h|w|d|c}.aqrl   fence rw,w; s{b|h|w|d|c}
> >    amo<op>.aq          amo<op>; fence r,rw
> >    amo<op>.rl          fence rw,w; amo<op>
> >    amo<op>.aqrl        fence rw,rw; amo<op>; fence rw,rw
> >    Table 2.2: Mappings from .aq and/or .rl to fence-based
> > equivalents.
> >    An alternative mapping places a fence rw,rw after the existing 
> >    s{b|h|w|d|c} mapping rather than at the front of the
> >    l{b|h|w|d|r} mapping.
> 
> I'm afraid I can't spot the specific case in this table. None of the
> stores in the right column have a .rl suffix.
Yes, it is expected.

I am reading this table as (f.e.) amo<op>.rl is an equivalent of fence
rw,w; amo<op>. (without .rl) 

> 
> >    It is also safe to translate any .aq, .rl, or .aqrl annotation
> > into
> >    the fence-based snippets of
> >    Table 2.2. These can also be used as a legal implementation of
> >    l{b|h|w|d} or s{b|h|w|d} pseu-
> >    doinstructions for as long as those instructions are not added
> > to
> >    the ISA.
> > 
> > So according to the spec, it should be:
> >  sc.w ...
> >  RISCV_FULL_BARRIER.
> > 
> > Considering [1] and how this code looks before, it seems to me that
> > it
> > is safe to use lr.w.aq/sc.w.rl here or an fence equivalent.
> 
> Here you say "or". Then why dos the code use sc.?.rl _and_ a fence?
I confused this line with amo<op>.aqrl, so based on the table 2.2 above
s{b|h|w|d|c}.aqrl is transformed to "fence rw,w; s{b|h|w|d|c}", but
Linux kernel decided to strengthen it with full barrier:
   -              "0:\n\t"
   -              "lr.w.aqrl  %[p],  %[c]\n\t"
   -              "beq        %[p],  %[u], 1f\n\t"
   -              "add       %[rc],  %[p], %[a]\n\t"
   -              "sc.w.aqrl %[rc], %[rc], %[c]\n\t"
   -              "bnez      %[rc], 0b\n\t"
   -              "1:"
   +               "0:     lr.w     %[p],  %[c]\n"
   +               "       beq      %[p],  %[u], 1f\n"
   +               "       add      %[rc], %[p], %[a]\n"
   +               "       sc.w.rl  %[rc], %[rc], %[c]\n"
   +               "       bnez     %[rc], 0b\n"
   +               "       fence    rw, rw\n"
   +               "1:\n"
As they have the following issue:
   implementations of atomics such as atomic_cmpxchg() and
   atomic_add_unless() rely on LR/SC pairs,
   which do not give full-ordering with .aqrl; for example, current
   implementations allow the "lr-sc-aqrl-pair-vs-full-barrier" test
   below to end up with the state indicated in the "exists" clause.

> 
> > But in general it ( a combination of fence, .aq, .rl ) can be
> > considered as the same things in this context, so it is possible to
> > leave this function as is to be synced here with Linux kernel.
> 
> In turn I also don't understand this. Yes, the excess .rl certainly
> doesn't render things unsafe. But what's the purpose of the .rl?
> That's
> what my original question boiled down to.
I don't know, either. As I mentioned before, it is enough ( in my
opinion ) to have a FULL barrier or .aq,.rl or .aqrl/.aqrl ( if it
needed to be strengthened) like it was done before in Linux.
It seems to me it is LKMM specific that they need more to be more
strengthened as it RISC-V Memory model requires because:
"sc.w ; fence rw, rw" does not guarantee that all previous reads and
writes finish before the sc itself is globally visible, which might
matter if the sc is unlocking a lock or something.

Despite of the fact, for compare-and-swap loops, RISC-V international
recommends lr.w.aq/lr.d.aq followed by sc.w.rl/sc.d.rl ( as it was
implemeted before in Linux kernel ) I am okay just for safety reasons
and for the reason I mentioned at the last sentence of previous
paragraph to strengthen implementations with fences.

~ Oleksii
> 
> Jan
> 
> > [1]
> > https://lore.kernel.org/lkml/1520274276-21871-1-git-send-email-parri.andrea@gmail.com
> > /
> > 
> > ~ Oleksii
>

Oleksii Kurochko March 26, 2024, 9:24 p.m. UTC | #5

On Tue, 2024-03-26 at 20:02 +0100, Oleksii wrote:
> On Mon, 2024-03-25 at 09:18 +0100, Jan Beulich wrote:
> > On 22.03.2024 13:25, Oleksii wrote:
> > > On Thu, 2024-03-21 at 14:03 +0100, Jan Beulich wrote:
> > > > On 15.03.2024 19:06, Oleksii Kurochko wrote:
> > > > > + */
> > > > > +static always_inline void read_atomic_size(const volatile
> > > > > void
> > > > > *p,
> > > > > +                                           void *res,
> > > > > +                                           unsigned int
> > > > > size)
> > > > > +{
> > > > > +    switch ( size )
> > > > > +    {
> > > > > +    case 1: *(uint8_t *)res = readb(p); break;
> > > > > +    case 2: *(uint16_t *)res = readw(p); break;
> > > > > +    case 4: *(uint32_t *)res = readl(p); break;
> > > > > +    case 8: *(uint32_t *)res  = readq(p); break;
> > > > 
> > > > Nit: Excess blank before =.
> > > > 
> > > > Also - no #ifdef here to be RV32-ready?
> > > Because there is #ifdef RV32 in io.h for readq().
> > 
> > There you'd run into __raw_readq()'s BUILD_BUG_ON(), afaict even
> > for
> > 1-, 2-, or 4-byte accesses. That's not quite what we want here.
> Do you mean that if someone will redefine readq() in another way and
> not wrap it by #ifdef RV32? Except this I am not sure that there is
> an
> issue as it will be still a compilation error, so anyway it will be
> needed to provide an implementation for __raw_readq().
> 
> One of the reason why I decided to wrap with #ifdef RV32 in io.h to
> not
> go over the source code and add wrapping. Also for some code it will
> be
> needed to rewrite it. For example, I am not sure that I can add
> #ifdef
> inside macros, f.e.:
>    #define write_atomic(p, x)                              \
>    ({                                                      \
>        typeof(*(p)) x__ = (x);                             \
>        switch ( sizeof(*(p)) )                             \
>        {                                                   \
>        case 1: writeb(x__, p); break;                      \
>        case 2: writew(x__, p); break;                      \
>        case 4: writel(x__, p); break;                      \
>        case 8: writeq(x__, p); break;                      \
>        default: __bad_atomic_size(); break;                \
>        }                                                   \
>        x__;                                                \
>    })
> 
> > 
> > > > > +/*
> > > > > + * First, the atomic ops that have no ordering constraints
> > > > > and
> > > > > therefor don't
> > > > > + * have the AQ or RL bits set.  These don't return anything,
> > > > > so
> > > > > there's only
> > > > > + * one version to worry about.
> > > > > + */
> > > > > +#define ATOMIC_OP(op, asm_op, I, asm_type, c_type, prefix) 
> > > > > \
> > > > > +static inline                                              
> > > > > \
> > > > > +void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v)
> > > > > \
> > > > > +{                                                          
> > > > > \
> > > > > +    asm volatile (                                         
> > > > > \
> > > > > +        "   amo" #asm_op "." #asm_type " zero, %1, %0"     
> > > > > \
> > > > > +        : "+A" (v->counter)                                
> > > > > \
> > > > > +        : "r" (I)                                          
> > > > > \
> > > > 
> > > > Btw, I consider this pretty confusing. At the 1st and 2nd
> > > > glance
> > > > this
> > > > looks like a mistake, i.e. as if i was meant. Imo ...
> > > > 
> > > > > +        : "memory" );                                      
> > > > > \
> > > > > +}                                                          
> > > > > \
> > > > > +
> > > > > +/*
> > > > > + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is
> > > > > the
> > > > > reason why
> > > > > + * last argument for ATOMIC_OP isn't used.
> > > > > + */
> > > > > +#define ATOMIC_OPS(op, asm_op, I)                          
> > > > > \
> > > > > +        ATOMIC_OP (op, asm_op, I, w, int,   )
> > > > > +
> > > > > +ATOMIC_OPS(add, add,  i)
> > > > > +ATOMIC_OPS(sub, add, -i)
> > > > > +ATOMIC_OPS(and, and,  i)
> > > > > +ATOMIC_OPS( or,  or,  i)
> > > > > +ATOMIC_OPS(xor, xor,  i)
> > > > 
> > > > ... here you want to only pass the (unary) operator (and
> > > > leaving
> > > > that
> > > > blank
> > > > is as fine as using +).
> > > I agree that a game with 'i' and 'I' looks confusing, but I am
> > > not
> > > really understand what is wrong with using ' i' here. It seems
> > > that
> > > preprocessed macros looks fine:
> > >    static inline void atomic_add(int i, atomic_t *v) { asm
> > > volatile
> > > ( "  
> > >    amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r"
> > > (i)
> > > :
> > >    "memory" ); }
> > >    
> > >    static inline void atomic_sub(int i, atomic_t *v) { asm
> > > volatile
> > > ( "  
> > >    amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r"
> > > (-
> > > i) :
> > >    "memory" ); }
> > 
> > I didn't question the pre-processed result being correct. Instead I
> > said
> > that I consider the construct confusing to the reader, for looking
> > as
> > if
> > there was a mistake (in the case of the letter i used). Note also
> > in
> > particular how the macro invocations need to be in sync with the
> > macro
> > implementation, for lower case i being used both in the macro and
> > in
> > its
> > invocations. Anything parameterized would better be fully so, at
> > the
> > very least to avoid, as said, confusion. (Having macros depend on
> > context at their use sites _may_ be okay for local helper macros,
> > but
> > here we're talking about a not even private header file.)
> I am not sure then I understand how mentioning '+i' will help
> significantly remove confusion.
> 
> > 
> > > > > +/* This is required to provide a full barrier on success. */
> > > > > +static inline int atomic_add_unless(atomic_t *v, int a, int
> > > > > u)
> > > > > +{
> > > > > +       int prev, rc;
> > > > > +
> > > > > +    asm volatile (
> > > > > +        "0: lr.w     %[p],  %[c]\n"
> > > > > +        "   beq      %[p],  %[u], 1f\n"
> > > > > +        "   add      %[rc], %[p], %[a]\n"
> > > > > +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
> > > > > +        "   bnez     %[rc], 0b\n"
> > > > > +        RISCV_FULL_BARRIER
> > > > 
> > > > With this and no .aq on the load, why the .rl on the store?
> > > It is something that LKMM requires [1].
> > > 
> > > This is not fully clear to me what is so specific in LKMM, but
> > > accoring
> > > to the spec:
> > >    Ordering Annotation Fence-based Equivalent
> > >    l{b|h|w|d|r}.aq     l{b|h|w|d|r}; fence r,rw
> > >    l{b|h|w|d|r}.aqrl   fence rw,rw; l{b|h|w|d|r}; fence r,rw
> > >    s{b|h|w|d|c}.rl     fence rw,w; s{b|h|w|d|c}
> > >    s{b|h|w|d|c}.aqrl   fence rw,w; s{b|h|w|d|c}
> > >    amo<op>.aq          amo<op>; fence r,rw
> > >    amo<op>.rl          fence rw,w; amo<op>
> > >    amo<op>.aqrl        fence rw,rw; amo<op>; fence rw,rw
> > >    Table 2.2: Mappings from .aq and/or .rl to fence-based
> > > equivalents.
> > >    An alternative mapping places a fence rw,rw after the existing
> > >    s{b|h|w|d|c} mapping rather than at the front of the
> > >    l{b|h|w|d|r} mapping.
> > 
> > I'm afraid I can't spot the specific case in this table. None of
> > the
> > stores in the right column have a .rl suffix.
> Yes, it is expected.
> 
> I am reading this table as (f.e.) amo<op>.rl is an equivalent of
> fence
> rw,w; amo<op>. (without .rl) 
> 
> > 
> > >    It is also safe to translate any .aq, .rl, or .aqrl annotation
> > > into
> > >    the fence-based snippets of
> > >    Table 2.2. These can also be used as a legal implementation of
> > >    l{b|h|w|d} or s{b|h|w|d} pseu-
> > >    doinstructions for as long as those instructions are not added
> > > to
> > >    the ISA.
> > > 
> > > So according to the spec, it should be:
> > >  sc.w ...
> > >  RISCV_FULL_BARRIER.
> > > 
> > > Considering [1] and how this code looks before, it seems to me
> > > that
> > > it
> > > is safe to use lr.w.aq/sc.w.rl here or an fence equivalent.
> > 
> > Here you say "or". Then why dos the code use sc.?.rl _and_ a fence?
> I confused this line with amo<op>.aqrl, so based on the table 2.2
> above
> s{b|h|w|d|c}.aqrl is transformed to "fence rw,w; s{b|h|w|d|c}", but
> Linux kernel decided to strengthen it with full barrier:
>    -              "0:\n\t"
>    -              "lr.w.aqrl  %[p],  %[c]\n\t"
>    -              "beq        %[p],  %[u], 1f\n\t"
>    -              "add       %[rc],  %[p], %[a]\n\t"
>    -              "sc.w.aqrl %[rc], %[rc], %[c]\n\t"
>    -              "bnez      %[rc], 0b\n\t"
>    -              "1:"
>    +               "0:     lr.w     %[p],  %[c]\n"
>    +               "       beq      %[p],  %[u], 1f\n"
>    +               "       add      %[rc], %[p], %[a]\n"
>    +               "       sc.w.rl  %[rc], %[rc], %[c]\n"
>    +               "       bnez     %[rc], 0b\n"
>    +               "       fence    rw, rw\n"
>    +               "1:\n"
> As they have the following issue:
>    implementations of atomics such as atomic_cmpxchg() and
>    atomic_add_unless() rely on LR/SC pairs,
>    which do not give full-ordering with .aqrl; for example, current
>    implementations allow the "lr-sc-aqrl-pair-vs-full-barrier" test
>    below to end up with the state indicated in the "exists" clause.
> 
> > 
> > > But in general it ( a combination of fence, .aq, .rl ) can be
> > > considered as the same things in this context, so it is possible
> > > to
> > > leave this function as is to be synced here with Linux kernel.
> > 
> > In turn I also don't understand this. Yes, the excess .rl certainly
> > doesn't render things unsafe. But what's the purpose of the .rl?
> > That's
> > what my original question boiled down to.
> I don't know, either. As I mentioned before, it is enough ( in my
> opinion ) to have a FULL barrier or .aq,.rl or .aqrl/.aqrl ( if it
> needed to be strengthened) like it was done before in Linux.
> It seems to me it is LKMM specific that they need more to be more
> strengthened as it RISC-V Memory model requires because:
> "sc.w ; fence rw, rw" does not guarantee that all previous reads and
> writes finish before the sc itself is globally visible, which might
> matter if the sc is unlocking a lock or something.
> 
> Despite of the fact, for compare-and-swap loops, RISC-V international
> recommends lr.w.aq/lr.d.aq followed by sc.w.rl/sc.d.rl ( as it was
> implemeted before in Linux kernel ) I am okay just for safety reasons
> and for the reason I mentioned at the last sentence of previous
> paragraph to strengthen implementations with fences.
Regarding the necessity of fence rw,rw, there is a commit that is proposingto remove fences: [2].
Additionally, it seems there is another reason why the fences were
added.
At the time when the patch introducing the usage of fences was
implemented,
there were no rules in RVWMO which allowed ld.aq+sc.aqrl to be
considered
as a full barrier [3].
   > > note that Model 2018 explicitly says that "ld.aq+sc.aqrl" is
   ordered
   > > against "earlier or later memory operations from the same hart",
   and
   > > this statement was not in Model 2017.
   > > 
   > > So my understanding of the story is that at some point between
   March and
   > > May 2018, RISV memory model folks decided to add this rule,
which
   does
   > > look more consistent with other parts of the model and is
useful.
   > > 
   > > And this is why (and when) "ld.aq+sc.aqrl" can be used as a
fully-
   ordered
   > > barrier ;-)
   > > 
   > > Now if my understanding is correct, to move forward, it's better
   that 1)
   > > this patch gets resend with the above information (better
rewording
   a
   > > bit), and 2) gets an Acked-by from Dan to confirm this is a
correct
   > > history ;-)

Based on patch [2], it may be possible to remove the full barrier and
switch from sc..rl to sc..aqrl. However, I need to finish reading the
Linux kernel mailing thread to understand why a similar change wasn't
made for the lr instruction ( lr -> lr.aq or lr.aqrl as it was before
).

Does that make sense to you?

[2]
https://lore.kernel.org/linux-riscv/20220505035526.2974382-6-guoren@kernel.org/
[3]https://lore.kernel.org/linux-riscv/YrSo%2F3iUuO0AL76T@boqun-archlinux/

~ Oleksii
> 
> ~ Oleksii
> > 
> > Jan
> > 
> > > [1]
> > >
https://lore.kernel.org/lkml/1520274276-21871-1-git-send-email-parri.andrea@gmail.com
> > > /
> > > 
> > > ~ Oleksii
> > 
>

Jan Beulich March 27, 2024, 7:40 a.m. UTC | #6

On 26.03.2024 20:02, Oleksii wrote:
> On Mon, 2024-03-25 at 09:18 +0100, Jan Beulich wrote:
>> On 22.03.2024 13:25, Oleksii wrote:
>>> On Thu, 2024-03-21 at 14:03 +0100, Jan Beulich wrote:
>>>> On 15.03.2024 19:06, Oleksii Kurochko wrote:
>>>>> + */
>>>>> +static always_inline void read_atomic_size(const volatile void
>>>>> *p,
>>>>> +                                           void *res,
>>>>> +                                           unsigned int size)
>>>>> +{
>>>>> +    switch ( size )
>>>>> +    {
>>>>> +    case 1: *(uint8_t *)res = readb(p); break;
>>>>> +    case 2: *(uint16_t *)res = readw(p); break;
>>>>> +    case 4: *(uint32_t *)res = readl(p); break;
>>>>> +    case 8: *(uint32_t *)res  = readq(p); break;
>>>>
>>>> Nit: Excess blank before =.
>>>>
>>>> Also - no #ifdef here to be RV32-ready?
>>> Because there is #ifdef RV32 in io.h for readq().
>>
>> There you'd run into __raw_readq()'s BUILD_BUG_ON(), afaict even for
>> 1-, 2-, or 4-byte accesses. That's not quite what we want here.
> Do you mean that if someone will redefine readq() in another way and
> not wrap it by #ifdef RV32? Except this I am not sure that there is an
> issue as it will be still a compilation error, so anyway it will be
> needed to provide an implementation for __raw_readq().

No. BUILD_BUG_ON() is a compile-time thing. The compiler will encounter
this construct. And hence it will necessarily fail. Which is why the
other approach (causing a linker error) is used elsewhere. And we're
still only in the course of considering to utilize DCE for something
like STATIC_ASSERT_UNREACHABLE(); iirc there was something getting in
the way there.

> One of the reason why I decided to wrap with #ifdef RV32 in io.h to not
> go over the source code and add wrapping. Also for some code it will be
> needed to rewrite it. For example, I am not sure that I can add #ifdef
> inside macros, f.e.:
>    #define write_atomic(p, x)                              \
>    ({                                                      \
>        typeof(*(p)) x__ = (x);                             \
>        switch ( sizeof(*(p)) )                             \
>        {                                                   \
>        case 1: writeb(x__, p); break;                      \
>        case 2: writew(x__, p); break;                      \
>        case 4: writel(x__, p); break;                      \
>        case 8: writeq(x__, p); break;                      \
>        default: __bad_atomic_size(); break;                \
>        }                                                   \
>        x__;                                                \
>    })

You can't add #ifdef there. Such needs abstracting differently.

But of course there's the option of simply not making any of these
constructs RV32-ready. Yet if so, that then will want doing
consistently.

>>>>> +/*
>>>>> + * First, the atomic ops that have no ordering constraints and
>>>>> therefor don't
>>>>> + * have the AQ or RL bits set.  These don't return anything,
>>>>> so
>>>>> there's only
>>>>> + * one version to worry about.
>>>>> + */
>>>>> +#define ATOMIC_OP(op, asm_op, I, asm_type, c_type, prefix)  \
>>>>> +static inline                                               \
>>>>> +void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
>>>>> +{                                                           \
>>>>> +    asm volatile (                                          \
>>>>> +        "   amo" #asm_op "." #asm_type " zero, %1, %0"      \
>>>>> +        : "+A" (v->counter)                                 \
>>>>> +        : "r" (I)                                           \
>>>>
>>>> Btw, I consider this pretty confusing. At the 1st and 2nd glance
>>>> this
>>>> looks like a mistake, i.e. as if i was meant. Imo ...
>>>>
>>>>> +        : "memory" );                                       \
>>>>> +}                                                           \
>>>>> +
>>>>> +/*
>>>>> + * Only CONFIG_GENERIC_ATOMIC64=y was ported to Xen that is
>>>>> the
>>>>> reason why
>>>>> + * last argument for ATOMIC_OP isn't used.
>>>>> + */
>>>>> +#define ATOMIC_OPS(op, asm_op, I)                           \
>>>>> +        ATOMIC_OP (op, asm_op, I, w, int,   )
>>>>> +
>>>>> +ATOMIC_OPS(add, add,  i)
>>>>> +ATOMIC_OPS(sub, add, -i)
>>>>> +ATOMIC_OPS(and, and,  i)
>>>>> +ATOMIC_OPS( or,  or,  i)
>>>>> +ATOMIC_OPS(xor, xor,  i)
>>>>
>>>> ... here you want to only pass the (unary) operator (and leaving
>>>> that
>>>> blank
>>>> is as fine as using +).
>>> I agree that a game with 'i' and 'I' looks confusing, but I am not
>>> really understand what is wrong with using ' i' here. It seems that
>>> preprocessed macros looks fine:
>>>    static inline void atomic_add(int i, atomic_t *v) { asm volatile
>>> ( "  
>>>    amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (i)
>>> :
>>>    "memory" ); }
>>>    
>>>    static inline void atomic_sub(int i, atomic_t *v) { asm volatile
>>> ( "  
>>>    amo" "add" "." "w" " zero, %1, %0" : "+A" (v->counter) : "r" (-
>>> i) :
>>>    "memory" ); }
>>
>> I didn't question the pre-processed result being correct. Instead I
>> said
>> that I consider the construct confusing to the reader, for looking as
>> if
>> there was a mistake (in the case of the letter i used). Note also in
>> particular how the macro invocations need to be in sync with the
>> macro
>> implementation, for lower case i being used both in the macro and in
>> its
>> invocations. Anything parameterized would better be fully so, at the
>> very least to avoid, as said, confusion. (Having macros depend on
>> context at their use sites _may_ be okay for local helper macros, but
>> here we're talking about a not even private header file.)
> I am not sure then I understand how mentioning '+i' will help
> significantly remove confusion.

I'm afraid I don't understand: What "mentioning '+i'" are you referring
to? I'm pretty sure I didn't suggest that. What I suggested was to pass
a bare operator (- or +) as macro argument.

>>>>> +/* This is required to provide a full barrier on success. */
>>>>> +static inline int atomic_add_unless(atomic_t *v, int a, int u)
>>>>> +{
>>>>> +       int prev, rc;
>>>>> +
>>>>> +    asm volatile (
>>>>> +        "0: lr.w     %[p],  %[c]\n"
>>>>> +        "   beq      %[p],  %[u], 1f\n"
>>>>> +        "   add      %[rc], %[p], %[a]\n"
>>>>> +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
>>>>> +        "   bnez     %[rc], 0b\n"
>>>>> +        RISCV_FULL_BARRIER
>>>>
>>>> With this and no .aq on the load, why the .rl on the store?
>>> It is something that LKMM requires [1].
>>>
>>> This is not fully clear to me what is so specific in LKMM, but
>>> accoring
>>> to the spec:
>>>    Ordering Annotation Fence-based Equivalent
>>>    l{b|h|w|d|r}.aq     l{b|h|w|d|r}; fence r,rw
>>>    l{b|h|w|d|r}.aqrl   fence rw,rw; l{b|h|w|d|r}; fence r,rw
>>>    s{b|h|w|d|c}.rl     fence rw,w; s{b|h|w|d|c}
>>>    s{b|h|w|d|c}.aqrl   fence rw,w; s{b|h|w|d|c}
>>>    amo<op>.aq          amo<op>; fence r,rw
>>>    amo<op>.rl          fence rw,w; amo<op>
>>>    amo<op>.aqrl        fence rw,rw; amo<op>; fence rw,rw
>>>    Table 2.2: Mappings from .aq and/or .rl to fence-based
>>> equivalents.
>>>    An alternative mapping places a fence rw,rw after the existing 
>>>    s{b|h|w|d|c} mapping rather than at the front of the
>>>    l{b|h|w|d|r} mapping.
>>
>> I'm afraid I can't spot the specific case in this table. None of the
>> stores in the right column have a .rl suffix.
> Yes, it is expected.
> 
> I am reading this table as (f.e.) amo<op>.rl is an equivalent of fence
> rw,w; amo<op>. (without .rl) 

In which case: How does quoting the table answer my original question?

>>>    It is also safe to translate any .aq, .rl, or .aqrl annotation
>>> into
>>>    the fence-based snippets of
>>>    Table 2.2. These can also be used as a legal implementation of
>>>    l{b|h|w|d} or s{b|h|w|d} pseu-
>>>    doinstructions for as long as those instructions are not added
>>> to
>>>    the ISA.
>>>
>>> So according to the spec, it should be:
>>>  sc.w ...
>>>  RISCV_FULL_BARRIER.
>>>
>>> Considering [1] and how this code looks before, it seems to me that
>>> it
>>> is safe to use lr.w.aq/sc.w.rl here or an fence equivalent.
>>
>> Here you say "or". Then why dos the code use sc.?.rl _and_ a fence?
> I confused this line with amo<op>.aqrl, so based on the table 2.2 above
> s{b|h|w|d|c}.aqrl is transformed to "fence rw,w; s{b|h|w|d|c}", but
> Linux kernel decided to strengthen it with full barrier:
>    -              "0:\n\t"
>    -              "lr.w.aqrl  %[p],  %[c]\n\t"
>    -              "beq        %[p],  %[u], 1f\n\t"
>    -              "add       %[rc],  %[p], %[a]\n\t"
>    -              "sc.w.aqrl %[rc], %[rc], %[c]\n\t"
>    -              "bnez      %[rc], 0b\n\t"
>    -              "1:"
>    +               "0:     lr.w     %[p],  %[c]\n"
>    +               "       beq      %[p],  %[u], 1f\n"
>    +               "       add      %[rc], %[p], %[a]\n"
>    +               "       sc.w.rl  %[rc], %[rc], %[c]\n"
>    +               "       bnez     %[rc], 0b\n"
>    +               "       fence    rw, rw\n"
>    +               "1:\n"
> As they have the following issue:
>    implementations of atomics such as atomic_cmpxchg() and
>    atomic_add_unless() rely on LR/SC pairs,
>    which do not give full-ordering with .aqrl; for example, current
>    implementations allow the "lr-sc-aqrl-pair-vs-full-barrier" test
>    below to end up with the state indicated in the "exists" clause.

Is that really "current implementations", not "the abstract model"?
If so, the use of an explicit fence would be more like a workaround
(and would hence want commenting to that effect).

In neither case can I see my original question answered: Why the .rl
on the store when there is a full fence later?

Jan

Oleksii Kurochko March 27, 2024, 10:28 a.m. UTC | #7

On Wed, 2024-03-27 at 08:40 +0100, Jan Beulich wrote:
...

> > > > > > +/* This is required to provide a full barrier on success.
> > > > > > */
> > > > > > +static inline int atomic_add_unless(atomic_t *v, int a,
> > > > > > int u)
> > > > > > +{
> > > > > > +       int prev, rc;
> > > > > > +
> > > > > > +    asm volatile (
> > > > > > +        "0: lr.w     %[p],  %[c]\n"
> > > > > > +        "   beq      %[p],  %[u], 1f\n"
> > > > > > +        "   add      %[rc], %[p], %[a]\n"
> > > > > > +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
> > > > > > +        "   bnez     %[rc], 0b\n"
> > > > > > +        RISCV_FULL_BARRIER
> > > > > 
> > > > > With this and no .aq on the load, why the .rl on the store?
> > > > It is something that LKMM requires [1].
> > > > 
> > > > This is not fully clear to me what is so specific in LKMM, but
> > > > accoring
> > > > to the spec:
> > > >    Ordering Annotation Fence-based Equivalent
> > > >    l{b|h|w|d|r}.aq     l{b|h|w|d|r}; fence r,rw
> > > >    l{b|h|w|d|r}.aqrl   fence rw,rw; l{b|h|w|d|r}; fence r,rw
> > > >    s{b|h|w|d|c}.rl     fence rw,w; s{b|h|w|d|c}
> > > >    s{b|h|w|d|c}.aqrl   fence rw,w; s{b|h|w|d|c}
> > > >    amo<op>.aq          amo<op>; fence r,rw
> > > >    amo<op>.rl          fence rw,w; amo<op>
> > > >    amo<op>.aqrl        fence rw,rw; amo<op>; fence rw,rw
> > > >    Table 2.2: Mappings from .aq and/or .rl to fence-based
> > > > equivalents.
> > > >    An alternative mapping places a fence rw,rw after the
> > > > existing 
> > > >    s{b|h|w|d|c} mapping rather than at the front of the
> > > >    l{b|h|w|d|r} mapping.
> > > 
> > > I'm afraid I can't spot the specific case in this table. None of
> > > the
> > > stores in the right column have a .rl suffix.
> > Yes, it is expected.
> > 
> > I am reading this table as (f.e.) amo<op>.rl is an equivalent of
> > fence
> > rw,w; amo<op>. (without .rl) 
> 
> In which case: How does quoting the table answer my original
> question?
Agree, it is starting to be confusing, so let me give an answer to your
question below.

> 
> > > >    It is also safe to translate any .aq, .rl, or .aqrl
> > > > annotation
> > > > into
> > > >    the fence-based snippets of
> > > >    Table 2.2. These can also be used as a legal implementation
> > > > of
> > > >    l{b|h|w|d} or s{b|h|w|d} pseu-
> > > >    doinstructions for as long as those instructions are not
> > > > added
> > > > to
> > > >    the ISA.
> > > > 
> > > > So according to the spec, it should be:
> > > >  sc.w ...
> > > >  RISCV_FULL_BARRIER.
> > > > 
> > > > Considering [1] and how this code looks before, it seems to me
> > > > that
> > > > it
> > > > is safe to use lr.w.aq/sc.w.rl here or an fence equivalent.
> > > 
> > > Here you say "or". Then why dos the code use sc.?.rl _and_ a
> > > fence?
> > I confused this line with amo<op>.aqrl, so based on the table 2.2
> > above
> > s{b|h|w|d|c}.aqrl is transformed to "fence rw,w; s{b|h|w|d|c}", but
> > Linux kernel decided to strengthen it with full barrier:
> >    -              "0:\n\t"
> >    -              "lr.w.aqrl  %[p],  %[c]\n\t"
> >    -              "beq        %[p],  %[u], 1f\n\t"
> >    -              "add       %[rc],  %[p], %[a]\n\t"
> >    -              "sc.w.aqrl %[rc], %[rc], %[c]\n\t"
> >    -              "bnez      %[rc], 0b\n\t"
> >    -              "1:"
> >    +               "0:     lr.w     %[p],  %[c]\n"
> >    +               "       beq      %[p],  %[u], 1f\n"
> >    +               "       add      %[rc], %[p], %[a]\n"
> >    +               "       sc.w.rl  %[rc], %[rc], %[c]\n"
> >    +               "       bnez     %[rc], 0b\n"
> >    +               "       fence    rw, rw\n"
> >    +               "1:\n"
> > As they have the following issue:
> >    implementations of atomics such as atomic_cmpxchg() and
> >    atomic_add_unless() rely on LR/SC pairs,
> >    which do not give full-ordering with .aqrl; for example, current
> >    implementations allow the "lr-sc-aqrl-pair-vs-full-barrier" test
> >    below to end up with the state indicated in the "exists" clause.
> 
> Is that really "current implementations", not "the abstract model"?
> If so, the use of an explicit fence would be more like a workaround
> (and would hence want commenting to that effect).
> 
> In neither case can I see my original question answered: Why the .rl
> on the store when there is a full fence later?
The good explanation for that was provided in the commit addressing a
similar issue for ARM64 [
https://patchwork.kernel.org/project/linux-arm-kernel/patch/1391516953-14541-1-git-send-email-will.deacon@arm.com/
].
The same holds true for RISC-V since ARM also employs WMO.

I would also like to mention another point, as I indicated in another
email thread
[ https://lists.xen.org/archives/html/xen-devel/2024-03/msg01582.html ]
, that now this fence can be omitted and .aqrl can be used instead.

This was confirmed by Dan (the author of the RVWMO spec)
[https://lore.kernel.org/linux-riscv/41e01514-74ca-84f2-f5cc-2645c444fd8e@nvidia.com/
]

I hope this addresses your original question. Does it?

~ Oleksii

Jan Beulich March 27, 2024, 11:07 a.m. UTC | #8

On 27.03.2024 11:28, Oleksii wrote:
> On Wed, 2024-03-27 at 08:40 +0100, Jan Beulich wrote:
> ...
> 
>>>>>>> +/* This is required to provide a full barrier on success.
>>>>>>> */
>>>>>>> +static inline int atomic_add_unless(atomic_t *v, int a,
>>>>>>> int u)
>>>>>>> +{
>>>>>>> +       int prev, rc;
>>>>>>> +
>>>>>>> +    asm volatile (
>>>>>>> +        "0: lr.w     %[p],  %[c]\n"
>>>>>>> +        "   beq      %[p],  %[u], 1f\n"
>>>>>>> +        "   add      %[rc], %[p], %[a]\n"
>>>>>>> +        "   sc.w.rl  %[rc], %[rc], %[c]\n"
>>>>>>> +        "   bnez     %[rc], 0b\n"
>>>>>>> +        RISCV_FULL_BARRIER
>>>>>>
>>>>>> With this and no .aq on the load, why the .rl on the store?
>>>>> It is something that LKMM requires [1].
>>>>>
>>>>> This is not fully clear to me what is so specific in LKMM, but
>>>>> accoring
>>>>> to the spec:
>>>>>    Ordering Annotation Fence-based Equivalent
>>>>>    l{b|h|w|d|r}.aq     l{b|h|w|d|r}; fence r,rw
>>>>>    l{b|h|w|d|r}.aqrl   fence rw,rw; l{b|h|w|d|r}; fence r,rw
>>>>>    s{b|h|w|d|c}.rl     fence rw,w; s{b|h|w|d|c}
>>>>>    s{b|h|w|d|c}.aqrl   fence rw,w; s{b|h|w|d|c}
>>>>>    amo<op>.aq          amo<op>; fence r,rw
>>>>>    amo<op>.rl          fence rw,w; amo<op>
>>>>>    amo<op>.aqrl        fence rw,rw; amo<op>; fence rw,rw
>>>>>    Table 2.2: Mappings from .aq and/or .rl to fence-based
>>>>> equivalents.
>>>>>    An alternative mapping places a fence rw,rw after the
>>>>> existing 
>>>>>    s{b|h|w|d|c} mapping rather than at the front of the
>>>>>    l{b|h|w|d|r} mapping.
>>>>
>>>> I'm afraid I can't spot the specific case in this table. None of
>>>> the
>>>> stores in the right column have a .rl suffix.
>>> Yes, it is expected.
>>>
>>> I am reading this table as (f.e.) amo<op>.rl is an equivalent of
>>> fence
>>> rw,w; amo<op>. (without .rl) 
>>
>> In which case: How does quoting the table answer my original
>> question?
> Agree, it is starting to be confusing, so let me give an answer to your
> question below.
> 
>>
>>>>>    It is also safe to translate any .aq, .rl, or .aqrl
>>>>> annotation
>>>>> into
>>>>>    the fence-based snippets of
>>>>>    Table 2.2. These can also be used as a legal implementation
>>>>> of
>>>>>    l{b|h|w|d} or s{b|h|w|d} pseu-
>>>>>    doinstructions for as long as those instructions are not
>>>>> added
>>>>> to
>>>>>    the ISA.
>>>>>
>>>>> So according to the spec, it should be:
>>>>>  sc.w ...
>>>>>  RISCV_FULL_BARRIER.
>>>>>
>>>>> Considering [1] and how this code looks before, it seems to me
>>>>> that
>>>>> it
>>>>> is safe to use lr.w.aq/sc.w.rl here or an fence equivalent.
>>>>
>>>> Here you say "or". Then why dos the code use sc.?.rl _and_ a
>>>> fence?
>>> I confused this line with amo<op>.aqrl, so based on the table 2.2
>>> above
>>> s{b|h|w|d|c}.aqrl is transformed to "fence rw,w; s{b|h|w|d|c}", but
>>> Linux kernel decided to strengthen it with full barrier:
>>>    -              "0:\n\t"
>>>    -              "lr.w.aqrl  %[p],  %[c]\n\t"
>>>    -              "beq        %[p],  %[u], 1f\n\t"
>>>    -              "add       %[rc],  %[p], %[a]\n\t"
>>>    -              "sc.w.aqrl %[rc], %[rc], %[c]\n\t"
>>>    -              "bnez      %[rc], 0b\n\t"
>>>    -              "1:"
>>>    +               "0:     lr.w     %[p],  %[c]\n"
>>>    +               "       beq      %[p],  %[u], 1f\n"
>>>    +               "       add      %[rc], %[p], %[a]\n"
>>>    +               "       sc.w.rl  %[rc], %[rc], %[c]\n"
>>>    +               "       bnez     %[rc], 0b\n"
>>>    +               "       fence    rw, rw\n"
>>>    +               "1:\n"
>>> As they have the following issue:
>>>    implementations of atomics such as atomic_cmpxchg() and
>>>    atomic_add_unless() rely on LR/SC pairs,
>>>    which do not give full-ordering with .aqrl; for example, current
>>>    implementations allow the "lr-sc-aqrl-pair-vs-full-barrier" test
>>>    below to end up with the state indicated in the "exists" clause.
>>
>> Is that really "current implementations", not "the abstract model"?
>> If so, the use of an explicit fence would be more like a workaround
>> (and would hence want commenting to that effect).
>>
>> In neither case can I see my original question answered: Why the .rl
>> on the store when there is a full fence later?
> The good explanation for that was provided in the commit addressing a
> similar issue for ARM64 [
> https://patchwork.kernel.org/project/linux-arm-kernel/patch/1391516953-14541-1-git-send-email-will.deacon@arm.com/
> ].
> The same holds true for RISC-V since ARM also employs WMO.
> 
> I would also like to mention another point, as I indicated in another
> email thread
> [ https://lists.xen.org/archives/html/xen-devel/2024-03/msg01582.html ]
> , that now this fence can be omitted and .aqrl can be used instead.
> 
> This was confirmed by Dan (the author of the RVWMO spec)
> [https://lore.kernel.org/linux-riscv/41e01514-74ca-84f2-f5cc-2645c444fd8e@nvidia.com/
> ]
> 
> I hope this addresses your original question. Does it?

I think it does, thanks. Some of this will need putting in at least the
patch description, if not a code comment.

Jan

[v6,10/20] xen/riscv: introduce atomic.h

Commit Message

Comments

Patch