diff mbox series

[bpf-next,2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire

Message ID 20210301104318.263262-3-bjorn.topel@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series load-acquire/store-release semantics for AF_XDP rings | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 10 maintainers not CCed: yhs@fb.com paul.walmsley@sifive.com kafai@fb.com john.fastabend@gmail.com aou@eecs.berkeley.edu linux-riscv@lists.infradead.org bjorn@kernel.org kpsingh@kernel.org songliubraving@fb.com palmer@dabbelt.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch fail CHECK: Macro argument 'p' may be better as '(p)' to avoid precedence issues ERROR: Macros with complex values should be enclosed in parentheses
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Björn Töpel March 1, 2021, 10:43 a.m. UTC
From: Björn Töpel <bjorn.topel@intel.com>

Now that the AF_XDP rings have load-acquire/store-release semantics,
move libbpf to that as well.

The library-internal libbpf_smp_{load_acquire,store_release} are only
valid for 32-bit words on ARM64.

Also, remove the barriers that are no longer in use.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 tools/lib/bpf/libbpf_util.h | 72 +++++++++++++++++++++++++------------
 tools/lib/bpf/xsk.h         | 17 +++------
 2 files changed, 55 insertions(+), 34 deletions(-)

Comments

Toke Høiland-Jørgensen March 1, 2021, 4:10 p.m. UTC | #1
Björn Töpel <bjorn.topel@gmail.com> writes:

> From: Björn Töpel <bjorn.topel@intel.com>
>
> Now that the AF_XDP rings have load-acquire/store-release semantics,
> move libbpf to that as well.
>
> The library-internal libbpf_smp_{load_acquire,store_release} are only
> valid for 32-bit words on ARM64.
>
> Also, remove the barriers that are no longer in use.

So what happens if an updated libbpf is paired with an older kernel (or
vice versa)?

-Toke
Björn Töpel March 2, 2021, 8:05 a.m. UTC | #2
On 2021-03-01 17:10, Toke Høiland-Jørgensen wrote:
> Björn Töpel <bjorn.topel@gmail.com> writes:
> 
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> Now that the AF_XDP rings have load-acquire/store-release semantics,
>> move libbpf to that as well.
>>
>> The library-internal libbpf_smp_{load_acquire,store_release} are only
>> valid for 32-bit words on ARM64.
>>
>> Also, remove the barriers that are no longer in use.
> 
> So what happens if an updated libbpf is paired with an older kernel (or
> vice versa)?
>

"This is fine." ;-) This was briefly discussed in [1], outlined by the
previous commit!

...even on POWER.


Björn

[1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/


> -Toke
>
Daniel Borkmann March 2, 2021, 9:13 a.m. UTC | #3
On 3/2/21 9:05 AM, Björn Töpel wrote:
> On 2021-03-01 17:10, Toke Høiland-Jørgensen wrote:
>> Björn Töpel <bjorn.topel@gmail.com> writes:
>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>
>>> Now that the AF_XDP rings have load-acquire/store-release semantics,
>>> move libbpf to that as well.
>>>
>>> The library-internal libbpf_smp_{load_acquire,store_release} are only
>>> valid for 32-bit words on ARM64.
>>>
>>> Also, remove the barriers that are no longer in use.
>>
>> So what happens if an updated libbpf is paired with an older kernel (or
>> vice versa)?
> 
> "This is fine." ;-) This was briefly discussed in [1], outlined by the
> previous commit!
> 
> ...even on POWER.

Could you put a summary or quote of that discussion on 'why it is okay and does not
cause /forward or backward/ compat issues with user space' directly into patch 1's
commit message?

I feel just referring to a link is probably less suitable in this case as it should
rather be part of the commit message that contains the justification on why it is
waterproof - at least it feels that specific area may be a bit under-documented, so
having it as direct part certainly doesn't hurt.

Would also be great to get Will's ACK on that when you have a v2. :)

Thanks,
Daniel

> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
Björn Töpel March 2, 2021, 9:16 a.m. UTC | #4
On 2021-03-02 10:13, Daniel Borkmann wrote:
> On 3/2/21 9:05 AM, Björn Töpel wrote:
>> On 2021-03-01 17:10, Toke Høiland-Jørgensen wrote:
>>> Björn Töpel <bjorn.topel@gmail.com> writes:
>>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>>
>>>> Now that the AF_XDP rings have load-acquire/store-release semantics,
>>>> move libbpf to that as well.
>>>>
>>>> The library-internal libbpf_smp_{load_acquire,store_release} are only
>>>> valid for 32-bit words on ARM64.
>>>>
>>>> Also, remove the barriers that are no longer in use.
>>>
>>> So what happens if an updated libbpf is paired with an older kernel (or
>>> vice versa)?
>>
>> "This is fine." ;-) This was briefly discussed in [1], outlined by the
>> previous commit!
>>
>> ...even on POWER.
> 
> Could you put a summary or quote of that discussion on 'why it is okay 
> and does not
> cause /forward or backward/ compat issues with user space' directly into 
> patch 1's
> commit message?
> 
> I feel just referring to a link is probably less suitable in this case 
> as it should
> rather be part of the commit message that contains the justification on 
> why it is
> waterproof - at least it feels that specific area may be a bit 
> under-documented, so
> having it as direct part certainly doesn't hurt.
>

I agree; It's enough in the weed as it is already.

I wonder if it's possible to cook a LKMM litmus test for this...?


> Would also be great to get Will's ACK on that when you have a v2. :)
>

Yup! :-)


Björn


> Thanks,
> Daniel
> 
>> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
Daniel Borkmann March 2, 2021, 9:25 a.m. UTC | #5
On 3/2/21 10:16 AM, Björn Töpel wrote:
> On 2021-03-02 10:13, Daniel Borkmann wrote:
>> On 3/2/21 9:05 AM, Björn Töpel wrote:
>>> On 2021-03-01 17:10, Toke Høiland-Jørgensen wrote:
>>>> Björn Töpel <bjorn.topel@gmail.com> writes:
>>>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>>>
>>>>> Now that the AF_XDP rings have load-acquire/store-release semantics,
>>>>> move libbpf to that as well.
>>>>>
>>>>> The library-internal libbpf_smp_{load_acquire,store_release} are only
>>>>> valid for 32-bit words on ARM64.
>>>>>
>>>>> Also, remove the barriers that are no longer in use.
>>>>
>>>> So what happens if an updated libbpf is paired with an older kernel (or
>>>> vice versa)?
>>>
>>> "This is fine." ;-) This was briefly discussed in [1], outlined by the
>>> previous commit!
>>>
>>> ...even on POWER.
>>
>> Could you put a summary or quote of that discussion on 'why it is okay and does not
>> cause /forward or backward/ compat issues with user space' directly into patch 1's
>> commit message?
>>
>> I feel just referring to a link is probably less suitable in this case as it should
>> rather be part of the commit message that contains the justification on why it is
>> waterproof - at least it feels that specific area may be a bit under-documented, so
>> having it as direct part certainly doesn't hurt.
> 
> I agree; It's enough in the weed as it is already.
> 
> I wonder if it's possible to cook a LKMM litmus test for this...?

That would be amazing! :-)

(Another option which can be done independently could be to update [0] with outlining a
  pairing scenario as we have here describing the forward/backward compatibility on the
  barriers used, I think that would be quite useful as well.)

   [0] Documentation/memory-barriers.txt

>> Would also be great to get Will's ACK on that when you have a v2. :)
> 
> Yup! :-)
> 
> 
> Björn
> 
> 
>> Thanks,
>> Daniel
>>
>>> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
Andrii Nakryiko March 3, 2021, 4:38 a.m. UTC | #6
On Mon, Mar 1, 2021 at 2:43 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Now that the AF_XDP rings have load-acquire/store-release semantics,
> move libbpf to that as well.
>
> The library-internal libbpf_smp_{load_acquire,store_release} are only
> valid for 32-bit words on ARM64.
>
> Also, remove the barriers that are no longer in use.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  tools/lib/bpf/libbpf_util.h | 72 +++++++++++++++++++++++++------------
>  tools/lib/bpf/xsk.h         | 17 +++------
>  2 files changed, 55 insertions(+), 34 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf_util.h b/tools/lib/bpf/libbpf_util.h
> index 59c779c5790c..94a0d7bb6f3c 100644
> --- a/tools/lib/bpf/libbpf_util.h
> +++ b/tools/lib/bpf/libbpf_util.h
> @@ -5,6 +5,7 @@
>  #define __LIBBPF_LIBBPF_UTIL_H
>
>  #include <stdbool.h>
> +#include <linux/compiler.h>
>
>  #ifdef __cplusplus
>  extern "C" {
> @@ -15,29 +16,56 @@ extern "C" {
>   * application that uses libbpf.
>   */
>  #if defined(__i386__) || defined(__x86_64__)
> -# define libbpf_smp_rmb() asm volatile("" : : : "memory")
> -# define libbpf_smp_wmb() asm volatile("" : : : "memory")
> -# define libbpf_smp_mb() \
> -       asm volatile("lock; addl $0,-4(%%rsp)" : : : "memory", "cc")
> -/* Hinders stores to be observed before older loads. */
> -# define libbpf_smp_rwmb() asm volatile("" : : : "memory")

So, technically, these four are part of libbpf's API, as libbpf_util.h
is actually installed on target hosts. Seems like xsk.h is the only
one that is using them, though.

So the question is whether it's ok to remove them now?

And also, why wasn't this part of xsk.h in the first place?

> +# define libbpf_smp_store_release(p, v)                                        \
> +       do {                                                            \
> +               asm volatile("" : : : "memory");                        \
> +               WRITE_ONCE(*p, v);                                      \
> +       } while (0)
> +# define libbpf_smp_load_acquire(p)                                    \
> +       ({                                                              \
> +               typeof(*p) ___p1 = READ_ONCE(*p);                       \
> +               asm volatile("" : : : "memory");                        \
> +               ___p1;                                                  \
> +       })

[...]
Björn Töpel March 3, 2021, 7:14 a.m. UTC | #7
On 2021-03-03 05:38, Andrii Nakryiko wrote:
> On Mon, Mar 1, 2021 at 2:43 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>>
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> Now that the AF_XDP rings have load-acquire/store-release semantics,
>> move libbpf to that as well.
>>
>> The library-internal libbpf_smp_{load_acquire,store_release} are only
>> valid for 32-bit words on ARM64.
>>
>> Also, remove the barriers that are no longer in use.
>>
>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>> ---
>>   tools/lib/bpf/libbpf_util.h | 72 +++++++++++++++++++++++++------------
>>   tools/lib/bpf/xsk.h         | 17 +++------
>>   2 files changed, 55 insertions(+), 34 deletions(-)
>>
>> diff --git a/tools/lib/bpf/libbpf_util.h b/tools/lib/bpf/libbpf_util.h
>> index 59c779c5790c..94a0d7bb6f3c 100644
>> --- a/tools/lib/bpf/libbpf_util.h
>> +++ b/tools/lib/bpf/libbpf_util.h
>> @@ -5,6 +5,7 @@
>>   #define __LIBBPF_LIBBPF_UTIL_H
>>
>>   #include <stdbool.h>
>> +#include <linux/compiler.h>
>>
>>   #ifdef __cplusplus
>>   extern "C" {
>> @@ -15,29 +16,56 @@ extern "C" {
>>    * application that uses libbpf.
>>    */
>>   #if defined(__i386__) || defined(__x86_64__)
>> -# define libbpf_smp_rmb() asm volatile("" : : : "memory")
>> -# define libbpf_smp_wmb() asm volatile("" : : : "memory")
>> -# define libbpf_smp_mb() \
>> -       asm volatile("lock; addl $0,-4(%%rsp)" : : : "memory", "cc")
>> -/* Hinders stores to be observed before older loads. */
>> -# define libbpf_smp_rwmb() asm volatile("" : : : "memory")
> 
> So, technically, these four are part of libbpf's API, as libbpf_util.h
> is actually installed on target hosts. Seems like xsk.h is the only
> one that is using them, though.
> 
> So the question is whether it's ok to remove them now?
>

I would say that. Ideally, the barriers shouldn't be visible at all,
since they're only used as an implementation detail for the static
inline functions.


> And also, why wasn't this part of xsk.h in the first place?
>

I guess there was a "maybe it can be useful for more than the XDP socket 
parts of libbpf"-idea. I'll move them to xsk.h for the v2, which will 
make the migration easier.


Björn


>> +# define libbpf_smp_store_release(p, v)                                        \
>> +       do {                                                            \
>> +               asm volatile("" : : : "memory");                        \
>> +               WRITE_ONCE(*p, v);                                      \
>> +       } while (0)
>> +# define libbpf_smp_load_acquire(p)                                    \
>> +       ({                                                              \
>> +               typeof(*p) ___p1 = READ_ONCE(*p);                       \
>> +               asm volatile("" : : : "memory");                        \
>> +               ___p1;                                                  \
>> +       })
> 
> [...]
>
Björn Töpel March 3, 2021, 8:08 a.m. UTC | #8
On Tue, 2 Mar 2021 at 10:25, Daniel Borkmann <daniel@iogearbox.net> wrote:
>

[...]

> > I wonder if it's possible to cook a LKMM litmus test for this...?
>
> That would be amazing! :-)
>

With the help of Paul and Alan [1] (Thanks!) I've cooked 8 litmus
tests for this [2].

The litmus tests is based on a one entry ring-buffer, and there are
two scenarios. The ring is full, i.e. the producer has written an
entry, so the consumer has to go first. The ring is empty, i.e. the
producer has to go first. There is one test for each permutation:
barrier only, acqrel only, acqrel+barrier, barrier+acqrel.

According to these tests the code in this series is correct. Now, for
the v2 some more wording/explanations are needed. Do you think I
should include the litmus tests in the patch, or just refer to them?
Paste parts of them into the cover?

> (Another option which can be done independently could be to update [0] with outlining a
>   pairing scenario as we have here describing the forward/backward compatibility on the
>   barriers used, I think that would be quite useful as well.)
>
>    [0] Documentation/memory-barriers.txt
>

Yeah, I agree. There is some information on it though in the "SMP
BARRIER PAIRING" section:
--8<--
General barriers pair with each other, though they also pair with most
other types of barriers, albeit without multicopy atomicity.  An acquire
barrier pairs with a release barrier, but both may also pair with other
barriers, including of course general barriers.  A write barrier pairs
with a data dependency barrier, a control dependency, an acquire barrier,
a release barrier, a read barrier, or a general barrier.  Similarly a
read barrier, control dependency, or a data dependency barrier pairs
with a write barrier, an acquire barrier, a release barrier, or a
general barrier:
-->8--

And there's the tools/memory-model/Documentation/cheatsheet.txt.

That being said; In this case more is more. :-D


Björn

[1] https://lore.kernel.org/lkml/CAJ+HfNhxWFeKnn1aZw-YJmzpBuCaoeGkXXKn058GhY-6ZBDtZA@mail.gmail.com/
[2] https://github.com/bjoto/litmus-xsk/commit/0db0dc426a7e1248f83e21f10f9e840f970f4cb7

> >> Would also be great to get Will's ACK on that when you have a v2. :)
> >
> > Yup! :-)
> >
> >
> > Björn
> >
> >
> >> Thanks,
> >> Daniel
> >>
> >>> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
>
Björn Töpel March 3, 2021, 8:19 a.m. UTC | #9
On Wed, 3 Mar 2021 at 08:14, Björn Töpel <bjorn.topel@intel.com> wrote:
>
> On 2021-03-03 05:38, Andrii Nakryiko wrote:
> > On Mon, Mar 1, 2021 at 2:43 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
> >>
> >> From: Björn Töpel <bjorn.topel@intel.com>
> >>
> >> Now that the AF_XDP rings have load-acquire/store-release semantics,
> >> move libbpf to that as well.
> >>
> >> The library-internal libbpf_smp_{load_acquire,store_release} are only
> >> valid for 32-bit words on ARM64.
> >>
> >> Also, remove the barriers that are no longer in use.
> >>
> >> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> >> ---
> >>   tools/lib/bpf/libbpf_util.h | 72 +++++++++++++++++++++++++------------
> >>   tools/lib/bpf/xsk.h         | 17 +++------
> >>   2 files changed, 55 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/tools/lib/bpf/libbpf_util.h b/tools/lib/bpf/libbpf_util.h
> >> index 59c779c5790c..94a0d7bb6f3c 100644
> >> --- a/tools/lib/bpf/libbpf_util.h
> >> +++ b/tools/lib/bpf/libbpf_util.h
> >> @@ -5,6 +5,7 @@
> >>   #define __LIBBPF_LIBBPF_UTIL_H
> >>
> >>   #include <stdbool.h>
> >> +#include <linux/compiler.h>
> >>
> >>   #ifdef __cplusplus
> >>   extern "C" {
> >> @@ -15,29 +16,56 @@ extern "C" {
> >>    * application that uses libbpf.
> >>    */
> >>   #if defined(__i386__) || defined(__x86_64__)
> >> -# define libbpf_smp_rmb() asm volatile("" : : : "memory")
> >> -# define libbpf_smp_wmb() asm volatile("" : : : "memory")
> >> -# define libbpf_smp_mb() \
> >> -       asm volatile("lock; addl $0,-4(%%rsp)" : : : "memory", "cc")
> >> -/* Hinders stores to be observed before older loads. */
> >> -# define libbpf_smp_rwmb() asm volatile("" : : : "memory")
> >
> > So, technically, these four are part of libbpf's API, as libbpf_util.h
> > is actually installed on target hosts. Seems like xsk.h is the only
> > one that is using them, though.
> >
> > So the question is whether it's ok to remove them now?
> >
>
> I would say that. Ideally, the barriers shouldn't be visible at all,
> since they're only used as an implementation detail for the static
> inline functions.
>
>
> > And also, why wasn't this part of xsk.h in the first place?
> >
>
> I guess there was a "maybe it can be useful for more than the XDP socket
> parts of libbpf"-idea. I'll move them to xsk.h for the v2, which will
> make the migration easier.
>

Clarification! The reason for not having them in xsk.h, was that the
idea was that only the APIs allowed from the application should reside
there. IOW, libbpf_utils.h is only "implementation details". Again,
the static-inline function messes things up. Maybe moving to an
LTO-only world would be better, so we can get rid of the inlining all
together.

>
> Björn
>
>
> >> +# define libbpf_smp_store_release(p, v)                                        \
> >> +       do {                                                            \
> >> +               asm volatile("" : : : "memory");                        \
> >> +               WRITE_ONCE(*p, v);                                      \
> >> +       } while (0)
> >> +# define libbpf_smp_load_acquire(p)                                    \
> >> +       ({                                                              \
> >> +               typeof(*p) ___p1 = READ_ONCE(*p);                       \
> >> +               asm volatile("" : : : "memory");                        \
> >> +               ___p1;                                                  \
> >> +       })
> >
> > [...]
> >
Will Deacon March 3, 2021, 3:39 p.m. UTC | #10
On Tue, Mar 02, 2021 at 10:13:21AM +0100, Daniel Borkmann wrote:
> On 3/2/21 9:05 AM, Björn Töpel wrote:
> > On 2021-03-01 17:10, Toke Høiland-Jørgensen wrote:
> > > Björn Töpel <bjorn.topel@gmail.com> writes:
> > > > From: Björn Töpel <bjorn.topel@intel.com>
> > > > 
> > > > Now that the AF_XDP rings have load-acquire/store-release semantics,
> > > > move libbpf to that as well.
> > > > 
> > > > The library-internal libbpf_smp_{load_acquire,store_release} are only
> > > > valid for 32-bit words on ARM64.
> > > > 
> > > > Also, remove the barriers that are no longer in use.
> > > 
> > > So what happens if an updated libbpf is paired with an older kernel (or
> > > vice versa)?
> > 
> > "This is fine." ;-) This was briefly discussed in [1], outlined by the
> > previous commit!
> > 
> > ...even on POWER.
> 
> Could you put a summary or quote of that discussion on 'why it is okay and does not
> cause /forward or backward/ compat issues with user space' directly into patch 1's
> commit message?
> 
> I feel just referring to a link is probably less suitable in this case as it should
> rather be part of the commit message that contains the justification on why it is
> waterproof - at least it feels that specific area may be a bit under-documented, so
> having it as direct part certainly doesn't hurt.
> 
> Would also be great to get Will's ACK on that when you have a v2. :)

Please stick me on CC for that and I'll take a look as I've forgotten pretty
much everything about this since last time :)

Will
Björn Töpel March 3, 2021, 4:34 p.m. UTC | #11
On 2021-03-03 16:39, Will Deacon wrote:
> On Tue, Mar 02, 2021 at 10:13:21AM +0100, Daniel Borkmann wrote:

[...]

>>
>> Would also be great to get Will's ACK on that when you have a v2. :)
> 
> Please stick me on CC for that and I'll take a look as I've forgotten pretty
> much everything about this since last time :)
>

:-) I'll do that!

Thanks!


> Will
>
diff mbox series

Patch

diff --git a/tools/lib/bpf/libbpf_util.h b/tools/lib/bpf/libbpf_util.h
index 59c779c5790c..94a0d7bb6f3c 100644
--- a/tools/lib/bpf/libbpf_util.h
+++ b/tools/lib/bpf/libbpf_util.h
@@ -5,6 +5,7 @@ 
 #define __LIBBPF_LIBBPF_UTIL_H
 
 #include <stdbool.h>
+#include <linux/compiler.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -15,29 +16,56 @@  extern "C" {
  * application that uses libbpf.
  */
 #if defined(__i386__) || defined(__x86_64__)
-# define libbpf_smp_rmb() asm volatile("" : : : "memory")
-# define libbpf_smp_wmb() asm volatile("" : : : "memory")
-# define libbpf_smp_mb() \
-	asm volatile("lock; addl $0,-4(%%rsp)" : : : "memory", "cc")
-/* Hinders stores to be observed before older loads. */
-# define libbpf_smp_rwmb() asm volatile("" : : : "memory")
+# define libbpf_smp_store_release(p, v)					\
+	do {								\
+		asm volatile("" : : : "memory");			\
+		WRITE_ONCE(*p, v);					\
+	} while (0)
+# define libbpf_smp_load_acquire(p)					\
+	({								\
+		typeof(*p) ___p1 = READ_ONCE(*p);			\
+		asm volatile("" : : : "memory");			\
+		___p1;							\
+	})
 #elif defined(__aarch64__)
-# define libbpf_smp_rmb() asm volatile("dmb ishld" : : : "memory")
-# define libbpf_smp_wmb() asm volatile("dmb ishst" : : : "memory")
-# define libbpf_smp_mb() asm volatile("dmb ish" : : : "memory")
-# define libbpf_smp_rwmb() libbpf_smp_mb()
-#elif defined(__arm__)
-/* These are only valid for armv7 and above */
-# define libbpf_smp_rmb() asm volatile("dmb ish" : : : "memory")
-# define libbpf_smp_wmb() asm volatile("dmb ishst" : : : "memory")
-# define libbpf_smp_mb() asm volatile("dmb ish" : : : "memory")
-# define libbpf_smp_rwmb() libbpf_smp_mb()
-#else
-/* Architecture missing native barrier functions. */
-# define libbpf_smp_rmb() __sync_synchronize()
-# define libbpf_smp_wmb() __sync_synchronize()
-# define libbpf_smp_mb() __sync_synchronize()
-# define libbpf_smp_rwmb() __sync_synchronize()
+# define libbpf_smp_store_release(p, v)					\
+		asm volatile ("stlr %w1, %0" : "=Q" (*p) : "r" (v) : "memory")
+# define libbpf_smp_load_acquire(p)					\
+	({								\
+		typeof(*p) ___p1;					\
+		asm volatile ("ldar %w0, %1"				\
+			      : "=r" (___p1) : "Q" (*p) : "memory");	\
+		__p1;							\
+	})
+#elif defined(__riscv)
+# define libbpf_smp_store_release(p, v)					\
+	do {								\
+		asm volatile ("fence rw,w" : : : "memory");		\
+		WRITE_ONCE(*p, v);					\
+	} while (0)
+# define libbpf_smp_load_acquire(p)					\
+	({								\
+		typeof(*p) ___p1 = READ_ONCE(*p);			\
+		asm volatile ("fence r,rw" : : : "memory");		\
+		___p1;							\
+	})
+#endif
+
+#ifndef libbpf_smp_store_release
+#define libbpf_smp_store_release(p, v)					\
+	do {								\
+		__sync_synchronize();					\
+		WRITE_ONCE(*p, v);					\
+	} while (0)
+#endif
+
+#ifndef libbpf_smp_load_acquire
+#define libbpf_smp_load_acquire(p)					\
+	({								\
+		typeof(*p) ___p1 = READ_ONCE(*p);			\
+		__sync_synchronize();					\
+		___p1;							\
+	})
 #endif
 
 #ifdef __cplusplus
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index e9f121f5d129..a9fdea87b5cd 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -96,7 +96,8 @@  static inline __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
 	 * this function. Without this optimization it whould have been
 	 * free_entries = r->cached_prod - r->cached_cons + r->size.
 	 */
-	r->cached_cons = *r->consumer + r->size;
+	r->cached_cons = libbpf_smp_load_acquire(r->consumer);
+	r->cached_cons += r->size;
 
 	return r->cached_cons - r->cached_prod;
 }
@@ -106,7 +107,7 @@  static inline __u32 xsk_cons_nb_avail(struct xsk_ring_cons *r, __u32 nb)
 	__u32 entries = r->cached_prod - r->cached_cons;
 
 	if (entries == 0) {
-		r->cached_prod = *r->producer;
+		r->cached_prod = libbpf_smp_load_acquire(r->producer);
 		entries = r->cached_prod - r->cached_cons;
 	}
 
@@ -129,9 +130,7 @@  static inline void xsk_ring_prod__submit(struct xsk_ring_prod *prod, __u32 nb)
 	/* Make sure everything has been written to the ring before indicating
 	 * this to the kernel by writing the producer pointer.
 	 */
-	libbpf_smp_wmb();
-
-	*prod->producer += nb;
+	libbpf_smp_store_release(prod->producer, *prod->producer + nb);
 }
 
 static inline __u32 xsk_ring_cons__peek(struct xsk_ring_cons *cons, __u32 nb, __u32 *idx)
@@ -139,11 +138,6 @@  static inline __u32 xsk_ring_cons__peek(struct xsk_ring_cons *cons, __u32 nb, __
 	__u32 entries = xsk_cons_nb_avail(cons, nb);
 
 	if (entries > 0) {
-		/* Make sure we do not speculatively read the data before
-		 * we have received the packet buffers from the ring.
-		 */
-		libbpf_smp_rmb();
-
 		*idx = cons->cached_cons;
 		cons->cached_cons += entries;
 	}
@@ -161,9 +155,8 @@  static inline void xsk_ring_cons__release(struct xsk_ring_cons *cons, __u32 nb)
 	/* Make sure data has been read before indicating we are done
 	 * with the entries by updating the consumer pointer.
 	 */
-	libbpf_smp_rwmb();
+	libbpf_smp_store_release(cons->consumer, *cons->consumer + nb);
 
-	*cons->consumer += nb;
 }
 
 static inline void *xsk_umem__get_data(void *umem_area, __u64 addr)