diff mbox series

[bpf-next,v4,10/11] bpf: Add tests for new BPF atomic operations

Message ID 20201207160734.2345502-11-jackmanb@google.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Atomics for eBPF | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 8 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch fail ERROR: Unrecognized email address: '' ERROR: do not initialise globals to 0 WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 WARNING: Unknown commit id '286daafd6512', maybe rebased or not pulled? WARNING: Use a single space after To: WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 93 exceeds 80 columns WARNING: line length of 98 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Brendan Jackman Dec. 7, 2020, 4:07 p.m. UTC
The prog_test that's added depends on Clang/LLVM features added by
Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184).

Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
to:

 - Avoid breaking the build for people on old versions of Clang
 - Avoid needing separate lists of test objects for no_alu32, where
   atomics are not supported even if Clang has the feature.

The atomics_test.o BPF object is built unconditionally both for
test_progs and test_progs-no_alu32. For test_progs, if Clang supports
atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
test code. Otherwise, progs and global vars are defined anyway, as
stubs; this means that the skeleton user code still builds.

The atomics_test.o userspace object is built once and used for both
test_progs and test_progs-no_alu32. A variable called skip_tests is
defined in the BPF object's data section, which tells the userspace
object whether to skip the atomics test.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 tools/testing/selftests/bpf/Makefile          |  10 +
 .../selftests/bpf/prog_tests/atomics.c        | 246 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/atomics.c   | 154 +++++++++++
 .../selftests/bpf/verifier/atomic_and.c       |  77 ++++++
 .../selftests/bpf/verifier/atomic_cmpxchg.c   |  96 +++++++
 .../selftests/bpf/verifier/atomic_fetch_add.c | 106 ++++++++
 .../selftests/bpf/verifier/atomic_or.c        |  77 ++++++
 .../selftests/bpf/verifier/atomic_xchg.c      |  46 ++++
 .../selftests/bpf/verifier/atomic_xor.c       |  77 ++++++
 9 files changed, 889 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/atomics.c
 create mode 100644 tools/testing/selftests/bpf/progs/atomics.c
 create mode 100644 tools/testing/selftests/bpf/verifier/atomic_and.c
 create mode 100644 tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
 create mode 100644 tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
 create mode 100644 tools/testing/selftests/bpf/verifier/atomic_or.c
 create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xchg.c
 create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xor.c

Comments

Yonghong Song Dec. 8, 2020, 3:18 a.m. UTC | #1
On 12/7/20 8:07 AM, Brendan Jackman wrote:
> The prog_test that's added depends on Clang/LLVM features added by
> Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184 ).
> 
> Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
> to:
> 
>   - Avoid breaking the build for people on old versions of Clang
>   - Avoid needing separate lists of test objects for no_alu32, where
>     atomics are not supported even if Clang has the feature.
> 
> The atomics_test.o BPF object is built unconditionally both for
> test_progs and test_progs-no_alu32. For test_progs, if Clang supports
> atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
> test code. Otherwise, progs and global vars are defined anyway, as
> stubs; this means that the skeleton user code still builds.
> 
> The atomics_test.o userspace object is built once and used for both
> test_progs and test_progs-no_alu32. A variable called skip_tests is
> defined in the BPF object's data section, which tells the userspace
> object whether to skip the atomics test.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>

Ack with minor comments below.

Acked-by: Yonghong Song <yhs@fb.com>

> ---
>   tools/testing/selftests/bpf/Makefile          |  10 +
>   .../selftests/bpf/prog_tests/atomics.c        | 246 ++++++++++++++++++
>   tools/testing/selftests/bpf/progs/atomics.c   | 154 +++++++++++
>   .../selftests/bpf/verifier/atomic_and.c       |  77 ++++++
>   .../selftests/bpf/verifier/atomic_cmpxchg.c   |  96 +++++++
>   .../selftests/bpf/verifier/atomic_fetch_add.c | 106 ++++++++
>   .../selftests/bpf/verifier/atomic_or.c        |  77 ++++++
>   .../selftests/bpf/verifier/atomic_xchg.c      |  46 ++++
>   .../selftests/bpf/verifier/atomic_xor.c       |  77 ++++++
>   9 files changed, 889 insertions(+)
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/atomics.c
>   create mode 100644 tools/testing/selftests/bpf/progs/atomics.c
>   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_and.c
>   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
>   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
>   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_or.c
>   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xchg.c
>   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xor.c
> 
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index ac25ba5d0d6c..13bc1d736164 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -239,6 +239,12 @@ BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) 			\
>   	     -I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR)			\
>   	     -I$(abspath $(OUTPUT)/../usr/include)
>   
> +# BPF atomics support was added to Clang in llvm-project commit 286daafd6512
> +# (release 12.0.0).
> +BPF_ATOMICS_SUPPORTED = $(shell \
> +	echo "int x = 0; int foo(void) { return __sync_val_compare_and_swap(&x, 1, 2); }" \
> +	| $(CLANG) -x cpp-output -S -target bpf -mcpu=v3 - -o /dev/null && echo 1 || echo 0)

'-x c' here more intuitive?

> +
>   CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
>   	       -Wno-compare-distinct-pointer-types
>   
> @@ -399,11 +405,15 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
>   		       $(wildcard progs/btf_dump_test_case_*.c)
>   TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
>   TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
> +ifeq ($(BPF_ATOMICS_SUPPORTED),1)
> +  TRUNNER_BPF_CFLAGS += -DENABLE_ATOMICS_TESTS
> +endif
>   TRUNNER_BPF_LDFLAGS := -mattr=+alu32
>   $(eval $(call DEFINE_TEST_RUNNER,test_progs))
>   
>   # Define test_progs-no_alu32 test runner.
>   TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
> +TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
>   TRUNNER_BPF_LDFLAGS :=
>   $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
>   
> diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
> new file mode 100644
> index 000000000000..c841a3abc2f7
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
> @@ -0,0 +1,246 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <test_progs.h>
> +
> +#include "atomics.skel.h"
> +
> +static void test_add(struct atomics *skel)
> +{
> +	int err, prog_fd;
> +	__u32 duration = 0, retval;
> +	struct bpf_link *link;
> +
> +	link = bpf_program__attach(skel->progs.add);
> +	if (CHECK(IS_ERR(link), "attach(add)", "err: %ld\n", PTR_ERR(link)))
> +		return;
> +
> +	prog_fd = bpf_program__fd(skel->progs.add);
> +	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
> +				NULL, NULL, &retval, &duration);
> +	if (CHECK(err || retval, "test_run add",
> +		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
> +		goto cleanup;
> +
> +	ASSERT_EQ(skel->data->add64_value, 3, "add64_value");
> +	ASSERT_EQ(skel->bss->add64_result, 1, "add64_result");
> +
> +	ASSERT_EQ(skel->data->add32_value, 3, "add32_value");
> +	ASSERT_EQ(skel->bss->add32_result, 1, "add32_result");
> +
> +	ASSERT_EQ(skel->bss->add_stack_value_copy, 3, "add_stack_value");
> +	ASSERT_EQ(skel->bss->add_stack_result, 1, "add_stack_result");
> +
> +	ASSERT_EQ(skel->data->add_noreturn_value, 3, "add_noreturn_value");
> +
> +cleanup:
> +	bpf_link__destroy(link);
> +}
> +
[...]
> +
> +__u64 xchg64_value = 1;
> +__u64 xchg64_result = 0;
> +__u32 xchg32_value = 1;
> +__u32 xchg32_result = 0;
> +
> +SEC("fentry/bpf_fentry_test1")
> +int BPF_PROG(xchg, int a)
> +{
> +#ifdef ENABLE_ATOMICS_TESTS
> +	__u64 val64 = 2;
> +	__u32 val32 = 2;
> +
> +	__atomic_exchange(&xchg64_value, &val64, &xchg64_result, __ATOMIC_RELAXED);
> +	__atomic_exchange(&xchg32_value, &val32, &xchg32_result, __ATOMIC_RELAXED);

Interesting to see this also works. I guess we probably won't advertise 
this, right? Currently for LLVM, the memory ordering parameter is ignored.

> +#endif
> +
> +	return 0;
> +}
[...]
Brendan Jackman Dec. 8, 2020, 12:41 p.m. UTC | #2
On Mon, Dec 07, 2020 at 07:18:57PM -0800, Yonghong Song wrote:
> 
> 
> On 12/7/20 8:07 AM, Brendan Jackman wrote:
> > The prog_test that's added depends on Clang/LLVM features added by
> > Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184 ).
> > 
> > Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
> > to:
> > 
> >   - Avoid breaking the build for people on old versions of Clang
> >   - Avoid needing separate lists of test objects for no_alu32, where
> >     atomics are not supported even if Clang has the feature.
> > 
> > The atomics_test.o BPF object is built unconditionally both for
> > test_progs and test_progs-no_alu32. For test_progs, if Clang supports
> > atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
> > test code. Otherwise, progs and global vars are defined anyway, as
> > stubs; this means that the skeleton user code still builds.
> > 
> > The atomics_test.o userspace object is built once and used for both
> > test_progs and test_progs-no_alu32. A variable called skip_tests is
> > defined in the BPF object's data section, which tells the userspace
> > object whether to skip the atomics test.
> > 
> > Signed-off-by: Brendan Jackman <jackmanb@google.com>
> 
> Ack with minor comments below.
> 
> Acked-by: Yonghong Song <yhs@fb.com>
> 
> > ---
> >   tools/testing/selftests/bpf/Makefile          |  10 +
> >   .../selftests/bpf/prog_tests/atomics.c        | 246 ++++++++++++++++++
> >   tools/testing/selftests/bpf/progs/atomics.c   | 154 +++++++++++
> >   .../selftests/bpf/verifier/atomic_and.c       |  77 ++++++
> >   .../selftests/bpf/verifier/atomic_cmpxchg.c   |  96 +++++++
> >   .../selftests/bpf/verifier/atomic_fetch_add.c | 106 ++++++++
> >   .../selftests/bpf/verifier/atomic_or.c        |  77 ++++++
> >   .../selftests/bpf/verifier/atomic_xchg.c      |  46 ++++
> >   .../selftests/bpf/verifier/atomic_xor.c       |  77 ++++++
> >   9 files changed, 889 insertions(+)
> >   create mode 100644 tools/testing/selftests/bpf/prog_tests/atomics.c
> >   create mode 100644 tools/testing/selftests/bpf/progs/atomics.c
> >   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_and.c
> >   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
> >   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
> >   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_or.c
> >   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xchg.c
> >   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xor.c
> > 
> > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > index ac25ba5d0d6c..13bc1d736164 100644
> > --- a/tools/testing/selftests/bpf/Makefile
> > +++ b/tools/testing/selftests/bpf/Makefile
> > @@ -239,6 +239,12 @@ BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) 			\
> >   	     -I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR)			\
> >   	     -I$(abspath $(OUTPUT)/../usr/include)
> > +# BPF atomics support was added to Clang in llvm-project commit 286daafd6512
> > +# (release 12.0.0).
> > +BPF_ATOMICS_SUPPORTED = $(shell \
> > +	echo "int x = 0; int foo(void) { return __sync_val_compare_and_swap(&x, 1, 2); }" \
> > +	| $(CLANG) -x cpp-output -S -target bpf -mcpu=v3 - -o /dev/null && echo 1 || echo 0)
> 
> '-x c' here more intuitive?
> 
> > +
> >   CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
> >   	       -Wno-compare-distinct-pointer-types
> > @@ -399,11 +405,15 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
> >   		       $(wildcard progs/btf_dump_test_case_*.c)
> >   TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
> >   TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
> > +ifeq ($(BPF_ATOMICS_SUPPORTED),1)
> > +  TRUNNER_BPF_CFLAGS += -DENABLE_ATOMICS_TESTS
> > +endif
> >   TRUNNER_BPF_LDFLAGS := -mattr=+alu32
> >   $(eval $(call DEFINE_TEST_RUNNER,test_progs))
> >   # Define test_progs-no_alu32 test runner.
> >   TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
> > +TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
> >   TRUNNER_BPF_LDFLAGS :=
> >   $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
> > diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
> > new file mode 100644
> > index 000000000000..c841a3abc2f7
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
> > @@ -0,0 +1,246 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <test_progs.h>
> > +
> > +#include "atomics.skel.h"
> > +
> > +static void test_add(struct atomics *skel)
> > +{
> > +	int err, prog_fd;
> > +	__u32 duration = 0, retval;
> > +	struct bpf_link *link;
> > +
> > +	link = bpf_program__attach(skel->progs.add);
> > +	if (CHECK(IS_ERR(link), "attach(add)", "err: %ld\n", PTR_ERR(link)))
> > +		return;
> > +
> > +	prog_fd = bpf_program__fd(skel->progs.add);
> > +	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
> > +				NULL, NULL, &retval, &duration);
> > +	if (CHECK(err || retval, "test_run add",
> > +		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
> > +		goto cleanup;
> > +
> > +	ASSERT_EQ(skel->data->add64_value, 3, "add64_value");
> > +	ASSERT_EQ(skel->bss->add64_result, 1, "add64_result");
> > +
> > +	ASSERT_EQ(skel->data->add32_value, 3, "add32_value");
> > +	ASSERT_EQ(skel->bss->add32_result, 1, "add32_result");
> > +
> > +	ASSERT_EQ(skel->bss->add_stack_value_copy, 3, "add_stack_value");
> > +	ASSERT_EQ(skel->bss->add_stack_result, 1, "add_stack_result");
> > +
> > +	ASSERT_EQ(skel->data->add_noreturn_value, 3, "add_noreturn_value");
> > +
> > +cleanup:
> > +	bpf_link__destroy(link);
> > +}
> > +
> [...]
> > +
> > +__u64 xchg64_value = 1;
> > +__u64 xchg64_result = 0;
> > +__u32 xchg32_value = 1;
> > +__u32 xchg32_result = 0;
> > +
> > +SEC("fentry/bpf_fentry_test1")
> > +int BPF_PROG(xchg, int a)
> > +{
> > +#ifdef ENABLE_ATOMICS_TESTS
> > +	__u64 val64 = 2;
> > +	__u32 val32 = 2;
> > +
> > +	__atomic_exchange(&xchg64_value, &val64, &xchg64_result, __ATOMIC_RELAXED);
> > +	__atomic_exchange(&xchg32_value, &val32, &xchg32_result, __ATOMIC_RELAXED);
> 
> Interesting to see this also works. I guess we probably won't advertise
> this, right? Currently for LLVM, the memory ordering parameter is ignored.

Well IIUC this specific case is fine: the ordering that you get with
BPF_[CMP]XCHG (via kernel atomic_[cmpxchg]) is sequential consistency,
and its' fine to provide a stronger ordering than the one requested. I
should change it to say __ATOMIC_SEQ_CST to avoid confusing readers,
though.

(I wrote it this way because I didn't see a __sync* function for
unconditional atomic exchange, and I didn't see an __atomic* function
where you don't need to specify the ordering).

However... this led me to double-check the semantics and realise that we
do have a problem with ordering: The kernel's atomic_{add,and,or,xor} do
not imply memory barriers and therefore neither do the corresponding BPF
instructions. That means Clang can compile this:

 (void)__atomic_fetch_add(&val, 1, __ATOMIC_SEQ_CST)

to a {.code = (BPF_STX | BPF_DW | BPF_ATOMIC), .imm = BPF_ADD},
which is implemented with atomic_add, which doesn't actually satisfy
__ATOMIC_SEQ_CST.

In fact... I think this is a pre-existing issue with BPF_XADD.

If all I've written here is correct, the fix is to use
(void)atomic_fetch_add etc (these imply barriers) even when BPF_FETCH is
not set. And that change ought to be backported to fix BPF_XADD.
Yonghong Song Dec. 8, 2020, 4:38 p.m. UTC | #3
On 12/8/20 4:41 AM, Brendan Jackman wrote:
> On Mon, Dec 07, 2020 at 07:18:57PM -0800, Yonghong Song wrote:
>>
>>
>> On 12/7/20 8:07 AM, Brendan Jackman wrote:
>>> The prog_test that's added depends on Clang/LLVM features added by
>>> Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184  ).
>>>
>>> Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
>>> to:
>>>
>>>    - Avoid breaking the build for people on old versions of Clang
>>>    - Avoid needing separate lists of test objects for no_alu32, where
>>>      atomics are not supported even if Clang has the feature.
>>>
>>> The atomics_test.o BPF object is built unconditionally both for
>>> test_progs and test_progs-no_alu32. For test_progs, if Clang supports
>>> atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
>>> test code. Otherwise, progs and global vars are defined anyway, as
>>> stubs; this means that the skeleton user code still builds.
>>>
>>> The atomics_test.o userspace object is built once and used for both
>>> test_progs and test_progs-no_alu32. A variable called skip_tests is
>>> defined in the BPF object's data section, which tells the userspace
>>> object whether to skip the atomics test.
>>>
>>> Signed-off-by: Brendan Jackman <jackmanb@google.com>
>>
>> Ack with minor comments below.
>>
>> Acked-by: Yonghong Song <yhs@fb.com>
>>
>>> ---
>>>    tools/testing/selftests/bpf/Makefile          |  10 +
>>>    .../selftests/bpf/prog_tests/atomics.c        | 246 ++++++++++++++++++
>>>    tools/testing/selftests/bpf/progs/atomics.c   | 154 +++++++++++
>>>    .../selftests/bpf/verifier/atomic_and.c       |  77 ++++++
>>>    .../selftests/bpf/verifier/atomic_cmpxchg.c   |  96 +++++++
>>>    .../selftests/bpf/verifier/atomic_fetch_add.c | 106 ++++++++
>>>    .../selftests/bpf/verifier/atomic_or.c        |  77 ++++++
>>>    .../selftests/bpf/verifier/atomic_xchg.c      |  46 ++++
>>>    .../selftests/bpf/verifier/atomic_xor.c       |  77 ++++++
>>>    9 files changed, 889 insertions(+)
>>>    create mode 100644 tools/testing/selftests/bpf/prog_tests/atomics.c
>>>    create mode 100644 tools/testing/selftests/bpf/progs/atomics.c
>>>    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_and.c
>>>    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
>>>    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
>>>    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_or.c
>>>    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xchg.c
>>>    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xor.c
>>>
>>> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
>>> index ac25ba5d0d6c..13bc1d736164 100644
>>> --- a/tools/testing/selftests/bpf/Makefile
>>> +++ b/tools/testing/selftests/bpf/Makefile
>>> @@ -239,6 +239,12 @@ BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) 			\
>>>    	     -I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR)			\
>>>    	     -I$(abspath $(OUTPUT)/../usr/include)
>>> +# BPF atomics support was added to Clang in llvm-project commit 286daafd6512
>>> +# (release 12.0.0).
>>> +BPF_ATOMICS_SUPPORTED = $(shell \
>>> +	echo "int x = 0; int foo(void) { return __sync_val_compare_and_swap(&x, 1, 2); }" \
>>> +	| $(CLANG) -x cpp-output -S -target bpf -mcpu=v3 - -o /dev/null && echo 1 || echo 0)
>>
>> '-x c' here more intuitive?
>>
>>> +
>>>    CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
>>>    	       -Wno-compare-distinct-pointer-types
>>> @@ -399,11 +405,15 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
>>>    		       $(wildcard progs/btf_dump_test_case_*.c)
>>>    TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
>>>    TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
>>> +ifeq ($(BPF_ATOMICS_SUPPORTED),1)
>>> +  TRUNNER_BPF_CFLAGS += -DENABLE_ATOMICS_TESTS
>>> +endif
>>>    TRUNNER_BPF_LDFLAGS := -mattr=+alu32
>>>    $(eval $(call DEFINE_TEST_RUNNER,test_progs))
>>>    # Define test_progs-no_alu32 test runner.
>>>    TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
>>> +TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
>>>    TRUNNER_BPF_LDFLAGS :=
>>>    $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
>>> diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
>>> new file mode 100644
>>> index 000000000000..c841a3abc2f7
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
>>> @@ -0,0 +1,246 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +#include <test_progs.h>
>>> +
>>> +#include "atomics.skel.h"
>>> +
>>> +static void test_add(struct atomics *skel)
>>> +{
>>> +	int err, prog_fd;
>>> +	__u32 duration = 0, retval;
>>> +	struct bpf_link *link;
>>> +
>>> +	link = bpf_program__attach(skel->progs.add);
>>> +	if (CHECK(IS_ERR(link), "attach(add)", "err: %ld\n", PTR_ERR(link)))
>>> +		return;
>>> +
>>> +	prog_fd = bpf_program__fd(skel->progs.add);
>>> +	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
>>> +				NULL, NULL, &retval, &duration);
>>> +	if (CHECK(err || retval, "test_run add",
>>> +		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
>>> +		goto cleanup;
>>> +
>>> +	ASSERT_EQ(skel->data->add64_value, 3, "add64_value");
>>> +	ASSERT_EQ(skel->bss->add64_result, 1, "add64_result");
>>> +
>>> +	ASSERT_EQ(skel->data->add32_value, 3, "add32_value");
>>> +	ASSERT_EQ(skel->bss->add32_result, 1, "add32_result");
>>> +
>>> +	ASSERT_EQ(skel->bss->add_stack_value_copy, 3, "add_stack_value");
>>> +	ASSERT_EQ(skel->bss->add_stack_result, 1, "add_stack_result");
>>> +
>>> +	ASSERT_EQ(skel->data->add_noreturn_value, 3, "add_noreturn_value");
>>> +
>>> +cleanup:
>>> +	bpf_link__destroy(link);
>>> +}
>>> +
>> [...]
>>> +
>>> +__u64 xchg64_value = 1;
>>> +__u64 xchg64_result = 0;
>>> +__u32 xchg32_value = 1;
>>> +__u32 xchg32_result = 0;
>>> +
>>> +SEC("fentry/bpf_fentry_test1")
>>> +int BPF_PROG(xchg, int a)
>>> +{
>>> +#ifdef ENABLE_ATOMICS_TESTS
>>> +	__u64 val64 = 2;
>>> +	__u32 val32 = 2;
>>> +
>>> +	__atomic_exchange(&xchg64_value, &val64, &xchg64_result, __ATOMIC_RELAXED);
>>> +	__atomic_exchange(&xchg32_value, &val32, &xchg32_result, __ATOMIC_RELAXED);
>>
>> Interesting to see this also works. I guess we probably won't advertise
>> this, right? Currently for LLVM, the memory ordering parameter is ignored.
> 
> Well IIUC this specific case is fine: the ordering that you get with
> BPF_[CMP]XCHG (via kernel atomic_[cmpxchg]) is sequential consistency,
> and its' fine to provide a stronger ordering than the one requested. I
> should change it to say __ATOMIC_SEQ_CST to avoid confusing readers,
> though.
> 
> (I wrote it this way because I didn't see a __sync* function for
> unconditional atomic exchange, and I didn't see an __atomic* function
> where you don't need to specify the ordering).

For the above code,
    __atomic_exchange(&xchg64_value, &val64, &xchg64_result, 
__ATOMIC_RELAXED);
It tries to do an atomic exchange between &xchg64_value and
  &val64, and store the old &xchg64_value to &xchg64_result. So it is
equivalent to
     xchg64_result = __sync_lock_test_and_set(&xchg64_value, val64);

So I think this test case can be dropped.

> 
> However... this led me to double-check the semantics and realise that we
> do have a problem with ordering: The kernel's atomic_{add,and,or,xor} do
> not imply memory barriers and therefore neither do the corresponding BPF
> instructions. That means Clang can compile this:
> 
>   (void)__atomic_fetch_add(&val, 1, __ATOMIC_SEQ_CST)
> 
> to a {.code = (BPF_STX | BPF_DW | BPF_ATOMIC), .imm = BPF_ADD},
> which is implemented with atomic_add, which doesn't actually satisfy
> __ATOMIC_SEQ_CST.

This is the main reason in all my llvm selftests I did not use
__atomic_* intrinsics because we cannot handle *different* memory
ordering properly.

> 
> In fact... I think this is a pre-existing issue with BPF_XADD.
> 
> If all I've written here is correct, the fix is to use
> (void)atomic_fetch_add etc (these imply barriers) even when BPF_FETCH is
> not set. And that change ought to be backported to fix BPF_XADD.

We cannot change BPF_XADD behavior. If we change BPF_XADD to use
atomic_fetch_add, then suddenly old code compiled with llvm12 will
suddenly requires latest kernel, which will break userland very badly.
Brendan Jackman Dec. 8, 2020, 4:59 p.m. UTC | #4
On Tue, Dec 08, 2020 at 08:38:04AM -0800, Yonghong Song wrote:
> 
> 
> On 12/8/20 4:41 AM, Brendan Jackman wrote:
> > On Mon, Dec 07, 2020 at 07:18:57PM -0800, Yonghong Song wrote:
> > > 
> > > 
> > > On 12/7/20 8:07 AM, Brendan Jackman wrote:
> > > > The prog_test that's added depends on Clang/LLVM features added by
> > > > Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184  ).
> > > > 
> > > > Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
> > > > to:
> > > > 
> > > >    - Avoid breaking the build for people on old versions of Clang
> > > >    - Avoid needing separate lists of test objects for no_alu32, where
> > > >      atomics are not supported even if Clang has the feature.
> > > > 
> > > > The atomics_test.o BPF object is built unconditionally both for
> > > > test_progs and test_progs-no_alu32. For test_progs, if Clang supports
> > > > atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
> > > > test code. Otherwise, progs and global vars are defined anyway, as
> > > > stubs; this means that the skeleton user code still builds.
> > > > 
> > > > The atomics_test.o userspace object is built once and used for both
> > > > test_progs and test_progs-no_alu32. A variable called skip_tests is
> > > > defined in the BPF object's data section, which tells the userspace
> > > > object whether to skip the atomics test.
> > > > 
> > > > Signed-off-by: Brendan Jackman <jackmanb@google.com>
> > > 
> > > Ack with minor comments below.
> > > 
> > > Acked-by: Yonghong Song <yhs@fb.com>
> > > 
> > > > ---
> > > >    tools/testing/selftests/bpf/Makefile          |  10 +
> > > >    .../selftests/bpf/prog_tests/atomics.c        | 246 ++++++++++++++++++
> > > >    tools/testing/selftests/bpf/progs/atomics.c   | 154 +++++++++++
> > > >    .../selftests/bpf/verifier/atomic_and.c       |  77 ++++++
> > > >    .../selftests/bpf/verifier/atomic_cmpxchg.c   |  96 +++++++
> > > >    .../selftests/bpf/verifier/atomic_fetch_add.c | 106 ++++++++
> > > >    .../selftests/bpf/verifier/atomic_or.c        |  77 ++++++
> > > >    .../selftests/bpf/verifier/atomic_xchg.c      |  46 ++++
> > > >    .../selftests/bpf/verifier/atomic_xor.c       |  77 ++++++
> > > >    9 files changed, 889 insertions(+)
> > > >    create mode 100644 tools/testing/selftests/bpf/prog_tests/atomics.c
> > > >    create mode 100644 tools/testing/selftests/bpf/progs/atomics.c
> > > >    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_and.c
> > > >    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
> > > >    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
> > > >    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_or.c
> > > >    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xchg.c
> > > >    create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xor.c
> > > > 
> > > > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > > > index ac25ba5d0d6c..13bc1d736164 100644
> > > > --- a/tools/testing/selftests/bpf/Makefile
> > > > +++ b/tools/testing/selftests/bpf/Makefile
> > > > @@ -239,6 +239,12 @@ BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) 			\
> > > >    	     -I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR)			\
> > > >    	     -I$(abspath $(OUTPUT)/../usr/include)
> > > > +# BPF atomics support was added to Clang in llvm-project commit 286daafd6512
> > > > +# (release 12.0.0).
> > > > +BPF_ATOMICS_SUPPORTED = $(shell \
> > > > +	echo "int x = 0; int foo(void) { return __sync_val_compare_and_swap(&x, 1, 2); }" \
> > > > +	| $(CLANG) -x cpp-output -S -target bpf -mcpu=v3 - -o /dev/null && echo 1 || echo 0)
> > > 
> > > '-x c' here more intuitive?
> > > 
> > > > +
> > > >    CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
> > > >    	       -Wno-compare-distinct-pointer-types
> > > > @@ -399,11 +405,15 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
> > > >    		       $(wildcard progs/btf_dump_test_case_*.c)
> > > >    TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
> > > >    TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
> > > > +ifeq ($(BPF_ATOMICS_SUPPORTED),1)
> > > > +  TRUNNER_BPF_CFLAGS += -DENABLE_ATOMICS_TESTS
> > > > +endif
> > > >    TRUNNER_BPF_LDFLAGS := -mattr=+alu32
> > > >    $(eval $(call DEFINE_TEST_RUNNER,test_progs))
> > > >    # Define test_progs-no_alu32 test runner.
> > > >    TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
> > > > +TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
> > > >    TRUNNER_BPF_LDFLAGS :=
> > > >    $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
> > > > new file mode 100644
> > > > index 000000000000..c841a3abc2f7
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
> > > > @@ -0,0 +1,246 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +#include <test_progs.h>
> > > > +
> > > > +#include "atomics.skel.h"
> > > > +
> > > > +static void test_add(struct atomics *skel)
> > > > +{
> > > > +	int err, prog_fd;
> > > > +	__u32 duration = 0, retval;
> > > > +	struct bpf_link *link;
> > > > +
> > > > +	link = bpf_program__attach(skel->progs.add);
> > > > +	if (CHECK(IS_ERR(link), "attach(add)", "err: %ld\n", PTR_ERR(link)))
> > > > +		return;
> > > > +
> > > > +	prog_fd = bpf_program__fd(skel->progs.add);
> > > > +	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
> > > > +				NULL, NULL, &retval, &duration);
> > > > +	if (CHECK(err || retval, "test_run add",
> > > > +		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
> > > > +		goto cleanup;
> > > > +
> > > > +	ASSERT_EQ(skel->data->add64_value, 3, "add64_value");
> > > > +	ASSERT_EQ(skel->bss->add64_result, 1, "add64_result");
> > > > +
> > > > +	ASSERT_EQ(skel->data->add32_value, 3, "add32_value");
> > > > +	ASSERT_EQ(skel->bss->add32_result, 1, "add32_result");
> > > > +
> > > > +	ASSERT_EQ(skel->bss->add_stack_value_copy, 3, "add_stack_value");
> > > > +	ASSERT_EQ(skel->bss->add_stack_result, 1, "add_stack_result");
> > > > +
> > > > +	ASSERT_EQ(skel->data->add_noreturn_value, 3, "add_noreturn_value");
> > > > +
> > > > +cleanup:
> > > > +	bpf_link__destroy(link);
> > > > +}
> > > > +
> > > [...]
> > > > +
> > > > +__u64 xchg64_value = 1;
> > > > +__u64 xchg64_result = 0;
> > > > +__u32 xchg32_value = 1;
> > > > +__u32 xchg32_result = 0;
> > > > +
> > > > +SEC("fentry/bpf_fentry_test1")
> > > > +int BPF_PROG(xchg, int a)
> > > > +{
> > > > +#ifdef ENABLE_ATOMICS_TESTS
> > > > +	__u64 val64 = 2;
> > > > +	__u32 val32 = 2;
> > > > +
> > > > +	__atomic_exchange(&xchg64_value, &val64, &xchg64_result, __ATOMIC_RELAXED);
> > > > +	__atomic_exchange(&xchg32_value, &val32, &xchg32_result, __ATOMIC_RELAXED);
> > > 
> > > Interesting to see this also works. I guess we probably won't advertise
> > > this, right? Currently for LLVM, the memory ordering parameter is ignored.
> > 
> > Well IIUC this specific case is fine: the ordering that you get with
> > BPF_[CMP]XCHG (via kernel atomic_[cmpxchg]) is sequential consistency,
> > and its' fine to provide a stronger ordering than the one requested. I
> > should change it to say __ATOMIC_SEQ_CST to avoid confusing readers,
> > though.
> > 
> > (I wrote it this way because I didn't see a __sync* function for
> > unconditional atomic exchange, and I didn't see an __atomic* function
> > where you don't need to specify the ordering).
> 
> For the above code,
>    __atomic_exchange(&xchg64_value, &val64, &xchg64_result,
> __ATOMIC_RELAXED);
> It tries to do an atomic exchange between &xchg64_value and
>  &val64, and store the old &xchg64_value to &xchg64_result. So it is
> equivalent to
>     xchg64_result = __sync_lock_test_and_set(&xchg64_value, val64);
> 
> So I think this test case can be dropped.

Ah nice, I didn't know about __sync_lock_test_and_set, let's switch to
that I think.

> > However... this led me to double-check the semantics and realise that we
> > do have a problem with ordering: The kernel's atomic_{add,and,or,xor} do
> > not imply memory barriers and therefore neither do the corresponding BPF
> > instructions. That means Clang can compile this:
> > 
> >   (void)__atomic_fetch_add(&val, 1, __ATOMIC_SEQ_CST)
> > 
> > to a {.code = (BPF_STX | BPF_DW | BPF_ATOMIC), .imm = BPF_ADD},
> > which is implemented with atomic_add, which doesn't actually satisfy
> > __ATOMIC_SEQ_CST.
> 
> This is the main reason in all my llvm selftests I did not use
> __atomic_* intrinsics because we cannot handle *different* memory
> ordering properly.
> 
> > 
> > In fact... I think this is a pre-existing issue with BPF_XADD.
> > 
> > If all I've written here is correct, the fix is to use
> > (void)atomic_fetch_add etc (these imply barriers) even when BPF_FETCH is
> > not set. And that change ought to be backported to fix BPF_XADD.
> 
> We cannot change BPF_XADD behavior. If we change BPF_XADD to use
> atomic_fetch_add, then suddenly old code compiled with llvm12 will
> suddenly requires latest kernel, which will break userland very badly.

Sorry I should have been more explicit: I meant that the fix would be to
call atomic_fetch_add but discard the return value, purely for the
implied barrier. The x86 JIT would stay the same. It would not break any
existing code, only add ordering guarantees that the user probably
already expected (since these builtins come from GCC originally and the
GCC docs say "these builtins are considered a full barrier" [1])

[1] https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html
Yonghong Song Dec. 8, 2020, 6:15 p.m. UTC | #5
On 12/8/20 8:59 AM, Brendan Jackman wrote:
> On Tue, Dec 08, 2020 at 08:38:04AM -0800, Yonghong Song wrote:
>>
>>
>> On 12/8/20 4:41 AM, Brendan Jackman wrote:
>>> On Mon, Dec 07, 2020 at 07:18:57PM -0800, Yonghong Song wrote:
>>>>
>>>>
>>>> On 12/7/20 8:07 AM, Brendan Jackman wrote:
>>>>> The prog_test that's added depends on Clang/LLVM features added by
>>>>> Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184   ).
>>>>>
>>>>> Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
>>>>> to:
>>>>>
>>>>>     - Avoid breaking the build for people on old versions of Clang
>>>>>     - Avoid needing separate lists of test objects for no_alu32, where
>>>>>       atomics are not supported even if Clang has the feature.
>>>>>
>>>>> The atomics_test.o BPF object is built unconditionally both for
>>>>> test_progs and test_progs-no_alu32. For test_progs, if Clang supports
>>>>> atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
>>>>> test code. Otherwise, progs and global vars are defined anyway, as
>>>>> stubs; this means that the skeleton user code still builds.
>>>>>
>>>>> The atomics_test.o userspace object is built once and used for both
>>>>> test_progs and test_progs-no_alu32. A variable called skip_tests is
>>>>> defined in the BPF object's data section, which tells the userspace
>>>>> object whether to skip the atomics test.
>>>>>
>>>>> Signed-off-by: Brendan Jackman <jackmanb@google.com>
>>>>
>>>> Ack with minor comments below.
>>>>
>>>> Acked-by: Yonghong Song <yhs@fb.com>
>>>>
>>>>> ---
>>>>>     tools/testing/selftests/bpf/Makefile          |  10 +
>>>>>     .../selftests/bpf/prog_tests/atomics.c        | 246 ++++++++++++++++++
>>>>>     tools/testing/selftests/bpf/progs/atomics.c   | 154 +++++++++++
>>>>>     .../selftests/bpf/verifier/atomic_and.c       |  77 ++++++
>>>>>     .../selftests/bpf/verifier/atomic_cmpxchg.c   |  96 +++++++
>>>>>     .../selftests/bpf/verifier/atomic_fetch_add.c | 106 ++++++++
>>>>>     .../selftests/bpf/verifier/atomic_or.c        |  77 ++++++
>>>>>     .../selftests/bpf/verifier/atomic_xchg.c      |  46 ++++
>>>>>     .../selftests/bpf/verifier/atomic_xor.c       |  77 ++++++
>>>>>     9 files changed, 889 insertions(+)
>>>>>     create mode 100644 tools/testing/selftests/bpf/prog_tests/atomics.c
>>>>>     create mode 100644 tools/testing/selftests/bpf/progs/atomics.c
>>>>>     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_and.c
>>>>>     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
>>>>>     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
>>>>>     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_or.c
>>>>>     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xchg.c
>>>>>     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xor.c
>>>>>
>>>>> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
>>>>> index ac25ba5d0d6c..13bc1d736164 100644
>>>>> --- a/tools/testing/selftests/bpf/Makefile
>>>>> +++ b/tools/testing/selftests/bpf/Makefile
>>>>> @@ -239,6 +239,12 @@ BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) 			\
>>>>>     	     -I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR)			\
>>>>>     	     -I$(abspath $(OUTPUT)/../usr/include)
>>>>> +# BPF atomics support was added to Clang in llvm-project commit 286daafd6512
>>>>> +# (release 12.0.0).
>>>>> +BPF_ATOMICS_SUPPORTED = $(shell \
>>>>> +	echo "int x = 0; int foo(void) { return __sync_val_compare_and_swap(&x, 1, 2); }" \
>>>>> +	| $(CLANG) -x cpp-output -S -target bpf -mcpu=v3 - -o /dev/null && echo 1 || echo 0)
>>>>
>>>> '-x c' here more intuitive?
>>>>
>>>>> +
>>>>>     CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
>>>>>     	       -Wno-compare-distinct-pointer-types
>>>>> @@ -399,11 +405,15 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
>>>>>     		       $(wildcard progs/btf_dump_test_case_*.c)
>>>>>     TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
>>>>>     TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
>>>>> +ifeq ($(BPF_ATOMICS_SUPPORTED),1)
>>>>> +  TRUNNER_BPF_CFLAGS += -DENABLE_ATOMICS_TESTS
>>>>> +endif
>>>>>     TRUNNER_BPF_LDFLAGS := -mattr=+alu32
>>>>>     $(eval $(call DEFINE_TEST_RUNNER,test_progs))
>>>>>     # Define test_progs-no_alu32 test runner.
>>>>>     TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
>>>>> +TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
>>>>>     TRUNNER_BPF_LDFLAGS :=
>>>>>     $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
>>>>> diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
>>>>> new file mode 100644
>>>>> index 000000000000..c841a3abc2f7
>>>>> --- /dev/null
>>>>> +++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
>>>>> @@ -0,0 +1,246 @@
>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>> +
>>>>> +#include <test_progs.h>
>>>>> +
>>>>> +#include "atomics.skel.h"
>>>>> +
>>>>> +static void test_add(struct atomics *skel)
>>>>> +{
>>>>> +	int err, prog_fd;
>>>>> +	__u32 duration = 0, retval;
>>>>> +	struct bpf_link *link;
>>>>> +
>>>>> +	link = bpf_program__attach(skel->progs.add);
>>>>> +	if (CHECK(IS_ERR(link), "attach(add)", "err: %ld\n", PTR_ERR(link)))
>>>>> +		return;
>>>>> +
>>>>> +	prog_fd = bpf_program__fd(skel->progs.add);
>>>>> +	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
>>>>> +				NULL, NULL, &retval, &duration);
>>>>> +	if (CHECK(err || retval, "test_run add",
>>>>> +		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
>>>>> +		goto cleanup;
>>>>> +
>>>>> +	ASSERT_EQ(skel->data->add64_value, 3, "add64_value");
>>>>> +	ASSERT_EQ(skel->bss->add64_result, 1, "add64_result");
>>>>> +
>>>>> +	ASSERT_EQ(skel->data->add32_value, 3, "add32_value");
>>>>> +	ASSERT_EQ(skel->bss->add32_result, 1, "add32_result");
>>>>> +
>>>>> +	ASSERT_EQ(skel->bss->add_stack_value_copy, 3, "add_stack_value");
>>>>> +	ASSERT_EQ(skel->bss->add_stack_result, 1, "add_stack_result");
>>>>> +
>>>>> +	ASSERT_EQ(skel->data->add_noreturn_value, 3, "add_noreturn_value");
>>>>> +
>>>>> +cleanup:
>>>>> +	bpf_link__destroy(link);
>>>>> +}
>>>>> +
>>>> [...]
>>>>> +
>>>>> +__u64 xchg64_value = 1;
>>>>> +__u64 xchg64_result = 0;
>>>>> +__u32 xchg32_value = 1;
>>>>> +__u32 xchg32_result = 0;
>>>>> +
>>>>> +SEC("fentry/bpf_fentry_test1")
>>>>> +int BPF_PROG(xchg, int a)
>>>>> +{
>>>>> +#ifdef ENABLE_ATOMICS_TESTS
>>>>> +	__u64 val64 = 2;
>>>>> +	__u32 val32 = 2;
>>>>> +
>>>>> +	__atomic_exchange(&xchg64_value, &val64, &xchg64_result, __ATOMIC_RELAXED);
>>>>> +	__atomic_exchange(&xchg32_value, &val32, &xchg32_result, __ATOMIC_RELAXED);
>>>>
>>>> Interesting to see this also works. I guess we probably won't advertise
>>>> this, right? Currently for LLVM, the memory ordering parameter is ignored.
>>>
>>> Well IIUC this specific case is fine: the ordering that you get with
>>> BPF_[CMP]XCHG (via kernel atomic_[cmpxchg]) is sequential consistency,
>>> and its' fine to provide a stronger ordering than the one requested. I
>>> should change it to say __ATOMIC_SEQ_CST to avoid confusing readers,
>>> though.
>>>
>>> (I wrote it this way because I didn't see a __sync* function for
>>> unconditional atomic exchange, and I didn't see an __atomic* function
>>> where you don't need to specify the ordering).
>>
>> For the above code,
>>     __atomic_exchange(&xchg64_value, &val64, &xchg64_result,
>> __ATOMIC_RELAXED);
>> It tries to do an atomic exchange between &xchg64_value and
>>   &val64, and store the old &xchg64_value to &xchg64_result. So it is
>> equivalent to
>>      xchg64_result = __sync_lock_test_and_set(&xchg64_value, val64);
>>
>> So I think this test case can be dropped.
> 
> Ah nice, I didn't know about __sync_lock_test_and_set, let's switch to
> that I think.
> 
>>> However... this led me to double-check the semantics and realise that we
>>> do have a problem with ordering: The kernel's atomic_{add,and,or,xor} do
>>> not imply memory barriers and therefore neither do the corresponding BPF
>>> instructions. That means Clang can compile this:
>>>
>>>    (void)__atomic_fetch_add(&val, 1, __ATOMIC_SEQ_CST)
>>>
>>> to a {.code = (BPF_STX | BPF_DW | BPF_ATOMIC), .imm = BPF_ADD},
>>> which is implemented with atomic_add, which doesn't actually satisfy
>>> __ATOMIC_SEQ_CST.
>>
>> This is the main reason in all my llvm selftests I did not use
>> __atomic_* intrinsics because we cannot handle *different* memory
>> ordering properly.
>>
>>>
>>> In fact... I think this is a pre-existing issue with BPF_XADD.
>>>
>>> If all I've written here is correct, the fix is to use
>>> (void)atomic_fetch_add etc (these imply barriers) even when BPF_FETCH is
>>> not set. And that change ought to be backported to fix BPF_XADD.
>>
>> We cannot change BPF_XADD behavior. If we change BPF_XADD to use
>> atomic_fetch_add, then suddenly old code compiled with llvm12 will
>> suddenly requires latest kernel, which will break userland very badly.
> 
> Sorry I should have been more explicit: I meant that the fix would be to
> call atomic_fetch_add but discard the return value, purely for the
> implied barrier. The x86 JIT would stay the same. It would not break any
> existing code, only add ordering guarantees that the user probably
> already expected (since these builtins come from GCC originally and the
> GCC docs say "these builtins are considered a full barrier" [1])
> 
> [1] https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html

This is indeed the issue. In the past, people already use gcc
__sync_fetch_and_add() for xadd instruction for which git generated
code does not implying barrier.

The new atomics support has the following logic:
   . if return value is used, it is atomic_fetch_add
   . if return value is not used, it is xadd
The reason to do this is to preserve backward compabiility
and this way, we can get rid of -mcpu=v4.

barrier issue is tricky and as we discussed earlier let us
delay this after basic atomics support landed. We may not
100% conform to gcc __sync_fetch_and_add() or __atomic_*()
semantics. We do need to clearly document what is expected
in llvm and kernel.
Brendan Jackman Dec. 15, 2020, 11:12 a.m. UTC | #6
On Tue, Dec 08, 2020 at 10:15:35AM -0800, Yonghong Song wrote:
> 
> 
> On 12/8/20 8:59 AM, Brendan Jackman wrote:
> > On Tue, Dec 08, 2020 at 08:38:04AM -0800, Yonghong Song wrote:
> > > 
> > > 
> > > On 12/8/20 4:41 AM, Brendan Jackman wrote:
> > > > On Mon, Dec 07, 2020 at 07:18:57PM -0800, Yonghong Song wrote:
> > > > > 
> > > > > 
> > > > > On 12/7/20 8:07 AM, Brendan Jackman wrote:
> > > > > > The prog_test that's added depends on Clang/LLVM features added by
> > > > > > Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184   ).
> > > > > > 
> > > > > > Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
> > > > > > to:
> > > > > > 
> > > > > >     - Avoid breaking the build for people on old versions of Clang
> > > > > >     - Avoid needing separate lists of test objects for no_alu32, where
> > > > > >       atomics are not supported even if Clang has the feature.
> > > > > > 
> > > > > > The atomics_test.o BPF object is built unconditionally both for
> > > > > > test_progs and test_progs-no_alu32. For test_progs, if Clang supports
> > > > > > atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
> > > > > > test code. Otherwise, progs and global vars are defined anyway, as
> > > > > > stubs; this means that the skeleton user code still builds.
> > > > > > 
> > > > > > The atomics_test.o userspace object is built once and used for both
> > > > > > test_progs and test_progs-no_alu32. A variable called skip_tests is
> > > > > > defined in the BPF object's data section, which tells the userspace
> > > > > > object whether to skip the atomics test.
> > > > > > 
> > > > > > Signed-off-by: Brendan Jackman <jackmanb@google.com>
> > > > > 
> > > > > Ack with minor comments below.
> > > > > 
> > > > > Acked-by: Yonghong Song <yhs@fb.com>
> > > > > 
> > > > > > ---
> > > > > >     tools/testing/selftests/bpf/Makefile          |  10 +
> > > > > >     .../selftests/bpf/prog_tests/atomics.c        | 246 ++++++++++++++++++
> > > > > >     tools/testing/selftests/bpf/progs/atomics.c   | 154 +++++++++++
> > > > > >     .../selftests/bpf/verifier/atomic_and.c       |  77 ++++++
> > > > > >     .../selftests/bpf/verifier/atomic_cmpxchg.c   |  96 +++++++
> > > > > >     .../selftests/bpf/verifier/atomic_fetch_add.c | 106 ++++++++
> > > > > >     .../selftests/bpf/verifier/atomic_or.c        |  77 ++++++
> > > > > >     .../selftests/bpf/verifier/atomic_xchg.c      |  46 ++++
> > > > > >     .../selftests/bpf/verifier/atomic_xor.c       |  77 ++++++
> > > > > >     9 files changed, 889 insertions(+)
> > > > > >     create mode 100644 tools/testing/selftests/bpf/prog_tests/atomics.c
> > > > > >     create mode 100644 tools/testing/selftests/bpf/progs/atomics.c
> > > > > >     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_and.c
> > > > > >     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
> > > > > >     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
> > > > > >     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_or.c
> > > > > >     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xchg.c
> > > > > >     create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xor.c
> > > > > > 
> > > > > > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > > > > > index ac25ba5d0d6c..13bc1d736164 100644
> > > > > > --- a/tools/testing/selftests/bpf/Makefile
> > > > > > +++ b/tools/testing/selftests/bpf/Makefile
> > > > > > @@ -239,6 +239,12 @@ BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) 			\
> > > > > >     	     -I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR)			\
> > > > > >     	     -I$(abspath $(OUTPUT)/../usr/include)
> > > > > > +# BPF atomics support was added to Clang in llvm-project commit 286daafd6512
> > > > > > +# (release 12.0.0).
> > > > > > +BPF_ATOMICS_SUPPORTED = $(shell \
> > > > > > +	echo "int x = 0; int foo(void) { return __sync_val_compare_and_swap(&x, 1, 2); }" \
> > > > > > +	| $(CLANG) -x cpp-output -S -target bpf -mcpu=v3 - -o /dev/null && echo 1 || echo 0)
> > > > > 
> > > > > '-x c' here more intuitive?
> > > > > 
> > > > > > +
> > > > > >     CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
> > > > > >     	       -Wno-compare-distinct-pointer-types
> > > > > > @@ -399,11 +405,15 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
> > > > > >     		       $(wildcard progs/btf_dump_test_case_*.c)
> > > > > >     TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
> > > > > >     TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
> > > > > > +ifeq ($(BPF_ATOMICS_SUPPORTED),1)
> > > > > > +  TRUNNER_BPF_CFLAGS += -DENABLE_ATOMICS_TESTS
> > > > > > +endif
> > > > > >     TRUNNER_BPF_LDFLAGS := -mattr=+alu32
> > > > > >     $(eval $(call DEFINE_TEST_RUNNER,test_progs))
> > > > > >     # Define test_progs-no_alu32 test runner.
> > > > > >     TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
> > > > > > +TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
> > > > > >     TRUNNER_BPF_LDFLAGS :=
> > > > > >     $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
> > > > > > diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
> > > > > > new file mode 100644
> > > > > > index 000000000000..c841a3abc2f7
> > > > > > --- /dev/null
> > > > > > +++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
> > > > > > @@ -0,0 +1,246 @@
> > > > > > +// SPDX-License-Identifier: GPL-2.0
> > > > > > +
> > > > > > +#include <test_progs.h>
> > > > > > +
> > > > > > +#include "atomics.skel.h"
> > > > > > +
> > > > > > +static void test_add(struct atomics *skel)
> > > > > > +{
> > > > > > +	int err, prog_fd;
> > > > > > +	__u32 duration = 0, retval;
> > > > > > +	struct bpf_link *link;
> > > > > > +
> > > > > > +	link = bpf_program__attach(skel->progs.add);
> > > > > > +	if (CHECK(IS_ERR(link), "attach(add)", "err: %ld\n", PTR_ERR(link)))
> > > > > > +		return;
> > > > > > +
> > > > > > +	prog_fd = bpf_program__fd(skel->progs.add);
> > > > > > +	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
> > > > > > +				NULL, NULL, &retval, &duration);
> > > > > > +	if (CHECK(err || retval, "test_run add",
> > > > > > +		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
> > > > > > +		goto cleanup;
> > > > > > +
> > > > > > +	ASSERT_EQ(skel->data->add64_value, 3, "add64_value");
> > > > > > +	ASSERT_EQ(skel->bss->add64_result, 1, "add64_result");
> > > > > > +
> > > > > > +	ASSERT_EQ(skel->data->add32_value, 3, "add32_value");
> > > > > > +	ASSERT_EQ(skel->bss->add32_result, 1, "add32_result");
> > > > > > +
> > > > > > +	ASSERT_EQ(skel->bss->add_stack_value_copy, 3, "add_stack_value");
> > > > > > +	ASSERT_EQ(skel->bss->add_stack_result, 1, "add_stack_result");
> > > > > > +
> > > > > > +	ASSERT_EQ(skel->data->add_noreturn_value, 3, "add_noreturn_value");
> > > > > > +
> > > > > > +cleanup:
> > > > > > +	bpf_link__destroy(link);
> > > > > > +}
> > > > > > +
> > > > > [...]
> > > > > > +
> > > > > > +__u64 xchg64_value = 1;
> > > > > > +__u64 xchg64_result = 0;
> > > > > > +__u32 xchg32_value = 1;
> > > > > > +__u32 xchg32_result = 0;
> > > > > > +
> > > > > > +SEC("fentry/bpf_fentry_test1")
> > > > > > +int BPF_PROG(xchg, int a)
> > > > > > +{
> > > > > > +#ifdef ENABLE_ATOMICS_TESTS
> > > > > > +	__u64 val64 = 2;
> > > > > > +	__u32 val32 = 2;
> > > > > > +
> > > > > > +	__atomic_exchange(&xchg64_value, &val64, &xchg64_result, __ATOMIC_RELAXED);
> > > > > > +	__atomic_exchange(&xchg32_value, &val32, &xchg32_result, __ATOMIC_RELAXED);
> > > > > 
> > > > > Interesting to see this also works. I guess we probably won't advertise
> > > > > this, right? Currently for LLVM, the memory ordering parameter is ignored.
> > > > 
> > > > Well IIUC this specific case is fine: the ordering that you get with
> > > > BPF_[CMP]XCHG (via kernel atomic_[cmpxchg]) is sequential consistency,
> > > > and its' fine to provide a stronger ordering than the one requested. I
> > > > should change it to say __ATOMIC_SEQ_CST to avoid confusing readers,
> > > > though.
> > > > 
> > > > (I wrote it this way because I didn't see a __sync* function for
> > > > unconditional atomic exchange, and I didn't see an __atomic* function
> > > > where you don't need to specify the ordering).
> > > 
> > > For the above code,
> > >     __atomic_exchange(&xchg64_value, &val64, &xchg64_result,
> > > __ATOMIC_RELAXED);
> > > It tries to do an atomic exchange between &xchg64_value and
> > >   &val64, and store the old &xchg64_value to &xchg64_result. So it is
> > > equivalent to
> > >      xchg64_result = __sync_lock_test_and_set(&xchg64_value, val64);
> > > 
> > > So I think this test case can be dropped.
> > 
> > Ah nice, I didn't know about __sync_lock_test_and_set, let's switch to
> > that I think.
> > 
> > > > However... this led me to double-check the semantics and realise that we
> > > > do have a problem with ordering: The kernel's atomic_{add,and,or,xor} do
> > > > not imply memory barriers and therefore neither do the corresponding BPF
> > > > instructions. That means Clang can compile this:
> > > > 
> > > >    (void)__atomic_fetch_add(&val, 1, __ATOMIC_SEQ_CST)
> > > > 
> > > > to a {.code = (BPF_STX | BPF_DW | BPF_ATOMIC), .imm = BPF_ADD},
> > > > which is implemented with atomic_add, which doesn't actually satisfy
> > > > __ATOMIC_SEQ_CST.
> > > 
> > > This is the main reason in all my llvm selftests I did not use
> > > __atomic_* intrinsics because we cannot handle *different* memory
> > > ordering properly.
> > > 
> > > > 
> > > > In fact... I think this is a pre-existing issue with BPF_XADD.
> > > > 
> > > > If all I've written here is correct, the fix is to use
> > > > (void)atomic_fetch_add etc (these imply barriers) even when BPF_FETCH is
> > > > not set. And that change ought to be backported to fix BPF_XADD.
> > > 
> > > We cannot change BPF_XADD behavior. If we change BPF_XADD to use
> > > atomic_fetch_add, then suddenly old code compiled with llvm12 will
> > > suddenly requires latest kernel, which will break userland very badly.
> > 
> > Sorry I should have been more explicit: I meant that the fix would be to
> > call atomic_fetch_add but discard the return value, purely for the
> > implied barrier. The x86 JIT would stay the same. It would not break any
> > existing code, only add ordering guarantees that the user probably
> > already expected (since these builtins come from GCC originally and the
> > GCC docs say "these builtins are considered a full barrier" [1])
> > 
> > [1] https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html
> 
> This is indeed the issue. In the past, people already use gcc
> __sync_fetch_and_add() for xadd instruction for which git generated
> code does not implying barrier.
> 
> The new atomics support has the following logic:
>   . if return value is used, it is atomic_fetch_add
>   . if return value is not used, it is xadd
> The reason to do this is to preserve backward compabiility
> and this way, we can get rid of -mcpu=v4.
> 
> barrier issue is tricky and as we discussed earlier let us
> delay this after basic atomics support landed. We may not
> 100% conform to gcc __sync_fetch_and_add() or __atomic_*()
> semantics. We do need to clearly document what is expected
> in llvm and kernel.

OK, then I think we can probably justify not conforming to the
__sync_fetch_and_add() semantics since that API is under-specified
anyway.

However IMO it's unambiguously a bug for

  (void)__atomic_fetch_add(&x, y, __ATOMIC_SEQ_CST);

to compile down to a kernel atomic_add. I think for that specific API
Clang really ought to always use BPF_FETCH | BPF_ADD when
anything stronger than __ATOMIC_RELAXED is requested, or even just
refuse to compile with when the return value is ignored and a
none-relaxed memory ordering is specified.
Yonghong Song Dec. 16, 2020, 7:18 a.m. UTC | #7
On 12/15/20 3:12 AM, Brendan Jackman wrote:
> On Tue, Dec 08, 2020 at 10:15:35AM -0800, Yonghong Song wrote:
>>
>>
>> On 12/8/20 8:59 AM, Brendan Jackman wrote:
>>> On Tue, Dec 08, 2020 at 08:38:04AM -0800, Yonghong Song wrote:
>>>>
>>>>
>>>> On 12/8/20 4:41 AM, Brendan Jackman wrote:
>>>>> On Mon, Dec 07, 2020 at 07:18:57PM -0800, Yonghong Song wrote:
>>>>>>
>>>>>>
>>>>>> On 12/7/20 8:07 AM, Brendan Jackman wrote:
>>>>>>> The prog_test that's added depends on Clang/LLVM features added by
>>>>>>> Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184    ).
>>>>>>>
>>>>>>> Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
>>>>>>> to:
>>>>>>>
>>>>>>>      - Avoid breaking the build for people on old versions of Clang
>>>>>>>      - Avoid needing separate lists of test objects for no_alu32, where
>>>>>>>        atomics are not supported even if Clang has the feature.
>>>>>>>
>>>>>>> The atomics_test.o BPF object is built unconditionally both for
>>>>>>> test_progs and test_progs-no_alu32. For test_progs, if Clang supports
>>>>>>> atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
>>>>>>> test code. Otherwise, progs and global vars are defined anyway, as
>>>>>>> stubs; this means that the skeleton user code still builds.
>>>>>>>
>>>>>>> The atomics_test.o userspace object is built once and used for both
>>>>>>> test_progs and test_progs-no_alu32. A variable called skip_tests is
>>>>>>> defined in the BPF object's data section, which tells the userspace
>>>>>>> object whether to skip the atomics test.
>>>>>>>
>>>>>>> Signed-off-by: Brendan Jackman <jackmanb@google.com>
>>>>>>
>>>>>> Ack with minor comments below.
>>>>>>
>>>>>> Acked-by: Yonghong Song <yhs@fb.com>
>>>>>>
>>>>>>> ---
>>>>>>>      tools/testing/selftests/bpf/Makefile          |  10 +
>>>>>>>      .../selftests/bpf/prog_tests/atomics.c        | 246 ++++++++++++++++++
>>>>>>>      tools/testing/selftests/bpf/progs/atomics.c   | 154 +++++++++++
>>>>>>>      .../selftests/bpf/verifier/atomic_and.c       |  77 ++++++
>>>>>>>      .../selftests/bpf/verifier/atomic_cmpxchg.c   |  96 +++++++
>>>>>>>      .../selftests/bpf/verifier/atomic_fetch_add.c | 106 ++++++++
>>>>>>>      .../selftests/bpf/verifier/atomic_or.c        |  77 ++++++
>>>>>>>      .../selftests/bpf/verifier/atomic_xchg.c      |  46 ++++
>>>>>>>      .../selftests/bpf/verifier/atomic_xor.c       |  77 ++++++
>>>>>>>      9 files changed, 889 insertions(+)
>>>>>>>      create mode 100644 tools/testing/selftests/bpf/prog_tests/atomics.c
>>>>>>>      create mode 100644 tools/testing/selftests/bpf/progs/atomics.c
>>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_and.c
>>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
>>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
>>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_or.c
>>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xchg.c
>>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xor.c
>>>>>>>
>>>>>>> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
>>>>>>> index ac25ba5d0d6c..13bc1d736164 100644
>>>>>>> --- a/tools/testing/selftests/bpf/Makefile
>>>>>>> +++ b/tools/testing/selftests/bpf/Makefile
>>>>>>> @@ -239,6 +239,12 @@ BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) 			\
>>>>>>>      	     -I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR)			\
>>>>>>>      	     -I$(abspath $(OUTPUT)/../usr/include)
>>>>>>> +# BPF atomics support was added to Clang in llvm-project commit 286daafd6512
>>>>>>> +# (release 12.0.0).
>>>>>>> +BPF_ATOMICS_SUPPORTED = $(shell \
>>>>>>> +	echo "int x = 0; int foo(void) { return __sync_val_compare_and_swap(&x, 1, 2); }" \
>>>>>>> +	| $(CLANG) -x cpp-output -S -target bpf -mcpu=v3 - -o /dev/null && echo 1 || echo 0)
>>>>>>
>>>>>> '-x c' here more intuitive?
>>>>>>
>>>>>>> +
>>>>>>>      CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
>>>>>>>      	       -Wno-compare-distinct-pointer-types
>>>>>>> @@ -399,11 +405,15 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
>>>>>>>      		       $(wildcard progs/btf_dump_test_case_*.c)
>>>>>>>      TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
>>>>>>>      TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
>>>>>>> +ifeq ($(BPF_ATOMICS_SUPPORTED),1)
>>>>>>> +  TRUNNER_BPF_CFLAGS += -DENABLE_ATOMICS_TESTS
>>>>>>> +endif
>>>>>>>      TRUNNER_BPF_LDFLAGS := -mattr=+alu32
>>>>>>>      $(eval $(call DEFINE_TEST_RUNNER,test_progs))
>>>>>>>      # Define test_progs-no_alu32 test runner.
>>>>>>>      TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
>>>>>>> +TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
>>>>>>>      TRUNNER_BPF_LDFLAGS :=
>>>>>>>      $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
>>>>>>> diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
>>>>>>> new file mode 100644
>>>>>>> index 000000000000..c841a3abc2f7
>>>>>>> --- /dev/null
>>>>>>> +++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
>>>>>>> @@ -0,0 +1,246 @@
>>>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>>>> +
>>>>>>> +#include <test_progs.h>
>>>>>>> +
>>>>>>> +#include "atomics.skel.h"
>>>>>>> +
>>>>>>> +static void test_add(struct atomics *skel)
>>>>>>> +{
>>>>>>> +	int err, prog_fd;
>>>>>>> +	__u32 duration = 0, retval;
>>>>>>> +	struct bpf_link *link;
>>>>>>> +
>>>>>>> +	link = bpf_program__attach(skel->progs.add);
>>>>>>> +	if (CHECK(IS_ERR(link), "attach(add)", "err: %ld\n", PTR_ERR(link)))
>>>>>>> +		return;
>>>>>>> +
>>>>>>> +	prog_fd = bpf_program__fd(skel->progs.add);
>>>>>>> +	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
>>>>>>> +				NULL, NULL, &retval, &duration);
>>>>>>> +	if (CHECK(err || retval, "test_run add",
>>>>>>> +		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
>>>>>>> +		goto cleanup;
>>>>>>> +
>>>>>>> +	ASSERT_EQ(skel->data->add64_value, 3, "add64_value");
>>>>>>> +	ASSERT_EQ(skel->bss->add64_result, 1, "add64_result");
>>>>>>> +
>>>>>>> +	ASSERT_EQ(skel->data->add32_value, 3, "add32_value");
>>>>>>> +	ASSERT_EQ(skel->bss->add32_result, 1, "add32_result");
>>>>>>> +
>>>>>>> +	ASSERT_EQ(skel->bss->add_stack_value_copy, 3, "add_stack_value");
>>>>>>> +	ASSERT_EQ(skel->bss->add_stack_result, 1, "add_stack_result");
>>>>>>> +
>>>>>>> +	ASSERT_EQ(skel->data->add_noreturn_value, 3, "add_noreturn_value");
>>>>>>> +
>>>>>>> +cleanup:
>>>>>>> +	bpf_link__destroy(link);
>>>>>>> +}
>>>>>>> +
>>>>>> [...]
>>>>>>> +
>>>>>>> +__u64 xchg64_value = 1;
>>>>>>> +__u64 xchg64_result = 0;
>>>>>>> +__u32 xchg32_value = 1;
>>>>>>> +__u32 xchg32_result = 0;
>>>>>>> +
>>>>>>> +SEC("fentry/bpf_fentry_test1")
>>>>>>> +int BPF_PROG(xchg, int a)
>>>>>>> +{
>>>>>>> +#ifdef ENABLE_ATOMICS_TESTS
>>>>>>> +	__u64 val64 = 2;
>>>>>>> +	__u32 val32 = 2;
>>>>>>> +
>>>>>>> +	__atomic_exchange(&xchg64_value, &val64, &xchg64_result, __ATOMIC_RELAXED);
>>>>>>> +	__atomic_exchange(&xchg32_value, &val32, &xchg32_result, __ATOMIC_RELAXED);
>>>>>>
>>>>>> Interesting to see this also works. I guess we probably won't advertise
>>>>>> this, right? Currently for LLVM, the memory ordering parameter is ignored.
>>>>>
>>>>> Well IIUC this specific case is fine: the ordering that you get with
>>>>> BPF_[CMP]XCHG (via kernel atomic_[cmpxchg]) is sequential consistency,
>>>>> and its' fine to provide a stronger ordering than the one requested. I
>>>>> should change it to say __ATOMIC_SEQ_CST to avoid confusing readers,
>>>>> though.
>>>>>
>>>>> (I wrote it this way because I didn't see a __sync* function for
>>>>> unconditional atomic exchange, and I didn't see an __atomic* function
>>>>> where you don't need to specify the ordering).
>>>>
>>>> For the above code,
>>>>      __atomic_exchange(&xchg64_value, &val64, &xchg64_result,
>>>> __ATOMIC_RELAXED);
>>>> It tries to do an atomic exchange between &xchg64_value and
>>>>    &val64, and store the old &xchg64_value to &xchg64_result. So it is
>>>> equivalent to
>>>>       xchg64_result = __sync_lock_test_and_set(&xchg64_value, val64);
>>>>
>>>> So I think this test case can be dropped.
>>>
>>> Ah nice, I didn't know about __sync_lock_test_and_set, let's switch to
>>> that I think.
>>>
>>>>> However... this led me to double-check the semantics and realise that we
>>>>> do have a problem with ordering: The kernel's atomic_{add,and,or,xor} do
>>>>> not imply memory barriers and therefore neither do the corresponding BPF
>>>>> instructions. That means Clang can compile this:
>>>>>
>>>>>     (void)__atomic_fetch_add(&val, 1, __ATOMIC_SEQ_CST)
>>>>>
>>>>> to a {.code = (BPF_STX | BPF_DW | BPF_ATOMIC), .imm = BPF_ADD},
>>>>> which is implemented with atomic_add, which doesn't actually satisfy
>>>>> __ATOMIC_SEQ_CST.
>>>>
>>>> This is the main reason in all my llvm selftests I did not use
>>>> __atomic_* intrinsics because we cannot handle *different* memory
>>>> ordering properly.
>>>>
>>>>>
>>>>> In fact... I think this is a pre-existing issue with BPF_XADD.
>>>>>
>>>>> If all I've written here is correct, the fix is to use
>>>>> (void)atomic_fetch_add etc (these imply barriers) even when BPF_FETCH is
>>>>> not set. And that change ought to be backported to fix BPF_XADD.
>>>>
>>>> We cannot change BPF_XADD behavior. If we change BPF_XADD to use
>>>> atomic_fetch_add, then suddenly old code compiled with llvm12 will
>>>> suddenly requires latest kernel, which will break userland very badly.
>>>
>>> Sorry I should have been more explicit: I meant that the fix would be to
>>> call atomic_fetch_add but discard the return value, purely for the
>>> implied barrier. The x86 JIT would stay the same. It would not break any
>>> existing code, only add ordering guarantees that the user probably
>>> already expected (since these builtins come from GCC originally and the
>>> GCC docs say "these builtins are considered a full barrier" [1])
>>>
>>> [1] https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html
>>
>> This is indeed the issue. In the past, people already use gcc
>> __sync_fetch_and_add() for xadd instruction for which git generated
>> code does not implying barrier.
>>
>> The new atomics support has the following logic:
>>    . if return value is used, it is atomic_fetch_add
>>    . if return value is not used, it is xadd
>> The reason to do this is to preserve backward compabiility
>> and this way, we can get rid of -mcpu=v4.
>>
>> barrier issue is tricky and as we discussed earlier let us
>> delay this after basic atomics support landed. We may not
>> 100% conform to gcc __sync_fetch_and_add() or __atomic_*()
>> semantics. We do need to clearly document what is expected
>> in llvm and kernel.
> 
> OK, then I think we can probably justify not conforming to the
> __sync_fetch_and_add() semantics since that API is under-specified
> anyway.
> 
> However IMO it's unambiguously a bug for
> 
>    (void)__atomic_fetch_add(&x, y, __ATOMIC_SEQ_CST);
> 
> to compile down to a kernel atomic_add. I think for that specific API
> Clang really ought to always use BPF_FETCH | BPF_ADD when
> anything stronger than __ATOMIC_RELAXED is requested, or even just
> refuse to compile with when the return value is ignored and a
> none-relaxed memory ordering is specified.

Both the following codes:
    (void)__sync_fetch_and_add(p, a);
    (void)__atomic_fetch_add(p, a, __ATOMIC_SEQ_CST);

will generate the same IR:
    %0 = atomicrmw add i32* %p, i32 %a seq_cst

Basically that means for old compiler (<= llvm11),
   (void)__atomic_fetch_add(&x, y, __ATOMIC_SEQ_CST)
already generates xadd.
Brendan Jackman Dec. 16, 2020, 11:51 a.m. UTC | #8
On Wed, 16 Dec 2020 at 08:19, Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 12/15/20 3:12 AM, Brendan Jackman wrote:
> > On Tue, Dec 08, 2020 at 10:15:35AM -0800, Yonghong Song wrote:
> >>
> >>
> >> On 12/8/20 8:59 AM, Brendan Jackman wrote:
> >>> On Tue, Dec 08, 2020 at 08:38:04AM -0800, Yonghong Song wrote:
> >>>>
> >>>>
> >>>> On 12/8/20 4:41 AM, Brendan Jackman wrote:
> >>>>> On Mon, Dec 07, 2020 at 07:18:57PM -0800, Yonghong Song wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 12/7/20 8:07 AM, Brendan Jackman wrote:
> >>>>>>> The prog_test that's added depends on Clang/LLVM features added by
> >>>>>>> Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184    ).
> >>>>>>>
> >>>>>>> Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
> >>>>>>> to:
> >>>>>>>
> >>>>>>>      - Avoid breaking the build for people on old versions of Clang
> >>>>>>>      - Avoid needing separate lists of test objects for no_alu32, where
> >>>>>>>        atomics are not supported even if Clang has the feature.
> >>>>>>>
> >>>>>>> The atomics_test.o BPF object is built unconditionally both for
> >>>>>>> test_progs and test_progs-no_alu32. For test_progs, if Clang supports
> >>>>>>> atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
> >>>>>>> test code. Otherwise, progs and global vars are defined anyway, as
> >>>>>>> stubs; this means that the skeleton user code still builds.
> >>>>>>>
> >>>>>>> The atomics_test.o userspace object is built once and used for both
> >>>>>>> test_progs and test_progs-no_alu32. A variable called skip_tests is
> >>>>>>> defined in the BPF object's data section, which tells the userspace
> >>>>>>> object whether to skip the atomics test.
> >>>>>>>
> >>>>>>> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> >>>>>>
> >>>>>> Ack with minor comments below.
> >>>>>>
> >>>>>> Acked-by: Yonghong Song <yhs@fb.com>
> >>>>>>
> >>>>>>> ---
> >>>>>>>      tools/testing/selftests/bpf/Makefile          |  10 +
> >>>>>>>      .../selftests/bpf/prog_tests/atomics.c        | 246 ++++++++++++++++++
> >>>>>>>      tools/testing/selftests/bpf/progs/atomics.c   | 154 +++++++++++
> >>>>>>>      .../selftests/bpf/verifier/atomic_and.c       |  77 ++++++
> >>>>>>>      .../selftests/bpf/verifier/atomic_cmpxchg.c   |  96 +++++++
> >>>>>>>      .../selftests/bpf/verifier/atomic_fetch_add.c | 106 ++++++++
> >>>>>>>      .../selftests/bpf/verifier/atomic_or.c        |  77 ++++++
> >>>>>>>      .../selftests/bpf/verifier/atomic_xchg.c      |  46 ++++
> >>>>>>>      .../selftests/bpf/verifier/atomic_xor.c       |  77 ++++++
> >>>>>>>      9 files changed, 889 insertions(+)
> >>>>>>>      create mode 100644 tools/testing/selftests/bpf/prog_tests/atomics.c
> >>>>>>>      create mode 100644 tools/testing/selftests/bpf/progs/atomics.c
> >>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_and.c
> >>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
> >>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
> >>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_or.c
> >>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xchg.c
> >>>>>>>      create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xor.c
> >>>>>>>
> >>>>>>> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> >>>>>>> index ac25ba5d0d6c..13bc1d736164 100644
> >>>>>>> --- a/tools/testing/selftests/bpf/Makefile
> >>>>>>> +++ b/tools/testing/selftests/bpf/Makefile
> >>>>>>> @@ -239,6 +239,12 @@ BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN)                      \
> >>>>>>>              -I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR)                   \
> >>>>>>>              -I$(abspath $(OUTPUT)/../usr/include)
> >>>>>>> +# BPF atomics support was added to Clang in llvm-project commit 286daafd6512
> >>>>>>> +# (release 12.0.0).
> >>>>>>> +BPF_ATOMICS_SUPPORTED = $(shell \
> >>>>>>> +       echo "int x = 0; int foo(void) { return __sync_val_compare_and_swap(&x, 1, 2); }" \
> >>>>>>> +       | $(CLANG) -x cpp-output -S -target bpf -mcpu=v3 - -o /dev/null && echo 1 || echo 0)
> >>>>>>
> >>>>>> '-x c' here more intuitive?
> >>>>>>
> >>>>>>> +
> >>>>>>>      CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
> >>>>>>>                -Wno-compare-distinct-pointer-types
> >>>>>>> @@ -399,11 +405,15 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko    \
> >>>>>>>                        $(wildcard progs/btf_dump_test_case_*.c)
> >>>>>>>      TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
> >>>>>>>      TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
> >>>>>>> +ifeq ($(BPF_ATOMICS_SUPPORTED),1)
> >>>>>>> +  TRUNNER_BPF_CFLAGS += -DENABLE_ATOMICS_TESTS
> >>>>>>> +endif
> >>>>>>>      TRUNNER_BPF_LDFLAGS := -mattr=+alu32
> >>>>>>>      $(eval $(call DEFINE_TEST_RUNNER,test_progs))
> >>>>>>>      # Define test_progs-no_alu32 test runner.
> >>>>>>>      TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
> >>>>>>> +TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
> >>>>>>>      TRUNNER_BPF_LDFLAGS :=
> >>>>>>>      $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
> >>>>>>> diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
> >>>>>>> new file mode 100644
> >>>>>>> index 000000000000..c841a3abc2f7
> >>>>>>> --- /dev/null
> >>>>>>> +++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
> >>>>>>> @@ -0,0 +1,246 @@
> >>>>>>> +// SPDX-License-Identifier: GPL-2.0
> >>>>>>> +
> >>>>>>> +#include <test_progs.h>
> >>>>>>> +
> >>>>>>> +#include "atomics.skel.h"
> >>>>>>> +
> >>>>>>> +static void test_add(struct atomics *skel)
> >>>>>>> +{
> >>>>>>> +       int err, prog_fd;
> >>>>>>> +       __u32 duration = 0, retval;
> >>>>>>> +       struct bpf_link *link;
> >>>>>>> +
> >>>>>>> +       link = bpf_program__attach(skel->progs.add);
> >>>>>>> +       if (CHECK(IS_ERR(link), "attach(add)", "err: %ld\n", PTR_ERR(link)))
> >>>>>>> +               return;
> >>>>>>> +
> >>>>>>> +       prog_fd = bpf_program__fd(skel->progs.add);
> >>>>>>> +       err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
> >>>>>>> +                               NULL, NULL, &retval, &duration);
> >>>>>>> +       if (CHECK(err || retval, "test_run add",
> >>>>>>> +                 "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
> >>>>>>> +               goto cleanup;
> >>>>>>> +
> >>>>>>> +       ASSERT_EQ(skel->data->add64_value, 3, "add64_value");
> >>>>>>> +       ASSERT_EQ(skel->bss->add64_result, 1, "add64_result");
> >>>>>>> +
> >>>>>>> +       ASSERT_EQ(skel->data->add32_value, 3, "add32_value");
> >>>>>>> +       ASSERT_EQ(skel->bss->add32_result, 1, "add32_result");
> >>>>>>> +
> >>>>>>> +       ASSERT_EQ(skel->bss->add_stack_value_copy, 3, "add_stack_value");
> >>>>>>> +       ASSERT_EQ(skel->bss->add_stack_result, 1, "add_stack_result");
> >>>>>>> +
> >>>>>>> +       ASSERT_EQ(skel->data->add_noreturn_value, 3, "add_noreturn_value");
> >>>>>>> +
> >>>>>>> +cleanup:
> >>>>>>> +       bpf_link__destroy(link);
> >>>>>>> +}
> >>>>>>> +
> >>>>>> [...]
> >>>>>>> +
> >>>>>>> +__u64 xchg64_value = 1;
> >>>>>>> +__u64 xchg64_result = 0;
> >>>>>>> +__u32 xchg32_value = 1;
> >>>>>>> +__u32 xchg32_result = 0;
> >>>>>>> +
> >>>>>>> +SEC("fentry/bpf_fentry_test1")
> >>>>>>> +int BPF_PROG(xchg, int a)
> >>>>>>> +{
> >>>>>>> +#ifdef ENABLE_ATOMICS_TESTS
> >>>>>>> +       __u64 val64 = 2;
> >>>>>>> +       __u32 val32 = 2;
> >>>>>>> +
> >>>>>>> +       __atomic_exchange(&xchg64_value, &val64, &xchg64_result, __ATOMIC_RELAXED);
> >>>>>>> +       __atomic_exchange(&xchg32_value, &val32, &xchg32_result, __ATOMIC_RELAXED);
> >>>>>>
> >>>>>> Interesting to see this also works. I guess we probably won't advertise
> >>>>>> this, right? Currently for LLVM, the memory ordering parameter is ignored.
> >>>>>
> >>>>> Well IIUC this specific case is fine: the ordering that you get with
> >>>>> BPF_[CMP]XCHG (via kernel atomic_[cmpxchg]) is sequential consistency,
> >>>>> and its' fine to provide a stronger ordering than the one requested. I
> >>>>> should change it to say __ATOMIC_SEQ_CST to avoid confusing readers,
> >>>>> though.
> >>>>>
> >>>>> (I wrote it this way because I didn't see a __sync* function for
> >>>>> unconditional atomic exchange, and I didn't see an __atomic* function
> >>>>> where you don't need to specify the ordering).
> >>>>
> >>>> For the above code,
> >>>>      __atomic_exchange(&xchg64_value, &val64, &xchg64_result,
> >>>> __ATOMIC_RELAXED);
> >>>> It tries to do an atomic exchange between &xchg64_value and
> >>>>    &val64, and store the old &xchg64_value to &xchg64_result. So it is
> >>>> equivalent to
> >>>>       xchg64_result = __sync_lock_test_and_set(&xchg64_value, val64);
> >>>>
> >>>> So I think this test case can be dropped.
> >>>
> >>> Ah nice, I didn't know about __sync_lock_test_and_set, let's switch to
> >>> that I think.
> >>>
> >>>>> However... this led me to double-check the semantics and realise that we
> >>>>> do have a problem with ordering: The kernel's atomic_{add,and,or,xor} do
> >>>>> not imply memory barriers and therefore neither do the corresponding BPF
> >>>>> instructions. That means Clang can compile this:
> >>>>>
> >>>>>     (void)__atomic_fetch_add(&val, 1, __ATOMIC_SEQ_CST)
> >>>>>
> >>>>> to a {.code = (BPF_STX | BPF_DW | BPF_ATOMIC), .imm = BPF_ADD},
> >>>>> which is implemented with atomic_add, which doesn't actually satisfy
> >>>>> __ATOMIC_SEQ_CST.
> >>>>
> >>>> This is the main reason in all my llvm selftests I did not use
> >>>> __atomic_* intrinsics because we cannot handle *different* memory
> >>>> ordering properly.
> >>>>
> >>>>>
> >>>>> In fact... I think this is a pre-existing issue with BPF_XADD.
> >>>>>
> >>>>> If all I've written here is correct, the fix is to use
> >>>>> (void)atomic_fetch_add etc (these imply barriers) even when BPF_FETCH is
> >>>>> not set. And that change ought to be backported to fix BPF_XADD.
> >>>>
> >>>> We cannot change BPF_XADD behavior. If we change BPF_XADD to use
> >>>> atomic_fetch_add, then suddenly old code compiled with llvm12 will
> >>>> suddenly requires latest kernel, which will break userland very badly.
> >>>
> >>> Sorry I should have been more explicit: I meant that the fix would be to
> >>> call atomic_fetch_add but discard the return value, purely for the
> >>> implied barrier. The x86 JIT would stay the same. It would not break any
> >>> existing code, only add ordering guarantees that the user probably
> >>> already expected (since these builtins come from GCC originally and the
> >>> GCC docs say "these builtins are considered a full barrier" [1])
> >>>
> >>> [1] https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html
> >>
> >> This is indeed the issue. In the past, people already use gcc
> >> __sync_fetch_and_add() for xadd instruction for which git generated
> >> code does not implying barrier.
> >>
> >> The new atomics support has the following logic:
> >>    . if return value is used, it is atomic_fetch_add
> >>    . if return value is not used, it is xadd
> >> The reason to do this is to preserve backward compabiility
> >> and this way, we can get rid of -mcpu=v4.
> >>
> >> barrier issue is tricky and as we discussed earlier let us
> >> delay this after basic atomics support landed. We may not
> >> 100% conform to gcc __sync_fetch_and_add() or __atomic_*()
> >> semantics. We do need to clearly document what is expected
> >> in llvm and kernel.
> >
> > OK, then I think we can probably justify not conforming to the
> > __sync_fetch_and_add() semantics since that API is under-specified
> > anyway.
> >
> > However IMO it's unambiguously a bug for
> >
> >    (void)__atomic_fetch_add(&x, y, __ATOMIC_SEQ_CST);
> >
> > to compile down to a kernel atomic_add. I think for that specific API
> > Clang really ought to always use BPF_FETCH | BPF_ADD when
> > anything stronger than __ATOMIC_RELAXED is requested, or even just
> > refuse to compile with when the return value is ignored and a
> > none-relaxed memory ordering is specified.
>
> Both the following codes:
>     (void)__sync_fetch_and_add(p, a);
>     (void)__atomic_fetch_add(p, a, __ATOMIC_SEQ_CST);
>
> will generate the same IR:
>     %0 = atomicrmw add i32* %p, i32 %a seq_cst
>
> Basically that means for old compiler (<= llvm11),
>    (void)__atomic_fetch_add(&x, y, __ATOMIC_SEQ_CST)
> already generates xadd.

Ah, I didn't realise that was already the case, that's unfortunate.

For users of newer Clang with alu32 enabled, unless I'm being naïve
this could be fixed without breaking compatibility. Clang could just
start generating a BPF_ADD|BPF_FETCH, and then handle the fact that
the src_reg is clobbered, right?

For users without alu32 enabled I would actually argue that the new
Clang should start failing to build that code - as a user I'd much
rather have my build suddenly fail than my explicitly-stated ordering
assumptions violated. But I understand if that doesn't seem too
palatable...
Yonghong Song Dec. 16, 2020, 8 p.m. UTC | #9
On 12/16/20 3:51 AM, Brendan Jackman wrote:
> On Wed, 16 Dec 2020 at 08:19, Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 12/15/20 3:12 AM, Brendan Jackman wrote:
>>> On Tue, Dec 08, 2020 at 10:15:35AM -0800, Yonghong Song wrote:
>>>>
>>>>
>>>> On 12/8/20 8:59 AM, Brendan Jackman wrote:
>>>>> On Tue, Dec 08, 2020 at 08:38:04AM -0800, Yonghong Song wrote:
>>>>>>
>>>>>>
>>>>>> On 12/8/20 4:41 AM, Brendan Jackman wrote:
>>>>>>> On Mon, Dec 07, 2020 at 07:18:57PM -0800, Yonghong Song wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/7/20 8:07 AM, Brendan Jackman wrote:
>>>>>>>>> The prog_test that's added depends on Clang/LLVM features added by
>>>>>>>>> Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184     ).
>>>>>>>>>
>>>>>>>>> Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
>>>>>>>>> to:
>>>>>>>>>
>>>>>>>>>       - Avoid breaking the build for people on old versions of Clang
>>>>>>>>>       - Avoid needing separate lists of test objects for no_alu32, where
>>>>>>>>>         atomics are not supported even if Clang has the feature.
>>>>>>>>>
>>>>>>>>> The atomics_test.o BPF object is built unconditionally both for
>>>>>>>>> test_progs and test_progs-no_alu32. For test_progs, if Clang supports
>>>>>>>>> atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
>>>>>>>>> test code. Otherwise, progs and global vars are defined anyway, as
>>>>>>>>> stubs; this means that the skeleton user code still builds.
>>>>>>>>>
>>>>>>>>> The atomics_test.o userspace object is built once and used for both
>>>>>>>>> test_progs and test_progs-no_alu32. A variable called skip_tests is
>>>>>>>>> defined in the BPF object's data section, which tells the userspace
>>>>>>>>> object whether to skip the atomics test.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Brendan Jackman <jackmanb@google.com>
>>>>>>>>
>>>>>>>> Ack with minor comments below.
>>>>>>>>
>>>>>>>> Acked-by: Yonghong Song <yhs@fb.com>
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>       tools/testing/selftests/bpf/Makefile          |  10 +
>>>>>>>>>       .../selftests/bpf/prog_tests/atomics.c        | 246 ++++++++++++++++++
>>>>>>>>>       tools/testing/selftests/bpf/progs/atomics.c   | 154 +++++++++++
>>>>>>>>>       .../selftests/bpf/verifier/atomic_and.c       |  77 ++++++
>>>>>>>>>       .../selftests/bpf/verifier/atomic_cmpxchg.c   |  96 +++++++
>>>>>>>>>       .../selftests/bpf/verifier/atomic_fetch_add.c | 106 ++++++++
>>>>>>>>>       .../selftests/bpf/verifier/atomic_or.c        |  77 ++++++
>>>>>>>>>       .../selftests/bpf/verifier/atomic_xchg.c      |  46 ++++
>>>>>>>>>       .../selftests/bpf/verifier/atomic_xor.c       |  77 ++++++
>>>>>>>>>       9 files changed, 889 insertions(+)
>>>>>>>>>       create mode 100644 tools/testing/selftests/bpf/prog_tests/atomics.c
>>>>>>>>>       create mode 100644 tools/testing/selftests/bpf/progs/atomics.c
>>>>>>>>>       create mode 100644 tools/testing/selftests/bpf/verifier/atomic_and.c
>>>>>>>>>       create mode 100644 tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
>>>>>>>>>       create mode 100644 tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
>>>>>>>>>       create mode 100644 tools/testing/selftests/bpf/verifier/atomic_or.c
>>>>>>>>>       create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xchg.c
>>>>>>>>>       create mode 100644 tools/testing/selftests/bpf/verifier/atomic_xor.c
>>>>>>>>>
>>>>>>>>> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
>>>>>>>>> index ac25ba5d0d6c..13bc1d736164 100644
>>>>>>>>> --- a/tools/testing/selftests/bpf/Makefile
>>>>>>>>> +++ b/tools/testing/selftests/bpf/Makefile
>>>>>>>>> @@ -239,6 +239,12 @@ BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN)                      \
>>>>>>>>>               -I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR)                   \
>>>>>>>>>               -I$(abspath $(OUTPUT)/../usr/include)
>>>>>>>>> +# BPF atomics support was added to Clang in llvm-project commit 286daafd6512
>>>>>>>>> +# (release 12.0.0).
>>>>>>>>> +BPF_ATOMICS_SUPPORTED = $(shell \
>>>>>>>>> +       echo "int x = 0; int foo(void) { return __sync_val_compare_and_swap(&x, 1, 2); }" \
>>>>>>>>> +       | $(CLANG) -x cpp-output -S -target bpf -mcpu=v3 - -o /dev/null && echo 1 || echo 0)
>>>>>>>>
>>>>>>>> '-x c' here more intuitive?
>>>>>>>>
>>>>>>>>> +
>>>>>>>>>       CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
>>>>>>>>>                 -Wno-compare-distinct-pointer-types
>>>>>>>>> @@ -399,11 +405,15 @@ TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko    \
>>>>>>>>>                         $(wildcard progs/btf_dump_test_case_*.c)
>>>>>>>>>       TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
>>>>>>>>>       TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
>>>>>>>>> +ifeq ($(BPF_ATOMICS_SUPPORTED),1)
>>>>>>>>> +  TRUNNER_BPF_CFLAGS += -DENABLE_ATOMICS_TESTS
>>>>>>>>> +endif
>>>>>>>>>       TRUNNER_BPF_LDFLAGS := -mattr=+alu32
>>>>>>>>>       $(eval $(call DEFINE_TEST_RUNNER,test_progs))
>>>>>>>>>       # Define test_progs-no_alu32 test runner.
>>>>>>>>>       TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
>>>>>>>>> +TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
>>>>>>>>>       TRUNNER_BPF_LDFLAGS :=
>>>>>>>>>       $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
>>>>>>>>> diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
>>>>>>>>> new file mode 100644
>>>>>>>>> index 000000000000..c841a3abc2f7
>>>>>>>>> --- /dev/null
>>>>>>>>> +++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
>>>>>>>>> @@ -0,0 +1,246 @@
>>>>>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>>>>>> +
>>>>>>>>> +#include <test_progs.h>
>>>>>>>>> +
>>>>>>>>> +#include "atomics.skel.h"
>>>>>>>>> +
>>>>>>>>> +static void test_add(struct atomics *skel)
>>>>>>>>> +{
>>>>>>>>> +       int err, prog_fd;
>>>>>>>>> +       __u32 duration = 0, retval;
>>>>>>>>> +       struct bpf_link *link;
>>>>>>>>> +
>>>>>>>>> +       link = bpf_program__attach(skel->progs.add);
>>>>>>>>> +       if (CHECK(IS_ERR(link), "attach(add)", "err: %ld\n", PTR_ERR(link)))
>>>>>>>>> +               return;
>>>>>>>>> +
>>>>>>>>> +       prog_fd = bpf_program__fd(skel->progs.add);
>>>>>>>>> +       err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
>>>>>>>>> +                               NULL, NULL, &retval, &duration);
>>>>>>>>> +       if (CHECK(err || retval, "test_run add",
>>>>>>>>> +                 "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
>>>>>>>>> +               goto cleanup;
>>>>>>>>> +
>>>>>>>>> +       ASSERT_EQ(skel->data->add64_value, 3, "add64_value");
>>>>>>>>> +       ASSERT_EQ(skel->bss->add64_result, 1, "add64_result");
>>>>>>>>> +
>>>>>>>>> +       ASSERT_EQ(skel->data->add32_value, 3, "add32_value");
>>>>>>>>> +       ASSERT_EQ(skel->bss->add32_result, 1, "add32_result");
>>>>>>>>> +
>>>>>>>>> +       ASSERT_EQ(skel->bss->add_stack_value_copy, 3, "add_stack_value");
>>>>>>>>> +       ASSERT_EQ(skel->bss->add_stack_result, 1, "add_stack_result");
>>>>>>>>> +
>>>>>>>>> +       ASSERT_EQ(skel->data->add_noreturn_value, 3, "add_noreturn_value");
>>>>>>>>> +
>>>>>>>>> +cleanup:
>>>>>>>>> +       bpf_link__destroy(link);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>> [...]
>>>>>>>>> +
>>>>>>>>> +__u64 xchg64_value = 1;
>>>>>>>>> +__u64 xchg64_result = 0;
>>>>>>>>> +__u32 xchg32_value = 1;
>>>>>>>>> +__u32 xchg32_result = 0;
>>>>>>>>> +
>>>>>>>>> +SEC("fentry/bpf_fentry_test1")
>>>>>>>>> +int BPF_PROG(xchg, int a)
>>>>>>>>> +{
>>>>>>>>> +#ifdef ENABLE_ATOMICS_TESTS
>>>>>>>>> +       __u64 val64 = 2;
>>>>>>>>> +       __u32 val32 = 2;
>>>>>>>>> +
>>>>>>>>> +       __atomic_exchange(&xchg64_value, &val64, &xchg64_result, __ATOMIC_RELAXED);
>>>>>>>>> +       __atomic_exchange(&xchg32_value, &val32, &xchg32_result, __ATOMIC_RELAXED);
>>>>>>>>
>>>>>>>> Interesting to see this also works. I guess we probably won't advertise
>>>>>>>> this, right? Currently for LLVM, the memory ordering parameter is ignored.
>>>>>>>
>>>>>>> Well IIUC this specific case is fine: the ordering that you get with
>>>>>>> BPF_[CMP]XCHG (via kernel atomic_[cmpxchg]) is sequential consistency,
>>>>>>> and its' fine to provide a stronger ordering than the one requested. I
>>>>>>> should change it to say __ATOMIC_SEQ_CST to avoid confusing readers,
>>>>>>> though.
>>>>>>>
>>>>>>> (I wrote it this way because I didn't see a __sync* function for
>>>>>>> unconditional atomic exchange, and I didn't see an __atomic* function
>>>>>>> where you don't need to specify the ordering).
>>>>>>
>>>>>> For the above code,
>>>>>>       __atomic_exchange(&xchg64_value, &val64, &xchg64_result,
>>>>>> __ATOMIC_RELAXED);
>>>>>> It tries to do an atomic exchange between &xchg64_value and
>>>>>>     &val64, and store the old &xchg64_value to &xchg64_result. So it is
>>>>>> equivalent to
>>>>>>        xchg64_result = __sync_lock_test_and_set(&xchg64_value, val64);
>>>>>>
>>>>>> So I think this test case can be dropped.
>>>>>
>>>>> Ah nice, I didn't know about __sync_lock_test_and_set, let's switch to
>>>>> that I think.
>>>>>
>>>>>>> However... this led me to double-check the semantics and realise that we
>>>>>>> do have a problem with ordering: The kernel's atomic_{add,and,or,xor} do
>>>>>>> not imply memory barriers and therefore neither do the corresponding BPF
>>>>>>> instructions. That means Clang can compile this:
>>>>>>>
>>>>>>>      (void)__atomic_fetch_add(&val, 1, __ATOMIC_SEQ_CST)
>>>>>>>
>>>>>>> to a {.code = (BPF_STX | BPF_DW | BPF_ATOMIC), .imm = BPF_ADD},
>>>>>>> which is implemented with atomic_add, which doesn't actually satisfy
>>>>>>> __ATOMIC_SEQ_CST.
>>>>>>
>>>>>> This is the main reason in all my llvm selftests I did not use
>>>>>> __atomic_* intrinsics because we cannot handle *different* memory
>>>>>> ordering properly.
>>>>>>
>>>>>>>
>>>>>>> In fact... I think this is a pre-existing issue with BPF_XADD.
>>>>>>>
>>>>>>> If all I've written here is correct, the fix is to use
>>>>>>> (void)atomic_fetch_add etc (these imply barriers) even when BPF_FETCH is
>>>>>>> not set. And that change ought to be backported to fix BPF_XADD.
>>>>>>
>>>>>> We cannot change BPF_XADD behavior. If we change BPF_XADD to use
>>>>>> atomic_fetch_add, then suddenly old code compiled with llvm12 will
>>>>>> suddenly requires latest kernel, which will break userland very badly.
>>>>>
>>>>> Sorry I should have been more explicit: I meant that the fix would be to
>>>>> call atomic_fetch_add but discard the return value, purely for the
>>>>> implied barrier. The x86 JIT would stay the same. It would not break any
>>>>> existing code, only add ordering guarantees that the user probably
>>>>> already expected (since these builtins come from GCC originally and the
>>>>> GCC docs say "these builtins are considered a full barrier" [1])
>>>>>
>>>>> [1] https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html
>>>>
>>>> This is indeed the issue. In the past, people already use gcc
>>>> __sync_fetch_and_add() for xadd instruction for which git generated
>>>> code does not implying barrier.
>>>>
>>>> The new atomics support has the following logic:
>>>>     . if return value is used, it is atomic_fetch_add
>>>>     . if return value is not used, it is xadd
>>>> The reason to do this is to preserve backward compabiility
>>>> and this way, we can get rid of -mcpu=v4.
>>>>
>>>> barrier issue is tricky and as we discussed earlier let us
>>>> delay this after basic atomics support landed. We may not
>>>> 100% conform to gcc __sync_fetch_and_add() or __atomic_*()
>>>> semantics. We do need to clearly document what is expected
>>>> in llvm and kernel.
>>>
>>> OK, then I think we can probably justify not conforming to the
>>> __sync_fetch_and_add() semantics since that API is under-specified
>>> anyway.
>>>
>>> However IMO it's unambiguously a bug for
>>>
>>>     (void)__atomic_fetch_add(&x, y, __ATOMIC_SEQ_CST);
>>>
>>> to compile down to a kernel atomic_add. I think for that specific API
>>> Clang really ought to always use BPF_FETCH | BPF_ADD when
>>> anything stronger than __ATOMIC_RELAXED is requested, or even just
>>> refuse to compile with when the return value is ignored and a
>>> none-relaxed memory ordering is specified.
>>
>> Both the following codes:
>>      (void)__sync_fetch_and_add(p, a);
>>      (void)__atomic_fetch_add(p, a, __ATOMIC_SEQ_CST);
>>
>> will generate the same IR:
>>      %0 = atomicrmw add i32* %p, i32 %a seq_cst
>>
>> Basically that means for old compiler (<= llvm11),
>>     (void)__atomic_fetch_add(&x, y, __ATOMIC_SEQ_CST)
>> already generates xadd.
> 
> Ah, I didn't realise that was already the case, that's unfortunate.
> 
> For users of newer Clang with alu32 enabled, unless I'm being naïve
> this could be fixed without breaking compatibility. Clang could just
> start generating a BPF_ADD|BPF_FETCH, and then handle the fact that
> the src_reg is clobbered, right?

This may cause regressions as old kernel won't support 
BPF_ADD|BPF_FETCH. Since we do not have llvm flags to control
this behavior, that means users with old kernel
cannot use llvm12 any more.

> 
> For users without alu32 enabled I would actually argue that the new
> Clang should start failing to build that code - as a user I'd much
> rather have my build suddenly fail than my explicitly-stated ordering
> assumptions violated. But I understand if that doesn't seem too
> palatable...
>
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index ac25ba5d0d6c..13bc1d736164 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -239,6 +239,12 @@  BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) 			\
 	     -I$(INCLUDE_DIR) -I$(CURDIR) -I$(APIDIR)			\
 	     -I$(abspath $(OUTPUT)/../usr/include)
 
+# BPF atomics support was added to Clang in llvm-project commit 286daafd6512
+# (release 12.0.0).
+BPF_ATOMICS_SUPPORTED = $(shell \
+	echo "int x = 0; int foo(void) { return __sync_val_compare_and_swap(&x, 1, 2); }" \
+	| $(CLANG) -x cpp-output -S -target bpf -mcpu=v3 - -o /dev/null && echo 1 || echo 0)
+
 CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
 	       -Wno-compare-distinct-pointer-types
 
@@ -399,11 +405,15 @@  TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
 		       $(wildcard progs/btf_dump_test_case_*.c)
 TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
 TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
+ifeq ($(BPF_ATOMICS_SUPPORTED),1)
+  TRUNNER_BPF_CFLAGS += -DENABLE_ATOMICS_TESTS
+endif
 TRUNNER_BPF_LDFLAGS := -mattr=+alu32
 $(eval $(call DEFINE_TEST_RUNNER,test_progs))
 
 # Define test_progs-no_alu32 test runner.
 TRUNNER_BPF_BUILD_RULE := CLANG_NOALU32_BPF_BUILD_RULE
+TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS)
 TRUNNER_BPF_LDFLAGS :=
 $(eval $(call DEFINE_TEST_RUNNER,test_progs,no_alu32))
 
diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
new file mode 100644
index 000000000000..c841a3abc2f7
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
@@ -0,0 +1,246 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+
+#include "atomics.skel.h"
+
+static void test_add(struct atomics *skel)
+{
+	int err, prog_fd;
+	__u32 duration = 0, retval;
+	struct bpf_link *link;
+
+	link = bpf_program__attach(skel->progs.add);
+	if (CHECK(IS_ERR(link), "attach(add)", "err: %ld\n", PTR_ERR(link)))
+		return;
+
+	prog_fd = bpf_program__fd(skel->progs.add);
+	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+				NULL, NULL, &retval, &duration);
+	if (CHECK(err || retval, "test_run add",
+		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
+		goto cleanup;
+
+	ASSERT_EQ(skel->data->add64_value, 3, "add64_value");
+	ASSERT_EQ(skel->bss->add64_result, 1, "add64_result");
+
+	ASSERT_EQ(skel->data->add32_value, 3, "add32_value");
+	ASSERT_EQ(skel->bss->add32_result, 1, "add32_result");
+
+	ASSERT_EQ(skel->bss->add_stack_value_copy, 3, "add_stack_value");
+	ASSERT_EQ(skel->bss->add_stack_result, 1, "add_stack_result");
+
+	ASSERT_EQ(skel->data->add_noreturn_value, 3, "add_noreturn_value");
+
+cleanup:
+	bpf_link__destroy(link);
+}
+
+static void test_sub(struct atomics *skel)
+{
+	int err, prog_fd;
+	__u32 duration = 0, retval;
+	struct bpf_link *link;
+
+	link = bpf_program__attach(skel->progs.sub);
+	if (CHECK(IS_ERR(link), "attach(sub)", "err: %ld\n", PTR_ERR(link)))
+		return;
+
+	prog_fd = bpf_program__fd(skel->progs.sub);
+	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+				NULL, NULL, &retval, &duration);
+	if (CHECK(err || retval, "test_run sub",
+		  "err %d errno %d retval %d duration %d\n",
+		  err, errno, retval, duration))
+		goto cleanup;
+
+	ASSERT_EQ(skel->data->sub64_value, -1, "sub64_value");
+	ASSERT_EQ(skel->bss->sub64_result, 1, "sub64_result");
+
+	ASSERT_EQ(skel->data->sub32_value, -1, "sub32_value");
+	ASSERT_EQ(skel->bss->sub32_result, 1, "sub32_result");
+
+	ASSERT_EQ(skel->bss->sub_stack_value_copy, -1, "sub_stack_value");
+	ASSERT_EQ(skel->bss->sub_stack_result, 1, "sub_stack_result");
+
+	ASSERT_EQ(skel->data->sub_noreturn_value, -1, "sub_noreturn_value");
+
+cleanup:
+	bpf_link__destroy(link);
+}
+
+static void test_and(struct atomics *skel)
+{
+	int err, prog_fd;
+	__u32 duration = 0, retval;
+	struct bpf_link *link;
+
+	link = bpf_program__attach(skel->progs.and);
+	if (CHECK(IS_ERR(link), "attach(and)", "err: %ld\n", PTR_ERR(link)))
+		return;
+
+	prog_fd = bpf_program__fd(skel->progs.and);
+	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+				NULL, NULL, &retval, &duration);
+	if (CHECK(err || retval, "test_run and",
+		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
+		goto cleanup;
+
+	ASSERT_EQ(skel->data->and64_value, 0x010ull << 32, "and64_value");
+	ASSERT_EQ(skel->bss->and64_result, 0x110ull << 32, "and64_result");
+
+	ASSERT_EQ(skel->data->and32_value, 0x010, "and32_value");
+	ASSERT_EQ(skel->bss->and32_result, 0x110, "and32_result");
+
+	ASSERT_EQ(skel->data->and_noreturn_value, 0x010ull << 32, "and_noreturn_value");
+cleanup:
+	bpf_link__destroy(link);
+}
+
+static void test_or(struct atomics *skel)
+{
+	int err, prog_fd;
+	__u32 duration = 0, retval;
+	struct bpf_link *link;
+
+	link = bpf_program__attach(skel->progs.or);
+	if (CHECK(IS_ERR(link), "attach(or)", "err: %ld\n", PTR_ERR(link)))
+		return;
+
+	prog_fd = bpf_program__fd(skel->progs.or);
+	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+				NULL, NULL, &retval, &duration);
+	if (CHECK(err || retval, "test_run or",
+		  "err %d errno %d retval %d duration %d\n",
+		  err, errno, retval, duration))
+		goto cleanup;
+
+	ASSERT_EQ(skel->data->or64_value, 0x111ull << 32, "or64_value");
+	ASSERT_EQ(skel->bss->or64_result, 0x110ull << 32, "or64_result");
+
+	ASSERT_EQ(skel->data->or32_value, 0x111, "or32_value");
+	ASSERT_EQ(skel->bss->or32_result, 0x110, "or32_result");
+
+	ASSERT_EQ(skel->data->or_noreturn_value, 0x111ull << 32, "or_noreturn_value");
+cleanup:
+	bpf_link__destroy(link);
+}
+
+static void test_xor(struct atomics *skel)
+{
+	int err, prog_fd;
+	__u32 duration = 0, retval;
+	struct bpf_link *link;
+
+	link = bpf_program__attach(skel->progs.xor);
+	if (CHECK(IS_ERR(link), "attach(xor)", "err: %ld\n", PTR_ERR(link)))
+		return;
+
+	prog_fd = bpf_program__fd(skel->progs.xor);
+	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+				NULL, NULL, &retval, &duration);
+	if (CHECK(err || retval, "test_run xor",
+		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
+		goto cleanup;
+
+	ASSERT_EQ(skel->data->xor64_value, 0x101ull << 32, "xor64_value");
+	ASSERT_EQ(skel->bss->xor64_result, 0x110ull << 32, "xor64_result");
+
+	ASSERT_EQ(skel->data->xor32_value, 0x101, "xor32_value");
+	ASSERT_EQ(skel->bss->xor32_result, 0x110, "xor32_result");
+
+	ASSERT_EQ(skel->data->xor_noreturn_value, 0x101ull << 32, "xor_nxoreturn_value");
+cleanup:
+	bpf_link__destroy(link);
+}
+
+static void test_cmpxchg(struct atomics *skel)
+{
+	int err, prog_fd;
+	__u32 duration = 0, retval;
+	struct bpf_link *link;
+
+	link = bpf_program__attach(skel->progs.cmpxchg);
+	if (CHECK(IS_ERR(link), "attach(cmpxchg)", "err: %ld\n", PTR_ERR(link)))
+		return;
+
+	prog_fd = bpf_program__fd(skel->progs.cmpxchg);
+	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+				NULL, NULL, &retval, &duration);
+	if (CHECK(err || retval, "test_run add",
+		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
+		goto cleanup;
+
+	ASSERT_EQ(skel->data->cmpxchg64_value, 2, "cmpxchg64_value");
+	ASSERT_EQ(skel->bss->cmpxchg64_result_fail, 1, "cmpxchg_result_fail");
+	ASSERT_EQ(skel->bss->cmpxchg64_result_succeed, 1, "cmpxchg_result_succeed");
+
+	ASSERT_EQ(skel->data->cmpxchg32_value, 2, "lcmpxchg32_value");
+	ASSERT_EQ(skel->bss->cmpxchg32_result_fail, 1, "cmpxchg_result_fail");
+	ASSERT_EQ(skel->bss->cmpxchg32_result_succeed, 1, "cmpxchg_result_succeed");
+
+cleanup:
+	bpf_link__destroy(link);
+}
+
+static void test_xchg(struct atomics *skel)
+{
+	int err, prog_fd;
+	__u32 duration = 0, retval;
+	struct bpf_link *link;
+
+	link = bpf_program__attach(skel->progs.xchg);
+	if (CHECK(IS_ERR(link), "attach(xchg)", "err: %ld\n", PTR_ERR(link)))
+		return;
+
+	prog_fd = bpf_program__fd(skel->progs.xchg);
+	err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+				NULL, NULL, &retval, &duration);
+	if (CHECK(err || retval, "test_run add",
+		  "err %d errno %d retval %d duration %d\n", err, errno, retval, duration))
+		goto cleanup;
+
+	ASSERT_EQ(skel->data->xchg64_value, 2, "xchg64_value");
+	ASSERT_EQ(skel->bss->xchg64_result, 1, "xchg_result");
+
+	ASSERT_EQ(skel->data->xchg32_value, 2, "xchg32_value");
+	ASSERT_EQ(skel->bss->xchg32_result, 1, "xchg_result");
+
+cleanup:
+	bpf_link__destroy(link);
+}
+
+void test_atomics(void)
+{
+	struct atomics *skel;
+	__u32 duration = 0;
+
+	skel = atomics__open_and_load();
+	if (CHECK(!skel, "skel_load", "atomics skeleton failed\n"))
+		return;
+
+	if (skel->data->skip_tests) {
+		printf("%s:SKIP:no ENABLE_ATOMICS_TESTS (missing Clang BPF atomics support)",
+		       __func__);
+		test__skip();
+		goto cleanup;
+	}
+
+	if (test__start_subtest("add"))
+		test_add(skel);
+	if (test__start_subtest("sub"))
+		test_sub(skel);
+	if (test__start_subtest("and"))
+		test_and(skel);
+	if (test__start_subtest("or"))
+		test_or(skel);
+	if (test__start_subtest("xor"))
+		test_xor(skel);
+	if (test__start_subtest("cmpxchg"))
+		test_cmpxchg(skel);
+	if (test__start_subtest("xchg"))
+		test_xchg(skel);
+
+cleanup:
+	atomics__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/atomics.c b/tools/testing/selftests/bpf/progs/atomics.c
new file mode 100644
index 000000000000..d40c93496843
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/atomics.c
@@ -0,0 +1,154 @@ 
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <stdbool.h>
+
+#ifdef ENABLE_ATOMICS_TESTS
+bool skip_tests __attribute((__section__(".data"))) = false;
+#else
+bool skip_tests = true;
+#endif
+
+__u64 add64_value = 1;
+__u64 add64_result = 0;
+__u32 add32_value = 1;
+__u32 add32_result = 0;
+__u64 add_stack_value_copy = 0;
+__u64 add_stack_result = 0;
+__u64 add_noreturn_value = 1;
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(add, int a)
+{
+#ifdef ENABLE_ATOMICS_TESTS
+	__u64 add_stack_value = 1;
+
+	add64_result = __sync_fetch_and_add(&add64_value, 2);
+	add32_result = __sync_fetch_and_add(&add32_value, 2);
+	add_stack_result = __sync_fetch_and_add(&add_stack_value, 2);
+	add_stack_value_copy = add_stack_value;
+	__sync_fetch_and_add(&add_noreturn_value, 2);
+#endif
+
+	return 0;
+}
+
+__s64 sub64_value = 1;
+__s64 sub64_result = 0;
+__s32 sub32_value = 1;
+__s32 sub32_result = 0;
+__s64 sub_stack_value_copy = 0;
+__s64 sub_stack_result = 0;
+__s64 sub_noreturn_value = 1;
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(sub, int a)
+{
+#ifdef ENABLE_ATOMICS_TESTS
+	__u64 sub_stack_value = 1;
+
+	sub64_result = __sync_fetch_and_sub(&sub64_value, 2);
+	sub32_result = __sync_fetch_and_sub(&sub32_value, 2);
+	sub_stack_result = __sync_fetch_and_sub(&sub_stack_value, 2);
+	sub_stack_value_copy = sub_stack_value;
+	__sync_fetch_and_sub(&sub_noreturn_value, 2);
+#endif
+
+	return 0;
+}
+
+__u64 and64_value = (0x110ull << 32);
+__u64 and64_result = 0;
+__u32 and32_value = 0x110;
+__u32 and32_result = 0;
+__u64 and_noreturn_value = (0x110ull << 32);
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(and, int a)
+{
+#ifdef ENABLE_ATOMICS_TESTS
+
+	and64_result = __sync_fetch_and_and(&and64_value, 0x011ull << 32);
+	and32_result = __sync_fetch_and_and(&and32_value, 0x011);
+	__sync_fetch_and_and(&and_noreturn_value, 0x011ull << 32);
+#endif
+
+	return 0;
+}
+
+__u64 or64_value = (0x110ull << 32);
+__u64 or64_result = 0;
+__u32 or32_value = 0x110;
+__u32 or32_result = 0;
+__u64 or_noreturn_value = (0x110ull << 32);
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(or, int a)
+{
+#ifdef ENABLE_ATOMICS_TESTS
+	or64_result = __sync_fetch_and_or(&or64_value, 0x011ull << 32);
+	or32_result = __sync_fetch_and_or(&or32_value, 0x011);
+	__sync_fetch_and_or(&or_noreturn_value, 0x011ull << 32);
+#endif
+
+	return 0;
+}
+
+__u64 xor64_value = (0x110ull << 32);
+__u64 xor64_result = 0;
+__u32 xor32_value = 0x110;
+__u32 xor32_result = 0;
+__u64 xor_noreturn_value = (0x110ull << 32);
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(xor, int a)
+{
+#ifdef ENABLE_ATOMICS_TESTS
+	xor64_result = __sync_fetch_and_xor(&xor64_value, 0x011ull << 32);
+	xor32_result = __sync_fetch_and_xor(&xor32_value, 0x011);
+	__sync_fetch_and_xor(&xor_noreturn_value, 0x011ull << 32);
+#endif
+
+	return 0;
+}
+
+__u64 cmpxchg64_value = 1;
+__u64 cmpxchg64_result_fail = 0;
+__u64 cmpxchg64_result_succeed = 0;
+__u32 cmpxchg32_value = 1;
+__u32 cmpxchg32_result_fail = 0;
+__u32 cmpxchg32_result_succeed = 0;
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(cmpxchg, int a)
+{
+#ifdef ENABLE_ATOMICS_TESTS
+	cmpxchg64_result_fail = __sync_val_compare_and_swap(&cmpxchg64_value, 0, 3);
+	cmpxchg64_result_succeed = __sync_val_compare_and_swap(&cmpxchg64_value, 1, 2);
+
+	cmpxchg32_result_fail = __sync_val_compare_and_swap(&cmpxchg32_value, 0, 3);
+	cmpxchg32_result_succeed = __sync_val_compare_and_swap(&cmpxchg32_value, 1, 2);
+#endif
+
+	return 0;
+}
+
+__u64 xchg64_value = 1;
+__u64 xchg64_result = 0;
+__u32 xchg32_value = 1;
+__u32 xchg32_result = 0;
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(xchg, int a)
+{
+#ifdef ENABLE_ATOMICS_TESTS
+	__u64 val64 = 2;
+	__u32 val32 = 2;
+
+	__atomic_exchange(&xchg64_value, &val64, &xchg64_result, __ATOMIC_RELAXED);
+	__atomic_exchange(&xchg32_value, &val32, &xchg32_result, __ATOMIC_RELAXED);
+#endif
+
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/verifier/atomic_and.c b/tools/testing/selftests/bpf/verifier/atomic_and.c
new file mode 100644
index 000000000000..7eea6d9dfd7d
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/atomic_and.c
@@ -0,0 +1,77 @@ 
+{
+	"BPF_ATOMIC_AND without fetch",
+	.insns = {
+		/* val = 0x110; */
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0x110),
+		/* atomic_and(&val, 0x011); */
+		BPF_MOV64_IMM(BPF_REG_1, 0x011),
+		BPF_ATOMIC_AND(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+		/* if (val != 0x010) exit(2); */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -8),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0x010, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+		/* r1 should not be clobbered, no BPF_FETCH flag */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x011, 1),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"BPF_ATOMIC_AND with fetch",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 123),
+		/* val = 0x110; */
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0x110),
+		/* old = atomic_fetch_and(&val, 0x011); */
+		BPF_MOV64_IMM(BPF_REG_1, 0x011),
+		BPF_ATOMIC_FETCH_AND(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+		/* if (old != 0x110) exit(3); */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x110, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 3),
+		BPF_EXIT_INSN(),
+		/* if (val != 0x010) exit(2); */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -8),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x010, 2),
+		BPF_MOV64_IMM(BPF_REG_1, 2),
+		BPF_EXIT_INSN(),
+		/* Check R0 wasn't clobbered (for fear of x86 JIT bug) */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 123, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		/* exit(0); */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"BPF_ATOMIC_AND with fetch 32bit",
+	.insns = {
+		/* r0 = (s64) -1 */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 1),
+		/* val = 0x110; */
+		BPF_ST_MEM(BPF_W, BPF_REG_10, -4, 0x110),
+		/* old = atomic_fetch_and(&val, 0x011); */
+		BPF_MOV32_IMM(BPF_REG_1, 0x011),
+		BPF_ATOMIC_FETCH_AND(BPF_W, BPF_REG_10, BPF_REG_1, -4),
+		/* if (old != 0x110) exit(3); */
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_1, 0x110, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 3),
+		BPF_EXIT_INSN(),
+		/* if (val != 0x010) exit(2); */
+		BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_10, -4),
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_1, 0x010, 2),
+		BPF_MOV32_IMM(BPF_REG_1, 2),
+		BPF_EXIT_INSN(),
+		/* Check R0 wasn't clobbered (for fear of x86 JIT bug)
+		 * It should be -1 so add 1 to get exit code.
+		 */
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
diff --git a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
new file mode 100644
index 000000000000..335e12690be7
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
@@ -0,0 +1,96 @@ 
+{
+	"atomic compare-and-exchange smoketest - 64bit",
+	.insns = {
+		/* val = 3; */
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 3),
+		/* old = atomic_cmpxchg(&val, 2, 4); */
+		BPF_MOV64_IMM(BPF_REG_1, 4),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_ATOMIC_CMPXCHG(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+		/* if (old != 3) exit(2); */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+		/* if (val != 3) exit(3); */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 3),
+		BPF_EXIT_INSN(),
+		/* old = atomic_cmpxchg(&val, 3, 4); */
+		BPF_MOV64_IMM(BPF_REG_1, 4),
+		BPF_MOV64_IMM(BPF_REG_0, 3),
+		BPF_ATOMIC_CMPXCHG(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+		/* if (old != 3) exit(4); */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 4),
+		BPF_EXIT_INSN(),
+		/* if (val != 4) exit(5); */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 4, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 5),
+		BPF_EXIT_INSN(),
+		/* exit(0); */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"atomic compare-and-exchange smoketest - 32bit",
+	.insns = {
+		/* val = 3; */
+		BPF_ST_MEM(BPF_W, BPF_REG_10, -4, 3),
+		/* old = atomic_cmpxchg(&val, 2, 4); */
+		BPF_MOV32_IMM(BPF_REG_1, 4),
+		BPF_MOV32_IMM(BPF_REG_0, 2),
+		BPF_ATOMIC_CMPXCHG(BPF_W, BPF_REG_10, BPF_REG_1, -4),
+		/* if (old != 3) exit(2); */
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+		/* if (val != 3) exit(3); */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -4),
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 3),
+		BPF_EXIT_INSN(),
+		/* old = atomic_cmpxchg(&val, 3, 4); */
+		BPF_MOV32_IMM(BPF_REG_1, 4),
+		BPF_MOV32_IMM(BPF_REG_0, 3),
+		BPF_ATOMIC_CMPXCHG(BPF_W, BPF_REG_10, BPF_REG_1, -4),
+		/* if (old != 3) exit(4); */
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 3, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 4),
+		BPF_EXIT_INSN(),
+		/* if (val != 4) exit(5); */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -4),
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 4, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 5),
+		BPF_EXIT_INSN(),
+		/* exit(0); */
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"Can't use cmpxchg on uninit src reg",
+	.insns = {
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 3),
+		BPF_MOV64_IMM(BPF_REG_0, 3),
+		BPF_ATOMIC_CMPXCHG(BPF_DW, BPF_REG_10, BPF_REG_2, -8),
+		BPF_EXIT_INSN(),
+	},
+	.result = REJECT,
+	.errstr = "!read_ok",
+},
+{
+	"Can't use cmpxchg on uninit memory",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 3),
+		BPF_MOV64_IMM(BPF_REG_2, 4),
+		BPF_ATOMIC_CMPXCHG(BPF_DW, BPF_REG_10, BPF_REG_2, -8),
+		BPF_EXIT_INSN(),
+	},
+	.result = REJECT,
+	.errstr = "invalid read from stack",
+},
diff --git a/tools/testing/selftests/bpf/verifier/atomic_fetch_add.c b/tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
new file mode 100644
index 000000000000..7c87bc9a13de
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/atomic_fetch_add.c
@@ -0,0 +1,106 @@ 
+{
+	"BPF_ATOMIC_FETCH_ADD smoketest - 64bit",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		/* Write 3 to stack */
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 3),
+		/* Put a 1 in R1, add it to the 3 on the stack, and load the value back into R1 */
+		BPF_MOV64_IMM(BPF_REG_1, 1),
+		BPF_ATOMIC_FETCH_ADD(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+		/* Check the value we loaded back was 3 */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 3, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		/* Load value from stack */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -8),
+		/* Check value loaded from stack was 4 */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 4, 1),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"BPF_ATOMIC_FETCH_ADD smoketest - 32bit",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		/* Write 3 to stack */
+		BPF_ST_MEM(BPF_W, BPF_REG_10, -4, 3),
+		/* Put a 1 in R1, add it to the 3 on the stack, and load the value back into R1 */
+		BPF_MOV32_IMM(BPF_REG_1, 1),
+		BPF_ATOMIC_FETCH_ADD(BPF_W, BPF_REG_10, BPF_REG_1, -4),
+		/* Check the value we loaded back was 3 */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 3, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		/* Load value from stack */
+		BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_10, -4),
+		/* Check value loaded from stack was 4 */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 4, 1),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"Can't use ATM_FETCH_ADD on frame pointer",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 3),
+		BPF_ATOMIC_FETCH_ADD(BPF_DW, BPF_REG_10, BPF_REG_10, -8),
+		BPF_EXIT_INSN(),
+	},
+	.result = REJECT,
+	.errstr_unpriv = "R10 leaks addr into mem",
+	.errstr = "frame pointer is read only",
+},
+{
+	"Can't use ATM_FETCH_ADD on uninit src reg",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 3),
+		BPF_ATOMIC_FETCH_ADD(BPF_DW, BPF_REG_10, BPF_REG_2, -8),
+		BPF_EXIT_INSN(),
+	},
+	.result = REJECT,
+	/* It happens that the address leak check is first, but it would also be
+	 * complain about the fact that we're trying to modify R10.
+	 */
+	.errstr = "!read_ok",
+},
+{
+	"Can't use ATM_FETCH_ADD on uninit dst reg",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_ATOMIC_FETCH_ADD(BPF_DW, BPF_REG_2, BPF_REG_0, -8),
+		BPF_EXIT_INSN(),
+	},
+	.result = REJECT,
+	/* It happens that the address leak check is first, but it would also be
+	 * complain about the fact that we're trying to modify R10.
+	 */
+	.errstr = "!read_ok",
+},
+{
+	"Can't use ATM_FETCH_ADD on kernel memory",
+	.insns = {
+		/* This is an fentry prog, context is array of the args of the
+		 * kernel function being called. Load first arg into R2.
+		 */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, 0),
+		/* First arg of bpf_fentry_test7 is a pointer to a struct.
+		 * Attempt to modify that struct. Verifier shouldn't let us
+		 * because it's kernel memory.
+		 */
+		BPF_MOV64_IMM(BPF_REG_3, 1),
+		BPF_ATOMIC_FETCH_ADD(BPF_DW, BPF_REG_2, BPF_REG_3, 0),
+		/* Done */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_TRACING,
+	.expected_attach_type = BPF_TRACE_FENTRY,
+	.kfunc = "bpf_fentry_test7",
+	.result = REJECT,
+	.errstr = "only read is supported",
+},
diff --git a/tools/testing/selftests/bpf/verifier/atomic_or.c b/tools/testing/selftests/bpf/verifier/atomic_or.c
new file mode 100644
index 000000000000..1b22fb2881f0
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/atomic_or.c
@@ -0,0 +1,77 @@ 
+{
+	"BPF_ATOMIC_OR without fetch",
+	.insns = {
+		/* val = 0x110; */
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0x110),
+		/* atomic_or(&val, 0x011); */
+		BPF_MOV64_IMM(BPF_REG_1, 0x011),
+		BPF_ATOMIC_OR(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+		/* if (val != 0x111) exit(2); */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -8),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0x111, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+		/* r1 should not be clobbered, no BPF_FETCH flag */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x011, 1),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"BPF_ATOMIC_OR with fetch",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 123),
+		/* val = 0x110; */
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0x110),
+		/* old = atomic_fetch_or(&val, 0x011); */
+		BPF_MOV64_IMM(BPF_REG_1, 0x011),
+		BPF_ATOMIC_FETCH_OR(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+		/* if (old != 0x110) exit(3); */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x110, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 3),
+		BPF_EXIT_INSN(),
+		/* if (val != 0x111) exit(2); */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -8),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x111, 2),
+		BPF_MOV64_IMM(BPF_REG_1, 2),
+		BPF_EXIT_INSN(),
+		/* Check R0 wasn't clobbered (for fear of x86 JIT bug) */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 123, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		/* exit(0); */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"BPF_ATOMIC_OR with fetch 32bit",
+	.insns = {
+		/* r0 = (s64) -1 */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 1),
+		/* val = 0x110; */
+		BPF_ST_MEM(BPF_W, BPF_REG_10, -4, 0x110),
+		/* old = atomic_fetch_or(&val, 0x011); */
+		BPF_MOV32_IMM(BPF_REG_1, 0x011),
+		BPF_ATOMIC_FETCH_OR(BPF_W, BPF_REG_10, BPF_REG_1, -4),
+		/* if (old != 0x110) exit(3); */
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_1, 0x110, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 3),
+		BPF_EXIT_INSN(),
+		/* if (val != 0x111) exit(2); */
+		BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_10, -4),
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_1, 0x111, 2),
+		BPF_MOV32_IMM(BPF_REG_1, 2),
+		BPF_EXIT_INSN(),
+		/* Check R0 wasn't clobbered (for fear of x86 JIT bug)
+		 * It should be -1 so add 1 to get exit code.
+		 */
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
diff --git a/tools/testing/selftests/bpf/verifier/atomic_xchg.c b/tools/testing/selftests/bpf/verifier/atomic_xchg.c
new file mode 100644
index 000000000000..9348ac490e24
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/atomic_xchg.c
@@ -0,0 +1,46 @@ 
+{
+	"atomic exchange smoketest - 64bit",
+	.insns = {
+		/* val = 3; */
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 3),
+		/* old = atomic_xchg(&val, 4); */
+		BPF_MOV64_IMM(BPF_REG_1, 4),
+		BPF_ATOMIC_XCHG(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+		/* if (old != 3) exit(1); */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 3, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		/* if (val != 4) exit(2); */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 4, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+		/* exit(0); */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"atomic exchange smoketest - 32bit",
+	.insns = {
+		/* val = 3; */
+		BPF_ST_MEM(BPF_W, BPF_REG_10, -4, 3),
+		/* old = atomic_xchg(&val, 4); */
+		BPF_MOV32_IMM(BPF_REG_1, 4),
+		BPF_ATOMIC_XCHG(BPF_W, BPF_REG_10, BPF_REG_1, -4),
+		/* if (old != 3) exit(1); */
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_1, 3, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		/* if (val != 4) exit(2); */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -4),
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 4, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+		/* exit(0); */
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
diff --git a/tools/testing/selftests/bpf/verifier/atomic_xor.c b/tools/testing/selftests/bpf/verifier/atomic_xor.c
new file mode 100644
index 000000000000..d1315419a3a8
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/atomic_xor.c
@@ -0,0 +1,77 @@ 
+{
+	"BPF_ATOMIC_XOR without fetch",
+	.insns = {
+		/* val = 0x110; */
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0x110),
+		/* atomic_xor(&val, 0x011); */
+		BPF_MOV64_IMM(BPF_REG_1, 0x011),
+		BPF_ATOMIC_XOR(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+		/* if (val != 0x101) exit(2); */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -8),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0x101, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 2),
+		BPF_EXIT_INSN(),
+		/* r1 should not be clobbered, no BPF_FETCH flag */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x011, 1),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"BPF_ATOMIC_XOR with fetch",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 123),
+		/* val = 0x110; */
+		BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0x110),
+		/* old = atomic_fetch_xor(&val, 0x011); */
+		BPF_MOV64_IMM(BPF_REG_1, 0x011),
+		BPF_ATOMIC_FETCH_XOR(BPF_DW, BPF_REG_10, BPF_REG_1, -8),
+		/* if (old != 0x110) exit(3); */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x110, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 3),
+		BPF_EXIT_INSN(),
+		/* if (val != 0x101) exit(2); */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -8),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x101, 2),
+		BPF_MOV64_IMM(BPF_REG_1, 2),
+		BPF_EXIT_INSN(),
+		/* Check R0 wasn't clobbered (fxor fear of x86 JIT bug) */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 123, 2),
+		BPF_MOV64_IMM(BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+		/* exit(0); */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},
+{
+	"BPF_ATOMIC_XOR with fetch 32bit",
+	.insns = {
+		/* r0 = (s64) -1 */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 1),
+		/* val = 0x110; */
+		BPF_ST_MEM(BPF_W, BPF_REG_10, -4, 0x110),
+		/* old = atomic_fetch_xor(&val, 0x011); */
+		BPF_MOV32_IMM(BPF_REG_1, 0x011),
+		BPF_ATOMIC_FETCH_XOR(BPF_W, BPF_REG_10, BPF_REG_1, -4),
+		/* if (old != 0x110) exit(3); */
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_1, 0x110, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 3),
+		BPF_EXIT_INSN(),
+		/* if (val != 0x101) exit(2); */
+		BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_10, -4),
+		BPF_JMP32_IMM(BPF_JEQ, BPF_REG_1, 0x101, 2),
+		BPF_MOV32_IMM(BPF_REG_1, 2),
+		BPF_EXIT_INSN(),
+		/* Check R0 wasn't clobbered (fxor fear of x86 JIT bug)
+		 * It should be -1 so add 1 to get exit code.
+		 */
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+},