diff mbox series

[bpf-next,RFC,V1] selftests/bpf: xdp_hw_metadata clear metadata when -EOPNOTSUPP

Message ID 167482734243.892262.18210955230092032606.stgit@firesoul (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series [bpf-next,RFC,V1] selftests/bpf: xdp_hw_metadata clear metadata when -EOPNOTSUPP | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-36 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-37 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_progs_parallel on s390x with gcc
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4 this patch: 4
netdev/cc_maintainers warning 10 maintainers not CCed: edumazet@google.com kpsingh@kernel.org jolsa@kernel.org mykolal@fb.com davem@davemloft.net linux-kselftest@vger.kernel.org pabeni@redhat.com shuah@kernel.org haoluo@google.com hawk@kernel.org
netdev/build_clang success Errors and warnings before: 1 this patch: 1
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: line length of 83 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on s390x with gcc

Commit Message

Jesper Dangaard Brouer Jan. 27, 2023, 1:49 p.m. UTC
The AF_XDP userspace part of xdp_hw_metadata see non-zero as a signal of
the availability of rx_timestamp and rx_hash in data_meta area. The
kernel-side BPF-prog code doesn't initialize these members when kernel
returns an error e.g. -EOPNOTSUPP.  This memory area is not guaranteed to
be zeroed, and can contain garbage/previous values, which will be read
and interpreted by AF_XDP userspace side.

Tested this on different drivers. The experiences are that for most
packets they will have zeroed this data_meta area, but occasionally it
will contain garbage data.

Example of failure tested on ixgbe:
 poll: 1 (0)
 xsk_ring_cons__peek: 1
 0x18ec788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
 rx_hash: 3697961069
 rx_timestamp:  9024981991734834796 (sec:9024981991.7348)
 0x18ec788: complete idx=8 addr=8000

Converting to date:
 date -d @9024981991
 2255-12-28T20:26:31 CET

I choose a simple fix in this patch. When kfunc fails or isn't supported
assign zero to the corresponding struct meta value.

It's up to the individual BPF-programmer to do something smarter e.g.
that fits their use-case, like getting a software timestamp and marking
a flag that gives the type of timestamp.

Another possibility is for the behavior of kfunc's
bpf_xdp_metadata_rx_timestamp and bpf_xdp_metadata_rx_hash to require
clearing return value pointer.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/xdp.c                                     |    2 ++
 .../testing/selftests/bpf/progs/xdp_hw_metadata.c  |    6 +++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

Comments

Toke Høiland-Jørgensen Jan. 27, 2023, 1:58 p.m. UTC | #1
Jesper Dangaard Brouer <brouer@redhat.com> writes:

> The AF_XDP userspace part of xdp_hw_metadata see non-zero as a signal of
> the availability of rx_timestamp and rx_hash in data_meta area. The
> kernel-side BPF-prog code doesn't initialize these members when kernel
> returns an error e.g. -EOPNOTSUPP.  This memory area is not guaranteed to
> be zeroed, and can contain garbage/previous values, which will be read
> and interpreted by AF_XDP userspace side.
>
> Tested this on different drivers. The experiences are that for most
> packets they will have zeroed this data_meta area, but occasionally it
> will contain garbage data.
>
> Example of failure tested on ixgbe:
>  poll: 1 (0)
>  xsk_ring_cons__peek: 1
>  0x18ec788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
>  rx_hash: 3697961069
>  rx_timestamp:  9024981991734834796 (sec:9024981991.7348)
>  0x18ec788: complete idx=8 addr=8000
>
> Converting to date:
>  date -d @9024981991
>  2255-12-28T20:26:31 CET
>
> I choose a simple fix in this patch. When kfunc fails or isn't supported
> assign zero to the corresponding struct meta value.
>
> It's up to the individual BPF-programmer to do something smarter e.g.
> that fits their use-case, like getting a software timestamp and marking
> a flag that gives the type of timestamp.
>
> Another possibility is for the behavior of kfunc's
> bpf_xdp_metadata_rx_timestamp and bpf_xdp_metadata_rx_hash to require
> clearing return value pointer.

I definitely think we should leave it up to the BPF programmer to react
to failures; that's what the return code is there for, after all :)

-Toke
Stanislav Fomichev Jan. 27, 2023, 5:18 p.m. UTC | #2
On Fri, Jan 27, 2023 at 5:58 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>
> > The AF_XDP userspace part of xdp_hw_metadata see non-zero as a signal of
> > the availability of rx_timestamp and rx_hash in data_meta area. The
> > kernel-side BPF-prog code doesn't initialize these members when kernel
> > returns an error e.g. -EOPNOTSUPP.  This memory area is not guaranteed to
> > be zeroed, and can contain garbage/previous values, which will be read
> > and interpreted by AF_XDP userspace side.
> >
> > Tested this on different drivers. The experiences are that for most
> > packets they will have zeroed this data_meta area, but occasionally it
> > will contain garbage data.
> >
> > Example of failure tested on ixgbe:
> >  poll: 1 (0)
> >  xsk_ring_cons__peek: 1
> >  0x18ec788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
> >  rx_hash: 3697961069
> >  rx_timestamp:  9024981991734834796 (sec:9024981991.7348)
> >  0x18ec788: complete idx=8 addr=8000
> >
> > Converting to date:
> >  date -d @9024981991
> >  2255-12-28T20:26:31 CET
> >
> > I choose a simple fix in this patch. When kfunc fails or isn't supported
> > assign zero to the corresponding struct meta value.
> >
> > It's up to the individual BPF-programmer to do something smarter e.g.
> > that fits their use-case, like getting a software timestamp and marking
> > a flag that gives the type of timestamp.
> >
> > Another possibility is for the behavior of kfunc's
> > bpf_xdp_metadata_rx_timestamp and bpf_xdp_metadata_rx_hash to require
> > clearing return value pointer.
>
> I definitely think we should leave it up to the BPF programmer to react
> to failures; that's what the return code is there for, after all :)

+1

Maybe we can unconditionally memset(meta, sizeof(*meta), 0) in
tools/testing/selftests/bpf/progs/xdp_hw_metadata.c?
Since it's not a performance tool, it should be ok functionality-wise.

> -Toke
>
Jesper Dangaard Brouer Jan. 31, 2023, 1 p.m. UTC | #3
On 27/01/2023 18.18, Stanislav Fomichev wrote:
> On Fri, Jan 27, 2023 at 5:58 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>>
>>> The AF_XDP userspace part of xdp_hw_metadata see non-zero as a signal of
>>> the availability of rx_timestamp and rx_hash in data_meta area. The
>>> kernel-side BPF-prog code doesn't initialize these members when kernel
>>> returns an error e.g. -EOPNOTSUPP.  This memory area is not guaranteed to
>>> be zeroed, and can contain garbage/previous values, which will be read
>>> and interpreted by AF_XDP userspace side.
>>>
>>> Tested this on different drivers. The experiences are that for most
>>> packets they will have zeroed this data_meta area, but occasionally it
>>> will contain garbage data.
>>>
>>> Example of failure tested on ixgbe:
>>>   poll: 1 (0)
>>>   xsk_ring_cons__peek: 1
>>>   0x18ec788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
>>>   rx_hash: 3697961069
>>>   rx_timestamp:  9024981991734834796 (sec:9024981991.7348)
>>>   0x18ec788: complete idx=8 addr=8000
>>>
>>> Converting to date:
>>>   date -d @9024981991
>>>   2255-12-28T20:26:31 CET
>>>
>>> I choose a simple fix in this patch. When kfunc fails or isn't supported
>>> assign zero to the corresponding struct meta value.
>>>
>>> It's up to the individual BPF-programmer to do something smarter e.g.
>>> that fits their use-case, like getting a software timestamp and marking
>>> a flag that gives the type of timestamp.
>>>
>>> Another possibility is for the behavior of kfunc's
>>> bpf_xdp_metadata_rx_timestamp and bpf_xdp_metadata_rx_hash to require
>>> clearing return value pointer.
>>
>> I definitely think we should leave it up to the BPF programmer to react
>> to failures; that's what the return code is there for, after all :)
> 
> +1

+1 I agree.
We should keep this default functions as simple as possible, for future
"unroll" of BPF-bytecode.

I the -EOPNOTSUPP case (default functions for drivers not implementing
kfunc), will likely be used runtime by BPF-prog to determine if the
hardware have this offload hint, but it comes with the overhead of a
function pointer call.

I hope we can somehow BPF-bytecode "unroll" these (default functions) at
BPF-load time, to remove this overhead, and perhaps even let BPF
bytecode do const propagation and code elimination?


> Maybe we can unconditionally memset(meta, sizeof(*meta), 0) in
> tools/testing/selftests/bpf/progs/xdp_hw_metadata.c?
> Since it's not a performance tool, it should be ok functionality-wise.

I know this isn't a performance test, but IMHO always memsetting
metadata area is a misleading example.  We know from experience that
developer simply copy-paste code examples, even quick-n-dirty testing
example code.

The specific issue in this example can lead to hard-to-find bugs, as my
testing shows it is only occasionally that data_meta area contains
garbage. We could do a memset, but it deserves a large code comment, why
this is needed, so people copy-pasting understand. I choose current
approach to keep code close to code people will copy-paste.

--Jesper
Stanislav Fomichev Jan. 31, 2023, 7:01 p.m. UTC | #4
On Tue, Jan 31, 2023 at 5:00 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
>
> On 27/01/2023 18.18, Stanislav Fomichev wrote:
> > On Fri, Jan 27, 2023 at 5:58 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> >>
> >>> The AF_XDP userspace part of xdp_hw_metadata see non-zero as a signal of
> >>> the availability of rx_timestamp and rx_hash in data_meta area. The
> >>> kernel-side BPF-prog code doesn't initialize these members when kernel
> >>> returns an error e.g. -EOPNOTSUPP.  This memory area is not guaranteed to
> >>> be zeroed, and can contain garbage/previous values, which will be read
> >>> and interpreted by AF_XDP userspace side.
> >>>
> >>> Tested this on different drivers. The experiences are that for most
> >>> packets they will have zeroed this data_meta area, but occasionally it
> >>> will contain garbage data.
> >>>
> >>> Example of failure tested on ixgbe:
> >>>   poll: 1 (0)
> >>>   xsk_ring_cons__peek: 1
> >>>   0x18ec788: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
> >>>   rx_hash: 3697961069
> >>>   rx_timestamp:  9024981991734834796 (sec:9024981991.7348)
> >>>   0x18ec788: complete idx=8 addr=8000
> >>>
> >>> Converting to date:
> >>>   date -d @9024981991
> >>>   2255-12-28T20:26:31 CET
> >>>
> >>> I choose a simple fix in this patch. When kfunc fails or isn't supported
> >>> assign zero to the corresponding struct meta value.
> >>>
> >>> It's up to the individual BPF-programmer to do something smarter e.g.
> >>> that fits their use-case, like getting a software timestamp and marking
> >>> a flag that gives the type of timestamp.
> >>>
> >>> Another possibility is for the behavior of kfunc's
> >>> bpf_xdp_metadata_rx_timestamp and bpf_xdp_metadata_rx_hash to require
> >>> clearing return value pointer.
> >>
> >> I definitely think we should leave it up to the BPF programmer to react
> >> to failures; that's what the return code is there for, after all :)
> >
> > +1
>
> +1 I agree.
> We should keep this default functions as simple as possible, for future
> "unroll" of BPF-bytecode.
>
> I the -EOPNOTSUPP case (default functions for drivers not implementing
> kfunc), will likely be used runtime by BPF-prog to determine if the
> hardware have this offload hint, but it comes with the overhead of a
> function pointer call.
>
> I hope we can somehow BPF-bytecode "unroll" these (default functions) at
> BPF-load time, to remove this overhead, and perhaps even let BPF
> bytecode do const propagation and code elimination?
>
>
> > Maybe we can unconditionally memset(meta, sizeof(*meta), 0) in
> > tools/testing/selftests/bpf/progs/xdp_hw_metadata.c?
> > Since it's not a performance tool, it should be ok functionality-wise.
>
> I know this isn't a performance test, but IMHO always memsetting
> metadata area is a misleading example.  We know from experience that
> developer simply copy-paste code examples, even quick-n-dirty testing
> example code.
>
> The specific issue in this example can lead to hard-to-find bugs, as my
> testing shows it is only occasionally that data_meta area contains
> garbage. We could do a memset, but it deserves a large code comment, why
> this is needed, so people copy-pasting understand. I choose current
> approach to keep code close to code people will copy-paste.

SG, I don't think it matters, but agreed that having this stated
explicitly could help with a blind copy-paste :-)
Then maybe repost with the TODO's removed from the kfucs? We seem to
agree that it's the user's job to manage the final buffer..

> --Jesper
>
diff mbox series

Patch

diff --git a/net/core/xdp.c b/net/core/xdp.c
index a5a7ecf6391c..5ea13554c080 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -724,6 +724,7 @@  __diag_ignore_all("-Wmissing-prototypes",
  */
 int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
 {
+	// XXX: Question: Should we clear mem pointed to by @timestamp ?
 	return -EOPNOTSUPP;
 }
 
@@ -736,6 +737,7 @@  int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp)
  */
 int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash)
 {
+	// XXX: Question: Should we clear mem pointed to by @hash ?
 	return -EOPNOTSUPP;
 }
 
diff --git a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
index 25b8178735ee..4c55b4d79d3d 100644
--- a/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_hw_metadata.c
@@ -70,10 +70,14 @@  int rx(struct xdp_md *ctx)
 	}
 
 	if (!bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp))
-		bpf_printk("populated rx_timestamp with %u", meta->rx_timestamp);
+		bpf_printk("populated rx_timestamp with %llu", meta->rx_timestamp);
+	else
+		meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
 
 	if (!bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash))
 		bpf_printk("populated rx_hash with %u", meta->rx_hash);
+	else
+		meta->rx_hash = 0; /* Used by AF_XDP as not avail signal */
 
 	return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
 }