mbox series

[bpf-next,v1,0/5] Introduce bpf_packet_pointer helper

Message ID 20220306234311.452206-1-memxor@gmail.com (mailing list archive)
Headers show
Series Introduce bpf_packet_pointer helper | expand

Message

Kumar Kartikeya Dwivedi March 6, 2022, 11:43 p.m. UTC
Expose existing 'bpf_xdp_pointer' as a BPF helper named 'bpf_packet_pointer'
returning a packet pointer with a fixed immutable range. This can be useful to
enable DPA without having to use memcpy (currently the case in
bpf_xdp_load_bytes and bpf_xdp_store_bytes).

The intended usage to read and write data for multi-buff XDP is:

	int err = 0;
	char buf[N];

	off &= 0xffff;
	ptr = bpf_packet_pointer(ctx, off, sizeof(buf), &err);
	if (unlikely(!ptr)) {
		if (err < 0)
			return XDP_ABORTED;
		err = bpf_xdp_load_bytes(ctx, off, buf, sizeof(buf));
		if (err < 0)
			return XDP_ABORTED;
		ptr = buf;
	}
	...
	// Do some stores and loads in [ptr, ptr + N) region
	...
	if (unlikely(ptr == buf)) {
		err = bpf_xdp_store_bytes(ctx, off, buf, sizeof(buf));
		if (err < 0)
			return XDP_ABORTED;
	}

Note that bpf_packet_pointer returns a PTR_TO_PACKET, not PTR_TO_MEM, because
these pointers need to be invalidated on clear_all_pkt_pointers invocation, and
it is also more meaningful to the user to see return value as R0=pkt.

This series is meant to collect feedback on the approach, next version can
include a bpf_skb_pointer and exposing it as bpf_packet_pointer helper for TC
hooks, and explore not resetting range to zero on r0 += rX, instead check access
like check_mem_region_access (var_off + off < range), since there would be no
data_end to compare against and obtain a new range.

The common name and func_id is supposed to allow writing generic code using
bpf_packet_pointer that works for both XDP and TC programs.

Please see the individual patches for implementation details.

Kumar Kartikeya Dwivedi (5):
  bpf: Add ARG_SCALAR and ARG_CONSTANT
  bpf: Introduce pkt_uid concept for PTR_TO_PACKET
  bpf: Introduce bpf_packet_pointer helper to do DPA
  selftests/bpf: Add verifier tests for pkt pointer with pkt_uid
  selftests/bpf: Update xdp_adjust_frags to use bpf_packet_pointer

 include/linux/bpf.h                           |   4 +
 include/linux/bpf_verifier.h                  |   9 +-
 include/uapi/linux/bpf.h                      |  12 ++
 kernel/bpf/verifier.c                         |  97 ++++++++++--
 net/core/filter.c                             |  48 +++---
 tools/include/uapi/linux/bpf.h                |  12 ++
 .../bpf/prog_tests/xdp_adjust_frags.c         |  46 ++++--
 .../bpf/progs/test_xdp_update_frags.c         |  46 ++++--
 tools/testing/selftests/bpf/verifier/xdp.c    | 146 ++++++++++++++++++
 9 files changed, 358 insertions(+), 62 deletions(-)

Comments

Andrii Nakryiko March 8, 2022, 5:48 a.m. UTC | #1
On Sun, Mar 6, 2022 at 3:43 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> Expose existing 'bpf_xdp_pointer' as a BPF helper named 'bpf_packet_pointer'
> returning a packet pointer with a fixed immutable range. This can be useful to
> enable DPA without having to use memcpy (currently the case in
> bpf_xdp_load_bytes and bpf_xdp_store_bytes).
>
> The intended usage to read and write data for multi-buff XDP is:
>
>         int err = 0;
>         char buf[N];
>
>         off &= 0xffff;
>         ptr = bpf_packet_pointer(ctx, off, sizeof(buf), &err);
>         if (unlikely(!ptr)) {
>                 if (err < 0)
>                         return XDP_ABORTED;
>                 err = bpf_xdp_load_bytes(ctx, off, buf, sizeof(buf));
>                 if (err < 0)
>                         return XDP_ABORTED;
>                 ptr = buf;
>         }
>         ...
>         // Do some stores and loads in [ptr, ptr + N) region
>         ...
>         if (unlikely(ptr == buf)) {
>                 err = bpf_xdp_store_bytes(ctx, off, buf, sizeof(buf));
>                 if (err < 0)
>                         return XDP_ABORTED;
>         }
>
> Note that bpf_packet_pointer returns a PTR_TO_PACKET, not PTR_TO_MEM, because
> these pointers need to be invalidated on clear_all_pkt_pointers invocation, and
> it is also more meaningful to the user to see return value as R0=pkt.
>
> This series is meant to collect feedback on the approach, next version can
> include a bpf_skb_pointer and exposing it as bpf_packet_pointer helper for TC
> hooks, and explore not resetting range to zero on r0 += rX, instead check access
> like check_mem_region_access (var_off + off < range), since there would be no
> data_end to compare against and obtain a new range.
>
> The common name and func_id is supposed to allow writing generic code using
> bpf_packet_pointer that works for both XDP and TC programs.
>
> Please see the individual patches for implementation details.
>

Joanne is working on a "bpf_dynptr" framework that will support
exactly this feature, in addition to working with dynamically
allocated memory, working with memory of statically unknown size (but
safe and checked at runtime), etc. And all that within a generic
common feature implemented uniformly within the verifier. E.g., it
won't need any of the custom bits of logic added in patch #2 and #3.
So I'm thinking that instead of custom-implementing a partial case of
bpf_dynptr just for skb and xdp packets, let's maybe wait for dynptr
and do it only once there?

See also my ARG_CONSTANT comment. It seems like a pretty common thing
where input constant is used to characterize some pointer returned
from the helper (e.g., bpf_ringbuf_reserve() case), and we'll need
that for bpf_dynptr for exactly this "give me direct access of N
bytes, if possible" case. So improving/generalizing it now before
dynptr lands makes a lot of sense, outside of bpf_packet_pointer()
feature itself.

> Kumar Kartikeya Dwivedi (5):
>   bpf: Add ARG_SCALAR and ARG_CONSTANT
>   bpf: Introduce pkt_uid concept for PTR_TO_PACKET
>   bpf: Introduce bpf_packet_pointer helper to do DPA
>   selftests/bpf: Add verifier tests for pkt pointer with pkt_uid
>   selftests/bpf: Update xdp_adjust_frags to use bpf_packet_pointer
>
>  include/linux/bpf.h                           |   4 +
>  include/linux/bpf_verifier.h                  |   9 +-
>  include/uapi/linux/bpf.h                      |  12 ++
>  kernel/bpf/verifier.c                         |  97 ++++++++++--
>  net/core/filter.c                             |  48 +++---
>  tools/include/uapi/linux/bpf.h                |  12 ++
>  .../bpf/prog_tests/xdp_adjust_frags.c         |  46 ++++--
>  .../bpf/progs/test_xdp_update_frags.c         |  46 ++++--
>  tools/testing/selftests/bpf/verifier/xdp.c    | 146 ++++++++++++++++++
>  9 files changed, 358 insertions(+), 62 deletions(-)
>
> --
> 2.35.1
>
Kumar Kartikeya Dwivedi March 8, 2022, 7:08 a.m. UTC | #2
On Tue, Mar 08, 2022 at 11:18:52AM IST, Andrii Nakryiko wrote:
> On Sun, Mar 6, 2022 at 3:43 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > Expose existing 'bpf_xdp_pointer' as a BPF helper named 'bpf_packet_pointer'
> > returning a packet pointer with a fixed immutable range. This can be useful to
> > enable DPA without having to use memcpy (currently the case in
> > bpf_xdp_load_bytes and bpf_xdp_store_bytes).
> >
> > The intended usage to read and write data for multi-buff XDP is:
> >
> >         int err = 0;
> >         char buf[N];
> >
> >         off &= 0xffff;
> >         ptr = bpf_packet_pointer(ctx, off, sizeof(buf), &err);
> >         if (unlikely(!ptr)) {
> >                 if (err < 0)
> >                         return XDP_ABORTED;
> >                 err = bpf_xdp_load_bytes(ctx, off, buf, sizeof(buf));
> >                 if (err < 0)
> >                         return XDP_ABORTED;
> >                 ptr = buf;
> >         }
> >         ...
> >         // Do some stores and loads in [ptr, ptr + N) region
> >         ...
> >         if (unlikely(ptr == buf)) {
> >                 err = bpf_xdp_store_bytes(ctx, off, buf, sizeof(buf));
> >                 if (err < 0)
> >                         return XDP_ABORTED;
> >         }
> >
> > Note that bpf_packet_pointer returns a PTR_TO_PACKET, not PTR_TO_MEM, because
> > these pointers need to be invalidated on clear_all_pkt_pointers invocation, and
> > it is also more meaningful to the user to see return value as R0=pkt.
> >
> > This series is meant to collect feedback on the approach, next version can
> > include a bpf_skb_pointer and exposing it as bpf_packet_pointer helper for TC
> > hooks, and explore not resetting range to zero on r0 += rX, instead check access
> > like check_mem_region_access (var_off + off < range), since there would be no
> > data_end to compare against and obtain a new range.
> >
> > The common name and func_id is supposed to allow writing generic code using
> > bpf_packet_pointer that works for both XDP and TC programs.
> >
> > Please see the individual patches for implementation details.
> >
>
> Joanne is working on a "bpf_dynptr" framework that will support
> exactly this feature, in addition to working with dynamically
> allocated memory, working with memory of statically unknown size (but
> safe and checked at runtime), etc. And all that within a generic
> common feature implemented uniformly within the verifier. E.g., it
> won't need any of the custom bits of logic added in patch #2 and #3.
> So I'm thinking that instead of custom-implementing a partial case of
> bpf_dynptr just for skb and xdp packets, let's maybe wait for dynptr
> and do it only once there?
>

Interesting stuff, looking forward to it.

> See also my ARG_CONSTANT comment. It seems like a pretty common thing
> where input constant is used to characterize some pointer returned
> from the helper (e.g., bpf_ringbuf_reserve() case), and we'll need
> that for bpf_dynptr for exactly this "give me direct access of N
> bytes, if possible" case. So improving/generalizing it now before
> dynptr lands makes a lot of sense, outside of bpf_packet_pointer()
> feature itself.

No worries, we can continue the discussion in patch 1, I'll split out the arg
changes into a separate patch, and wait for dynptr to be posted before reworking
this.

>
> > Kumar Kartikeya Dwivedi (5):
> >   bpf: Add ARG_SCALAR and ARG_CONSTANT
> >   bpf: Introduce pkt_uid concept for PTR_TO_PACKET
> >   bpf: Introduce bpf_packet_pointer helper to do DPA
> >   selftests/bpf: Add verifier tests for pkt pointer with pkt_uid
> >   selftests/bpf: Update xdp_adjust_frags to use bpf_packet_pointer
> >
> >  include/linux/bpf.h                           |   4 +
> >  include/linux/bpf_verifier.h                  |   9 +-
> >  include/uapi/linux/bpf.h                      |  12 ++
> >  kernel/bpf/verifier.c                         |  97 ++++++++++--
> >  net/core/filter.c                             |  48 +++---
> >  tools/include/uapi/linux/bpf.h                |  12 ++
> >  .../bpf/prog_tests/xdp_adjust_frags.c         |  46 ++++--
> >  .../bpf/progs/test_xdp_update_frags.c         |  46 ++++--
> >  tools/testing/selftests/bpf/verifier/xdp.c    | 146 ++++++++++++++++++
> >  9 files changed, 358 insertions(+), 62 deletions(-)
> >
> > --
> > 2.35.1
> >

--
Kartikeya
Toke Høiland-Jørgensen March 8, 2022, 1:40 p.m. UTC | #3
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> On Tue, Mar 08, 2022 at 11:18:52AM IST, Andrii Nakryiko wrote:
>> On Sun, Mar 6, 2022 at 3:43 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>> >
>> > Expose existing 'bpf_xdp_pointer' as a BPF helper named 'bpf_packet_pointer'
>> > returning a packet pointer with a fixed immutable range. This can be useful to
>> > enable DPA without having to use memcpy (currently the case in
>> > bpf_xdp_load_bytes and bpf_xdp_store_bytes).
>> >
>> > The intended usage to read and write data for multi-buff XDP is:
>> >
>> >         int err = 0;
>> >         char buf[N];
>> >
>> >         off &= 0xffff;
>> >         ptr = bpf_packet_pointer(ctx, off, sizeof(buf), &err);
>> >         if (unlikely(!ptr)) {
>> >                 if (err < 0)
>> >                         return XDP_ABORTED;
>> >                 err = bpf_xdp_load_bytes(ctx, off, buf, sizeof(buf));
>> >                 if (err < 0)
>> >                         return XDP_ABORTED;
>> >                 ptr = buf;
>> >         }
>> >         ...
>> >         // Do some stores and loads in [ptr, ptr + N) region
>> >         ...
>> >         if (unlikely(ptr == buf)) {
>> >                 err = bpf_xdp_store_bytes(ctx, off, buf, sizeof(buf));
>> >                 if (err < 0)
>> >                         return XDP_ABORTED;
>> >         }
>> >
>> > Note that bpf_packet_pointer returns a PTR_TO_PACKET, not PTR_TO_MEM, because
>> > these pointers need to be invalidated on clear_all_pkt_pointers invocation, and
>> > it is also more meaningful to the user to see return value as R0=pkt.
>> >
>> > This series is meant to collect feedback on the approach, next version can
>> > include a bpf_skb_pointer and exposing it as bpf_packet_pointer helper for TC
>> > hooks, and explore not resetting range to zero on r0 += rX, instead check access
>> > like check_mem_region_access (var_off + off < range), since there would be no
>> > data_end to compare against and obtain a new range.
>> >
>> > The common name and func_id is supposed to allow writing generic code using
>> > bpf_packet_pointer that works for both XDP and TC programs.
>> >
>> > Please see the individual patches for implementation details.
>> >
>>
>> Joanne is working on a "bpf_dynptr" framework that will support
>> exactly this feature, in addition to working with dynamically
>> allocated memory, working with memory of statically unknown size (but
>> safe and checked at runtime), etc. And all that within a generic
>> common feature implemented uniformly within the verifier. E.g., it
>> won't need any of the custom bits of logic added in patch #2 and #3.
>> So I'm thinking that instead of custom-implementing a partial case of
>> bpf_dynptr just for skb and xdp packets, let's maybe wait for dynptr
>> and do it only once there?
>>
>
> Interesting stuff, looking forward to it.
>
>> See also my ARG_CONSTANT comment. It seems like a pretty common thing
>> where input constant is used to characterize some pointer returned
>> from the helper (e.g., bpf_ringbuf_reserve() case), and we'll need
>> that for bpf_dynptr for exactly this "give me direct access of N
>> bytes, if possible" case. So improving/generalizing it now before
>> dynptr lands makes a lot of sense, outside of bpf_packet_pointer()
>> feature itself.
>
> No worries, we can continue the discussion in patch 1, I'll split out the arg
> changes into a separate patch, and wait for dynptr to be posted before reworking
> this.

This does raise the question of what we do in the meantime, though? Your
patch includes a change to bpf_xdp_{load,store}_bytes() which, if we're
making it, really has to go in before those hit a release and become
UAPI.

One option would be to still make the change to those other helpers;
they'd become a bit slower, but if we have a solution for that coming,
that may be OK for a single release? WDYT?

-Toke
Andrii Nakryiko March 10, 2022, 11:12 p.m. UTC | #4
On Tue, Mar 8, 2022 at 5:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>
> > On Tue, Mar 08, 2022 at 11:18:52AM IST, Andrii Nakryiko wrote:
> >> On Sun, Mar 6, 2022 at 3:43 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >> >
> >> > Expose existing 'bpf_xdp_pointer' as a BPF helper named 'bpf_packet_pointer'
> >> > returning a packet pointer with a fixed immutable range. This can be useful to
> >> > enable DPA without having to use memcpy (currently the case in
> >> > bpf_xdp_load_bytes and bpf_xdp_store_bytes).
> >> >
> >> > The intended usage to read and write data for multi-buff XDP is:
> >> >
> >> >         int err = 0;
> >> >         char buf[N];
> >> >
> >> >         off &= 0xffff;
> >> >         ptr = bpf_packet_pointer(ctx, off, sizeof(buf), &err);
> >> >         if (unlikely(!ptr)) {
> >> >                 if (err < 0)
> >> >                         return XDP_ABORTED;
> >> >                 err = bpf_xdp_load_bytes(ctx, off, buf, sizeof(buf));
> >> >                 if (err < 0)
> >> >                         return XDP_ABORTED;
> >> >                 ptr = buf;
> >> >         }
> >> >         ...
> >> >         // Do some stores and loads in [ptr, ptr + N) region
> >> >         ...
> >> >         if (unlikely(ptr == buf)) {
> >> >                 err = bpf_xdp_store_bytes(ctx, off, buf, sizeof(buf));
> >> >                 if (err < 0)
> >> >                         return XDP_ABORTED;
> >> >         }
> >> >
> >> > Note that bpf_packet_pointer returns a PTR_TO_PACKET, not PTR_TO_MEM, because
> >> > these pointers need to be invalidated on clear_all_pkt_pointers invocation, and
> >> > it is also more meaningful to the user to see return value as R0=pkt.
> >> >
> >> > This series is meant to collect feedback on the approach, next version can
> >> > include a bpf_skb_pointer and exposing it as bpf_packet_pointer helper for TC
> >> > hooks, and explore not resetting range to zero on r0 += rX, instead check access
> >> > like check_mem_region_access (var_off + off < range), since there would be no
> >> > data_end to compare against and obtain a new range.
> >> >
> >> > The common name and func_id is supposed to allow writing generic code using
> >> > bpf_packet_pointer that works for both XDP and TC programs.
> >> >
> >> > Please see the individual patches for implementation details.
> >> >
> >>
> >> Joanne is working on a "bpf_dynptr" framework that will support
> >> exactly this feature, in addition to working with dynamically
> >> allocated memory, working with memory of statically unknown size (but
> >> safe and checked at runtime), etc. And all that within a generic
> >> common feature implemented uniformly within the verifier. E.g., it
> >> won't need any of the custom bits of logic added in patch #2 and #3.
> >> So I'm thinking that instead of custom-implementing a partial case of
> >> bpf_dynptr just for skb and xdp packets, let's maybe wait for dynptr
> >> and do it only once there?
> >>
> >
> > Interesting stuff, looking forward to it.
> >
> >> See also my ARG_CONSTANT comment. It seems like a pretty common thing
> >> where input constant is used to characterize some pointer returned
> >> from the helper (e.g., bpf_ringbuf_reserve() case), and we'll need
> >> that for bpf_dynptr for exactly this "give me direct access of N
> >> bytes, if possible" case. So improving/generalizing it now before
> >> dynptr lands makes a lot of sense, outside of bpf_packet_pointer()
> >> feature itself.
> >
> > No worries, we can continue the discussion in patch 1, I'll split out the arg
> > changes into a separate patch, and wait for dynptr to be posted before reworking
> > this.
>
> This does raise the question of what we do in the meantime, though? Your
> patch includes a change to bpf_xdp_{load,store}_bytes() which, if we're
> making it, really has to go in before those hit a release and become
> UAPI.
>
> One option would be to still make the change to those other helpers;
> they'd become a bit slower, but if we have a solution for that coming,
> that may be OK for a single release? WDYT?

I must have missed important changes to bpf_xdp_{load,store}_bytes().
Does anything change about its behavior? If there are some fixes
specific to those helpers, we should fix them as well as a separate
patch. My main objection is adding a bpf_packet_pointer() special case
when we have a generic mechanism in the works that will come this use
case (among other use cases).

>
> -Toke
>
Toke Høiland-Jørgensen March 10, 2022, 11:30 p.m. UTC | #5
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Mar 8, 2022 at 5:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>>
>> > On Tue, Mar 08, 2022 at 11:18:52AM IST, Andrii Nakryiko wrote:
>> >> On Sun, Mar 6, 2022 at 3:43 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>> >> >
>> >> > Expose existing 'bpf_xdp_pointer' as a BPF helper named 'bpf_packet_pointer'
>> >> > returning a packet pointer with a fixed immutable range. This can be useful to
>> >> > enable DPA without having to use memcpy (currently the case in
>> >> > bpf_xdp_load_bytes and bpf_xdp_store_bytes).
>> >> >
>> >> > The intended usage to read and write data for multi-buff XDP is:
>> >> >
>> >> >         int err = 0;
>> >> >         char buf[N];
>> >> >
>> >> >         off &= 0xffff;
>> >> >         ptr = bpf_packet_pointer(ctx, off, sizeof(buf), &err);
>> >> >         if (unlikely(!ptr)) {
>> >> >                 if (err < 0)
>> >> >                         return XDP_ABORTED;
>> >> >                 err = bpf_xdp_load_bytes(ctx, off, buf, sizeof(buf));
>> >> >                 if (err < 0)
>> >> >                         return XDP_ABORTED;
>> >> >                 ptr = buf;
>> >> >         }
>> >> >         ...
>> >> >         // Do some stores and loads in [ptr, ptr + N) region
>> >> >         ...
>> >> >         if (unlikely(ptr == buf)) {
>> >> >                 err = bpf_xdp_store_bytes(ctx, off, buf, sizeof(buf));
>> >> >                 if (err < 0)
>> >> >                         return XDP_ABORTED;
>> >> >         }
>> >> >
>> >> > Note that bpf_packet_pointer returns a PTR_TO_PACKET, not PTR_TO_MEM, because
>> >> > these pointers need to be invalidated on clear_all_pkt_pointers invocation, and
>> >> > it is also more meaningful to the user to see return value as R0=pkt.
>> >> >
>> >> > This series is meant to collect feedback on the approach, next version can
>> >> > include a bpf_skb_pointer and exposing it as bpf_packet_pointer helper for TC
>> >> > hooks, and explore not resetting range to zero on r0 += rX, instead check access
>> >> > like check_mem_region_access (var_off + off < range), since there would be no
>> >> > data_end to compare against and obtain a new range.
>> >> >
>> >> > The common name and func_id is supposed to allow writing generic code using
>> >> > bpf_packet_pointer that works for both XDP and TC programs.
>> >> >
>> >> > Please see the individual patches for implementation details.
>> >> >
>> >>
>> >> Joanne is working on a "bpf_dynptr" framework that will support
>> >> exactly this feature, in addition to working with dynamically
>> >> allocated memory, working with memory of statically unknown size (but
>> >> safe and checked at runtime), etc. And all that within a generic
>> >> common feature implemented uniformly within the verifier. E.g., it
>> >> won't need any of the custom bits of logic added in patch #2 and #3.
>> >> So I'm thinking that instead of custom-implementing a partial case of
>> >> bpf_dynptr just for skb and xdp packets, let's maybe wait for dynptr
>> >> and do it only once there?
>> >>
>> >
>> > Interesting stuff, looking forward to it.
>> >
>> >> See also my ARG_CONSTANT comment. It seems like a pretty common thing
>> >> where input constant is used to characterize some pointer returned
>> >> from the helper (e.g., bpf_ringbuf_reserve() case), and we'll need
>> >> that for bpf_dynptr for exactly this "give me direct access of N
>> >> bytes, if possible" case. So improving/generalizing it now before
>> >> dynptr lands makes a lot of sense, outside of bpf_packet_pointer()
>> >> feature itself.
>> >
>> > No worries, we can continue the discussion in patch 1, I'll split out the arg
>> > changes into a separate patch, and wait for dynptr to be posted before reworking
>> > this.
>>
>> This does raise the question of what we do in the meantime, though? Your
>> patch includes a change to bpf_xdp_{load,store}_bytes() which, if we're
>> making it, really has to go in before those hit a release and become
>> UAPI.
>>
>> One option would be to still make the change to those other helpers;
>> they'd become a bit slower, but if we have a solution for that coming,
>> that may be OK for a single release? WDYT?
>
> I must have missed important changes to bpf_xdp_{load,store}_bytes().
> Does anything change about its behavior? If there are some fixes
> specific to those helpers, we should fix them as well as a separate
> patch. My main objection is adding a bpf_packet_pointer() special case
> when we have a generic mechanism in the works that will come this use
> case (among other use cases).

Well it's not a functional change per se, but Kartikeya's patch is
removing an optimisation from bpf_xdp_{load_store}_bytes() (i.e., the
use of the bpf_xdp_pointer()) in favour of making it available directly
to BPF. So if we don't do that change before those helpers are
finalised, we will end up either introducing a performance regression
for code using those helpers, or being stuck with the bpf_xdp_pointer()
use inside them even though it makes more sense to move it out to BPF.

So the "safe" thing to do would do the change to the store/load helpers
now, and get rid of the bpf_xdp_pointer() entirely until it can be
introduced as a BPF helper in a generic way. Of course this depends on
whether you consider performance regressions to be something to avoid,
but this being XDP IMO we should :)

-Toke
Alexei Starovoitov March 10, 2022, 11:35 p.m. UTC | #6
On Thu, Mar 10, 2022 at 3:30 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Tue, Mar 8, 2022 at 5:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> >>
> >> > On Tue, Mar 08, 2022 at 11:18:52AM IST, Andrii Nakryiko wrote:
> >> >> On Sun, Mar 6, 2022 at 3:43 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >> >> >
> >> >> > Expose existing 'bpf_xdp_pointer' as a BPF helper named 'bpf_packet_pointer'
> >> >> > returning a packet pointer with a fixed immutable range. This can be useful to
> >> >> > enable DPA without having to use memcpy (currently the case in
> >> >> > bpf_xdp_load_bytes and bpf_xdp_store_bytes).
> >> >> >
> >> >> > The intended usage to read and write data for multi-buff XDP is:
> >> >> >
> >> >> >         int err = 0;
> >> >> >         char buf[N];
> >> >> >
> >> >> >         off &= 0xffff;
> >> >> >         ptr = bpf_packet_pointer(ctx, off, sizeof(buf), &err);
> >> >> >         if (unlikely(!ptr)) {
> >> >> >                 if (err < 0)
> >> >> >                         return XDP_ABORTED;
> >> >> >                 err = bpf_xdp_load_bytes(ctx, off, buf, sizeof(buf));
> >> >> >                 if (err < 0)
> >> >> >                         return XDP_ABORTED;
> >> >> >                 ptr = buf;
> >> >> >         }
> >> >> >         ...
> >> >> >         // Do some stores and loads in [ptr, ptr + N) region
> >> >> >         ...
> >> >> >         if (unlikely(ptr == buf)) {
> >> >> >                 err = bpf_xdp_store_bytes(ctx, off, buf, sizeof(buf));
> >> >> >                 if (err < 0)
> >> >> >                         return XDP_ABORTED;
> >> >> >         }
> >> >> >
> >> >> > Note that bpf_packet_pointer returns a PTR_TO_PACKET, not PTR_TO_MEM, because
> >> >> > these pointers need to be invalidated on clear_all_pkt_pointers invocation, and
> >> >> > it is also more meaningful to the user to see return value as R0=pkt.
> >> >> >
> >> >> > This series is meant to collect feedback on the approach, next version can
> >> >> > include a bpf_skb_pointer and exposing it as bpf_packet_pointer helper for TC
> >> >> > hooks, and explore not resetting range to zero on r0 += rX, instead check access
> >> >> > like check_mem_region_access (var_off + off < range), since there would be no
> >> >> > data_end to compare against and obtain a new range.
> >> >> >
> >> >> > The common name and func_id is supposed to allow writing generic code using
> >> >> > bpf_packet_pointer that works for both XDP and TC programs.
> >> >> >
> >> >> > Please see the individual patches for implementation details.
> >> >> >
> >> >>
> >> >> Joanne is working on a "bpf_dynptr" framework that will support
> >> >> exactly this feature, in addition to working with dynamically
> >> >> allocated memory, working with memory of statically unknown size (but
> >> >> safe and checked at runtime), etc. And all that within a generic
> >> >> common feature implemented uniformly within the verifier. E.g., it
> >> >> won't need any of the custom bits of logic added in patch #2 and #3.
> >> >> So I'm thinking that instead of custom-implementing a partial case of
> >> >> bpf_dynptr just for skb and xdp packets, let's maybe wait for dynptr
> >> >> and do it only once there?
> >> >>
> >> >
> >> > Interesting stuff, looking forward to it.
> >> >
> >> >> See also my ARG_CONSTANT comment. It seems like a pretty common thing
> >> >> where input constant is used to characterize some pointer returned
> >> >> from the helper (e.g., bpf_ringbuf_reserve() case), and we'll need
> >> >> that for bpf_dynptr for exactly this "give me direct access of N
> >> >> bytes, if possible" case. So improving/generalizing it now before
> >> >> dynptr lands makes a lot of sense, outside of bpf_packet_pointer()
> >> >> feature itself.
> >> >
> >> > No worries, we can continue the discussion in patch 1, I'll split out the arg
> >> > changes into a separate patch, and wait for dynptr to be posted before reworking
> >> > this.
> >>
> >> This does raise the question of what we do in the meantime, though? Your
> >> patch includes a change to bpf_xdp_{load,store}_bytes() which, if we're
> >> making it, really has to go in before those hit a release and become
> >> UAPI.
> >>
> >> One option would be to still make the change to those other helpers;
> >> they'd become a bit slower, but if we have a solution for that coming,
> >> that may be OK for a single release? WDYT?
> >
> > I must have missed important changes to bpf_xdp_{load,store}_bytes().
> > Does anything change about its behavior? If there are some fixes
> > specific to those helpers, we should fix them as well as a separate
> > patch. My main objection is adding a bpf_packet_pointer() special case
> > when we have a generic mechanism in the works that will come this use
> > case (among other use cases).
>
> Well it's not a functional change per se, but Kartikeya's patch is
> removing an optimisation from bpf_xdp_{load_store}_bytes() (i.e., the
> use of the bpf_xdp_pointer()) in favour of making it available directly
> to BPF. So if we don't do that change before those helpers are
> finalised, we will end up either introducing a performance regression
> for code using those helpers, or being stuck with the bpf_xdp_pointer()
> use inside them even though it makes more sense to move it out to BPF.
>
> So the "safe" thing to do would do the change to the store/load helpers
> now, and get rid of the bpf_xdp_pointer() entirely until it can be
> introduced as a BPF helper in a generic way. Of course this depends on
> whether you consider performance regressions to be something to avoid,
> but this being XDP IMO we should :)

I don't follow this logic.
Would you mean by "get rid of the bpf_xdp_pointer" ?
It's just an internal static function.

Also I don't believe that this patch set and exposing
bpf_xdp_pointer as a helper actually gives measurable performance wins.
It looks quirky to me and hard to use.
Kumar Kartikeya Dwivedi March 11, 2022, 7:19 a.m. UTC | #7
On Fri, Mar 11, 2022 at 05:00:42AM IST, Toke Høiland-Jørgensen wrote:
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Tue, Mar 8, 2022 at 5:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> >>
> >> > On Tue, Mar 08, 2022 at 11:18:52AM IST, Andrii Nakryiko wrote:
> >> >> On Sun, Mar 6, 2022 at 3:43 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >> >> >
> >> >> > Expose existing 'bpf_xdp_pointer' as a BPF helper named 'bpf_packet_pointer'
> >> >> > returning a packet pointer with a fixed immutable range. This can be useful to
> >> >> > enable DPA without having to use memcpy (currently the case in
> >> >> > bpf_xdp_load_bytes and bpf_xdp_store_bytes).
> >> >> >
> >> >> > The intended usage to read and write data for multi-buff XDP is:
> >> >> >
> >> >> >         int err = 0;
> >> >> >         char buf[N];
> >> >> >
> >> >> >         off &= 0xffff;
> >> >> >         ptr = bpf_packet_pointer(ctx, off, sizeof(buf), &err);
> >> >> >         if (unlikely(!ptr)) {
> >> >> >                 if (err < 0)
> >> >> >                         return XDP_ABORTED;
> >> >> >                 err = bpf_xdp_load_bytes(ctx, off, buf, sizeof(buf));
> >> >> >                 if (err < 0)
> >> >> >                         return XDP_ABORTED;
> >> >> >                 ptr = buf;
> >> >> >         }
> >> >> >         ...
> >> >> >         // Do some stores and loads in [ptr, ptr + N) region
> >> >> >         ...
> >> >> >         if (unlikely(ptr == buf)) {
> >> >> >                 err = bpf_xdp_store_bytes(ctx, off, buf, sizeof(buf));
> >> >> >                 if (err < 0)
> >> >> >                         return XDP_ABORTED;
> >> >> >         }
> >> >> >
> >> >> > Note that bpf_packet_pointer returns a PTR_TO_PACKET, not PTR_TO_MEM, because
> >> >> > these pointers need to be invalidated on clear_all_pkt_pointers invocation, and
> >> >> > it is also more meaningful to the user to see return value as R0=pkt.
> >> >> >
> >> >> > This series is meant to collect feedback on the approach, next version can
> >> >> > include a bpf_skb_pointer and exposing it as bpf_packet_pointer helper for TC
> >> >> > hooks, and explore not resetting range to zero on r0 += rX, instead check access
> >> >> > like check_mem_region_access (var_off + off < range), since there would be no
> >> >> > data_end to compare against and obtain a new range.
> >> >> >
> >> >> > The common name and func_id is supposed to allow writing generic code using
> >> >> > bpf_packet_pointer that works for both XDP and TC programs.
> >> >> >
> >> >> > Please see the individual patches for implementation details.
> >> >> >
> >> >>
> >> >> Joanne is working on a "bpf_dynptr" framework that will support
> >> >> exactly this feature, in addition to working with dynamically
> >> >> allocated memory, working with memory of statically unknown size (but
> >> >> safe and checked at runtime), etc. And all that within a generic
> >> >> common feature implemented uniformly within the verifier. E.g., it
> >> >> won't need any of the custom bits of logic added in patch #2 and #3.
> >> >> So I'm thinking that instead of custom-implementing a partial case of
> >> >> bpf_dynptr just for skb and xdp packets, let's maybe wait for dynptr
> >> >> and do it only once there?
> >> >>
> >> >
> >> > Interesting stuff, looking forward to it.
> >> >
> >> >> See also my ARG_CONSTANT comment. It seems like a pretty common thing
> >> >> where input constant is used to characterize some pointer returned
> >> >> from the helper (e.g., bpf_ringbuf_reserve() case), and we'll need
> >> >> that for bpf_dynptr for exactly this "give me direct access of N
> >> >> bytes, if possible" case. So improving/generalizing it now before
> >> >> dynptr lands makes a lot of sense, outside of bpf_packet_pointer()
> >> >> feature itself.
> >> >
> >> > No worries, we can continue the discussion in patch 1, I'll split out the arg
> >> > changes into a separate patch, and wait for dynptr to be posted before reworking
> >> > this.
> >>
> >> This does raise the question of what we do in the meantime, though? Your
> >> patch includes a change to bpf_xdp_{load,store}_bytes() which, if we're
> >> making it, really has to go in before those hit a release and become
> >> UAPI.
> >>
> >> One option would be to still make the change to those other helpers;
> >> they'd become a bit slower, but if we have a solution for that coming,
> >> that may be OK for a single release? WDYT?
> >
> > I must have missed important changes to bpf_xdp_{load,store}_bytes().
> > Does anything change about its behavior? If there are some fixes
> > specific to those helpers, we should fix them as well as a separate
> > patch. My main objection is adding a bpf_packet_pointer() special case
> > when we have a generic mechanism in the works that will come this use
> > case (among other use cases).
>
> Well it's not a functional change per se, but Kartikeya's patch is
> removing an optimisation from bpf_xdp_{load_store}_bytes() (i.e., the
> use of the bpf_xdp_pointer()) in favour of making it available directly
> to BPF. So if we don't do that change before those helpers are
> finalised, we will end up either introducing a performance regression
> for code using those helpers, or being stuck with the bpf_xdp_pointer()
> use inside them even though it makes more sense to move it out to BPF.
>

So IIUC, the case we're worried about is when a linear region is in head or a
frag and bpf_xdp_pointer can be used to do a direct memcpy for it. But in my
testing there doesn't seem to be any difference. With or without the call, the
time taken e.g. for bpf_xdp_load_bytes lies in the 30-40ns range. It would make
sense, because for this case the code in bpf_xdp_pointer and bpf_xdp_copy_buf
are almost the same, just that the latter has a conditional jump out of the loop
based on len. bpf_xdp_copy_buf is still only doing a single memcpy, the cost
seems to be dominated by that.

Otoh, removing it would improve the case for the other scenario (when region
touches two or more frags) because we wouldn't spend time in bpf_xdp_pointer and
returning NULL from it failing to find a linear region, but that shouldn't be a
regression.

Please let me know if I missed something.

> So the "safe" thing to do would do the change to the store/load helpers
> now, and get rid of the bpf_xdp_pointer() entirely until it can be
> introduced as a BPF helper in a generic way. Of course this depends on
> whether you consider performance regressions to be something to avoid,
> but this being XDP IMO we should :)
>
> -Toke
>

--
Kartikeya
Kumar Kartikeya Dwivedi March 11, 2022, 7:25 a.m. UTC | #8
On Fri, Mar 11, 2022 at 05:05:24AM IST, Alexei Starovoitov wrote:
> On Thu, Mar 10, 2022 at 3:30 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >
> > > On Tue, Mar 8, 2022 at 5:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >>
> > >> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> > >>
> > >> > On Tue, Mar 08, 2022 at 11:18:52AM IST, Andrii Nakryiko wrote:
> > >> >> On Sun, Mar 6, 2022 at 3:43 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > >> >> >
> > >> >> > Expose existing 'bpf_xdp_pointer' as a BPF helper named 'bpf_packet_pointer'
> > >> >> > returning a packet pointer with a fixed immutable range. This can be useful to
> > >> >> > enable DPA without having to use memcpy (currently the case in
> > >> >> > bpf_xdp_load_bytes and bpf_xdp_store_bytes).
> > >> >> >
> > >> >> > The intended usage to read and write data for multi-buff XDP is:
> > >> >> >
> > >> >> >         int err = 0;
> > >> >> >         char buf[N];
> > >> >> >
> > >> >> >         off &= 0xffff;
> > >> >> >         ptr = bpf_packet_pointer(ctx, off, sizeof(buf), &err);
> > >> >> >         if (unlikely(!ptr)) {
> > >> >> >                 if (err < 0)
> > >> >> >                         return XDP_ABORTED;
> > >> >> >                 err = bpf_xdp_load_bytes(ctx, off, buf, sizeof(buf));
> > >> >> >                 if (err < 0)
> > >> >> >                         return XDP_ABORTED;
> > >> >> >                 ptr = buf;
> > >> >> >         }
> > >> >> >         ...
> > >> >> >         // Do some stores and loads in [ptr, ptr + N) region
> > >> >> >         ...
> > >> >> >         if (unlikely(ptr == buf)) {
> > >> >> >                 err = bpf_xdp_store_bytes(ctx, off, buf, sizeof(buf));
> > >> >> >                 if (err < 0)
> > >> >> >                         return XDP_ABORTED;
> > >> >> >         }
> > >> >> >
> > >> >> > Note that bpf_packet_pointer returns a PTR_TO_PACKET, not PTR_TO_MEM, because
> > >> >> > these pointers need to be invalidated on clear_all_pkt_pointers invocation, and
> > >> >> > it is also more meaningful to the user to see return value as R0=pkt.
> > >> >> >
> > >> >> > This series is meant to collect feedback on the approach, next version can
> > >> >> > include a bpf_skb_pointer and exposing it as bpf_packet_pointer helper for TC
> > >> >> > hooks, and explore not resetting range to zero on r0 += rX, instead check access
> > >> >> > like check_mem_region_access (var_off + off < range), since there would be no
> > >> >> > data_end to compare against and obtain a new range.
> > >> >> >
> > >> >> > The common name and func_id is supposed to allow writing generic code using
> > >> >> > bpf_packet_pointer that works for both XDP and TC programs.
> > >> >> >
> > >> >> > Please see the individual patches for implementation details.
> > >> >> >
> > >> >>
> > >> >> Joanne is working on a "bpf_dynptr" framework that will support
> > >> >> exactly this feature, in addition to working with dynamically
> > >> >> allocated memory, working with memory of statically unknown size (but
> > >> >> safe and checked at runtime), etc. And all that within a generic
> > >> >> common feature implemented uniformly within the verifier. E.g., it
> > >> >> won't need any of the custom bits of logic added in patch #2 and #3.
> > >> >> So I'm thinking that instead of custom-implementing a partial case of
> > >> >> bpf_dynptr just for skb and xdp packets, let's maybe wait for dynptr
> > >> >> and do it only once there?
> > >> >>
> > >> >
> > >> > Interesting stuff, looking forward to it.
> > >> >
> > >> >> See also my ARG_CONSTANT comment. It seems like a pretty common thing
> > >> >> where input constant is used to characterize some pointer returned
> > >> >> from the helper (e.g., bpf_ringbuf_reserve() case), and we'll need
> > >> >> that for bpf_dynptr for exactly this "give me direct access of N
> > >> >> bytes, if possible" case. So improving/generalizing it now before
> > >> >> dynptr lands makes a lot of sense, outside of bpf_packet_pointer()
> > >> >> feature itself.
> > >> >
> > >> > No worries, we can continue the discussion in patch 1, I'll split out the arg
> > >> > changes into a separate patch, and wait for dynptr to be posted before reworking
> > >> > this.
> > >>
> > >> This does raise the question of what we do in the meantime, though? Your
> > >> patch includes a change to bpf_xdp_{load,store}_bytes() which, if we're
> > >> making it, really has to go in before those hit a release and become
> > >> UAPI.
> > >>
> > >> One option would be to still make the change to those other helpers;
> > >> they'd become a bit slower, but if we have a solution for that coming,
> > >> that may be OK for a single release? WDYT?
> > >
> > > I must have missed important changes to bpf_xdp_{load,store}_bytes().
> > > Does anything change about its behavior? If there are some fixes
> > > specific to those helpers, we should fix them as well as a separate
> > > patch. My main objection is adding a bpf_packet_pointer() special case
> > > when we have a generic mechanism in the works that will come this use
> > > case (among other use cases).
> >
> > Well it's not a functional change per se, but Kartikeya's patch is
> > removing an optimisation from bpf_xdp_{load_store}_bytes() (i.e., the
> > use of the bpf_xdp_pointer()) in favour of making it available directly
> > to BPF. So if we don't do that change before those helpers are
> > finalised, we will end up either introducing a performance regression
> > for code using those helpers, or being stuck with the bpf_xdp_pointer()
> > use inside them even though it makes more sense to move it out to BPF.
> >
> > So the "safe" thing to do would do the change to the store/load helpers
> > now, and get rid of the bpf_xdp_pointer() entirely until it can be
> > introduced as a BPF helper in a generic way. Of course this depends on
> > whether you consider performance regressions to be something to avoid,
> > but this being XDP IMO we should :)
>
> I don't follow this logic.
> Would you mean by "get rid of the bpf_xdp_pointer" ?
> It's just an internal static function.
>
> Also I don't believe that this patch set and exposing
> bpf_xdp_pointer as a helper actually gives measurable performance wins.
> It looks quirky to me and hard to use.

This is actually inspired from your idea to avoid memcpy when reading and
writing to multi-buff XDP [0]. But instead of passing in the stack or mem
pointer (as discussed in that thread), I let the user set it and detect it
themselves, which makes the implementation simpler.

I am sure accessing a few bytes directly is going to be faster than first
memcpy'ing it to a local buffer, reading, and then possibly writing things
back again using a memcpy, but I will be happy to come with some numbers when
I respin this later, when Joanne posts the dynptr series.

Ofcourse, we could just make return value PTR_TO_MEM even for the 'pass buf
pointer' idea, but then we have to conservatively invalidate the pointer even if
it points to stack buffer on clear_all_pkt_pointers. The current approach looked
better to me.

  [0]: https://lore.kernel.org/bpf/CAADnVQKbrkOxfNoixUx-RLJEWULJLyhqjZ=M_X2cFG_APwNyCg@mail.gmail.com

--
Kartikeya
Kumar Kartikeya Dwivedi March 11, 2022, 7:32 a.m. UTC | #9
On Fri, Mar 11, 2022 at 12:55:45PM IST, Kumar Kartikeya Dwivedi wrote:
> On Fri, Mar 11, 2022 at 05:05:24AM IST, Alexei Starovoitov wrote:
> > On Thu, Mar 10, 2022 at 3:30 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> > >
> > > > On Tue, Mar 8, 2022 at 5:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > > >>
> > > >> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> > > >>
> > > >> > On Tue, Mar 08, 2022 at 11:18:52AM IST, Andrii Nakryiko wrote:
> > > >> >> On Sun, Mar 6, 2022 at 3:43 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > > >> >> >
> > > >> >> > Expose existing 'bpf_xdp_pointer' as a BPF helper named 'bpf_packet_pointer'
> > > >> >> > returning a packet pointer with a fixed immutable range. This can be useful to
> > > >> >> > enable DPA without having to use memcpy (currently the case in
> > > >> >> > bpf_xdp_load_bytes and bpf_xdp_store_bytes).
> > > >> >> >
> > > >> >> > The intended usage to read and write data for multi-buff XDP is:
> > > >> >> >
> > > >> >> >         int err = 0;
> > > >> >> >         char buf[N];
> > > >> >> >
> > > >> >> >         off &= 0xffff;
> > > >> >> >         ptr = bpf_packet_pointer(ctx, off, sizeof(buf), &err);
> > > >> >> >         if (unlikely(!ptr)) {
> > > >> >> >                 if (err < 0)
> > > >> >> >                         return XDP_ABORTED;
> > > >> >> >                 err = bpf_xdp_load_bytes(ctx, off, buf, sizeof(buf));
> > > >> >> >                 if (err < 0)
> > > >> >> >                         return XDP_ABORTED;
> > > >> >> >                 ptr = buf;
> > > >> >> >         }
> > > >> >> >         ...
> > > >> >> >         // Do some stores and loads in [ptr, ptr + N) region
> > > >> >> >         ...
> > > >> >> >         if (unlikely(ptr == buf)) {
> > > >> >> >                 err = bpf_xdp_store_bytes(ctx, off, buf, sizeof(buf));
> > > >> >> >                 if (err < 0)
> > > >> >> >                         return XDP_ABORTED;
> > > >> >> >         }
> > > >> >> >
> > > >> >> > Note that bpf_packet_pointer returns a PTR_TO_PACKET, not PTR_TO_MEM, because
> > > >> >> > these pointers need to be invalidated on clear_all_pkt_pointers invocation, and
> > > >> >> > it is also more meaningful to the user to see return value as R0=pkt.
> > > >> >> >
> > > >> >> > This series is meant to collect feedback on the approach, next version can
> > > >> >> > include a bpf_skb_pointer and exposing it as bpf_packet_pointer helper for TC
> > > >> >> > hooks, and explore not resetting range to zero on r0 += rX, instead check access
> > > >> >> > like check_mem_region_access (var_off + off < range), since there would be no
> > > >> >> > data_end to compare against and obtain a new range.
> > > >> >> >
> > > >> >> > The common name and func_id is supposed to allow writing generic code using
> > > >> >> > bpf_packet_pointer that works for both XDP and TC programs.
> > > >> >> >
> > > >> >> > Please see the individual patches for implementation details.
> > > >> >> >
> > > >> >>
> > > >> >> Joanne is working on a "bpf_dynptr" framework that will support
> > > >> >> exactly this feature, in addition to working with dynamically
> > > >> >> allocated memory, working with memory of statically unknown size (but
> > > >> >> safe and checked at runtime), etc. And all that within a generic
> > > >> >> common feature implemented uniformly within the verifier. E.g., it
> > > >> >> won't need any of the custom bits of logic added in patch #2 and #3.
> > > >> >> So I'm thinking that instead of custom-implementing a partial case of
> > > >> >> bpf_dynptr just for skb and xdp packets, let's maybe wait for dynptr
> > > >> >> and do it only once there?
> > > >> >>
> > > >> >
> > > >> > Interesting stuff, looking forward to it.
> > > >> >
> > > >> >> See also my ARG_CONSTANT comment. It seems like a pretty common thing
> > > >> >> where input constant is used to characterize some pointer returned
> > > >> >> from the helper (e.g., bpf_ringbuf_reserve() case), and we'll need
> > > >> >> that for bpf_dynptr for exactly this "give me direct access of N
> > > >> >> bytes, if possible" case. So improving/generalizing it now before
> > > >> >> dynptr lands makes a lot of sense, outside of bpf_packet_pointer()
> > > >> >> feature itself.
> > > >> >
> > > >> > No worries, we can continue the discussion in patch 1, I'll split out the arg
> > > >> > changes into a separate patch, and wait for dynptr to be posted before reworking
> > > >> > this.
> > > >>
> > > >> This does raise the question of what we do in the meantime, though? Your
> > > >> patch includes a change to bpf_xdp_{load,store}_bytes() which, if we're
> > > >> making it, really has to go in before those hit a release and become
> > > >> UAPI.
> > > >>
> > > >> One option would be to still make the change to those other helpers;
> > > >> they'd become a bit slower, but if we have a solution for that coming,
> > > >> that may be OK for a single release? WDYT?
> > > >
> > > > I must have missed important changes to bpf_xdp_{load,store}_bytes().
> > > > Does anything change about its behavior? If there are some fixes
> > > > specific to those helpers, we should fix them as well as a separate
> > > > patch. My main objection is adding a bpf_packet_pointer() special case
> > > > when we have a generic mechanism in the works that will come this use
> > > > case (among other use cases).
> > >
> > > Well it's not a functional change per se, but Kartikeya's patch is
> > > removing an optimisation from bpf_xdp_{load_store}_bytes() (i.e., the
> > > use of the bpf_xdp_pointer()) in favour of making it available directly
> > > to BPF. So if we don't do that change before those helpers are
> > > finalised, we will end up either introducing a performance regression
> > > for code using those helpers, or being stuck with the bpf_xdp_pointer()
> > > use inside them even though it makes more sense to move it out to BPF.
> > >
> > > So the "safe" thing to do would do the change to the store/load helpers
> > > now, and get rid of the bpf_xdp_pointer() entirely until it can be
> > > introduced as a BPF helper in a generic way. Of course this depends on
> > > whether you consider performance regressions to be something to avoid,
> > > but this being XDP IMO we should :)
> >
> > I don't follow this logic.
> > Would you mean by "get rid of the bpf_xdp_pointer" ?
> > It's just an internal static function.
> >
> > Also I don't believe that this patch set and exposing
> > bpf_xdp_pointer as a helper actually gives measurable performance wins.
> > It looks quirky to me and hard to use.
>
> This is actually inspired from your idea to avoid memcpy when reading and
> writing to multi-buff XDP [0]. But instead of passing in the stack or mem
> pointer (as discussed in that thread), I let the user set it and detect it
> themselves, which makes the implementation simpler.
>
> I am sure accessing a few bytes directly is going to be faster than first
> memcpy'ing it to a local buffer, reading, and then possibly writing things
> back again using a memcpy, but I will be happy to come with some numbers when
> I respin this later, when Joanne posts the dynptr series.
>
> Ofcourse, we could just make return value PTR_TO_MEM even for the 'pass buf
> pointer' idea, but then we have to conservatively invalidate the pointer even if
> it points to stack buffer on clear_all_pkt_pointers. The current approach looked
> better to me.
>
>   [0]: https://lore.kernel.org/bpf/CAADnVQKbrkOxfNoixUx-RLJEWULJLyhqjZ=M_X2cFG_APwNyCg@mail.gmail.com
>

This is probably the correct link:
https://lore.kernel.org/bpf/CAADnVQ+XXGUxzqMdbPMYf+t_ViDkqvGDdogrmv-wH-dckzujLw@mail.gmail.com

> --
> Kartikeya

--
Kartikeya
Toke Høiland-Jørgensen March 11, 2022, 10:34 a.m. UTC | #10
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> On Fri, Mar 11, 2022 at 05:00:42AM IST, Toke Høiland-Jørgensen wrote:
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Tue, Mar 8, 2022 at 5:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>> >>
>> >> > On Tue, Mar 08, 2022 at 11:18:52AM IST, Andrii Nakryiko wrote:
>> >> >> On Sun, Mar 6, 2022 at 3:43 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>> >> >> >
>> >> >> > Expose existing 'bpf_xdp_pointer' as a BPF helper named 'bpf_packet_pointer'
>> >> >> > returning a packet pointer with a fixed immutable range. This can be useful to
>> >> >> > enable DPA without having to use memcpy (currently the case in
>> >> >> > bpf_xdp_load_bytes and bpf_xdp_store_bytes).
>> >> >> >
>> >> >> > The intended usage to read and write data for multi-buff XDP is:
>> >> >> >
>> >> >> >         int err = 0;
>> >> >> >         char buf[N];
>> >> >> >
>> >> >> >         off &= 0xffff;
>> >> >> >         ptr = bpf_packet_pointer(ctx, off, sizeof(buf), &err);
>> >> >> >         if (unlikely(!ptr)) {
>> >> >> >                 if (err < 0)
>> >> >> >                         return XDP_ABORTED;
>> >> >> >                 err = bpf_xdp_load_bytes(ctx, off, buf, sizeof(buf));
>> >> >> >                 if (err < 0)
>> >> >> >                         return XDP_ABORTED;
>> >> >> >                 ptr = buf;
>> >> >> >         }
>> >> >> >         ...
>> >> >> >         // Do some stores and loads in [ptr, ptr + N) region
>> >> >> >         ...
>> >> >> >         if (unlikely(ptr == buf)) {
>> >> >> >                 err = bpf_xdp_store_bytes(ctx, off, buf, sizeof(buf));
>> >> >> >                 if (err < 0)
>> >> >> >                         return XDP_ABORTED;
>> >> >> >         }
>> >> >> >
>> >> >> > Note that bpf_packet_pointer returns a PTR_TO_PACKET, not PTR_TO_MEM, because
>> >> >> > these pointers need to be invalidated on clear_all_pkt_pointers invocation, and
>> >> >> > it is also more meaningful to the user to see return value as R0=pkt.
>> >> >> >
>> >> >> > This series is meant to collect feedback on the approach, next version can
>> >> >> > include a bpf_skb_pointer and exposing it as bpf_packet_pointer helper for TC
>> >> >> > hooks, and explore not resetting range to zero on r0 += rX, instead check access
>> >> >> > like check_mem_region_access (var_off + off < range), since there would be no
>> >> >> > data_end to compare against and obtain a new range.
>> >> >> >
>> >> >> > The common name and func_id is supposed to allow writing generic code using
>> >> >> > bpf_packet_pointer that works for both XDP and TC programs.
>> >> >> >
>> >> >> > Please see the individual patches for implementation details.
>> >> >> >
>> >> >>
>> >> >> Joanne is working on a "bpf_dynptr" framework that will support
>> >> >> exactly this feature, in addition to working with dynamically
>> >> >> allocated memory, working with memory of statically unknown size (but
>> >> >> safe and checked at runtime), etc. And all that within a generic
>> >> >> common feature implemented uniformly within the verifier. E.g., it
>> >> >> won't need any of the custom bits of logic added in patch #2 and #3.
>> >> >> So I'm thinking that instead of custom-implementing a partial case of
>> >> >> bpf_dynptr just for skb and xdp packets, let's maybe wait for dynptr
>> >> >> and do it only once there?
>> >> >>
>> >> >
>> >> > Interesting stuff, looking forward to it.
>> >> >
>> >> >> See also my ARG_CONSTANT comment. It seems like a pretty common thing
>> >> >> where input constant is used to characterize some pointer returned
>> >> >> from the helper (e.g., bpf_ringbuf_reserve() case), and we'll need
>> >> >> that for bpf_dynptr for exactly this "give me direct access of N
>> >> >> bytes, if possible" case. So improving/generalizing it now before
>> >> >> dynptr lands makes a lot of sense, outside of bpf_packet_pointer()
>> >> >> feature itself.
>> >> >
>> >> > No worries, we can continue the discussion in patch 1, I'll split out the arg
>> >> > changes into a separate patch, and wait for dynptr to be posted before reworking
>> >> > this.
>> >>
>> >> This does raise the question of what we do in the meantime, though? Your
>> >> patch includes a change to bpf_xdp_{load,store}_bytes() which, if we're
>> >> making it, really has to go in before those hit a release and become
>> >> UAPI.
>> >>
>> >> One option would be to still make the change to those other helpers;
>> >> they'd become a bit slower, but if we have a solution for that coming,
>> >> that may be OK for a single release? WDYT?
>> >
>> > I must have missed important changes to bpf_xdp_{load,store}_bytes().
>> > Does anything change about its behavior? If there are some fixes
>> > specific to those helpers, we should fix them as well as a separate
>> > patch. My main objection is adding a bpf_packet_pointer() special case
>> > when we have a generic mechanism in the works that will come this use
>> > case (among other use cases).
>>
>> Well it's not a functional change per se, but Kartikeya's patch is
>> removing an optimisation from bpf_xdp_{load_store}_bytes() (i.e., the
>> use of the bpf_xdp_pointer()) in favour of making it available directly
>> to BPF. So if we don't do that change before those helpers are
>> finalised, we will end up either introducing a performance regression
>> for code using those helpers, or being stuck with the bpf_xdp_pointer()
>> use inside them even though it makes more sense to move it out to BPF.
>>
>
> So IIUC, the case we're worried about is when a linear region is in head or a
> frag and bpf_xdp_pointer can be used to do a direct memcpy for it. But in my
> testing there doesn't seem to be any difference. With or without the call, the
> time taken e.g. for bpf_xdp_load_bytes lies in the 30-40ns range. It would make
> sense, because for this case the code in bpf_xdp_pointer and bpf_xdp_copy_buf
> are almost the same, just that the latter has a conditional jump out of the loop
> based on len. bpf_xdp_copy_buf is still only doing a single memcpy, the cost
> seems to be dominated by that.
>
> Otoh, removing it would improve the case for the other scenario (when region
> touches two or more frags) because we wouldn't spend time in bpf_xdp_pointer and
> returning NULL from it failing to find a linear region, but that shouldn't be a
> regression.

Yeah, that was basically what I was worried about; thanks for testing!
So this implies that the current use of the bpf_xdp_pointer() helper
function is pretty pointless, right? But at least it's an internal
detail so there's no hurry in fixing it...

-Toke
Alexei Starovoitov March 11, 2022, 5:27 p.m. UTC | #11
On Thu, Mar 10, 2022 at 11:25 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
> >
> > Also I don't believe that this patch set and exposing
> > bpf_xdp_pointer as a helper actually gives measurable performance wins.
> > It looks quirky to me and hard to use.
>
> This is actually inspired from your idea to avoid memcpy when reading and
> writing to multi-buff XDP [0]. But instead of passing in the stack or mem
> pointer (as discussed in that thread), I let the user set it and detect it
> themselves, which makes the implementation simpler.
>
> I am sure accessing a few bytes directly is going to be faster than first
> memcpy'ing it to a local buffer, reading, and then possibly writing things
> back again using a memcpy, but I will be happy to come with some numbers when
> I respin this later, when Joanne posts the dynptr series.
>
> Ofcourse, we could just make return value PTR_TO_MEM even for the 'pass buf
> pointer' idea, but then we have to conservatively invalidate the pointer even if
> it points to stack buffer on clear_all_pkt_pointers. The current approach looked
> better to me.
>
>   [0]: https://lore.kernel.org/bpf/CAADnVQKbrkOxfNoixUx-RLJEWULJLyhqjZ=M_X2cFG_APwNyCg@mail.gmail.com

There is a big difference in what was proposed earlier
vs what you've implemented.
In that proposal bpf_packet_pointer would behave similar to
skb_header_pointer. The kernel developers are familiar with it
and the concept is tested by time.
While your bpf_packet_pointer is different.
It fails on frag boundaries, so the user has to open code
the checks and add explicit bpf_xdp_load_bytes.
I guess you're arguing that it's the same.
My point that it's a big conceptual difference in api.
This kind of difference should be discussed instead
of dumping a patch set. Especially without RFC tag.

Please focus on kptr patches. They were close enough and have
a chance of landing in this release.
This bpf_packet_pointer stuff is certainly not going to make it.