[bpf-next,v2,0/3] Add XDP support for bpf_load_hdr_opt

Message ID	20211006230543.3928580-1-joannekoong@fb.com (mailing list archive)
Headers	show Return-Path: <netdev-owner@kernel.org> From: Joanne Koong <joannekoong@fb.com> To: <bpf@vger.kernel.org> CC: <kafai@fb.com>, <netdev@vger.kernel.org>, <Kernel-team@fb.com>, Joanne Koong <joannekoong@fb.com> Subject: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt Date: Wed, 6 Oct 2021 16:05:40 -0700 Message-ID: <20211006230543.3928580-1-joannekoong@fb.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain Precedence: bulk
Series	Add XDP support for bpf_load_hdr_opt \| expand [bpf-next,v2,0/3] Add XDP support for bpf_load_hdr_opt [bpf-next,v2,1/3] bpf/xdp: Add bpf_load_hdr_opt support for xdp [bpf-next,v2,2/3] bpf/selftests: Rename test_tcp_hdr_options to test_sockops_tcp_hdr_options [bpf-next,v2,3/3] bpf/selftests: Add xdp bpf_load_tcp_hdr_options tests

Joanne Koong Oct. 6, 2021, 11:05 p.m. UTC

Currently, bpf_sockops programs have been using bpf_load_hdr_opt() to
parse the tcp header option. It will be useful to allow other bpf prog
types to have a similar way of handling tcp hdr options.

This series adds XDP support for bpf_load_hdr_opt(). At a high level,
these patches are:

1/3 patch - Add functionality for xdp bpf_load_hdr_opt in net/core.

2/3 patch - Rename existing test_tcp_hdr_options to test_sockops_tcp_hdr_options.

3/3 patch - Add tests for xdp bpf_load_hdr_opt (test_xdp_tcp_hdr_options).

v1 -> v2:
* 1/3 - Change "(flags & 0xffffffffffff)" to
(flags & ((1ULL << BPF_LOAD_HDR_OPT_TCP_OFFSET_SHIFT) - 1))
* 3/3 - Skip check_opt_out on failures
* 3/3 - Change CHECKs to ASSERTs

Joanne Koong (3):
  bpf/xdp: Add bpf_load_hdr_opt support for xdp
  bpf/selftests: Rename test_tcp_hdr_options to
    test_sockops_tcp_hdr_options
  bpf/selftests: Add xdp bpf_load_tcp_hdr_options tests

 include/uapi/linux/bpf.h                      |  26 ++-
 net/core/filter.c                             |  88 ++++++--
 tools/include/uapi/linux/bpf.h                |  26 ++-
 ...dr_options.c => sockops_tcp_hdr_options.c} |  18 +-
 .../bpf/prog_tests/xdp_tcp_hdr_options.c      | 144 +++++++++++++
 ....c => test_sockops_misc_tcp_hdr_options.c} |   0
 ...tions.c => test_sockops_tcp_hdr_options.c} |   0
 .../bpf/progs/test_xdp_tcp_hdr_options.c      | 198 ++++++++++++++++++
 8 files changed, 454 insertions(+), 46 deletions(-)
 rename tools/testing/selftests/bpf/prog_tests/{tcp_hdr_options.c => sockops_tcp_hdr_options.c} (96%)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_tcp_hdr_options.c
 rename tools/testing/selftests/bpf/progs/{test_misc_tcp_hdr_options.c => test_sockops_misc_tcp_hdr_options.c} (100%)
 rename tools/testing/selftests/bpf/progs/{test_tcp_hdr_options.c => test_sockops_tcp_hdr_options.c} (100%)
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_tcp_hdr_options.c

Toke Høiland-Jørgensen Oct. 7, 2021, 2:41 p.m. UTC | #1

Joanne Koong <joannekoong@fb.com> writes:

> Currently, bpf_sockops programs have been using bpf_load_hdr_opt() to
> parse the tcp header option. It will be useful to allow other bpf prog
> types to have a similar way of handling tcp hdr options.
>
> This series adds XDP support for bpf_load_hdr_opt(). At a high level,
> these patches are:

Why is this needed? Why not just parse the header directly in XDP? Seems
a bit arbitrary to add a helper for this particular type of packet
payload parsing to this particular program type. I.e., what about other
headers (IP options?)? Are we going to have a whole bunch of
special-purpose parsing helpers to pick out protocol data from packets?

Also, why only enable this for XDP (and not, say the TC hook as well)?

-Toke

Joanne Koong Oct. 7, 2021, 8:57 p.m. UTC | #2

On 10/7/21 7:41 AM, Toke Høiland-Jørgensen wrote:

> Joanne Koong <joannekoong@fb.com> writes:
>
>> Currently, bpf_sockops programs have been using bpf_load_hdr_opt() to
>> parse the tcp header option. It will be useful to allow other bpf prog
>> types to have a similar way of handling tcp hdr options.
>>
>> This series adds XDP support for bpf_load_hdr_opt(). At a high level,
>> these patches are:
> Why is this needed? Why not just parse the header directly in XDP?
Parsing a variable number of TCP options is challenging for the verifier.
Some programs are using #pragma unroll as a temporary workaround
(https://github.com/xdp-project/bpf-examples/blob/master/pping/pping_kern.c#L95)
I believe Christian Deacon also recently posted about this on the xdp 
mailing list
with a link to his bpf fail logs in 
https://github.com/gamemann/XDP-TCP-Header-Options
which showcases some of the difficulties involved

> Seems
> a bit arbitrary to add a helper for this particular type of packet
> payload parsing to this particular program type. I.e., what about other
> headers (IP options?)?
The current use case needs so far have been for parsing tcp headers, but
in the future, when there are needs for parsing other types, they
can be supported as well through bpf_load_hdr_opt.

> Are we going to have a whole bunch of
> special-purpose parsing helpers to pick out protocol data from packets?

I think bpf_load_hdr_opt is generic enough to support parsing
any kind of protocol data (as specified through flags) in the packets
> Also, why only enable this for XDP (and not, say the TC hook as well)?
The plan is to also support this in tc as well (this will be in a separate
patchset)
> -Toke
>

Daniel Borkmann Oct. 7, 2021, 9:25 p.m. UTC | #3

On 10/7/21 10:57 PM, Joanne Koong wrote:
> On 10/7/21 7:41 AM, Toke Høiland-Jørgensen wrote:
>> Joanne Koong <joannekoong@fb.com> writes:
>>
>>> Currently, bpf_sockops programs have been using bpf_load_hdr_opt() to
>>> parse the tcp header option. It will be useful to allow other bpf prog
>>> types to have a similar way of handling tcp hdr options.
>>>
>>> This series adds XDP support for bpf_load_hdr_opt(). At a high level,
>>> these patches are:
>> Why is this needed? Why not just parse the header directly in XDP?
> Parsing a variable number of TCP options is challenging for the verifier.
> Some programs are using #pragma unroll as a temporary workaround
> (https://github.com/xdp-project/bpf-examples/blob/master/pping/pping_kern.c#L95)
> I believe Christian Deacon also recently posted about this on the xdp mailing list
> with a link to his bpf fail logs in https://github.com/gamemann/XDP-TCP-Header-Options
> which showcases some of the difficulties involved
> 
>> Seems
>> a bit arbitrary to add a helper for this particular type of packet
>> payload parsing to this particular program type. I.e., what about other
>> headers (IP options?)?
> The current use case needs so far have been for parsing tcp headers, but
> in the future, when there are needs for parsing other types, they
> can be supported as well through bpf_load_hdr_opt.
> 
>> Are we going to have a whole bunch of
>> special-purpose parsing helpers to pick out protocol data from packets?
> 
> I think bpf_load_hdr_opt is generic enough to support parsing
> any kind of protocol data (as specified through flags) in the packets

I tend to agree with Toke here that this is not generic. What has been tried
to improve the verifier instead before submitting the series? It would be much
more preferable to improve the developer experience with regards to a generic
solution, so that other/similar problems can be tackled in one go as well such
as IP options, extension headers, etc.

>> Also, why only enable this for XDP (and not, say the TC hook as well)?
> The plan is to also support this in tc as well (this will be in a separate
> patchset)

Martin KaFai Lau Oct. 7, 2021, 11:52 p.m. UTC | #4

On Thu, Oct 07, 2021 at 11:25:29PM +0200, Daniel Borkmann wrote:
> I tend to agree with Toke here that this is not generic. What has been tried
> to improve the verifier instead before submitting the series? It would be much
> more preferable to improve the developer experience with regards to a generic
> solution, so that other/similar problems can be tackled in one go as well such
> as IP options, extension headers, etc.
It would be nice to improve verifier to recognize it more smoothly.  Would
love to hear idea how to do it.

When adding the tcp header options for bpf_sockops, a bpf_store_hdr_opt()
is needed to ensure the header option is sane.  When writing test to parse
variable length header option, I also pulled in tricks (e.g. "#pragma unroll"
is easier to get it work.  Tried bounded loop but then hits max insns and
then moved some cases into subprog...etc).  Most (if not all) TCP headers
has some options (e.g. tstamp), so it will be useful to have an easy way
to search a particular option and bpf_load_hdr_opt() was also added to
bpf_sockops.

When bpf-tcp-hdr-opt was added, the initial use case was only useful in the
TCP stack because it wants to change the behavior of a tcp_sock.  When
bpf allows to add tcp-hdr-opt easily at tcp stack, it opens up other use
cases and one of them is to load-balance by the tcp-hdr-opt "server_id" in
XDP [1].  The same user that writes the bpf_sockops prog to add/parse
tcp-opt needs to do extra tricks to get this work in xdp prog.  The user
had repeated a similar try-and-error exercise (e.g. while the logical thinking
is to somehow bounded by the max tcp header size but that does not work
in unroll.  With the current cases in the loop, 15 is the max
magic number that works and hoping it will continue to work).

[1]: https://linuxplumbersconf.org/event/11/contributions/950/

Toke Høiland-Jørgensen Oct. 8, 2021, 10:20 p.m. UTC | #5

Martin KaFai Lau <kafai@fb.com> writes:

> On Thu, Oct 07, 2021 at 11:25:29PM +0200, Daniel Borkmann wrote:
>> I tend to agree with Toke here that this is not generic. What has been tried
>> to improve the verifier instead before submitting the series? It would be much
>> more preferable to improve the developer experience with regards to a generic
>> solution, so that other/similar problems can be tackled in one go as well such
>> as IP options, extension headers, etc.
> It would be nice to improve verifier to recognize it more smoothly.  Would
> love to hear idea how to do it.

So as far as I could tell, the verifier blows up in part because when
there's multiple bounded loops in sequence the verifier gets into a
combinatorial explosion of exploring all paths through the first loop
combined with all paths through the second. So if we could teach the
verifier to recognise that each loop is a separate entity to avoid this,
I think looping through headers would be a lot easier.

As you can probably tell, though, there is quite a bit of handwaving in
the above, and I have no idea how to actually do this. Some kind of
invariant analysis, maybe? But is this possible in general?

> When adding the tcp header options for bpf_sockops, a bpf_store_hdr_opt()
> is needed to ensure the header option is sane.  When writing test to parse
> variable length header option, I also pulled in tricks (e.g. "#pragma unroll"
> is easier to get it work.  Tried bounded loop but then hits max insns and
> then moved some cases into subprog...etc).  Most (if not all) TCP headers
> has some options (e.g. tstamp), so it will be useful to have an easy way
> to search a particular option and bpf_load_hdr_opt() was also added to
> bpf_sockops.

So if we can't fix the verifier, maybe we could come up with a more
general helper for packet parsing? Something like:

bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
{
  ptr = ctx->data + offset;
  while (ptr < ctx->data_end) {
    offset = callback_fn(ptr, ctx->data_end, callback_arg);
    if (offset == 0)
      return 0;
    ptr += offset;
  }

  // out of bounds before callback was done
  return -EINVAL;
}

This would work for parsing any kind of packet header or TLV-style data
without having to teach the kernel about each header type. It'll have
quite a bit of overhead if all the callbacks happen via indirect calls,
but maybe the verifier can inline the calls (or at least turn them into
direct CALL instructions)?

-Toke

Martin KaFai Lau Oct. 11, 2021, 6:43 p.m. UTC | #6

On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
> So if we can't fix the verifier, maybe we could come up with a more
> general helper for packet parsing? Something like:
> 
> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
> {
>   ptr = ctx->data + offset;
>   while (ptr < ctx->data_end) {
>     offset = callback_fn(ptr, ctx->data_end, callback_arg);
>     if (offset == 0)
>       return 0;
>     ptr += offset;
>   }
>   
>   // out of bounds before callback was done
>   return -EINVAL;
> }
>    
> This would work for parsing any kind of packet header or TLV-style data
> without having to teach the kernel about each header type. It'll have
> quite a bit of overhead if all the callbacks happen via indirect calls,
> but maybe the verifier can inline the calls (or at least turn them into
> direct CALL instructions)?
Direct call different callback_fn?  bpf_for_each_pkt_chunk() is a kernel
function.  It would be nice if the verifier could do that.

This for_each helper had been considered also.  Other than the need
to callback in a loop, the thought was to extend the existing
bpf_load_hdr_opt() because our initial feedback is the same
header handling logic cannot be used in xdp which is confusing.

I don't mind to go with the for_each helper.  However, with another
thought, if it needs to call a function in the loop anyway, I think
it could also be done in bpf by putting a global function in a loop.
Need to try and double check.

Toke Høiland-Jørgensen Oct. 12, 2021, 2:11 p.m. UTC | #7

Martin KaFai Lau <kafai@fb.com> writes:

> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>> So if we can't fix the verifier, maybe we could come up with a more
>> general helper for packet parsing? Something like:
>> 
>> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
>> {
>>   ptr = ctx->data + offset;
>>   while (ptr < ctx->data_end) {
>>     offset = callback_fn(ptr, ctx->data_end, callback_arg);
>>     if (offset == 0)
>>       return 0;
>>     ptr += offset;
>>   }
>>   
>>   // out of bounds before callback was done
>>   return -EINVAL;
>> }
>>    
>> This would work for parsing any kind of packet header or TLV-style data
>> without having to teach the kernel about each header type. It'll have
>> quite a bit of overhead if all the callbacks happen via indirect calls,
>> but maybe the verifier can inline the calls (or at least turn them into
>> direct CALL instructions)?
> Direct call different callback_fn?  bpf_for_each_pkt_chunk() is a kernel
> function.  It would be nice if the verifier could do that.

Ohh, right, think-o on my part. It could be done if the helper was
inlined in its entirety, though? Not sure if that's feasible?

> This for_each helper had been considered also. Other than the need to
> callback in a loop, the thought was to extend the existing
> bpf_load_hdr_opt() because our initial feedback is the same header
> handling logic cannot be used in xdp which is confusing.

TBH, I had not noticed this helper before. Now that I have, it does
seems like the kind of thing that belongs as a BPF library function
rather than a helper in the first place :)

> I don't mind to go with the for_each helper.  However, with another
> thought, if it needs to call a function in the loop anyway, I think
> it could also be done in bpf by putting a global function in a loop.
> Need to try and double check.

Hmm, that would be interesting if possible!

-Toke

Joanne Koong Oct. 12, 2021, 8:51 p.m. UTC | #8

On 10/12/21 7:11 AM, Toke Høiland-Jørgensen wrote:

> Martin KaFai Lau <kafai@fb.com> writes:
>> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>> [...] 
>> I don't mind to go with the for_each helper.  However, with another
>> thought, if it needs to call a function in the loop anyway, I think
>> it could also be done in bpf by putting a global function in a loop.
>> Need to try and double check.
> Hmm, that would be interesting if possible!
>
> -Toke
>
Martin and I tried this out. When we moved out the logic for parsing the
options into a global non-inlined function, the verifier approved the
program.

As such, I will abandon this patchset and submit a new separate patch
that will add a test to ensure we're always able to parse header options
through a global function.

Thanks for the conversation on this, Toke, Martin, and Daniel!

Toke Høiland-Jørgensen Oct. 13, 2021, 10:19 a.m. UTC | #9

Joanne Koong <joannekoong@fb.com> writes:

> On 10/12/21 7:11 AM, Toke Høiland-Jørgensen wrote:
>
>> Martin KaFai Lau <kafai@fb.com> writes:
>>> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>>> [...] 
>>> I don't mind to go with the for_each helper.  However, with another
>>> thought, if it needs to call a function in the loop anyway, I think
>>> it could also be done in bpf by putting a global function in a loop.
>>> Need to try and double check.
>> Hmm, that would be interesting if possible!
>>
>> -Toke
>>
> Martin and I tried this out. When we moved out the logic for parsing the
> options into a global non-inlined function, the verifier approved the
> program.

Excellent!

> As such, I will abandon this patchset and submit a new separate patch
> that will add a test to ensure we're always able to parse header options
> through a global function.
>
> Thanks for the conversation on this, Toke, Martin, and Daniel!

Sounds good - you're welcome :)

-Toke

Alexei Starovoitov Oct. 19, 2021, midnight UTC | #10

On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
> 
> So if we can't fix the verifier, maybe we could come up with a more
> general helper for packet parsing? Something like:
> 
> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
> {
>   ptr = ctx->data + offset;
>   while (ptr < ctx->data_end) {
>     offset = callback_fn(ptr, ctx->data_end, callback_arg);
>     if (offset == 0)
>       return 0;
>     ptr += offset;
>   }
>   
>   // out of bounds before callback was done
>   return -EINVAL;
> }

We're starting to work on this since it will be useful not only for
packet parsing, TLV parsing, but potentially any kind of 'for' loop iteration.

> This would work for parsing any kind of packet header or TLV-style data
> without having to teach the kernel about each header type. It'll have
> quite a bit of overhead if all the callbacks happen via indirect calls,
> but maybe the verifier can inline the calls (or at least turn them into
> direct CALL instructions)?

Right. That's the main downside.
If the bpf_for_each*() helper is simple enough the verifier can inline it
similar to map_gen_lookup. In such case the indirect call will be a direct call,
so the overhead won't be that bad, but it's still a function call and
static function will have full prologue+epilogue.
Converting static function into direct jump would be really challenging
for the verifier and won't provide much benefit, since r6-r9 save/restore
would need to happen anyway even for such 'inlined' static func, since
llvm will be freely using r6-r9 for insns inside function body
assuming that it's a normal function.

May be there is a way to avoid call overhead with with clang extensions.
If we want to do:
int mem_eq(char *p1, char *p2, int size)
{
  int i;
  for (i = 0; i < size; i++)
    if (p1[i] != p2[i])
      return 0;
  return 1;
}

With clang extension we might write it as:
int bpf_mem_eq(char *p1, char *p2, int size)
{
  int i = 0;
  int iter;

  iter = __builtin_for_until(i, size, ({
      if (p1[i] != p2[i])
        goto out;
  }));
  out:
  if (iter != size)
    return 0;
  return 1;
}

The llvm will generate pseudo insns for this __builtin_for.
The verifier will analyze the loop body for the range [0, size)
and replace pseudo insns with normal branches after the verification.
We might even keep the normal C syntax for loops and use
llvm HardwareLoops pass to add pseudo insns.
It's more or less the same ideas for loops we discussed before
bounded loops were introduced.
The main problem with bounded loops is that the loop body will
typically be verified the number of times equal to the number of iterations.
So for any non-trivial loop such iteration count is not much more
than 100. The verifier can do scalar evolution analysis, but
it's likely won't work for many cases and user experience
will suffer. With __builtin_for the scalar evolution is not necessary,
since induction variable is one and explicit and its range is explicit too.
That enables single pass over loop body.
One might argue that for (i = 0; i < 10000; i += 10) loops are
necessary too, but instead of complicating the verifier with sparse
ranges it's better to put that on users that can do:
  iter = __builtin_for_until(i, 10000 / 10, ({
      j = i * 10;
      use j;
  }));
Single explicit induction variable makes the verification practical.
The loop body won't be as heavily optimized as normal loop,
but it's a good thing.

Yonghong Song Oct. 19, 2021, 4:02 p.m. UTC | #11

On 10/18/21 5:00 PM, Alexei Starovoitov wrote:
> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>>
>> So if we can't fix the verifier, maybe we could come up with a more
>> general helper for packet parsing? Something like:
>>
>> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
>> {
>>    ptr = ctx->data + offset;
>>    while (ptr < ctx->data_end) {
>>      offset = callback_fn(ptr, ctx->data_end, callback_arg);
>>      if (offset == 0)
>>        return 0;
>>      ptr += offset;
>>    }
>>    
>>    // out of bounds before callback was done
>>    return -EINVAL;
>> }
> 
> We're starting to work on this since it will be useful not only for
> packet parsing, TLV parsing, but potentially any kind of 'for' loop iteration.
> 
>> This would work for parsing any kind of packet header or TLV-style data
>> without having to teach the kernel about each header type. It'll have
>> quite a bit of overhead if all the callbacks happen via indirect calls,
>> but maybe the verifier can inline the calls (or at least turn them into
>> direct CALL instructions)?
> 
> Right. That's the main downside.
> If the bpf_for_each*() helper is simple enough the verifier can inline it
> similar to map_gen_lookup. In such case the indirect call will be a direct call,
> so the overhead won't be that bad, but it's still a function call and
> static function will have full prologue+epilogue.
> Converting static function into direct jump would be really challenging
> for the verifier and won't provide much benefit, since r6-r9 save/restore
> would need to happen anyway even for such 'inlined' static func, since
> llvm will be freely using r6-r9 for insns inside function body
> assuming that it's a normal function.
> 
> May be there is a way to avoid call overhead with with clang extensions.
> If we want to do:
> int mem_eq(char *p1, char *p2, int size)
> {
>    int i;
>    for (i = 0; i < size; i++)
>      if (p1[i] != p2[i])
>        return 0;
>    return 1;
> }
> 
> With clang extension we might write it as:
> int bpf_mem_eq(char *p1, char *p2, int size)
> {
>    int i = 0;
>    int iter;
> 
>    iter = __builtin_for_until(i, size, ({
>        if (p1[i] != p2[i])
>          goto out;
>    }));
>    out:
>    if (iter != size)
>      return 0;
>    return 1;
> }
> 
> The llvm will generate pseudo insns for this __builtin_for.
> The verifier will analyze the loop body for the range [0, size)
> and replace pseudo insns with normal branches after the verification.
> We might even keep the normal C syntax for loops and use
> llvm HardwareLoops pass to add pseudo insns.
> It's more or less the same ideas for loops we discussed before
> bounded loops were introduced.
> The main problem with bounded loops is that the loop body will
> typically be verified the number of times equal to the number of iterations.
> So for any non-trivial loop such iteration count is not much more
> than 100. The verifier can do scalar evolution analysis, but
> it's likely won't work for many cases and user experience
> will suffer. With __builtin_for the scalar evolution is not necessary,
> since induction variable is one and explicit and its range is explicit too.
> That enables single pass over loop body.
> One might argue that for (i = 0; i < 10000; i += 10) loops are
> necessary too, but instead of complicating the verifier with sparse
> ranges it's better to put that on users that can do:
>    iter = __builtin_for_until(i, 10000 / 10, ({
>        j = i * 10;
>        use j;
>    }));
> Single explicit induction variable makes the verification practical.
> The loop body won't be as heavily optimized as normal loop,
> but it's a good thing.

We have discussed how to verify *well-formed* loops back in 2018.
(BPF control flow, supporting loops and other patterns:
  https://www.linuxplumbersconf.org/event/2/contributions/116/)
Now probably the time to revisit it again!

I think Alexei's proposal is the right direction to have
compiler preserving the loop structure with some pseudo instructions
and verifier is able to range-based verification
instead of iterating all loop iterations.

LLVM already has some IR loop intrinsics like below:
   def int_set_loop_iterations :
   def int_start_loop_iterations :
   def int_test_set_loop_iterations :
   def int_test_start_loop_iterations :
   def int_loop_decrement :
   def int_loop_decrement_reg :
to facilitate hardware loop. BPF target can define its own
intrinsics to help define a well structured loop.

I will look into this.

Toke Høiland-Jørgensen Oct. 19, 2021, 4:10 p.m. UTC | #12

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>> 
>> So if we can't fix the verifier, maybe we could come up with a more
>> general helper for packet parsing? Something like:
>> 
>> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
>> {
>>   ptr = ctx->data + offset;
>>   while (ptr < ctx->data_end) {
>>     offset = callback_fn(ptr, ctx->data_end, callback_arg);
>>     if (offset == 0)
>>       return 0;
>>     ptr += offset;
>>   }
>>   
>>   // out of bounds before callback was done
>>   return -EINVAL;
>> }
>
> We're starting to work on this since it will be useful not only for
> packet parsing, TLV parsing, but potentially any kind of 'for' loop iteration.

Awesome! :)

>> This would work for parsing any kind of packet header or TLV-style data
>> without having to teach the kernel about each header type. It'll have
>> quite a bit of overhead if all the callbacks happen via indirect calls,
>> but maybe the verifier can inline the calls (or at least turn them into
>> direct CALL instructions)?
>
> Right. That's the main downside.
> If the bpf_for_each*() helper is simple enough the verifier can inline it
> similar to map_gen_lookup. In such case the indirect call will be a direct call,
> so the overhead won't be that bad, but it's still a function call and
> static function will have full prologue+epilogue.
> Converting static function into direct jump would be really challenging
> for the verifier and won't provide much benefit, since r6-r9 save/restore
> would need to happen anyway even for such 'inlined' static func, since
> llvm will be freely using r6-r9 for insns inside function body
> assuming that it's a normal function.

I reckon it could be acceptable to have the overhead of a regular
function call per iteration, but obviously it would be better to avoid
it.

> May be there is a way to avoid call overhead with with clang extensions.
> If we want to do:
> int mem_eq(char *p1, char *p2, int size)
> {
>   int i;
>   for (i = 0; i < size; i++)
>     if (p1[i] != p2[i])
>       return 0;
>   return 1;
> }
>
> With clang extension we might write it as:
> int bpf_mem_eq(char *p1, char *p2, int size)
> {
>   int i = 0;
>   int iter;
>
>   iter = __builtin_for_until(i, size, ({
>       if (p1[i] != p2[i])
>         goto out;
>   }));
>   out:
>   if (iter != size)
>     return 0;
>   return 1;
> }
>
> The llvm will generate pseudo insns for this __builtin_for.
> The verifier will analyze the loop body for the range [0, size)
> and replace pseudo insns with normal branches after the verification.

That seems like an interesting approach! The __builtin_for thing is a
little awkward, but not more than turning the loop into a separate
function + callback.

What about backwards compatibility? Would you have to ensure your kernel
understands the loop instructions before you put them into your code, or
could libbpf be taught to rewrite them if the kernel doesn't understand
them (say, to a separate function that is called in a regular bounded
loop)?

> We might even keep the normal C syntax for loops and use
> llvm HardwareLoops pass to add pseudo insns.

Now *this* would be cool!

> It's more or less the same ideas for loops we discussed before
> bounded loops were introduced.

Why was it rejected at the time?

> The main problem with bounded loops is that the loop body will
> typically be verified the number of times equal to the number of iterations.
> So for any non-trivial loop such iteration count is not much more
> than 100. The verifier can do scalar evolution analysis, but
> it's likely won't work for many cases and user experience
> will suffer. With __builtin_for the scalar evolution is not necessary,
> since induction variable is one and explicit and its range is explicit too.
> That enables single pass over loop body.
> One might argue that for (i = 0; i < 10000; i += 10) loops are
> necessary too, but instead of complicating the verifier with sparse
> ranges it's better to put that on users that can do:
>   iter = __builtin_for_until(i, 10000 / 10, ({
>       j = i * 10;
>       use j;
>   }));
> Single explicit induction variable makes the verification practical.
> The loop body won't be as heavily optimized as normal loop,
> but it's a good thing.

Agreed, limiting things to single-step loops would be acceptable.

-Toke

[bpf-next,v2,0/3] Add XDP support for bpf_load_hdr_opt

Message

Comments