Message ID | 20211006230543.3928580-1-joannekoong@fb.com (mailing list archive) |
---|---|
Headers | show |
Series | Add XDP support for bpf_load_hdr_opt | expand |
Joanne Koong <joannekoong@fb.com> writes: > Currently, bpf_sockops programs have been using bpf_load_hdr_opt() to > parse the tcp header option. It will be useful to allow other bpf prog > types to have a similar way of handling tcp hdr options. > > This series adds XDP support for bpf_load_hdr_opt(). At a high level, > these patches are: Why is this needed? Why not just parse the header directly in XDP? Seems a bit arbitrary to add a helper for this particular type of packet payload parsing to this particular program type. I.e., what about other headers (IP options?)? Are we going to have a whole bunch of special-purpose parsing helpers to pick out protocol data from packets? Also, why only enable this for XDP (and not, say the TC hook as well)? -Toke
On 10/7/21 7:41 AM, Toke Høiland-Jørgensen wrote: > Joanne Koong <joannekoong@fb.com> writes: > >> Currently, bpf_sockops programs have been using bpf_load_hdr_opt() to >> parse the tcp header option. It will be useful to allow other bpf prog >> types to have a similar way of handling tcp hdr options. >> >> This series adds XDP support for bpf_load_hdr_opt(). At a high level, >> these patches are: > Why is this needed? Why not just parse the header directly in XDP? Parsing a variable number of TCP options is challenging for the verifier. Some programs are using #pragma unroll as a temporary workaround (https://github.com/xdp-project/bpf-examples/blob/master/pping/pping_kern.c#L95) I believe Christian Deacon also recently posted about this on the xdp mailing list with a link to his bpf fail logs in https://github.com/gamemann/XDP-TCP-Header-Options which showcases some of the difficulties involved > Seems > a bit arbitrary to add a helper for this particular type of packet > payload parsing to this particular program type. I.e., what about other > headers (IP options?)? The current use case needs so far have been for parsing tcp headers, but in the future, when there are needs for parsing other types, they can be supported as well through bpf_load_hdr_opt. > Are we going to have a whole bunch of > special-purpose parsing helpers to pick out protocol data from packets? I think bpf_load_hdr_opt is generic enough to support parsing any kind of protocol data (as specified through flags) in the packets > Also, why only enable this for XDP (and not, say the TC hook as well)? The plan is to also support this in tc as well (this will be in a separate patchset) > -Toke >
On 10/7/21 10:57 PM, Joanne Koong wrote: > On 10/7/21 7:41 AM, Toke Høiland-Jørgensen wrote: >> Joanne Koong <joannekoong@fb.com> writes: >> >>> Currently, bpf_sockops programs have been using bpf_load_hdr_opt() to >>> parse the tcp header option. It will be useful to allow other bpf prog >>> types to have a similar way of handling tcp hdr options. >>> >>> This series adds XDP support for bpf_load_hdr_opt(). At a high level, >>> these patches are: >> Why is this needed? Why not just parse the header directly in XDP? > Parsing a variable number of TCP options is challenging for the verifier. > Some programs are using #pragma unroll as a temporary workaround > (https://github.com/xdp-project/bpf-examples/blob/master/pping/pping_kern.c#L95) > I believe Christian Deacon also recently posted about this on the xdp mailing list > with a link to his bpf fail logs in https://github.com/gamemann/XDP-TCP-Header-Options > which showcases some of the difficulties involved > >> Seems >> a bit arbitrary to add a helper for this particular type of packet >> payload parsing to this particular program type. I.e., what about other >> headers (IP options?)? > The current use case needs so far have been for parsing tcp headers, but > in the future, when there are needs for parsing other types, they > can be supported as well through bpf_load_hdr_opt. > >> Are we going to have a whole bunch of >> special-purpose parsing helpers to pick out protocol data from packets? > > I think bpf_load_hdr_opt is generic enough to support parsing > any kind of protocol data (as specified through flags) in the packets I tend to agree with Toke here that this is not generic. What has been tried to improve the verifier instead before submitting the series? It would be much more preferable to improve the developer experience with regards to a generic solution, so that other/similar problems can be tackled in one go as well such as IP options, extension headers, etc. >> Also, why only enable this for XDP (and not, say the TC hook as well)? > The plan is to also support this in tc as well (this will be in a separate > patchset)
On Thu, Oct 07, 2021 at 11:25:29PM +0200, Daniel Borkmann wrote: > I tend to agree with Toke here that this is not generic. What has been tried > to improve the verifier instead before submitting the series? It would be much > more preferable to improve the developer experience with regards to a generic > solution, so that other/similar problems can be tackled in one go as well such > as IP options, extension headers, etc. It would be nice to improve verifier to recognize it more smoothly. Would love to hear idea how to do it. When adding the tcp header options for bpf_sockops, a bpf_store_hdr_opt() is needed to ensure the header option is sane. When writing test to parse variable length header option, I also pulled in tricks (e.g. "#pragma unroll" is easier to get it work. Tried bounded loop but then hits max insns and then moved some cases into subprog...etc). Most (if not all) TCP headers has some options (e.g. tstamp), so it will be useful to have an easy way to search a particular option and bpf_load_hdr_opt() was also added to bpf_sockops. When bpf-tcp-hdr-opt was added, the initial use case was only useful in the TCP stack because it wants to change the behavior of a tcp_sock. When bpf allows to add tcp-hdr-opt easily at tcp stack, it opens up other use cases and one of them is to load-balance by the tcp-hdr-opt "server_id" in XDP [1]. The same user that writes the bpf_sockops prog to add/parse tcp-opt needs to do extra tricks to get this work in xdp prog. The user had repeated a similar try-and-error exercise (e.g. while the logical thinking is to somehow bounded by the max tcp header size but that does not work in unroll. With the current cases in the loop, 15 is the max magic number that works and hoping it will continue to work). [1]: https://linuxplumbersconf.org/event/11/contributions/950/
Martin KaFai Lau <kafai@fb.com> writes: > On Thu, Oct 07, 2021 at 11:25:29PM +0200, Daniel Borkmann wrote: >> I tend to agree with Toke here that this is not generic. What has been tried >> to improve the verifier instead before submitting the series? It would be much >> more preferable to improve the developer experience with regards to a generic >> solution, so that other/similar problems can be tackled in one go as well such >> as IP options, extension headers, etc. > It would be nice to improve verifier to recognize it more smoothly. Would > love to hear idea how to do it. So as far as I could tell, the verifier blows up in part because when there's multiple bounded loops in sequence the verifier gets into a combinatorial explosion of exploring all paths through the first loop combined with all paths through the second. So if we could teach the verifier to recognise that each loop is a separate entity to avoid this, I think looping through headers would be a lot easier. As you can probably tell, though, there is quite a bit of handwaving in the above, and I have no idea how to actually do this. Some kind of invariant analysis, maybe? But is this possible in general? > When adding the tcp header options for bpf_sockops, a bpf_store_hdr_opt() > is needed to ensure the header option is sane. When writing test to parse > variable length header option, I also pulled in tricks (e.g. "#pragma unroll" > is easier to get it work. Tried bounded loop but then hits max insns and > then moved some cases into subprog...etc). Most (if not all) TCP headers > has some options (e.g. tstamp), so it will be useful to have an easy way > to search a particular option and bpf_load_hdr_opt() was also added to > bpf_sockops. So if we can't fix the verifier, maybe we could come up with a more general helper for packet parsing? Something like: bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg) { ptr = ctx->data + offset; while (ptr < ctx->data_end) { offset = callback_fn(ptr, ctx->data_end, callback_arg); if (offset == 0) return 0; ptr += offset; } // out of bounds before callback was done return -EINVAL; } This would work for parsing any kind of packet header or TLV-style data without having to teach the kernel about each header type. It'll have quite a bit of overhead if all the callbacks happen via indirect calls, but maybe the verifier can inline the calls (or at least turn them into direct CALL instructions)? -Toke
On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote: > So if we can't fix the verifier, maybe we could come up with a more > general helper for packet parsing? Something like: > > bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg) > { > ptr = ctx->data + offset; > while (ptr < ctx->data_end) { > offset = callback_fn(ptr, ctx->data_end, callback_arg); > if (offset == 0) > return 0; > ptr += offset; > } > > // out of bounds before callback was done > return -EINVAL; > } > > This would work for parsing any kind of packet header or TLV-style data > without having to teach the kernel about each header type. It'll have > quite a bit of overhead if all the callbacks happen via indirect calls, > but maybe the verifier can inline the calls (or at least turn them into > direct CALL instructions)? Direct call different callback_fn? bpf_for_each_pkt_chunk() is a kernel function. It would be nice if the verifier could do that. This for_each helper had been considered also. Other than the need to callback in a loop, the thought was to extend the existing bpf_load_hdr_opt() because our initial feedback is the same header handling logic cannot be used in xdp which is confusing. I don't mind to go with the for_each helper. However, with another thought, if it needs to call a function in the loop anyway, I think it could also be done in bpf by putting a global function in a loop. Need to try and double check.
Martin KaFai Lau <kafai@fb.com> writes: > On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote: >> So if we can't fix the verifier, maybe we could come up with a more >> general helper for packet parsing? Something like: >> >> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg) >> { >> ptr = ctx->data + offset; >> while (ptr < ctx->data_end) { >> offset = callback_fn(ptr, ctx->data_end, callback_arg); >> if (offset == 0) >> return 0; >> ptr += offset; >> } >> >> // out of bounds before callback was done >> return -EINVAL; >> } >> >> This would work for parsing any kind of packet header or TLV-style data >> without having to teach the kernel about each header type. It'll have >> quite a bit of overhead if all the callbacks happen via indirect calls, >> but maybe the verifier can inline the calls (or at least turn them into >> direct CALL instructions)? > Direct call different callback_fn? bpf_for_each_pkt_chunk() is a kernel > function. It would be nice if the verifier could do that. Ohh, right, think-o on my part. It could be done if the helper was inlined in its entirety, though? Not sure if that's feasible? > This for_each helper had been considered also. Other than the need to > callback in a loop, the thought was to extend the existing > bpf_load_hdr_opt() because our initial feedback is the same header > handling logic cannot be used in xdp which is confusing. TBH, I had not noticed this helper before. Now that I have, it does seems like the kind of thing that belongs as a BPF library function rather than a helper in the first place :) > I don't mind to go with the for_each helper. However, with another > thought, if it needs to call a function in the loop anyway, I think > it could also be done in bpf by putting a global function in a loop. > Need to try and double check. Hmm, that would be interesting if possible! -Toke
On 10/12/21 7:11 AM, Toke Høiland-Jørgensen wrote: > Martin KaFai Lau <kafai@fb.com> writes: >> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote: >> [...] >> I don't mind to go with the for_each helper. However, with another >> thought, if it needs to call a function in the loop anyway, I think >> it could also be done in bpf by putting a global function in a loop. >> Need to try and double check. > Hmm, that would be interesting if possible! > > -Toke > Martin and I tried this out. When we moved out the logic for parsing the options into a global non-inlined function, the verifier approved the program. As such, I will abandon this patchset and submit a new separate patch that will add a test to ensure we're always able to parse header options through a global function. Thanks for the conversation on this, Toke, Martin, and Daniel!
Joanne Koong <joannekoong@fb.com> writes: > On 10/12/21 7:11 AM, Toke Høiland-Jørgensen wrote: > >> Martin KaFai Lau <kafai@fb.com> writes: >>> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote: >>> [...] >>> I don't mind to go with the for_each helper. However, with another >>> thought, if it needs to call a function in the loop anyway, I think >>> it could also be done in bpf by putting a global function in a loop. >>> Need to try and double check. >> Hmm, that would be interesting if possible! >> >> -Toke >> > Martin and I tried this out. When we moved out the logic for parsing the > options into a global non-inlined function, the verifier approved the > program. Excellent! > As such, I will abandon this patchset and submit a new separate patch > that will add a test to ensure we're always able to parse header options > through a global function. > > Thanks for the conversation on this, Toke, Martin, and Daniel! Sounds good - you're welcome :) -Toke
On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote: > > So if we can't fix the verifier, maybe we could come up with a more > general helper for packet parsing? Something like: > > bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg) > { > ptr = ctx->data + offset; > while (ptr < ctx->data_end) { > offset = callback_fn(ptr, ctx->data_end, callback_arg); > if (offset == 0) > return 0; > ptr += offset; > } > > // out of bounds before callback was done > return -EINVAL; > } We're starting to work on this since it will be useful not only for packet parsing, TLV parsing, but potentially any kind of 'for' loop iteration. > This would work for parsing any kind of packet header or TLV-style data > without having to teach the kernel about each header type. It'll have > quite a bit of overhead if all the callbacks happen via indirect calls, > but maybe the verifier can inline the calls (or at least turn them into > direct CALL instructions)? Right. That's the main downside. If the bpf_for_each*() helper is simple enough the verifier can inline it similar to map_gen_lookup. In such case the indirect call will be a direct call, so the overhead won't be that bad, but it's still a function call and static function will have full prologue+epilogue. Converting static function into direct jump would be really challenging for the verifier and won't provide much benefit, since r6-r9 save/restore would need to happen anyway even for such 'inlined' static func, since llvm will be freely using r6-r9 for insns inside function body assuming that it's a normal function. May be there is a way to avoid call overhead with with clang extensions. If we want to do: int mem_eq(char *p1, char *p2, int size) { int i; for (i = 0; i < size; i++) if (p1[i] != p2[i]) return 0; return 1; } With clang extension we might write it as: int bpf_mem_eq(char *p1, char *p2, int size) { int i = 0; int iter; iter = __builtin_for_until(i, size, ({ if (p1[i] != p2[i]) goto out; })); out: if (iter != size) return 0; return 1; } The llvm will generate pseudo insns for this __builtin_for. The verifier will analyze the loop body for the range [0, size) and replace pseudo insns with normal branches after the verification. We might even keep the normal C syntax for loops and use llvm HardwareLoops pass to add pseudo insns. It's more or less the same ideas for loops we discussed before bounded loops were introduced. The main problem with bounded loops is that the loop body will typically be verified the number of times equal to the number of iterations. So for any non-trivial loop such iteration count is not much more than 100. The verifier can do scalar evolution analysis, but it's likely won't work for many cases and user experience will suffer. With __builtin_for the scalar evolution is not necessary, since induction variable is one and explicit and its range is explicit too. That enables single pass over loop body. One might argue that for (i = 0; i < 10000; i += 10) loops are necessary too, but instead of complicating the verifier with sparse ranges it's better to put that on users that can do: iter = __builtin_for_until(i, 10000 / 10, ({ j = i * 10; use j; })); Single explicit induction variable makes the verification practical. The loop body won't be as heavily optimized as normal loop, but it's a good thing.
On 10/18/21 5:00 PM, Alexei Starovoitov wrote: > On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote: >> >> So if we can't fix the verifier, maybe we could come up with a more >> general helper for packet parsing? Something like: >> >> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg) >> { >> ptr = ctx->data + offset; >> while (ptr < ctx->data_end) { >> offset = callback_fn(ptr, ctx->data_end, callback_arg); >> if (offset == 0) >> return 0; >> ptr += offset; >> } >> >> // out of bounds before callback was done >> return -EINVAL; >> } > > We're starting to work on this since it will be useful not only for > packet parsing, TLV parsing, but potentially any kind of 'for' loop iteration. > >> This would work for parsing any kind of packet header or TLV-style data >> without having to teach the kernel about each header type. It'll have >> quite a bit of overhead if all the callbacks happen via indirect calls, >> but maybe the verifier can inline the calls (or at least turn them into >> direct CALL instructions)? > > Right. That's the main downside. > If the bpf_for_each*() helper is simple enough the verifier can inline it > similar to map_gen_lookup. In such case the indirect call will be a direct call, > so the overhead won't be that bad, but it's still a function call and > static function will have full prologue+epilogue. > Converting static function into direct jump would be really challenging > for the verifier and won't provide much benefit, since r6-r9 save/restore > would need to happen anyway even for such 'inlined' static func, since > llvm will be freely using r6-r9 for insns inside function body > assuming that it's a normal function. > > May be there is a way to avoid call overhead with with clang extensions. > If we want to do: > int mem_eq(char *p1, char *p2, int size) > { > int i; > for (i = 0; i < size; i++) > if (p1[i] != p2[i]) > return 0; > return 1; > } > > With clang extension we might write it as: > int bpf_mem_eq(char *p1, char *p2, int size) > { > int i = 0; > int iter; > > iter = __builtin_for_until(i, size, ({ > if (p1[i] != p2[i]) > goto out; > })); > out: > if (iter != size) > return 0; > return 1; > } > > The llvm will generate pseudo insns for this __builtin_for. > The verifier will analyze the loop body for the range [0, size) > and replace pseudo insns with normal branches after the verification. > We might even keep the normal C syntax for loops and use > llvm HardwareLoops pass to add pseudo insns. > It's more or less the same ideas for loops we discussed before > bounded loops were introduced. > The main problem with bounded loops is that the loop body will > typically be verified the number of times equal to the number of iterations. > So for any non-trivial loop such iteration count is not much more > than 100. The verifier can do scalar evolution analysis, but > it's likely won't work for many cases and user experience > will suffer. With __builtin_for the scalar evolution is not necessary, > since induction variable is one and explicit and its range is explicit too. > That enables single pass over loop body. > One might argue that for (i = 0; i < 10000; i += 10) loops are > necessary too, but instead of complicating the verifier with sparse > ranges it's better to put that on users that can do: > iter = __builtin_for_until(i, 10000 / 10, ({ > j = i * 10; > use j; > })); > Single explicit induction variable makes the verification practical. > The loop body won't be as heavily optimized as normal loop, > but it's a good thing. We have discussed how to verify *well-formed* loops back in 2018. (BPF control flow, supporting loops and other patterns: https://www.linuxplumbersconf.org/event/2/contributions/116/) Now probably the time to revisit it again! I think Alexei's proposal is the right direction to have compiler preserving the loop structure with some pseudo instructions and verifier is able to range-based verification instead of iterating all loop iterations. LLVM already has some IR loop intrinsics like below: def int_set_loop_iterations : def int_start_loop_iterations : def int_test_set_loop_iterations : def int_test_start_loop_iterations : def int_loop_decrement : def int_loop_decrement_reg : to facilitate hardware loop. BPF target can define its own intrinsics to help define a well structured loop. I will look into this.
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote: >> >> So if we can't fix the verifier, maybe we could come up with a more >> general helper for packet parsing? Something like: >> >> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg) >> { >> ptr = ctx->data + offset; >> while (ptr < ctx->data_end) { >> offset = callback_fn(ptr, ctx->data_end, callback_arg); >> if (offset == 0) >> return 0; >> ptr += offset; >> } >> >> // out of bounds before callback was done >> return -EINVAL; >> } > > We're starting to work on this since it will be useful not only for > packet parsing, TLV parsing, but potentially any kind of 'for' loop iteration. Awesome! :) >> This would work for parsing any kind of packet header or TLV-style data >> without having to teach the kernel about each header type. It'll have >> quite a bit of overhead if all the callbacks happen via indirect calls, >> but maybe the verifier can inline the calls (or at least turn them into >> direct CALL instructions)? > > Right. That's the main downside. > If the bpf_for_each*() helper is simple enough the verifier can inline it > similar to map_gen_lookup. In such case the indirect call will be a direct call, > so the overhead won't be that bad, but it's still a function call and > static function will have full prologue+epilogue. > Converting static function into direct jump would be really challenging > for the verifier and won't provide much benefit, since r6-r9 save/restore > would need to happen anyway even for such 'inlined' static func, since > llvm will be freely using r6-r9 for insns inside function body > assuming that it's a normal function. I reckon it could be acceptable to have the overhead of a regular function call per iteration, but obviously it would be better to avoid it. > May be there is a way to avoid call overhead with with clang extensions. > If we want to do: > int mem_eq(char *p1, char *p2, int size) > { > int i; > for (i = 0; i < size; i++) > if (p1[i] != p2[i]) > return 0; > return 1; > } > > With clang extension we might write it as: > int bpf_mem_eq(char *p1, char *p2, int size) > { > int i = 0; > int iter; > > iter = __builtin_for_until(i, size, ({ > if (p1[i] != p2[i]) > goto out; > })); > out: > if (iter != size) > return 0; > return 1; > } > > The llvm will generate pseudo insns for this __builtin_for. > The verifier will analyze the loop body for the range [0, size) > and replace pseudo insns with normal branches after the verification. That seems like an interesting approach! The __builtin_for thing is a little awkward, but not more than turning the loop into a separate function + callback. What about backwards compatibility? Would you have to ensure your kernel understands the loop instructions before you put them into your code, or could libbpf be taught to rewrite them if the kernel doesn't understand them (say, to a separate function that is called in a regular bounded loop)? > We might even keep the normal C syntax for loops and use > llvm HardwareLoops pass to add pseudo insns. Now *this* would be cool! > It's more or less the same ideas for loops we discussed before > bounded loops were introduced. Why was it rejected at the time? > The main problem with bounded loops is that the loop body will > typically be verified the number of times equal to the number of iterations. > So for any non-trivial loop such iteration count is not much more > than 100. The verifier can do scalar evolution analysis, but > it's likely won't work for many cases and user experience > will suffer. With __builtin_for the scalar evolution is not necessary, > since induction variable is one and explicit and its range is explicit too. > That enables single pass over loop body. > One might argue that for (i = 0; i < 10000; i += 10) loops are > necessary too, but instead of complicating the verifier with sparse > ranges it's better to put that on users that can do: > iter = __builtin_for_until(i, 10000 / 10, ({ > j = i * 10; > use j; > })); > Single explicit induction variable makes the verification practical. > The loop body won't be as heavily optimized as normal loop, > but it's a good thing. Agreed, limiting things to single-step loops would be acceptable. -Toke