Message ID | 20231229081409.1276386-1-menglong8.dong@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf: add csum/ip_summed fields to __sk_buff | expand |
On 12/29, Menglong Dong wrote: > For now, we have to call some helpers when we need to update the csum, > such as bpf_l4_csum_replace, bpf_l3_csum_replace, etc. These helpers are > not inlined, which causes poor performance. > > In fact, we can define our own csum update functions in BPF program > instead of bpf_l3_csum_replace, which is totally inlined and efficient. > However, we can't do this for bpf_l4_csum_replace for now, as we can't > update skb->csum, which can cause skb->csum invalid in the rx path with > CHECKSUM_COMPLETE mode. > > What's more, we can't use the direct data access and have to use > skb_store_bytes() with the BPF_F_RECOMPUTE_CSUM flag in some case, such > as modifing the vni in the vxlan header and the underlay udp header has > no checksum. > > In the first patch, we make skb->csum readable and writable, and we make > skb->ip_summed readable. For now, for tc only. With these 2 fields, we > don't need to call bpf helpers for csum update any more. > > In the second patch, we add some testcases for the read/write testing for > skb->csum and skb->ip_summed. > > If this series is acceptable, we can define the inlined functions for csum > update in libbpf in the next step. One downside of exposing those as __sk_buff fields is that all this skb internal csum stuff now becomes a UAPI. And I'm not sure we want that :-) Should we add a lightweight kfunc to reset the fields instead? Or will it still have an unacceptable overhead?
On 1/2/24 10:11 AM, Stanislav Fomichev wrote: > On 12/29, Menglong Dong wrote: >> For now, we have to call some helpers when we need to update the csum, >> such as bpf_l4_csum_replace, bpf_l3_csum_replace, etc. These helpers are >> not inlined, which causes poor performance. >> >> In fact, we can define our own csum update functions in BPF program >> instead of bpf_l3_csum_replace, which is totally inlined and efficient. >> However, we can't do this for bpf_l4_csum_replace for now, as we can't >> update skb->csum, which can cause skb->csum invalid in the rx path with >> CHECKSUM_COMPLETE mode. >> >> What's more, we can't use the direct data access and have to use >> skb_store_bytes() with the BPF_F_RECOMPUTE_CSUM flag in some case, such >> as modifing the vni in the vxlan header and the underlay udp header has >> no checksum. There is bpf_csum_update(), does it work? A helper call should be acceptable comparing with the csum calculation itself. >> >> In the first patch, we make skb->csum readable and writable, and we make >> skb->ip_summed readable. For now, for tc only. With these 2 fields, we >> don't need to call bpf helpers for csum update any more. >> >> In the second patch, we add some testcases for the read/write testing for >> skb->csum and skb->ip_summed. >> >> If this series is acceptable, we can define the inlined functions for csum >> update in libbpf in the next step. > > One downside of exposing those as __sk_buff fields is that all this > skb internal csum stuff now becomes a UAPI. And I'm not sure we want +1. Please no new __sk_buff extension and no new conversion in bpf_convert_ctx_access(). > that :-) Should we add a lightweight kfunc to reset the fields instead? > Or will it still have an unacceptable overhead?
On Wed, Jan 3, 2024 at 8:52 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 1/2/24 10:11 AM, Stanislav Fomichev wrote: > > On 12/29, Menglong Dong wrote: > >> For now, we have to call some helpers when we need to update the csum, > >> such as bpf_l4_csum_replace, bpf_l3_csum_replace, etc. These helpers are > >> not inlined, which causes poor performance. > >> > >> In fact, we can define our own csum update functions in BPF program > >> instead of bpf_l3_csum_replace, which is totally inlined and efficient. > >> However, we can't do this for bpf_l4_csum_replace for now, as we can't > >> update skb->csum, which can cause skb->csum invalid in the rx path with > >> CHECKSUM_COMPLETE mode. > >> > >> What's more, we can't use the direct data access and have to use > >> skb_store_bytes() with the BPF_F_RECOMPUTE_CSUM flag in some case, such > >> as modifing the vni in the vxlan header and the underlay udp header has > >> no checksum. > > There is bpf_csum_update(), does it work? > A helper call should be acceptable comparing with the csum calculation itself. Yeah, this helper works in this case! Now we miss the last piece for the tx path: ip_summed. We need to know if it is CHECKSUM_PARTIAL to decide if we should update the csum in the packet. In the tx path, the csum in the L4 is the pseudo header only if skb->ip_summed is CHECKSUM_PARTIAL. Maybe we can introduce a lightweight kfunc to get its value? Such as bpf_skb_csum_mode(). As we need only call it once, there shouldn't be overhead on it. Thanks! Menglong Dong > > >> > >> In the first patch, we make skb->csum readable and writable, and we make > >> skb->ip_summed readable. For now, for tc only. With these 2 fields, we > >> don't need to call bpf helpers for csum update any more. > >> > >> In the second patch, we add some testcases for the read/write testing for > >> skb->csum and skb->ip_summed. > >> > >> If this series is acceptable, we can define the inlined functions for csum > >> update in libbpf in the next step. > > > > One downside of exposing those as __sk_buff fields is that all this > > skb internal csum stuff now becomes a UAPI. And I'm not sure we want > > +1. Please no new __sk_buff extension and no new conversion in > bpf_convert_ctx_access(). > > > that :-) Should we add a lightweight kfunc to reset the fields instead? > > Or will it still have an unacceptable overhead? >
On 1/2/24 6:54 PM, Menglong Dong wrote: > On Wed, Jan 3, 2024 at 8:52 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: >> On 1/2/24 10:11 AM, Stanislav Fomichev wrote: >>> On 12/29, Menglong Dong wrote: >>>> For now, we have to call some helpers when we need to update the csum, >>>> such as bpf_l4_csum_replace, bpf_l3_csum_replace, etc. These helpers are >>>> not inlined, which causes poor performance. >>>> >>>> In fact, we can define our own csum update functions in BPF program >>>> instead of bpf_l3_csum_replace, which is totally inlined and efficient. >>>> However, we can't do this for bpf_l4_csum_replace for now, as we can't >>>> update skb->csum, which can cause skb->csum invalid in the rx path with >>>> CHECKSUM_COMPLETE mode. >>>> >>>> What's more, we can't use the direct data access and have to use >>>> skb_store_bytes() with the BPF_F_RECOMPUTE_CSUM flag in some case, such >>>> as modifing the vni in the vxlan header and the underlay udp header has >>>> no checksum. >> There is bpf_csum_update(), does it work? >> A helper call should be acceptable comparing with the csum calculation itself. > Yeah, this helper works in this case! Now we miss the last > piece for the tx path: ip_summed. We need to know if it is > CHECKSUM_PARTIAL to decide if we should update the > csum in the packet. In the tx path, the csum in the L4 is the > pseudo header only if skb->ip_summed is CHECKSUM_PARTIAL. > > Maybe we can introduce a lightweight kfunc to get its > value? Such as bpf_skb_csum_mode(). As we need only call > it once, there shouldn't be overhead on it. You don't need kfunc, you can do checking like struct sk_buff *kskb = bpf_cast_to_kern_ctx(skb); if (kskb->ip_summed == CHECKSUM_PARTIAL) ... ... > > Thanks! > Menglong Dong > >>>> In the first patch, we make skb->csum readable and writable, and we make >>>> skb->ip_summed readable. For now, for tc only. With these 2 fields, we >>>> don't need to call bpf helpers for csum update any more. >>>> >>>> In the second patch, we add some testcases for the read/write testing for >>>> skb->csum and skb->ip_summed. >>>> >>>> If this series is acceptable, we can define the inlined functions for csum >>>> update in libbpf in the next step. >>> One downside of exposing those as __sk_buff fields is that all this >>> skb internal csum stuff now becomes a UAPI. And I'm not sure we want >> +1. Please no new __sk_buff extension and no new conversion in >> bpf_convert_ctx_access(). >> >>> that :-) Should we add a lightweight kfunc to reset the fields instead? >>> Or will it still have an unacceptable overhead?
On Wed, Jan 3, 2024 at 11:55 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > On 1/2/24 6:54 PM, Menglong Dong wrote: > > On Wed, Jan 3, 2024 at 8:52 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >> On 1/2/24 10:11 AM, Stanislav Fomichev wrote: > >>> On 12/29, Menglong Dong wrote: > >>>> For now, we have to call some helpers when we need to update the csum, > >>>> such as bpf_l4_csum_replace, bpf_l3_csum_replace, etc. These helpers are > >>>> not inlined, which causes poor performance. > >>>> > >>>> In fact, we can define our own csum update functions in BPF program > >>>> instead of bpf_l3_csum_replace, which is totally inlined and efficient. > >>>> However, we can't do this for bpf_l4_csum_replace for now, as we can't > >>>> update skb->csum, which can cause skb->csum invalid in the rx path with > >>>> CHECKSUM_COMPLETE mode. > >>>> > >>>> What's more, we can't use the direct data access and have to use > >>>> skb_store_bytes() with the BPF_F_RECOMPUTE_CSUM flag in some case, such > >>>> as modifing the vni in the vxlan header and the underlay udp header has > >>>> no checksum. > >> There is bpf_csum_update(), does it work? > >> A helper call should be acceptable comparing with the csum calculation itself. > > Yeah, this helper works in this case! Now we miss the last > > piece for the tx path: ip_summed. We need to know if it is > > CHECKSUM_PARTIAL to decide if we should update the > > csum in the packet. In the tx path, the csum in the L4 is the > > pseudo header only if skb->ip_summed is CHECKSUM_PARTIAL. > > > > Maybe we can introduce a lightweight kfunc to get its > > value? Such as bpf_skb_csum_mode(). As we need only call > > it once, there shouldn't be overhead on it. > > You don't need kfunc, you can do checking like > struct sk_buff *kskb = bpf_cast_to_kern_ctx(skb); > if (kskb->ip_summed == CHECKSUM_PARTIAL) ... > ... > Great, this is exactly what I need! Thanks~