Message ID | 168554475365.3262482.9868965521545045945.stgit@firesoul (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next] bpf/xdp: optimize bpf_xdp_pointer to avoid reading sinfo | expand |
> Currently we observed a significant performance degradation in > samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling, > added in commit 772251742262 ("samples/bpf: fixup some tools to be able > to support xdp multibuffer"). > > This patch reduce the overhead by avoiding to read/load shared_info > (sinfo) memory area, when XDP packet don't have any frags. This improves > performance because sinfo is located in another cacheline. > > Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes() > and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can > potentially access sinfo. > > Perf report show bpf_xdp_pointer() percentage utilization being reduced > from 4,19% to 3,37% (on CPU E5-1650 @3.60GHz). > > The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it > should also take effect for that. > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > --- > net/core/filter.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 968139f4a1ac..a635f537d499 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -3948,20 +3948,24 @@ void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, > > void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) > { > - struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); > u32 size = xdp->data_end - xdp->data; > + struct skb_shared_info *sinfo; > void *addr = xdp->data; > int i; > > if (unlikely(offset > 0xffff || len > 0xffff)) > return ERR_PTR(-EFAULT); > > - if (offset + len > xdp_get_buff_len(xdp)) > - return ERR_PTR(-EINVAL); > + if (likely((offset < size))) /* linear area */ > + goto out; Hi Jesper, please correct me if I am wrong but looking at the code, in this way bpf_xdp_pointer() will return NULL (and not ERR_PTR(-EINVAL)) if: - offset < size - offset + len > xdp_get_buff_len() doing so I would say bpf_xdp_copy_buf() will copy the full packet starting from offset leaving some part of the auxiliary buffer possible uninitialized. Do you think it is an issue? Regards, Lorenzo > > - if (offset < size) /* linear area */ > + if (likely(!xdp_buff_has_frags(xdp))) > goto out; > > + if (offset + len > xdp_get_buff_len(xdp)) > + return ERR_PTR(-EINVAL); > + > + sinfo = xdp_get_shared_info_from_buff(xdp); > offset -= size; > for (i = 0; i < sinfo->nr_frags; i++) { /* paged area */ > u32 frag_size = skb_frag_size(&sinfo->frags[i]); > >
Lorenzo Bianconi <lorenzo@kernel.org> writes: >> Currently we observed a significant performance degradation in >> samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling, >> added in commit 772251742262 ("samples/bpf: fixup some tools to be able >> to support xdp multibuffer"). >> >> This patch reduce the overhead by avoiding to read/load shared_info >> (sinfo) memory area, when XDP packet don't have any frags. This improves >> performance because sinfo is located in another cacheline. >> >> Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes() >> and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can >> potentially access sinfo. >> >> Perf report show bpf_xdp_pointer() percentage utilization being reduced >> from 4,19% to 3,37% (on CPU E5-1650 @3.60GHz). >> >> The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it >> should also take effect for that. >> >> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> >> --- >> net/core/filter.c | 12 ++++++++---- >> 1 file changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/net/core/filter.c b/net/core/filter.c >> index 968139f4a1ac..a635f537d499 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -3948,20 +3948,24 @@ void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, >> >> void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) >> { >> - struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); >> u32 size = xdp->data_end - xdp->data; >> + struct skb_shared_info *sinfo; >> void *addr = xdp->data; >> int i; >> >> if (unlikely(offset > 0xffff || len > 0xffff)) >> return ERR_PTR(-EFAULT); >> >> - if (offset + len > xdp_get_buff_len(xdp)) >> - return ERR_PTR(-EINVAL); >> + if (likely((offset < size))) /* linear area */ >> + goto out; > > Hi Jesper, > > please correct me if I am wrong but looking at the code, in this way > bpf_xdp_pointer() will return NULL (and not ERR_PTR(-EINVAL)) if: > - offset < size > - offset + len > xdp_get_buff_len() > > doing so I would say bpf_xdp_copy_buf() will copy the full packet starting from > offset leaving some part of the auxiliary buffer possible uninitialized. > Do you think it is an issue? Yeah, you're right, bpf_xdp_load_bytes() should fail if trying to read beyond the frame, and in this case it won't for non-frags; that's a change in behaviour we probably shouldn't be making... -Toke
On 31/05/2023 18.24, Toke Høiland-Jørgensen wrote: > Lorenzo Bianconi <lorenzo@kernel.org> writes: > >>> Currently we observed a significant performance degradation in >>> samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling, >>> added in commit 772251742262 ("samples/bpf: fixup some tools to be able >>> to support xdp multibuffer"). >>> >>> This patch reduce the overhead by avoiding to read/load shared_info >>> (sinfo) memory area, when XDP packet don't have any frags. This improves >>> performance because sinfo is located in another cacheline. >>> >>> Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes() >>> and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can >>> potentially access sinfo. >>> >>> Perf report show bpf_xdp_pointer() percentage utilization being reduced >>> from 4,19% to 3,37% (on CPU E5-1650 @3.60GHz). >>> >>> The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it >>> should also take effect for that. >>> >>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> >>> --- >>> net/core/filter.c | 12 ++++++++---- >>> 1 file changed, 8 insertions(+), 4 deletions(-) >>> >>> diff --git a/net/core/filter.c b/net/core/filter.c >>> index 968139f4a1ac..a635f537d499 100644 >>> --- a/net/core/filter.c >>> +++ b/net/core/filter.c >>> @@ -3948,20 +3948,24 @@ void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, >>> >>> void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) >>> { >>> - struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); >>> u32 size = xdp->data_end - xdp->data; >>> + struct skb_shared_info *sinfo; >>> void *addr = xdp->data; >>> int i; >>> >>> if (unlikely(offset > 0xffff || len > 0xffff)) >>> return ERR_PTR(-EFAULT); >>> >>> - if (offset + len > xdp_get_buff_len(xdp)) >>> - return ERR_PTR(-EINVAL); >>> + if (likely((offset < size))) /* linear area */ >>> + goto out; >> >> Hi Jesper, >> >> please correct me if I am wrong but looking at the code, in this way >> bpf_xdp_pointer() will return NULL (and not ERR_PTR(-EINVAL)) if: >> - offset < size >> - offset + len > xdp_get_buff_len() >> >> doing so I would say bpf_xdp_copy_buf() will copy the full packet starting from >> offset leaving some part of the auxiliary buffer possible uninitialized. >> Do you think it is an issue? > > Yeah, you're right, bpf_xdp_load_bytes() should fail if trying to read > beyond the frame, and in this case it won't for non-frags; that's a > change in behaviour we probably shouldn't be making... > Thanks for spotting this! I will work on a V2 tomorrow. --Jesper
diff --git a/net/core/filter.c b/net/core/filter.c index 968139f4a1ac..a635f537d499 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3948,20 +3948,24 @@ void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len) { - struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); u32 size = xdp->data_end - xdp->data; + struct skb_shared_info *sinfo; void *addr = xdp->data; int i; if (unlikely(offset > 0xffff || len > 0xffff)) return ERR_PTR(-EFAULT); - if (offset + len > xdp_get_buff_len(xdp)) - return ERR_PTR(-EINVAL); + if (likely((offset < size))) /* linear area */ + goto out; - if (offset < size) /* linear area */ + if (likely(!xdp_buff_has_frags(xdp))) goto out; + if (offset + len > xdp_get_buff_len(xdp)) + return ERR_PTR(-EINVAL); + + sinfo = xdp_get_shared_info_from_buff(xdp); offset -= size; for (i = 0; i < sinfo->nr_frags; i++) { /* paged area */ u32 frag_size = skb_frag_size(&sinfo->frags[i]);
Currently we observed a significant performance degradation in samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling, added in commit 772251742262 ("samples/bpf: fixup some tools to be able to support xdp multibuffer"). This patch reduce the overhead by avoiding to read/load shared_info (sinfo) memory area, when XDP packet don't have any frags. This improves performance because sinfo is located in another cacheline. Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes() and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can potentially access sinfo. Perf report show bpf_xdp_pointer() percentage utilization being reduced from 4,19% to 3,37% (on CPU E5-1650 @3.60GHz). The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it should also take effect for that. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> --- net/core/filter.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)