diff mbox series

[RFC,6/9] virtio_net: construct multi-buffer xdp in mergeable

Message ID 20221122074348.88601-7-hengqi@linux.alibaba.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series virtio_net: support multi buffer xdp | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1 this patch: 0
netdev/cc_maintainers warning 2 maintainers not CCed: virtualization@lists.linux-foundation.org hawk@kernel.org
netdev/build_clang success Errors and warnings before: 2 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 0
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Heng Qi Nov. 22, 2022, 7:43 a.m. UTC
Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.

For the prefilled buffer before xdp is set, vq reset can be
used to clear it, but most devices do not support it at present.
In order not to bother users who are using xdp normally, we do
not use vq reset for the time being. At the same time, virtio
net currently uses comp pages, and bpf_xdp_frags_increase_tail()
needs to calculate the tailroom of the last frag, which will
involve the offset of the corresponding page and cause a negative
value, so we disable tail increase by not setting xdp_rxq->frag_size.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 29 deletions(-)

Comments

Jason Wang Dec. 6, 2022, 6:33 a.m. UTC | #1
On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
>
> For the prefilled buffer before xdp is set, vq reset can be
> used to clear it, but most devices do not support it at present.
> In order not to bother users who are using xdp normally, we do
> not use vq reset for the time being.

I guess to tweak the part to say we will probably use vq reset in the future.

> At the same time, virtio
> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
> needs to calculate the tailroom of the last frag, which will
> involve the offset of the corresponding page and cause a negative
> value, so we disable tail increase by not setting xdp_rxq->frag_size.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
>  1 file changed, 38 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 20784b1d8236..83e6933ae62b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>                                          unsigned int *xdp_xmit,
>                                          struct virtnet_rq_stats *stats)
>  {
> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>         struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>         u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>         struct page *page = virt_to_head_page(buf);
> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>         rcu_read_lock();
>         xdp_prog = rcu_dereference(rq->xdp_prog);
>         if (xdp_prog) {
> +               unsigned int xdp_frags_truesz = 0;
> +               struct skb_shared_info *shinfo;
>                 struct xdp_frame *xdpf;
>                 struct page *xdp_page;
>                 struct xdp_buff xdp;
>                 void *data;
>                 u32 act;
> +               int i;
>
> -               /* Transient failure which in theory could occur if
> -                * in-flight packets from before XDP was enabled reach
> -                * the receive path after XDP is loaded.
> -                */
> -               if (unlikely(hdr->hdr.gso_type))
> -                       goto err_xdp;

Two questions:

1) should we keep this check for the XDP program that can't deal with XDP frags?
2) how could we guarantee that the vnet header (gso_type/csum_start
etc) is still valid after XDP (where XDP program can choose to
override the header)?

> -
> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
> +                * with headroom may add hole in truesize, which
> +                * make their length exceed PAGE_SIZE. So we disabled the
> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
>                  */
>                 frame_sz = headroom ? PAGE_SIZE : truesize;
>
> -               /* This happens when rx buffer size is underestimated
> -                * or headroom is not enough because of the buffer
> -                * was refilled before XDP is set. This should only
> -                * happen for the first several packets, so we don't
> -                * care much about its performance.
> +               /* This happens when headroom is not enough because
> +                * of the buffer was prefilled before XDP is set.
> +                * This should only happen for the first several packets.
> +                * In fact, vq reset can be used here to help us clean up
> +                * the prefilled buffers, but many existing devices do not
> +                * support it, and we don't want to bother users who are
> +                * using xdp normally.
>                  */
> -               if (unlikely(num_buf > 1 ||
> -                            headroom < virtnet_get_headroom(vi))) {
> -                       /* linearize data for XDP */
> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
> -                                                     page, offset,
> -                                                     VIRTIO_XDP_HEADROOM,
> -                                                     &len);
> -                       frame_sz = PAGE_SIZE;
> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
> +                               goto err_xdp;
>
> +                       xdp_page = alloc_page(GFP_ATOMIC);
>                         if (!xdp_page)
>                                 goto err_xdp;
> +
> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
> +                              page_address(page) + offset, len);
> +                       frame_sz = PAGE_SIZE;

How can we know a single page is sufficient here? (before XDP is set,
we reserve neither headroom nor tailroom).

>                         offset = VIRTIO_XDP_HEADROOM;

I think we should still try to do linearization for the XDP program
that doesn't support XDP frags.

Thanks

>                 } else {
>                         xdp_page = page;
>                 }
> -
> -               /* Allow consuming headroom but reserve enough space to push
> -                * the descriptor on if we get an XDP_TX return code.
> -                */
>                 data = page_address(xdp_page) + offset;
> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
> +                                            &num_buf, &xdp_frags_truesz, stats);
> +               if (unlikely(err))
> +                       goto err_xdp_frags;
>
>                 act = bpf_prog_run_xdp(xdp_prog, &xdp);
>                 stats->xdp_packets++;
> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>                                 __free_pages(xdp_page, 0);
>                         goto err_xdp;
>                 }
> +err_xdp_frags:
> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
> +
> +               if (unlikely(xdp_page != page))
> +                       __free_pages(xdp_page, 0);
> +
> +               for (i = 0; i < shinfo->nr_frags; i++) {
> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
> +                       put_page(xdp_page);
> +               }
> +               goto err_xdp;
>         }
>         rcu_read_unlock();
>
> --
> 2.19.1.6.gb485710b
>
Heng Qi Dec. 8, 2022, 8:30 a.m. UTC | #2
在 2022/12/6 下午2:33, Jason Wang 写道:
> On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
>>
>> For the prefilled buffer before xdp is set, vq reset can be
>> used to clear it, but most devices do not support it at present.
>> In order not to bother users who are using xdp normally, we do
>> not use vq reset for the time being.
> I guess to tweak the part to say we will probably use vq reset in the future.

OK, it works.

>
>> At the same time, virtio
>> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
>> needs to calculate the tailroom of the last frag, which will
>> involve the offset of the corresponding page and cause a negative
>> value, so we disable tail increase by not setting xdp_rxq->frag_size.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
>>   1 file changed, 38 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 20784b1d8236..83e6933ae62b 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                                           unsigned int *xdp_xmit,
>>                                           struct virtnet_rq_stats *stats)
>>   {
>> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>>          u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>>          struct page *page = virt_to_head_page(buf);
>> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>          rcu_read_lock();
>>          xdp_prog = rcu_dereference(rq->xdp_prog);
>>          if (xdp_prog) {
>> +               unsigned int xdp_frags_truesz = 0;
>> +               struct skb_shared_info *shinfo;
>>                  struct xdp_frame *xdpf;
>>                  struct page *xdp_page;
>>                  struct xdp_buff xdp;
>>                  void *data;
>>                  u32 act;
>> +               int i;
>>
>> -               /* Transient failure which in theory could occur if
>> -                * in-flight packets from before XDP was enabled reach
>> -                * the receive path after XDP is loaded.
>> -                */
>> -               if (unlikely(hdr->hdr.gso_type))
>> -                       goto err_xdp;
> Two questions:
>
> 1) should we keep this check for the XDP program that can't deal with XDP frags?

Yes, the problem is the same as the xdp program without xdp.frags when 
GRO_HW, I will correct it.

> 2) how could we guarantee that the vnet header (gso_type/csum_start
> etc) is still valid after XDP (where XDP program can choose to
> override the header)?

We can save the vnet headr before the driver receives the packet and 
build xdp_buff, and then use
the pre-saved value in the subsequent process.

>> -
>> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
>> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
>> +                * with headroom may add hole in truesize, which
>> +                * make their length exceed PAGE_SIZE. So we disabled the
>> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
>>                   */
>>                  frame_sz = headroom ? PAGE_SIZE : truesize;
>>
>> -               /* This happens when rx buffer size is underestimated
>> -                * or headroom is not enough because of the buffer
>> -                * was refilled before XDP is set. This should only
>> -                * happen for the first several packets, so we don't
>> -                * care much about its performance.
>> +               /* This happens when headroom is not enough because
>> +                * of the buffer was prefilled before XDP is set.
>> +                * This should only happen for the first several packets.
>> +                * In fact, vq reset can be used here to help us clean up
>> +                * the prefilled buffers, but many existing devices do not
>> +                * support it, and we don't want to bother users who are
>> +                * using xdp normally.
>>                   */
>> -               if (unlikely(num_buf > 1 ||
>> -                            headroom < virtnet_get_headroom(vi))) {
>> -                       /* linearize data for XDP */
>> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
>> -                                                     page, offset,
>> -                                                     VIRTIO_XDP_HEADROOM,
>> -                                                     &len);
>> -                       frame_sz = PAGE_SIZE;
>> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
>> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
>> +                               goto err_xdp;
>>
>> +                       xdp_page = alloc_page(GFP_ATOMIC);
>>                          if (!xdp_page)
>>                                  goto err_xdp;
>> +
>> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
>> +                              page_address(page) + offset, len);
>> +                       frame_sz = PAGE_SIZE;
> How can we know a single page is sufficient here? (before XDP is set,
> we reserve neither headroom nor tailroom).

This is only for the first buffer, refer to add_recvbuf_mergeable() and
get_mergeable_buf_len() A buffer is always no larger than a page.

>
>>                          offset = VIRTIO_XDP_HEADROOM;
> I think we should still try to do linearization for the XDP program
> that doesn't support XDP frags.

Yes, you are right.

Thanks.

>
> Thanks
>
>>                  } else {
>>                          xdp_page = page;
>>                  }
>> -
>> -               /* Allow consuming headroom but reserve enough space to push
>> -                * the descriptor on if we get an XDP_TX return code.
>> -                */
>>                  data = page_address(xdp_page) + offset;
>> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
>> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
>> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
>> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
>> +                                            &num_buf, &xdp_frags_truesz, stats);
>> +               if (unlikely(err))
>> +                       goto err_xdp_frags;
>>
>>                  act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>                  stats->xdp_packets++;
>> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                                  __free_pages(xdp_page, 0);
>>                          goto err_xdp;
>>                  }
>> +err_xdp_frags:
>> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
>> +
>> +               if (unlikely(xdp_page != page))
>> +                       __free_pages(xdp_page, 0);
>> +
>> +               for (i = 0; i < shinfo->nr_frags; i++) {
>> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
>> +                       put_page(xdp_page);
>> +               }
>> +               goto err_xdp;
>>          }
>>          rcu_read_unlock();
>>
>> --
>> 2.19.1.6.gb485710b
>>
Jason Wang Dec. 13, 2022, 7:08 a.m. UTC | #3
On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
>
>
> 在 2022/12/6 下午2:33, Jason Wang 写道:
> > On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
> >>
> >> For the prefilled buffer before xdp is set, vq reset can be
> >> used to clear it, but most devices do not support it at present.
> >> In order not to bother users who are using xdp normally, we do
> >> not use vq reset for the time being.
> > I guess to tweak the part to say we will probably use vq reset in the future.
>
> OK, it works.
>
> >
> >> At the same time, virtio
> >> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
> >> needs to calculate the tailroom of the last frag, which will
> >> involve the offset of the corresponding page and cause a negative
> >> value, so we disable tail increase by not setting xdp_rxq->frag_size.
> >>
> >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> >> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >> ---
> >>   drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
> >>   1 file changed, 38 insertions(+), 29 deletions(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index 20784b1d8236..83e6933ae62b 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >>                                           unsigned int *xdp_xmit,
> >>                                           struct virtnet_rq_stats *stats)
> >>   {
> >> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> >>          u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> >>          struct page *page = virt_to_head_page(buf);
> >> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >>          rcu_read_lock();
> >>          xdp_prog = rcu_dereference(rq->xdp_prog);
> >>          if (xdp_prog) {
> >> +               unsigned int xdp_frags_truesz = 0;
> >> +               struct skb_shared_info *shinfo;
> >>                  struct xdp_frame *xdpf;
> >>                  struct page *xdp_page;
> >>                  struct xdp_buff xdp;
> >>                  void *data;
> >>                  u32 act;
> >> +               int i;
> >>
> >> -               /* Transient failure which in theory could occur if
> >> -                * in-flight packets from before XDP was enabled reach
> >> -                * the receive path after XDP is loaded.
> >> -                */
> >> -               if (unlikely(hdr->hdr.gso_type))
> >> -                       goto err_xdp;
> > Two questions:
> >
> > 1) should we keep this check for the XDP program that can't deal with XDP frags?
>
> Yes, the problem is the same as the xdp program without xdp.frags when
> GRO_HW, I will correct it.
>
> > 2) how could we guarantee that the vnet header (gso_type/csum_start
> > etc) is still valid after XDP (where XDP program can choose to
> > override the header)?
>
> We can save the vnet headr before the driver receives the packet and
> build xdp_buff, and then use
> the pre-saved value in the subsequent process.

The problem is that XDP may modify the packet (header) so some fields
are not valid any more (e.g csum_start/offset ?).

If I was not wrong, there's no way for the XDP program to access those
fields or does it support it right now?

>
> >> -
> >> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
> >> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> >> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
> >> +                * with headroom may add hole in truesize, which
> >> +                * make their length exceed PAGE_SIZE. So we disabled the
> >> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
> >>                   */
> >>                  frame_sz = headroom ? PAGE_SIZE : truesize;
> >>
> >> -               /* This happens when rx buffer size is underestimated
> >> -                * or headroom is not enough because of the buffer
> >> -                * was refilled before XDP is set. This should only
> >> -                * happen for the first several packets, so we don't
> >> -                * care much about its performance.
> >> +               /* This happens when headroom is not enough because
> >> +                * of the buffer was prefilled before XDP is set.
> >> +                * This should only happen for the first several packets.
> >> +                * In fact, vq reset can be used here to help us clean up
> >> +                * the prefilled buffers, but many existing devices do not
> >> +                * support it, and we don't want to bother users who are
> >> +                * using xdp normally.
> >>                   */
> >> -               if (unlikely(num_buf > 1 ||
> >> -                            headroom < virtnet_get_headroom(vi))) {
> >> -                       /* linearize data for XDP */
> >> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
> >> -                                                     page, offset,
> >> -                                                     VIRTIO_XDP_HEADROOM,
> >> -                                                     &len);
> >> -                       frame_sz = PAGE_SIZE;
> >> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
> >> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
> >> +                               goto err_xdp;
> >>
> >> +                       xdp_page = alloc_page(GFP_ATOMIC);
> >>                          if (!xdp_page)
> >>                                  goto err_xdp;
> >> +
> >> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
> >> +                              page_address(page) + offset, len);
> >> +                       frame_sz = PAGE_SIZE;
> > How can we know a single page is sufficient here? (before XDP is set,
> > we reserve neither headroom nor tailroom).
>
> This is only for the first buffer, refer to add_recvbuf_mergeable() and
> get_mergeable_buf_len() A buffer is always no larger than a page.

Ok.

Thanks

>
> >
> >>                          offset = VIRTIO_XDP_HEADROOM;
> > I think we should still try to do linearization for the XDP program
> > that doesn't support XDP frags.
>
> Yes, you are right.
>
> Thanks.
>
> >
> > Thanks
> >
> >>                  } else {
> >>                          xdp_page = page;
> >>                  }
> >> -
> >> -               /* Allow consuming headroom but reserve enough space to push
> >> -                * the descriptor on if we get an XDP_TX return code.
> >> -                */
> >>                  data = page_address(xdp_page) + offset;
> >> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
> >> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
> >> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
> >> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
> >> +                                            &num_buf, &xdp_frags_truesz, stats);
> >> +               if (unlikely(err))
> >> +                       goto err_xdp_frags;
> >>
> >>                  act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >>                  stats->xdp_packets++;
> >> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >>                                  __free_pages(xdp_page, 0);
> >>                          goto err_xdp;
> >>                  }
> >> +err_xdp_frags:
> >> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
> >> +
> >> +               if (unlikely(xdp_page != page))
> >> +                       __free_pages(xdp_page, 0);
> >> +
> >> +               for (i = 0; i < shinfo->nr_frags; i++) {
> >> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
> >> +                       put_page(xdp_page);
> >> +               }
> >> +               goto err_xdp;
> >>          }
> >>          rcu_read_unlock();
> >>
> >> --
> >> 2.19.1.6.gb485710b
> >>
>
Heng Qi Dec. 14, 2022, 8:37 a.m. UTC | #4
On Tue, Dec 13, 2022 at 03:08:46PM +0800, Jason Wang wrote:
> On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >
> >
> >
> > 在 2022/12/6 下午2:33, Jason Wang 写道:
> > > On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> > >> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
> > >>
> > >> For the prefilled buffer before xdp is set, vq reset can be
> > >> used to clear it, but most devices do not support it at present.
> > >> In order not to bother users who are using xdp normally, we do
> > >> not use vq reset for the time being.
> > > I guess to tweak the part to say we will probably use vq reset in the future.
> >
> > OK, it works.
> >
> > >
> > >> At the same time, virtio
> > >> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
> > >> needs to calculate the tailroom of the last frag, which will
> > >> involve the offset of the corresponding page and cause a negative
> > >> value, so we disable tail increase by not setting xdp_rxq->frag_size.
> > >>
> > >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > >> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > >> ---
> > >>   drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
> > >>   1 file changed, 38 insertions(+), 29 deletions(-)
> > >>
> > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > >> index 20784b1d8236..83e6933ae62b 100644
> > >> --- a/drivers/net/virtio_net.c
> > >> +++ b/drivers/net/virtio_net.c
> > >> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >>                                           unsigned int *xdp_xmit,
> > >>                                           struct virtnet_rq_stats *stats)
> > >>   {
> > >> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > >>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> > >>          u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> > >>          struct page *page = virt_to_head_page(buf);
> > >> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >>          rcu_read_lock();
> > >>          xdp_prog = rcu_dereference(rq->xdp_prog);
> > >>          if (xdp_prog) {
> > >> +               unsigned int xdp_frags_truesz = 0;
> > >> +               struct skb_shared_info *shinfo;
> > >>                  struct xdp_frame *xdpf;
> > >>                  struct page *xdp_page;
> > >>                  struct xdp_buff xdp;
> > >>                  void *data;
> > >>                  u32 act;
> > >> +               int i;
> > >>
> > >> -               /* Transient failure which in theory could occur if
> > >> -                * in-flight packets from before XDP was enabled reach
> > >> -                * the receive path after XDP is loaded.
> > >> -                */
> > >> -               if (unlikely(hdr->hdr.gso_type))
> > >> -                       goto err_xdp;
> > > Two questions:
> > >
> > > 1) should we keep this check for the XDP program that can't deal with XDP frags?
> >
> > Yes, the problem is the same as the xdp program without xdp.frags when
> > GRO_HW, I will correct it.
> >
> > > 2) how could we guarantee that the vnet header (gso_type/csum_start
> > > etc) is still valid after XDP (where XDP program can choose to
> > > override the header)?
> >
> > We can save the vnet headr before the driver receives the packet and
> > build xdp_buff, and then use
> > the pre-saved value in the subsequent process.
> 
> The problem is that XDP may modify the packet (header) so some fields
> are not valid any more (e.g csum_start/offset ?).
> 
> If I was not wrong, there's no way for the XDP program to access those
> fields or does it support it right now?
> 

When guest_csum feature is negotiated, xdp cannot be set, because the metadata
of xdp_{buff, frame} may be adjusted by the bpf program, therefore,
csum_{start, offset} itself is invalid. And at the same time,
multi-buffer xdp programs should only Receive packets over larger MTU, so
we don't need gso related information anymore and need to disable GRO_HW.

Thanks.

> >
> > >> -
> > >> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
> > >> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> > >> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
> > >> +                * with headroom may add hole in truesize, which
> > >> +                * make their length exceed PAGE_SIZE. So we disabled the
> > >> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
> > >>                   */
> > >>                  frame_sz = headroom ? PAGE_SIZE : truesize;
> > >>
> > >> -               /* This happens when rx buffer size is underestimated
> > >> -                * or headroom is not enough because of the buffer
> > >> -                * was refilled before XDP is set. This should only
> > >> -                * happen for the first several packets, so we don't
> > >> -                * care much about its performance.
> > >> +               /* This happens when headroom is not enough because
> > >> +                * of the buffer was prefilled before XDP is set.
> > >> +                * This should only happen for the first several packets.
> > >> +                * In fact, vq reset can be used here to help us clean up
> > >> +                * the prefilled buffers, but many existing devices do not
> > >> +                * support it, and we don't want to bother users who are
> > >> +                * using xdp normally.
> > >>                   */
> > >> -               if (unlikely(num_buf > 1 ||
> > >> -                            headroom < virtnet_get_headroom(vi))) {
> > >> -                       /* linearize data for XDP */
> > >> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
> > >> -                                                     page, offset,
> > >> -                                                     VIRTIO_XDP_HEADROOM,
> > >> -                                                     &len);
> > >> -                       frame_sz = PAGE_SIZE;
> > >> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
> > >> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
> > >> +                               goto err_xdp;
> > >>
> > >> +                       xdp_page = alloc_page(GFP_ATOMIC);
> > >>                          if (!xdp_page)
> > >>                                  goto err_xdp;
> > >> +
> > >> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
> > >> +                              page_address(page) + offset, len);
> > >> +                       frame_sz = PAGE_SIZE;
> > > How can we know a single page is sufficient here? (before XDP is set,
> > > we reserve neither headroom nor tailroom).
> >
> > This is only for the first buffer, refer to add_recvbuf_mergeable() and
> > get_mergeable_buf_len() A buffer is always no larger than a page.
> 
> Ok.
> 
> Thanks
> 
> >
> > >
> > >>                          offset = VIRTIO_XDP_HEADROOM;
> > > I think we should still try to do linearization for the XDP program
> > > that doesn't support XDP frags.
> >
> > Yes, you are right.
> >
> > Thanks.
> >
> > >
> > > Thanks
> > >
> > >>                  } else {
> > >>                          xdp_page = page;
> > >>                  }
> > >> -
> > >> -               /* Allow consuming headroom but reserve enough space to push
> > >> -                * the descriptor on if we get an XDP_TX return code.
> > >> -                */
> > >>                  data = page_address(xdp_page) + offset;
> > >> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
> > >> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
> > >> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
> > >> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
> > >> +                                            &num_buf, &xdp_frags_truesz, stats);
> > >> +               if (unlikely(err))
> > >> +                       goto err_xdp_frags;
> > >>
> > >>                  act = bpf_prog_run_xdp(xdp_prog, &xdp);
> > >>                  stats->xdp_packets++;
> > >> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >>                                  __free_pages(xdp_page, 0);
> > >>                          goto err_xdp;
> > >>                  }
> > >> +err_xdp_frags:
> > >> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
> > >> +
> > >> +               if (unlikely(xdp_page != page))
> > >> +                       __free_pages(xdp_page, 0);
> > >> +
> > >> +               for (i = 0; i < shinfo->nr_frags; i++) {
> > >> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
> > >> +                       put_page(xdp_page);
> > >> +               }
> > >> +               goto err_xdp;
> > >>          }
> > >>          rcu_read_unlock();
> > >>
> > >> --
> > >> 2.19.1.6.gb485710b
> > >>
> >
Jason Wang Dec. 16, 2022, 3:46 a.m. UTC | #5
On Wed, Dec 14, 2022 at 4:38 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> On Tue, Dec 13, 2022 at 03:08:46PM +0800, Jason Wang wrote:
> > On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> > >
> > >
> > >
> > > 在 2022/12/6 下午2:33, Jason Wang 写道:
> > > > On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> > > >> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
> > > >>
> > > >> For the prefilled buffer before xdp is set, vq reset can be
> > > >> used to clear it, but most devices do not support it at present.
> > > >> In order not to bother users who are using xdp normally, we do
> > > >> not use vq reset for the time being.
> > > > I guess to tweak the part to say we will probably use vq reset in the future.
> > >
> > > OK, it works.
> > >
> > > >
> > > >> At the same time, virtio
> > > >> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
> > > >> needs to calculate the tailroom of the last frag, which will
> > > >> involve the offset of the corresponding page and cause a negative
> > > >> value, so we disable tail increase by not setting xdp_rxq->frag_size.
> > > >>
> > > >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > > >> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >> ---
> > > >>   drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
> > > >>   1 file changed, 38 insertions(+), 29 deletions(-)
> > > >>
> > > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > >> index 20784b1d8236..83e6933ae62b 100644
> > > >> --- a/drivers/net/virtio_net.c
> > > >> +++ b/drivers/net/virtio_net.c
> > > >> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >>                                           unsigned int *xdp_xmit,
> > > >>                                           struct virtnet_rq_stats *stats)
> > > >>   {
> > > >> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > >>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> > > >>          u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> > > >>          struct page *page = virt_to_head_page(buf);
> > > >> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >>          rcu_read_lock();
> > > >>          xdp_prog = rcu_dereference(rq->xdp_prog);
> > > >>          if (xdp_prog) {
> > > >> +               unsigned int xdp_frags_truesz = 0;
> > > >> +               struct skb_shared_info *shinfo;
> > > >>                  struct xdp_frame *xdpf;
> > > >>                  struct page *xdp_page;
> > > >>                  struct xdp_buff xdp;
> > > >>                  void *data;
> > > >>                  u32 act;
> > > >> +               int i;
> > > >>
> > > >> -               /* Transient failure which in theory could occur if
> > > >> -                * in-flight packets from before XDP was enabled reach
> > > >> -                * the receive path after XDP is loaded.
> > > >> -                */
> > > >> -               if (unlikely(hdr->hdr.gso_type))
> > > >> -                       goto err_xdp;
> > > > Two questions:
> > > >
> > > > 1) should we keep this check for the XDP program that can't deal with XDP frags?
> > >
> > > Yes, the problem is the same as the xdp program without xdp.frags when
> > > GRO_HW, I will correct it.
> > >
> > > > 2) how could we guarantee that the vnet header (gso_type/csum_start
> > > > etc) is still valid after XDP (where XDP program can choose to
> > > > override the header)?
> > >
> > > We can save the vnet headr before the driver receives the packet and
> > > build xdp_buff, and then use
> > > the pre-saved value in the subsequent process.
> >
> > The problem is that XDP may modify the packet (header) so some fields
> > are not valid any more (e.g csum_start/offset ?).
> >
> > If I was not wrong, there's no way for the XDP program to access those
> > fields or does it support it right now?
> >
>
> When guest_csum feature is negotiated, xdp cannot be set, because the metadata
> of xdp_{buff, frame} may be adjusted by the bpf program, therefore,
> csum_{start, offset} itself is invalid. And at the same time,
> multi-buffer xdp programs should only Receive packets over larger MTU, so
> we don't need gso related information anymore and need to disable GRO_HW.

Ok, that's fine.

(But it requires a large pMTU).

Thanks

>
> Thanks.
>
> > >
> > > >> -
> > > >> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
> > > >> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> > > >> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
> > > >> +                * with headroom may add hole in truesize, which
> > > >> +                * make their length exceed PAGE_SIZE. So we disabled the
> > > >> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
> > > >>                   */
> > > >>                  frame_sz = headroom ? PAGE_SIZE : truesize;
> > > >>
> > > >> -               /* This happens when rx buffer size is underestimated
> > > >> -                * or headroom is not enough because of the buffer
> > > >> -                * was refilled before XDP is set. This should only
> > > >> -                * happen for the first several packets, so we don't
> > > >> -                * care much about its performance.
> > > >> +               /* This happens when headroom is not enough because
> > > >> +                * of the buffer was prefilled before XDP is set.
> > > >> +                * This should only happen for the first several packets.
> > > >> +                * In fact, vq reset can be used here to help us clean up
> > > >> +                * the prefilled buffers, but many existing devices do not
> > > >> +                * support it, and we don't want to bother users who are
> > > >> +                * using xdp normally.
> > > >>                   */
> > > >> -               if (unlikely(num_buf > 1 ||
> > > >> -                            headroom < virtnet_get_headroom(vi))) {
> > > >> -                       /* linearize data for XDP */
> > > >> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
> > > >> -                                                     page, offset,
> > > >> -                                                     VIRTIO_XDP_HEADROOM,
> > > >> -                                                     &len);
> > > >> -                       frame_sz = PAGE_SIZE;
> > > >> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
> > > >> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
> > > >> +                               goto err_xdp;
> > > >>
> > > >> +                       xdp_page = alloc_page(GFP_ATOMIC);
> > > >>                          if (!xdp_page)
> > > >>                                  goto err_xdp;
> > > >> +
> > > >> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
> > > >> +                              page_address(page) + offset, len);
> > > >> +                       frame_sz = PAGE_SIZE;
> > > > How can we know a single page is sufficient here? (before XDP is set,
> > > > we reserve neither headroom nor tailroom).
> > >
> > > This is only for the first buffer, refer to add_recvbuf_mergeable() and
> > > get_mergeable_buf_len() A buffer is always no larger than a page.
> >
> > Ok.
> >
> > Thanks
> >
> > >
> > > >
> > > >>                          offset = VIRTIO_XDP_HEADROOM;
> > > > I think we should still try to do linearization for the XDP program
> > > > that doesn't support XDP frags.
> > >
> > > Yes, you are right.
> > >
> > > Thanks.
> > >
> > > >
> > > > Thanks
> > > >
> > > >>                  } else {
> > > >>                          xdp_page = page;
> > > >>                  }
> > > >> -
> > > >> -               /* Allow consuming headroom but reserve enough space to push
> > > >> -                * the descriptor on if we get an XDP_TX return code.
> > > >> -                */
> > > >>                  data = page_address(xdp_page) + offset;
> > > >> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
> > > >> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
> > > >> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
> > > >> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
> > > >> +                                            &num_buf, &xdp_frags_truesz, stats);
> > > >> +               if (unlikely(err))
> > > >> +                       goto err_xdp_frags;
> > > >>
> > > >>                  act = bpf_prog_run_xdp(xdp_prog, &xdp);
> > > >>                  stats->xdp_packets++;
> > > >> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >>                                  __free_pages(xdp_page, 0);
> > > >>                          goto err_xdp;
> > > >>                  }
> > > >> +err_xdp_frags:
> > > >> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
> > > >> +
> > > >> +               if (unlikely(xdp_page != page))
> > > >> +                       __free_pages(xdp_page, 0);
> > > >> +
> > > >> +               for (i = 0; i < shinfo->nr_frags; i++) {
> > > >> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
> > > >> +                       put_page(xdp_page);
> > > >> +               }
> > > >> +               goto err_xdp;
> > > >>          }
> > > >>          rcu_read_unlock();
> > > >>
> > > >> --
> > > >> 2.19.1.6.gb485710b
> > > >>
> > >
>
Heng Qi Dec. 16, 2022, 9:42 a.m. UTC | #6
在 2022/12/16 上午11:46, Jason Wang 写道:
> On Wed, Dec 14, 2022 at 4:38 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> On Tue, Dec 13, 2022 at 03:08:46PM +0800, Jason Wang wrote:
>>> On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>>>
>>>>
>>>> 在 2022/12/6 下午2:33, Jason Wang 写道:
>>>>> On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>>>>> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
>>>>>>
>>>>>> For the prefilled buffer before xdp is set, vq reset can be
>>>>>> used to clear it, but most devices do not support it at present.
>>>>>> In order not to bother users who are using xdp normally, we do
>>>>>> not use vq reset for the time being.
>>>>> I guess to tweak the part to say we will probably use vq reset in the future.
>>>> OK, it works.
>>>>
>>>>>> At the same time, virtio
>>>>>> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
>>>>>> needs to calculate the tailroom of the last frag, which will
>>>>>> involve the offset of the corresponding page and cause a negative
>>>>>> value, so we disable tail increase by not setting xdp_rxq->frag_size.
>>>>>>
>>>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>>>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>>> ---
>>>>>>    drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
>>>>>>    1 file changed, 38 insertions(+), 29 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>> index 20784b1d8236..83e6933ae62b 100644
>>>>>> --- a/drivers/net/virtio_net.c
>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>>>>                                            unsigned int *xdp_xmit,
>>>>>>                                            struct virtnet_rq_stats *stats)
>>>>>>    {
>>>>>> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>>>>>>           struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>>>>>>           u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>>>>>>           struct page *page = virt_to_head_page(buf);
>>>>>> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>>>>           rcu_read_lock();
>>>>>>           xdp_prog = rcu_dereference(rq->xdp_prog);
>>>>>>           if (xdp_prog) {
>>>>>> +               unsigned int xdp_frags_truesz = 0;
>>>>>> +               struct skb_shared_info *shinfo;
>>>>>>                   struct xdp_frame *xdpf;
>>>>>>                   struct page *xdp_page;
>>>>>>                   struct xdp_buff xdp;
>>>>>>                   void *data;
>>>>>>                   u32 act;
>>>>>> +               int i;
>>>>>>
>>>>>> -               /* Transient failure which in theory could occur if
>>>>>> -                * in-flight packets from before XDP was enabled reach
>>>>>> -                * the receive path after XDP is loaded.
>>>>>> -                */
>>>>>> -               if (unlikely(hdr->hdr.gso_type))
>>>>>> -                       goto err_xdp;
>>>>> Two questions:
>>>>>
>>>>> 1) should we keep this check for the XDP program that can't deal with XDP frags?
>>>> Yes, the problem is the same as the xdp program without xdp.frags when
>>>> GRO_HW, I will correct it.
>>>>
>>>>> 2) how could we guarantee that the vnet header (gso_type/csum_start
>>>>> etc) is still valid after XDP (where XDP program can choose to
>>>>> override the header)?
>>>> We can save the vnet headr before the driver receives the packet and
>>>> build xdp_buff, and then use
>>>> the pre-saved value in the subsequent process.
>>> The problem is that XDP may modify the packet (header) so some fields
>>> are not valid any more (e.g csum_start/offset ?).
>>>
>>> If I was not wrong, there's no way for the XDP program to access those
>>> fields or does it support it right now?
>>>
>> When guest_csum feature is negotiated, xdp cannot be set, because the metadata
>> of xdp_{buff, frame} may be adjusted by the bpf program, therefore,
>> csum_{start, offset} itself is invalid. And at the same time,
>> multi-buffer xdp programs should only Receive packets over larger MTU, so
>> we don't need gso related information anymore and need to disable GRO_HW.
> Ok, that's fine.
>
> (But it requires a large pMTU).

Yes. Like a jumbo frame that larger than 4K.

Thanks.

>
> Thanks
>
>> Thanks.
>>
>>>>>> -
>>>>>> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
>>>>>> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>>>>>> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
>>>>>> +                * with headroom may add hole in truesize, which
>>>>>> +                * make their length exceed PAGE_SIZE. So we disabled the
>>>>>> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
>>>>>>                    */
>>>>>>                   frame_sz = headroom ? PAGE_SIZE : truesize;
>>>>>>
>>>>>> -               /* This happens when rx buffer size is underestimated
>>>>>> -                * or headroom is not enough because of the buffer
>>>>>> -                * was refilled before XDP is set. This should only
>>>>>> -                * happen for the first several packets, so we don't
>>>>>> -                * care much about its performance.
>>>>>> +               /* This happens when headroom is not enough because
>>>>>> +                * of the buffer was prefilled before XDP is set.
>>>>>> +                * This should only happen for the first several packets.
>>>>>> +                * In fact, vq reset can be used here to help us clean up
>>>>>> +                * the prefilled buffers, but many existing devices do not
>>>>>> +                * support it, and we don't want to bother users who are
>>>>>> +                * using xdp normally.
>>>>>>                    */
>>>>>> -               if (unlikely(num_buf > 1 ||
>>>>>> -                            headroom < virtnet_get_headroom(vi))) {
>>>>>> -                       /* linearize data for XDP */
>>>>>> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
>>>>>> -                                                     page, offset,
>>>>>> -                                                     VIRTIO_XDP_HEADROOM,
>>>>>> -                                                     &len);
>>>>>> -                       frame_sz = PAGE_SIZE;
>>>>>> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
>>>>>> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
>>>>>> +                               goto err_xdp;
>>>>>>
>>>>>> +                       xdp_page = alloc_page(GFP_ATOMIC);
>>>>>>                           if (!xdp_page)
>>>>>>                                   goto err_xdp;
>>>>>> +
>>>>>> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
>>>>>> +                              page_address(page) + offset, len);
>>>>>> +                       frame_sz = PAGE_SIZE;
>>>>> How can we know a single page is sufficient here? (before XDP is set,
>>>>> we reserve neither headroom nor tailroom).
>>>> This is only for the first buffer, refer to add_recvbuf_mergeable() and
>>>> get_mergeable_buf_len() A buffer is always no larger than a page.
>>> Ok.
>>>
>>> Thanks
>>>
>>>>>>                           offset = VIRTIO_XDP_HEADROOM;
>>>>> I think we should still try to do linearization for the XDP program
>>>>> that doesn't support XDP frags.
>>>> Yes, you are right.
>>>>
>>>> Thanks.
>>>>
>>>>> Thanks
>>>>>
>>>>>>                   } else {
>>>>>>                           xdp_page = page;
>>>>>>                   }
>>>>>> -
>>>>>> -               /* Allow consuming headroom but reserve enough space to push
>>>>>> -                * the descriptor on if we get an XDP_TX return code.
>>>>>> -                */
>>>>>>                   data = page_address(xdp_page) + offset;
>>>>>> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
>>>>>> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
>>>>>> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
>>>>>> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
>>>>>> +                                            &num_buf, &xdp_frags_truesz, stats);
>>>>>> +               if (unlikely(err))
>>>>>> +                       goto err_xdp_frags;
>>>>>>
>>>>>>                   act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>>>>                   stats->xdp_packets++;
>>>>>> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>>>>                                   __free_pages(xdp_page, 0);
>>>>>>                           goto err_xdp;
>>>>>>                   }
>>>>>> +err_xdp_frags:
>>>>>> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
>>>>>> +
>>>>>> +               if (unlikely(xdp_page != page))
>>>>>> +                       __free_pages(xdp_page, 0);
>>>>>> +
>>>>>> +               for (i = 0; i < shinfo->nr_frags; i++) {
>>>>>> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
>>>>>> +                       put_page(xdp_page);
>>>>>> +               }
>>>>>> +               goto err_xdp;
>>>>>>           }
>>>>>>           rcu_read_unlock();
>>>>>>
>>>>>> --
>>>>>> 2.19.1.6.gb485710b
>>>>>>
diff mbox series

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 20784b1d8236..83e6933ae62b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -994,6 +994,7 @@  static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 unsigned int *xdp_xmit,
 					 struct virtnet_rq_stats *stats)
 {
+	unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
 	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
 	struct page *page = virt_to_head_page(buf);
@@ -1024,53 +1025,50 @@  static struct sk_buff *receive_mergeable(struct net_device *dev,
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
+		unsigned int xdp_frags_truesz = 0;
+		struct skb_shared_info *shinfo;
 		struct xdp_frame *xdpf;
 		struct page *xdp_page;
 		struct xdp_buff xdp;
 		void *data;
 		u32 act;
+		int i;
 
-		/* Transient failure which in theory could occur if
-		 * in-flight packets from before XDP was enabled reach
-		 * the receive path after XDP is loaded.
-		 */
-		if (unlikely(hdr->hdr.gso_type))
-			goto err_xdp;
-
-		/* Buffers with headroom use PAGE_SIZE as alloc size,
-		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
+		/* Now XDP core assumes frag size is PAGE_SIZE, but buffers
+		 * with headroom may add hole in truesize, which
+		 * make their length exceed PAGE_SIZE. So we disabled the
+		 * hole mechanism for xdp. See add_recvbuf_mergeable().
 		 */
 		frame_sz = headroom ? PAGE_SIZE : truesize;
 
-		/* This happens when rx buffer size is underestimated
-		 * or headroom is not enough because of the buffer
-		 * was refilled before XDP is set. This should only
-		 * happen for the first several packets, so we don't
-		 * care much about its performance.
+		/* This happens when headroom is not enough because
+		 * of the buffer was prefilled before XDP is set.
+		 * This should only happen for the first several packets.
+		 * In fact, vq reset can be used here to help us clean up
+		 * the prefilled buffers, but many existing devices do not
+		 * support it, and we don't want to bother users who are
+		 * using xdp normally.
 		 */
-		if (unlikely(num_buf > 1 ||
-			     headroom < virtnet_get_headroom(vi))) {
-			/* linearize data for XDP */
-			xdp_page = xdp_linearize_page(rq, &num_buf,
-						      page, offset,
-						      VIRTIO_XDP_HEADROOM,
-						      &len);
-			frame_sz = PAGE_SIZE;
+		if (unlikely(headroom < virtnet_get_headroom(vi))) {
+			if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
+				goto err_xdp;
 
+			xdp_page = alloc_page(GFP_ATOMIC);
 			if (!xdp_page)
 				goto err_xdp;
+
+			memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
+			       page_address(page) + offset, len);
+			frame_sz = PAGE_SIZE;
 			offset = VIRTIO_XDP_HEADROOM;
 		} else {
 			xdp_page = page;
 		}
-
-		/* Allow consuming headroom but reserve enough space to push
-		 * the descriptor on if we get an XDP_TX return code.
-		 */
 		data = page_address(xdp_page) + offset;
-		xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
-		xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
-				 VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
+		err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
+					     &num_buf, &xdp_frags_truesz, stats);
+		if (unlikely(err))
+			goto err_xdp_frags;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -1164,6 +1162,17 @@  static struct sk_buff *receive_mergeable(struct net_device *dev,
 				__free_pages(xdp_page, 0);
 			goto err_xdp;
 		}
+err_xdp_frags:
+		shinfo = xdp_get_shared_info_from_buff(&xdp);
+
+		if (unlikely(xdp_page != page))
+			__free_pages(xdp_page, 0);
+
+		for (i = 0; i < shinfo->nr_frags; i++) {
+			xdp_page = skb_frag_page(&shinfo->frags[i]);
+			put_page(xdp_page);
+		}
+		goto err_xdp;
 	}
 	rcu_read_unlock();