Message ID | 20221122074348.88601-7-hengqi@linux.alibaba.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | virtio_net: support multi buffer xdp | expand |
On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable. > > For the prefilled buffer before xdp is set, vq reset can be > used to clear it, but most devices do not support it at present. > In order not to bother users who are using xdp normally, we do > not use vq reset for the time being. I guess to tweak the part to say we will probably use vq reset in the future. > At the same time, virtio > net currently uses comp pages, and bpf_xdp_frags_increase_tail() > needs to calculate the tailroom of the last frag, which will > involve the offset of the corresponding page and cause a negative > value, so we disable tail increase by not setting xdp_rxq->frag_size. > > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > --- > drivers/net/virtio_net.c | 67 +++++++++++++++++++++++----------------- > 1 file changed, 38 insertions(+), 29 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 20784b1d8236..83e6933ae62b 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > unsigned int *xdp_xmit, > struct virtnet_rq_stats *stats) > { > + unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > struct virtio_net_hdr_mrg_rxbuf *hdr = buf; > u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers); > struct page *page = virt_to_head_page(buf); > @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > rcu_read_lock(); > xdp_prog = rcu_dereference(rq->xdp_prog); > if (xdp_prog) { > + unsigned int xdp_frags_truesz = 0; > + struct skb_shared_info *shinfo; > struct xdp_frame *xdpf; > struct page *xdp_page; > struct xdp_buff xdp; > void *data; > u32 act; > + int i; > > - /* Transient failure which in theory could occur if > - * in-flight packets from before XDP was enabled reach > - * the receive path after XDP is loaded. > - */ > - if (unlikely(hdr->hdr.gso_type)) > - goto err_xdp; Two questions: 1) should we keep this check for the XDP program that can't deal with XDP frags? 2) how could we guarantee that the vnet header (gso_type/csum_start etc) is still valid after XDP (where XDP program can choose to override the header)? > - > - /* Buffers with headroom use PAGE_SIZE as alloc size, > - * see add_recvbuf_mergeable() + get_mergeable_buf_len() > + /* Now XDP core assumes frag size is PAGE_SIZE, but buffers > + * with headroom may add hole in truesize, which > + * make their length exceed PAGE_SIZE. So we disabled the > + * hole mechanism for xdp. See add_recvbuf_mergeable(). > */ > frame_sz = headroom ? PAGE_SIZE : truesize; > > - /* This happens when rx buffer size is underestimated > - * or headroom is not enough because of the buffer > - * was refilled before XDP is set. This should only > - * happen for the first several packets, so we don't > - * care much about its performance. > + /* This happens when headroom is not enough because > + * of the buffer was prefilled before XDP is set. > + * This should only happen for the first several packets. > + * In fact, vq reset can be used here to help us clean up > + * the prefilled buffers, but many existing devices do not > + * support it, and we don't want to bother users who are > + * using xdp normally. > */ > - if (unlikely(num_buf > 1 || > - headroom < virtnet_get_headroom(vi))) { > - /* linearize data for XDP */ > - xdp_page = xdp_linearize_page(rq, &num_buf, > - page, offset, > - VIRTIO_XDP_HEADROOM, > - &len); > - frame_sz = PAGE_SIZE; > + if (unlikely(headroom < virtnet_get_headroom(vi))) { > + if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE) > + goto err_xdp; > > + xdp_page = alloc_page(GFP_ATOMIC); > if (!xdp_page) > goto err_xdp; > + > + memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM, > + page_address(page) + offset, len); > + frame_sz = PAGE_SIZE; How can we know a single page is sufficient here? (before XDP is set, we reserve neither headroom nor tailroom). > offset = VIRTIO_XDP_HEADROOM; I think we should still try to do linearization for the XDP program that doesn't support XDP frags. Thanks > } else { > xdp_page = page; > } > - > - /* Allow consuming headroom but reserve enough space to push > - * the descriptor on if we get an XDP_TX return code. > - */ > data = page_address(xdp_page) + offset; > - xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq); > - xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len, > - VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true); > + err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz, > + &num_buf, &xdp_frags_truesz, stats); > + if (unlikely(err)) > + goto err_xdp_frags; > > act = bpf_prog_run_xdp(xdp_prog, &xdp); > stats->xdp_packets++; > @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > __free_pages(xdp_page, 0); > goto err_xdp; > } > +err_xdp_frags: > + shinfo = xdp_get_shared_info_from_buff(&xdp); > + > + if (unlikely(xdp_page != page)) > + __free_pages(xdp_page, 0); > + > + for (i = 0; i < shinfo->nr_frags; i++) { > + xdp_page = skb_frag_page(&shinfo->frags[i]); > + put_page(xdp_page); > + } > + goto err_xdp; > } > rcu_read_unlock(); > > -- > 2.19.1.6.gb485710b >
在 2022/12/6 下午2:33, Jason Wang 写道: > On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable. >> >> For the prefilled buffer before xdp is set, vq reset can be >> used to clear it, but most devices do not support it at present. >> In order not to bother users who are using xdp normally, we do >> not use vq reset for the time being. > I guess to tweak the part to say we will probably use vq reset in the future. OK, it works. > >> At the same time, virtio >> net currently uses comp pages, and bpf_xdp_frags_increase_tail() >> needs to calculate the tailroom of the last frag, which will >> involve the offset of the corresponding page and cause a negative >> value, so we disable tail increase by not setting xdp_rxq->frag_size. >> >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> >> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> >> --- >> drivers/net/virtio_net.c | 67 +++++++++++++++++++++++----------------- >> 1 file changed, 38 insertions(+), 29 deletions(-) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index 20784b1d8236..83e6933ae62b 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, >> unsigned int *xdp_xmit, >> struct virtnet_rq_stats *stats) >> { >> + unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); >> struct virtio_net_hdr_mrg_rxbuf *hdr = buf; >> u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers); >> struct page *page = virt_to_head_page(buf); >> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, >> rcu_read_lock(); >> xdp_prog = rcu_dereference(rq->xdp_prog); >> if (xdp_prog) { >> + unsigned int xdp_frags_truesz = 0; >> + struct skb_shared_info *shinfo; >> struct xdp_frame *xdpf; >> struct page *xdp_page; >> struct xdp_buff xdp; >> void *data; >> u32 act; >> + int i; >> >> - /* Transient failure which in theory could occur if >> - * in-flight packets from before XDP was enabled reach >> - * the receive path after XDP is loaded. >> - */ >> - if (unlikely(hdr->hdr.gso_type)) >> - goto err_xdp; > Two questions: > > 1) should we keep this check for the XDP program that can't deal with XDP frags? Yes, the problem is the same as the xdp program without xdp.frags when GRO_HW, I will correct it. > 2) how could we guarantee that the vnet header (gso_type/csum_start > etc) is still valid after XDP (where XDP program can choose to > override the header)? We can save the vnet headr before the driver receives the packet and build xdp_buff, and then use the pre-saved value in the subsequent process. >> - >> - /* Buffers with headroom use PAGE_SIZE as alloc size, >> - * see add_recvbuf_mergeable() + get_mergeable_buf_len() >> + /* Now XDP core assumes frag size is PAGE_SIZE, but buffers >> + * with headroom may add hole in truesize, which >> + * make their length exceed PAGE_SIZE. So we disabled the >> + * hole mechanism for xdp. See add_recvbuf_mergeable(). >> */ >> frame_sz = headroom ? PAGE_SIZE : truesize; >> >> - /* This happens when rx buffer size is underestimated >> - * or headroom is not enough because of the buffer >> - * was refilled before XDP is set. This should only >> - * happen for the first several packets, so we don't >> - * care much about its performance. >> + /* This happens when headroom is not enough because >> + * of the buffer was prefilled before XDP is set. >> + * This should only happen for the first several packets. >> + * In fact, vq reset can be used here to help us clean up >> + * the prefilled buffers, but many existing devices do not >> + * support it, and we don't want to bother users who are >> + * using xdp normally. >> */ >> - if (unlikely(num_buf > 1 || >> - headroom < virtnet_get_headroom(vi))) { >> - /* linearize data for XDP */ >> - xdp_page = xdp_linearize_page(rq, &num_buf, >> - page, offset, >> - VIRTIO_XDP_HEADROOM, >> - &len); >> - frame_sz = PAGE_SIZE; >> + if (unlikely(headroom < virtnet_get_headroom(vi))) { >> + if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE) >> + goto err_xdp; >> >> + xdp_page = alloc_page(GFP_ATOMIC); >> if (!xdp_page) >> goto err_xdp; >> + >> + memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM, >> + page_address(page) + offset, len); >> + frame_sz = PAGE_SIZE; > How can we know a single page is sufficient here? (before XDP is set, > we reserve neither headroom nor tailroom). This is only for the first buffer, refer to add_recvbuf_mergeable() and get_mergeable_buf_len() A buffer is always no larger than a page. > >> offset = VIRTIO_XDP_HEADROOM; > I think we should still try to do linearization for the XDP program > that doesn't support XDP frags. Yes, you are right. Thanks. > > Thanks > >> } else { >> xdp_page = page; >> } >> - >> - /* Allow consuming headroom but reserve enough space to push >> - * the descriptor on if we get an XDP_TX return code. >> - */ >> data = page_address(xdp_page) + offset; >> - xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq); >> - xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len, >> - VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true); >> + err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz, >> + &num_buf, &xdp_frags_truesz, stats); >> + if (unlikely(err)) >> + goto err_xdp_frags; >> >> act = bpf_prog_run_xdp(xdp_prog, &xdp); >> stats->xdp_packets++; >> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, >> __free_pages(xdp_page, 0); >> goto err_xdp; >> } >> +err_xdp_frags: >> + shinfo = xdp_get_shared_info_from_buff(&xdp); >> + >> + if (unlikely(xdp_page != page)) >> + __free_pages(xdp_page, 0); >> + >> + for (i = 0; i < shinfo->nr_frags; i++) { >> + xdp_page = skb_frag_page(&shinfo->frags[i]); >> + put_page(xdp_page); >> + } >> + goto err_xdp; >> } >> rcu_read_unlock(); >> >> -- >> 2.19.1.6.gb485710b >>
On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2022/12/6 下午2:33, Jason Wang 写道: > > On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > >> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable. > >> > >> For the prefilled buffer before xdp is set, vq reset can be > >> used to clear it, but most devices do not support it at present. > >> In order not to bother users who are using xdp normally, we do > >> not use vq reset for the time being. > > I guess to tweak the part to say we will probably use vq reset in the future. > > OK, it works. > > > > >> At the same time, virtio > >> net currently uses comp pages, and bpf_xdp_frags_increase_tail() > >> needs to calculate the tailroom of the last frag, which will > >> involve the offset of the corresponding page and cause a negative > >> value, so we disable tail increase by not setting xdp_rxq->frag_size. > >> > >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > >> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > >> --- > >> drivers/net/virtio_net.c | 67 +++++++++++++++++++++++----------------- > >> 1 file changed, 38 insertions(+), 29 deletions(-) > >> > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >> index 20784b1d8236..83e6933ae62b 100644 > >> --- a/drivers/net/virtio_net.c > >> +++ b/drivers/net/virtio_net.c > >> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > >> unsigned int *xdp_xmit, > >> struct virtnet_rq_stats *stats) > >> { > >> + unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > >> struct virtio_net_hdr_mrg_rxbuf *hdr = buf; > >> u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers); > >> struct page *page = virt_to_head_page(buf); > >> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > >> rcu_read_lock(); > >> xdp_prog = rcu_dereference(rq->xdp_prog); > >> if (xdp_prog) { > >> + unsigned int xdp_frags_truesz = 0; > >> + struct skb_shared_info *shinfo; > >> struct xdp_frame *xdpf; > >> struct page *xdp_page; > >> struct xdp_buff xdp; > >> void *data; > >> u32 act; > >> + int i; > >> > >> - /* Transient failure which in theory could occur if > >> - * in-flight packets from before XDP was enabled reach > >> - * the receive path after XDP is loaded. > >> - */ > >> - if (unlikely(hdr->hdr.gso_type)) > >> - goto err_xdp; > > Two questions: > > > > 1) should we keep this check for the XDP program that can't deal with XDP frags? > > Yes, the problem is the same as the xdp program without xdp.frags when > GRO_HW, I will correct it. > > > 2) how could we guarantee that the vnet header (gso_type/csum_start > > etc) is still valid after XDP (where XDP program can choose to > > override the header)? > > We can save the vnet headr before the driver receives the packet and > build xdp_buff, and then use > the pre-saved value in the subsequent process. The problem is that XDP may modify the packet (header) so some fields are not valid any more (e.g csum_start/offset ?). If I was not wrong, there's no way for the XDP program to access those fields or does it support it right now? > > >> - > >> - /* Buffers with headroom use PAGE_SIZE as alloc size, > >> - * see add_recvbuf_mergeable() + get_mergeable_buf_len() > >> + /* Now XDP core assumes frag size is PAGE_SIZE, but buffers > >> + * with headroom may add hole in truesize, which > >> + * make their length exceed PAGE_SIZE. So we disabled the > >> + * hole mechanism for xdp. See add_recvbuf_mergeable(). > >> */ > >> frame_sz = headroom ? PAGE_SIZE : truesize; > >> > >> - /* This happens when rx buffer size is underestimated > >> - * or headroom is not enough because of the buffer > >> - * was refilled before XDP is set. This should only > >> - * happen for the first several packets, so we don't > >> - * care much about its performance. > >> + /* This happens when headroom is not enough because > >> + * of the buffer was prefilled before XDP is set. > >> + * This should only happen for the first several packets. > >> + * In fact, vq reset can be used here to help us clean up > >> + * the prefilled buffers, but many existing devices do not > >> + * support it, and we don't want to bother users who are > >> + * using xdp normally. > >> */ > >> - if (unlikely(num_buf > 1 || > >> - headroom < virtnet_get_headroom(vi))) { > >> - /* linearize data for XDP */ > >> - xdp_page = xdp_linearize_page(rq, &num_buf, > >> - page, offset, > >> - VIRTIO_XDP_HEADROOM, > >> - &len); > >> - frame_sz = PAGE_SIZE; > >> + if (unlikely(headroom < virtnet_get_headroom(vi))) { > >> + if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE) > >> + goto err_xdp; > >> > >> + xdp_page = alloc_page(GFP_ATOMIC); > >> if (!xdp_page) > >> goto err_xdp; > >> + > >> + memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM, > >> + page_address(page) + offset, len); > >> + frame_sz = PAGE_SIZE; > > How can we know a single page is sufficient here? (before XDP is set, > > we reserve neither headroom nor tailroom). > > This is only for the first buffer, refer to add_recvbuf_mergeable() and > get_mergeable_buf_len() A buffer is always no larger than a page. Ok. Thanks > > > > >> offset = VIRTIO_XDP_HEADROOM; > > I think we should still try to do linearization for the XDP program > > that doesn't support XDP frags. > > Yes, you are right. > > Thanks. > > > > > Thanks > > > >> } else { > >> xdp_page = page; > >> } > >> - > >> - /* Allow consuming headroom but reserve enough space to push > >> - * the descriptor on if we get an XDP_TX return code. > >> - */ > >> data = page_address(xdp_page) + offset; > >> - xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq); > >> - xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len, > >> - VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true); > >> + err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz, > >> + &num_buf, &xdp_frags_truesz, stats); > >> + if (unlikely(err)) > >> + goto err_xdp_frags; > >> > >> act = bpf_prog_run_xdp(xdp_prog, &xdp); > >> stats->xdp_packets++; > >> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > >> __free_pages(xdp_page, 0); > >> goto err_xdp; > >> } > >> +err_xdp_frags: > >> + shinfo = xdp_get_shared_info_from_buff(&xdp); > >> + > >> + if (unlikely(xdp_page != page)) > >> + __free_pages(xdp_page, 0); > >> + > >> + for (i = 0; i < shinfo->nr_frags; i++) { > >> + xdp_page = skb_frag_page(&shinfo->frags[i]); > >> + put_page(xdp_page); > >> + } > >> + goto err_xdp; > >> } > >> rcu_read_unlock(); > >> > >> -- > >> 2.19.1.6.gb485710b > >> >
On Tue, Dec 13, 2022 at 03:08:46PM +0800, Jason Wang wrote: > On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > > > > > 在 2022/12/6 下午2:33, Jason Wang 写道: > > > On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > >> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable. > > >> > > >> For the prefilled buffer before xdp is set, vq reset can be > > >> used to clear it, but most devices do not support it at present. > > >> In order not to bother users who are using xdp normally, we do > > >> not use vq reset for the time being. > > > I guess to tweak the part to say we will probably use vq reset in the future. > > > > OK, it works. > > > > > > > >> At the same time, virtio > > >> net currently uses comp pages, and bpf_xdp_frags_increase_tail() > > >> needs to calculate the tailroom of the last frag, which will > > >> involve the offset of the corresponding page and cause a negative > > >> value, so we disable tail increase by not setting xdp_rxq->frag_size. > > >> > > >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > > >> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > >> --- > > >> drivers/net/virtio_net.c | 67 +++++++++++++++++++++++----------------- > > >> 1 file changed, 38 insertions(+), 29 deletions(-) > > >> > > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > >> index 20784b1d8236..83e6933ae62b 100644 > > >> --- a/drivers/net/virtio_net.c > > >> +++ b/drivers/net/virtio_net.c > > >> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > >> unsigned int *xdp_xmit, > > >> struct virtnet_rq_stats *stats) > > >> { > > >> + unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > >> struct virtio_net_hdr_mrg_rxbuf *hdr = buf; > > >> u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers); > > >> struct page *page = virt_to_head_page(buf); > > >> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > >> rcu_read_lock(); > > >> xdp_prog = rcu_dereference(rq->xdp_prog); > > >> if (xdp_prog) { > > >> + unsigned int xdp_frags_truesz = 0; > > >> + struct skb_shared_info *shinfo; > > >> struct xdp_frame *xdpf; > > >> struct page *xdp_page; > > >> struct xdp_buff xdp; > > >> void *data; > > >> u32 act; > > >> + int i; > > >> > > >> - /* Transient failure which in theory could occur if > > >> - * in-flight packets from before XDP was enabled reach > > >> - * the receive path after XDP is loaded. > > >> - */ > > >> - if (unlikely(hdr->hdr.gso_type)) > > >> - goto err_xdp; > > > Two questions: > > > > > > 1) should we keep this check for the XDP program that can't deal with XDP frags? > > > > Yes, the problem is the same as the xdp program without xdp.frags when > > GRO_HW, I will correct it. > > > > > 2) how could we guarantee that the vnet header (gso_type/csum_start > > > etc) is still valid after XDP (where XDP program can choose to > > > override the header)? > > > > We can save the vnet headr before the driver receives the packet and > > build xdp_buff, and then use > > the pre-saved value in the subsequent process. > > The problem is that XDP may modify the packet (header) so some fields > are not valid any more (e.g csum_start/offset ?). > > If I was not wrong, there's no way for the XDP program to access those > fields or does it support it right now? > When guest_csum feature is negotiated, xdp cannot be set, because the metadata of xdp_{buff, frame} may be adjusted by the bpf program, therefore, csum_{start, offset} itself is invalid. And at the same time, multi-buffer xdp programs should only Receive packets over larger MTU, so we don't need gso related information anymore and need to disable GRO_HW. Thanks. > > > > >> - > > >> - /* Buffers with headroom use PAGE_SIZE as alloc size, > > >> - * see add_recvbuf_mergeable() + get_mergeable_buf_len() > > >> + /* Now XDP core assumes frag size is PAGE_SIZE, but buffers > > >> + * with headroom may add hole in truesize, which > > >> + * make their length exceed PAGE_SIZE. So we disabled the > > >> + * hole mechanism for xdp. See add_recvbuf_mergeable(). > > >> */ > > >> frame_sz = headroom ? PAGE_SIZE : truesize; > > >> > > >> - /* This happens when rx buffer size is underestimated > > >> - * or headroom is not enough because of the buffer > > >> - * was refilled before XDP is set. This should only > > >> - * happen for the first several packets, so we don't > > >> - * care much about its performance. > > >> + /* This happens when headroom is not enough because > > >> + * of the buffer was prefilled before XDP is set. > > >> + * This should only happen for the first several packets. > > >> + * In fact, vq reset can be used here to help us clean up > > >> + * the prefilled buffers, but many existing devices do not > > >> + * support it, and we don't want to bother users who are > > >> + * using xdp normally. > > >> */ > > >> - if (unlikely(num_buf > 1 || > > >> - headroom < virtnet_get_headroom(vi))) { > > >> - /* linearize data for XDP */ > > >> - xdp_page = xdp_linearize_page(rq, &num_buf, > > >> - page, offset, > > >> - VIRTIO_XDP_HEADROOM, > > >> - &len); > > >> - frame_sz = PAGE_SIZE; > > >> + if (unlikely(headroom < virtnet_get_headroom(vi))) { > > >> + if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE) > > >> + goto err_xdp; > > >> > > >> + xdp_page = alloc_page(GFP_ATOMIC); > > >> if (!xdp_page) > > >> goto err_xdp; > > >> + > > >> + memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM, > > >> + page_address(page) + offset, len); > > >> + frame_sz = PAGE_SIZE; > > > How can we know a single page is sufficient here? (before XDP is set, > > > we reserve neither headroom nor tailroom). > > > > This is only for the first buffer, refer to add_recvbuf_mergeable() and > > get_mergeable_buf_len() A buffer is always no larger than a page. > > Ok. > > Thanks > > > > > > > > >> offset = VIRTIO_XDP_HEADROOM; > > > I think we should still try to do linearization for the XDP program > > > that doesn't support XDP frags. > > > > Yes, you are right. > > > > Thanks. > > > > > > > > Thanks > > > > > >> } else { > > >> xdp_page = page; > > >> } > > >> - > > >> - /* Allow consuming headroom but reserve enough space to push > > >> - * the descriptor on if we get an XDP_TX return code. > > >> - */ > > >> data = page_address(xdp_page) + offset; > > >> - xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq); > > >> - xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len, > > >> - VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true); > > >> + err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz, > > >> + &num_buf, &xdp_frags_truesz, stats); > > >> + if (unlikely(err)) > > >> + goto err_xdp_frags; > > >> > > >> act = bpf_prog_run_xdp(xdp_prog, &xdp); > > >> stats->xdp_packets++; > > >> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > >> __free_pages(xdp_page, 0); > > >> goto err_xdp; > > >> } > > >> +err_xdp_frags: > > >> + shinfo = xdp_get_shared_info_from_buff(&xdp); > > >> + > > >> + if (unlikely(xdp_page != page)) > > >> + __free_pages(xdp_page, 0); > > >> + > > >> + for (i = 0; i < shinfo->nr_frags; i++) { > > >> + xdp_page = skb_frag_page(&shinfo->frags[i]); > > >> + put_page(xdp_page); > > >> + } > > >> + goto err_xdp; > > >> } > > >> rcu_read_unlock(); > > >> > > >> -- > > >> 2.19.1.6.gb485710b > > >> > >
On Wed, Dec 14, 2022 at 4:38 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > On Tue, Dec 13, 2022 at 03:08:46PM +0800, Jason Wang wrote: > > On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > > > > > > > > > 在 2022/12/6 下午2:33, Jason Wang 写道: > > > > On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > >> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable. > > > >> > > > >> For the prefilled buffer before xdp is set, vq reset can be > > > >> used to clear it, but most devices do not support it at present. > > > >> In order not to bother users who are using xdp normally, we do > > > >> not use vq reset for the time being. > > > > I guess to tweak the part to say we will probably use vq reset in the future. > > > > > > OK, it works. > > > > > > > > > > >> At the same time, virtio > > > >> net currently uses comp pages, and bpf_xdp_frags_increase_tail() > > > >> needs to calculate the tailroom of the last frag, which will > > > >> involve the offset of the corresponding page and cause a negative > > > >> value, so we disable tail increase by not setting xdp_rxq->frag_size. > > > >> > > > >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> > > > >> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > >> --- > > > >> drivers/net/virtio_net.c | 67 +++++++++++++++++++++++----------------- > > > >> 1 file changed, 38 insertions(+), 29 deletions(-) > > > >> > > > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > >> index 20784b1d8236..83e6933ae62b 100644 > > > >> --- a/drivers/net/virtio_net.c > > > >> +++ b/drivers/net/virtio_net.c > > > >> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > >> unsigned int *xdp_xmit, > > > >> struct virtnet_rq_stats *stats) > > > >> { > > > >> + unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > >> struct virtio_net_hdr_mrg_rxbuf *hdr = buf; > > > >> u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers); > > > >> struct page *page = virt_to_head_page(buf); > > > >> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > >> rcu_read_lock(); > > > >> xdp_prog = rcu_dereference(rq->xdp_prog); > > > >> if (xdp_prog) { > > > >> + unsigned int xdp_frags_truesz = 0; > > > >> + struct skb_shared_info *shinfo; > > > >> struct xdp_frame *xdpf; > > > >> struct page *xdp_page; > > > >> struct xdp_buff xdp; > > > >> void *data; > > > >> u32 act; > > > >> + int i; > > > >> > > > >> - /* Transient failure which in theory could occur if > > > >> - * in-flight packets from before XDP was enabled reach > > > >> - * the receive path after XDP is loaded. > > > >> - */ > > > >> - if (unlikely(hdr->hdr.gso_type)) > > > >> - goto err_xdp; > > > > Two questions: > > > > > > > > 1) should we keep this check for the XDP program that can't deal with XDP frags? > > > > > > Yes, the problem is the same as the xdp program without xdp.frags when > > > GRO_HW, I will correct it. > > > > > > > 2) how could we guarantee that the vnet header (gso_type/csum_start > > > > etc) is still valid after XDP (where XDP program can choose to > > > > override the header)? > > > > > > We can save the vnet headr before the driver receives the packet and > > > build xdp_buff, and then use > > > the pre-saved value in the subsequent process. > > > > The problem is that XDP may modify the packet (header) so some fields > > are not valid any more (e.g csum_start/offset ?). > > > > If I was not wrong, there's no way for the XDP program to access those > > fields or does it support it right now? > > > > When guest_csum feature is negotiated, xdp cannot be set, because the metadata > of xdp_{buff, frame} may be adjusted by the bpf program, therefore, > csum_{start, offset} itself is invalid. And at the same time, > multi-buffer xdp programs should only Receive packets over larger MTU, so > we don't need gso related information anymore and need to disable GRO_HW. Ok, that's fine. (But it requires a large pMTU). Thanks > > Thanks. > > > > > > > >> - > > > >> - /* Buffers with headroom use PAGE_SIZE as alloc size, > > > >> - * see add_recvbuf_mergeable() + get_mergeable_buf_len() > > > >> + /* Now XDP core assumes frag size is PAGE_SIZE, but buffers > > > >> + * with headroom may add hole in truesize, which > > > >> + * make their length exceed PAGE_SIZE. So we disabled the > > > >> + * hole mechanism for xdp. See add_recvbuf_mergeable(). > > > >> */ > > > >> frame_sz = headroom ? PAGE_SIZE : truesize; > > > >> > > > >> - /* This happens when rx buffer size is underestimated > > > >> - * or headroom is not enough because of the buffer > > > >> - * was refilled before XDP is set. This should only > > > >> - * happen for the first several packets, so we don't > > > >> - * care much about its performance. > > > >> + /* This happens when headroom is not enough because > > > >> + * of the buffer was prefilled before XDP is set. > > > >> + * This should only happen for the first several packets. > > > >> + * In fact, vq reset can be used here to help us clean up > > > >> + * the prefilled buffers, but many existing devices do not > > > >> + * support it, and we don't want to bother users who are > > > >> + * using xdp normally. > > > >> */ > > > >> - if (unlikely(num_buf > 1 || > > > >> - headroom < virtnet_get_headroom(vi))) { > > > >> - /* linearize data for XDP */ > > > >> - xdp_page = xdp_linearize_page(rq, &num_buf, > > > >> - page, offset, > > > >> - VIRTIO_XDP_HEADROOM, > > > >> - &len); > > > >> - frame_sz = PAGE_SIZE; > > > >> + if (unlikely(headroom < virtnet_get_headroom(vi))) { > > > >> + if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE) > > > >> + goto err_xdp; > > > >> > > > >> + xdp_page = alloc_page(GFP_ATOMIC); > > > >> if (!xdp_page) > > > >> goto err_xdp; > > > >> + > > > >> + memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM, > > > >> + page_address(page) + offset, len); > > > >> + frame_sz = PAGE_SIZE; > > > > How can we know a single page is sufficient here? (before XDP is set, > > > > we reserve neither headroom nor tailroom). > > > > > > This is only for the first buffer, refer to add_recvbuf_mergeable() and > > > get_mergeable_buf_len() A buffer is always no larger than a page. > > > > Ok. > > > > Thanks > > > > > > > > > > > > >> offset = VIRTIO_XDP_HEADROOM; > > > > I think we should still try to do linearization for the XDP program > > > > that doesn't support XDP frags. > > > > > > Yes, you are right. > > > > > > Thanks. > > > > > > > > > > > Thanks > > > > > > > >> } else { > > > >> xdp_page = page; > > > >> } > > > >> - > > > >> - /* Allow consuming headroom but reserve enough space to push > > > >> - * the descriptor on if we get an XDP_TX return code. > > > >> - */ > > > >> data = page_address(xdp_page) + offset; > > > >> - xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq); > > > >> - xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len, > > > >> - VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true); > > > >> + err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz, > > > >> + &num_buf, &xdp_frags_truesz, stats); > > > >> + if (unlikely(err)) > > > >> + goto err_xdp_frags; > > > >> > > > >> act = bpf_prog_run_xdp(xdp_prog, &xdp); > > > >> stats->xdp_packets++; > > > >> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > >> __free_pages(xdp_page, 0); > > > >> goto err_xdp; > > > >> } > > > >> +err_xdp_frags: > > > >> + shinfo = xdp_get_shared_info_from_buff(&xdp); > > > >> + > > > >> + if (unlikely(xdp_page != page)) > > > >> + __free_pages(xdp_page, 0); > > > >> + > > > >> + for (i = 0; i < shinfo->nr_frags; i++) { > > > >> + xdp_page = skb_frag_page(&shinfo->frags[i]); > > > >> + put_page(xdp_page); > > > >> + } > > > >> + goto err_xdp; > > > >> } > > > >> rcu_read_unlock(); > > > >> > > > >> -- > > > >> 2.19.1.6.gb485710b > > > >> > > > >
在 2022/12/16 上午11:46, Jason Wang 写道: > On Wed, Dec 14, 2022 at 4:38 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >> On Tue, Dec 13, 2022 at 03:08:46PM +0800, Jason Wang wrote: >>> On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>> >>>> >>>> 在 2022/12/6 下午2:33, Jason Wang 写道: >>>>> On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >>>>>> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable. >>>>>> >>>>>> For the prefilled buffer before xdp is set, vq reset can be >>>>>> used to clear it, but most devices do not support it at present. >>>>>> In order not to bother users who are using xdp normally, we do >>>>>> not use vq reset for the time being. >>>>> I guess to tweak the part to say we will probably use vq reset in the future. >>>> OK, it works. >>>> >>>>>> At the same time, virtio >>>>>> net currently uses comp pages, and bpf_xdp_frags_increase_tail() >>>>>> needs to calculate the tailroom of the last frag, which will >>>>>> involve the offset of the corresponding page and cause a negative >>>>>> value, so we disable tail increase by not setting xdp_rxq->frag_size. >>>>>> >>>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> >>>>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> >>>>>> --- >>>>>> drivers/net/virtio_net.c | 67 +++++++++++++++++++++++----------------- >>>>>> 1 file changed, 38 insertions(+), 29 deletions(-) >>>>>> >>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>>>> index 20784b1d8236..83e6933ae62b 100644 >>>>>> --- a/drivers/net/virtio_net.c >>>>>> +++ b/drivers/net/virtio_net.c >>>>>> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, >>>>>> unsigned int *xdp_xmit, >>>>>> struct virtnet_rq_stats *stats) >>>>>> { >>>>>> + unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); >>>>>> struct virtio_net_hdr_mrg_rxbuf *hdr = buf; >>>>>> u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers); >>>>>> struct page *page = virt_to_head_page(buf); >>>>>> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, >>>>>> rcu_read_lock(); >>>>>> xdp_prog = rcu_dereference(rq->xdp_prog); >>>>>> if (xdp_prog) { >>>>>> + unsigned int xdp_frags_truesz = 0; >>>>>> + struct skb_shared_info *shinfo; >>>>>> struct xdp_frame *xdpf; >>>>>> struct page *xdp_page; >>>>>> struct xdp_buff xdp; >>>>>> void *data; >>>>>> u32 act; >>>>>> + int i; >>>>>> >>>>>> - /* Transient failure which in theory could occur if >>>>>> - * in-flight packets from before XDP was enabled reach >>>>>> - * the receive path after XDP is loaded. >>>>>> - */ >>>>>> - if (unlikely(hdr->hdr.gso_type)) >>>>>> - goto err_xdp; >>>>> Two questions: >>>>> >>>>> 1) should we keep this check for the XDP program that can't deal with XDP frags? >>>> Yes, the problem is the same as the xdp program without xdp.frags when >>>> GRO_HW, I will correct it. >>>> >>>>> 2) how could we guarantee that the vnet header (gso_type/csum_start >>>>> etc) is still valid after XDP (where XDP program can choose to >>>>> override the header)? >>>> We can save the vnet headr before the driver receives the packet and >>>> build xdp_buff, and then use >>>> the pre-saved value in the subsequent process. >>> The problem is that XDP may modify the packet (header) so some fields >>> are not valid any more (e.g csum_start/offset ?). >>> >>> If I was not wrong, there's no way for the XDP program to access those >>> fields or does it support it right now? >>> >> When guest_csum feature is negotiated, xdp cannot be set, because the metadata >> of xdp_{buff, frame} may be adjusted by the bpf program, therefore, >> csum_{start, offset} itself is invalid. And at the same time, >> multi-buffer xdp programs should only Receive packets over larger MTU, so >> we don't need gso related information anymore and need to disable GRO_HW. > Ok, that's fine. > > (But it requires a large pMTU). Yes. Like a jumbo frame that larger than 4K. Thanks. > > Thanks > >> Thanks. >> >>>>>> - >>>>>> - /* Buffers with headroom use PAGE_SIZE as alloc size, >>>>>> - * see add_recvbuf_mergeable() + get_mergeable_buf_len() >>>>>> + /* Now XDP core assumes frag size is PAGE_SIZE, but buffers >>>>>> + * with headroom may add hole in truesize, which >>>>>> + * make their length exceed PAGE_SIZE. So we disabled the >>>>>> + * hole mechanism for xdp. See add_recvbuf_mergeable(). >>>>>> */ >>>>>> frame_sz = headroom ? PAGE_SIZE : truesize; >>>>>> >>>>>> - /* This happens when rx buffer size is underestimated >>>>>> - * or headroom is not enough because of the buffer >>>>>> - * was refilled before XDP is set. This should only >>>>>> - * happen for the first several packets, so we don't >>>>>> - * care much about its performance. >>>>>> + /* This happens when headroom is not enough because >>>>>> + * of the buffer was prefilled before XDP is set. >>>>>> + * This should only happen for the first several packets. >>>>>> + * In fact, vq reset can be used here to help us clean up >>>>>> + * the prefilled buffers, but many existing devices do not >>>>>> + * support it, and we don't want to bother users who are >>>>>> + * using xdp normally. >>>>>> */ >>>>>> - if (unlikely(num_buf > 1 || >>>>>> - headroom < virtnet_get_headroom(vi))) { >>>>>> - /* linearize data for XDP */ >>>>>> - xdp_page = xdp_linearize_page(rq, &num_buf, >>>>>> - page, offset, >>>>>> - VIRTIO_XDP_HEADROOM, >>>>>> - &len); >>>>>> - frame_sz = PAGE_SIZE; >>>>>> + if (unlikely(headroom < virtnet_get_headroom(vi))) { >>>>>> + if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE) >>>>>> + goto err_xdp; >>>>>> >>>>>> + xdp_page = alloc_page(GFP_ATOMIC); >>>>>> if (!xdp_page) >>>>>> goto err_xdp; >>>>>> + >>>>>> + memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM, >>>>>> + page_address(page) + offset, len); >>>>>> + frame_sz = PAGE_SIZE; >>>>> How can we know a single page is sufficient here? (before XDP is set, >>>>> we reserve neither headroom nor tailroom). >>>> This is only for the first buffer, refer to add_recvbuf_mergeable() and >>>> get_mergeable_buf_len() A buffer is always no larger than a page. >>> Ok. >>> >>> Thanks >>> >>>>>> offset = VIRTIO_XDP_HEADROOM; >>>>> I think we should still try to do linearization for the XDP program >>>>> that doesn't support XDP frags. >>>> Yes, you are right. >>>> >>>> Thanks. >>>> >>>>> Thanks >>>>> >>>>>> } else { >>>>>> xdp_page = page; >>>>>> } >>>>>> - >>>>>> - /* Allow consuming headroom but reserve enough space to push >>>>>> - * the descriptor on if we get an XDP_TX return code. >>>>>> - */ >>>>>> data = page_address(xdp_page) + offset; >>>>>> - xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq); >>>>>> - xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len, >>>>>> - VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true); >>>>>> + err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz, >>>>>> + &num_buf, &xdp_frags_truesz, stats); >>>>>> + if (unlikely(err)) >>>>>> + goto err_xdp_frags; >>>>>> >>>>>> act = bpf_prog_run_xdp(xdp_prog, &xdp); >>>>>> stats->xdp_packets++; >>>>>> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, >>>>>> __free_pages(xdp_page, 0); >>>>>> goto err_xdp; >>>>>> } >>>>>> +err_xdp_frags: >>>>>> + shinfo = xdp_get_shared_info_from_buff(&xdp); >>>>>> + >>>>>> + if (unlikely(xdp_page != page)) >>>>>> + __free_pages(xdp_page, 0); >>>>>> + >>>>>> + for (i = 0; i < shinfo->nr_frags; i++) { >>>>>> + xdp_page = skb_frag_page(&shinfo->frags[i]); >>>>>> + put_page(xdp_page); >>>>>> + } >>>>>> + goto err_xdp; >>>>>> } >>>>>> rcu_read_unlock(); >>>>>> >>>>>> -- >>>>>> 2.19.1.6.gb485710b >>>>>>
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 20784b1d8236..83e6933ae62b 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, unsigned int *xdp_xmit, struct virtnet_rq_stats *stats) { + unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); struct virtio_net_hdr_mrg_rxbuf *hdr = buf; u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers); struct page *page = virt_to_head_page(buf); @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, rcu_read_lock(); xdp_prog = rcu_dereference(rq->xdp_prog); if (xdp_prog) { + unsigned int xdp_frags_truesz = 0; + struct skb_shared_info *shinfo; struct xdp_frame *xdpf; struct page *xdp_page; struct xdp_buff xdp; void *data; u32 act; + int i; - /* Transient failure which in theory could occur if - * in-flight packets from before XDP was enabled reach - * the receive path after XDP is loaded. - */ - if (unlikely(hdr->hdr.gso_type)) - goto err_xdp; - - /* Buffers with headroom use PAGE_SIZE as alloc size, - * see add_recvbuf_mergeable() + get_mergeable_buf_len() + /* Now XDP core assumes frag size is PAGE_SIZE, but buffers + * with headroom may add hole in truesize, which + * make their length exceed PAGE_SIZE. So we disabled the + * hole mechanism for xdp. See add_recvbuf_mergeable(). */ frame_sz = headroom ? PAGE_SIZE : truesize; - /* This happens when rx buffer size is underestimated - * or headroom is not enough because of the buffer - * was refilled before XDP is set. This should only - * happen for the first several packets, so we don't - * care much about its performance. + /* This happens when headroom is not enough because + * of the buffer was prefilled before XDP is set. + * This should only happen for the first several packets. + * In fact, vq reset can be used here to help us clean up + * the prefilled buffers, but many existing devices do not + * support it, and we don't want to bother users who are + * using xdp normally. */ - if (unlikely(num_buf > 1 || - headroom < virtnet_get_headroom(vi))) { - /* linearize data for XDP */ - xdp_page = xdp_linearize_page(rq, &num_buf, - page, offset, - VIRTIO_XDP_HEADROOM, - &len); - frame_sz = PAGE_SIZE; + if (unlikely(headroom < virtnet_get_headroom(vi))) { + if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE) + goto err_xdp; + xdp_page = alloc_page(GFP_ATOMIC); if (!xdp_page) goto err_xdp; + + memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM, + page_address(page) + offset, len); + frame_sz = PAGE_SIZE; offset = VIRTIO_XDP_HEADROOM; } else { xdp_page = page; } - - /* Allow consuming headroom but reserve enough space to push - * the descriptor on if we get an XDP_TX return code. - */ data = page_address(xdp_page) + offset; - xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq); - xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len, - VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true); + err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz, + &num_buf, &xdp_frags_truesz, stats); + if (unlikely(err)) + goto err_xdp_frags; act = bpf_prog_run_xdp(xdp_prog, &xdp); stats->xdp_packets++; @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, __free_pages(xdp_page, 0); goto err_xdp; } +err_xdp_frags: + shinfo = xdp_get_shared_info_from_buff(&xdp); + + if (unlikely(xdp_page != page)) + __free_pages(xdp_page, 0); + + for (i = 0; i < shinfo->nr_frags; i++) { + xdp_page = skb_frag_page(&shinfo->frags[i]); + put_page(xdp_page); + } + goto err_xdp; } rcu_read_unlock();