Message ID | 20241014031234.7659-2-xuanzhuo@linux.alibaba.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | virtio_net: enable premapped mode by default | expand |
On 10/14/24 05:12, Xuan Zhuo wrote: > When the frag just got a page, then may lead to regression on VM. > Specially if the sysctl net.core.high_order_alloc_disable value is 1, > then the frag always get a page when do refill. > > Which could see reliable crashes or scp failure (scp a file 100M in size > to VM): > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > of a new frag. When the frag size is larger than PAGE_SIZE, > everything is fine. However, if the frag is only one page and the > total size of the buffer and virtnet_rq_dma is larger than one page, an > overflow may occur. > > Here, when the frag size is not enough, we reduce the buffer len to fix > this problem. > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> This looks like a fix that should target the net tree, but the following patches looks like net-next material. Any special reason to bundle them together? Also, please explicitly include the the target tree in the subj on next submissions, thanks! Paolo
On Mon, Oct 14, 2024 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > When the frag just got a page, then may lead to regression on VM. > Specially if the sysctl net.core.high_order_alloc_disable value is 1, > then the frag always get a page when do refill. > > Which could see reliable crashes or scp failure (scp a file 100M in size > to VM): > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > of a new frag. When the frag size is larger than PAGE_SIZE, > everything is fine. However, if the frag is only one page and the > total size of the buffer and virtnet_rq_dma is larger than one page, an > overflow may occur. > > Here, when the frag size is not enough, we reduce the buffer len to fix > this problem. > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Though this may fix the problem and we need it now, I would like to have some tweaks in the future. Basically because of the security implication for sharing driver metadata with the device (most IOMMU can only do protection at the granule at the page). So we may end up with device-triggerable behaviour. For safety, we should use driver private memory for DMA metadata. > --- > drivers/net/virtio_net.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index f8131f92a392..59a99bbaf852 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -926,9 +926,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > void *buf, *head; > dma_addr_t addr; > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > - return NULL; > - > head = page_address(alloc_frag->page); > > if (rq->do_dma) { > @@ -2423,6 +2420,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > len = SKB_DATA_ALIGN(len) + > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > + return -ENOMEM; > + > buf = virtnet_rq_alloc(rq, len, gfp); > if (unlikely(!buf)) > return -ENOMEM; > @@ -2525,6 +2525,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > */ > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > + return -ENOMEM; This makes me think of another question, how could we guarantee len + room is less or equal to PAGE_SIZE. Especially considering we need extra head and tailroom for XDP? If we can't it is a bug: """ /** * skb_page_frag_refill - check that a page_frag contains enough room * @sz: minimum size of the fragment we want to get * @pfrag: pointer to page_frag * @gfp: priority for memory allocation * * Note: While this allocator tries to use high order pages, there is * no guarantee that allocations succeed. Therefore, @sz MUST be * less or equal than PAGE_SIZE. */ """ > + > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > + len -= sizeof(struct virtnet_rq_dma); Any reason we need to check alloc_frag->offset? > + > buf = virtnet_rq_alloc(rq, len + room, gfp); Btw, as pointed out in previous review, we should have a consistent API: 1) hide the alloc_frag like virtnet_rq_alloc() or 2) pass the alloc_frag to virtnet_rq_alloc() > if (unlikely(!buf)) > return -ENOMEM; > -- > 2.32.0.3.g01195cf9f > Thanks
On Thu, Oct 17, 2024 at 03:42:59PM +0200, Paolo Abeni wrote: > > > On 10/14/24 05:12, Xuan Zhuo wrote: > > When the frag just got a page, then may lead to regression on VM. > > Specially if the sysctl net.core.high_order_alloc_disable value is 1, > > then the frag always get a page when do refill. > > > > Which could see reliable crashes or scp failure (scp a file 100M in size > > to VM): > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > of a new frag. When the frag size is larger than PAGE_SIZE, > > everything is fine. However, if the frag is only one page and the > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > overflow may occur. > > > > Here, when the frag size is not enough, we reduce the buffer len to fix > > this problem. > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > This looks like a fix that should target the net tree, but the following > patches looks like net-next material. Any special reason to bundle them > together? > > Also, please explicitly include the the target tree in the subj on next > submissions, thanks! > > Paolo The issue is that net-next is not rebased on net, so if patches 2-5 land in next and 1 only in net, next will be broken. I think the simplest fix is just to have 1 on net-next, backport it to net too, and git will figure it out. Right?
On Thu, 17 Oct 2024 15:42:59 +0200, Paolo Abeni <pabeni@redhat.com> wrote: > > > On 10/14/24 05:12, Xuan Zhuo wrote: > > When the frag just got a page, then may lead to regression on VM. > > Specially if the sysctl net.core.high_order_alloc_disable value is 1, > > then the frag always get a page when do refill. > > > > Which could see reliable crashes or scp failure (scp a file 100M in size > > to VM): > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > of a new frag. When the frag size is larger than PAGE_SIZE, > > everything is fine. However, if the frag is only one page and the > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > overflow may occur. > > > > Here, when the frag size is not enough, we reduce the buffer len to fix > > this problem. > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > This looks like a fix that should target the net tree, but the following > patches looks like net-next material. Any special reason to bundle them > together? Sorry, I forgot to add net-next as a target tree. This may look like a fix. But the feature was disabled in the last Linux version. So the bug cannot be triggered, so we don't need to push to the net tree. Thanks. > > Also, please explicitly include the the target tree in the subj on next > submissions, thanks! > > Paolo > >
On Fri, 18 Oct 2024 15:41:41 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Oct 14, 2024 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > When the frag just got a page, then may lead to regression on VM. > > Specially if the sysctl net.core.high_order_alloc_disable value is 1, > > then the frag always get a page when do refill. > > > > Which could see reliable crashes or scp failure (scp a file 100M in size > > to VM): > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > of a new frag. When the frag size is larger than PAGE_SIZE, > > everything is fine. However, if the frag is only one page and the > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > overflow may occur. > > > > Here, when the frag size is not enough, we reduce the buffer len to fix > > this problem. > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > Though this may fix the problem and we need it now, I would like to > have some tweaks in the future. Basically because of the security > implication for sharing driver metadata with the device (most IOMMU > can only do protection at the granule at the page). So we may end up > with device-triggerable behaviour. For safety, we should use driver > private memory for DMA metadata. > > > --- > > drivers/net/virtio_net.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index f8131f92a392..59a99bbaf852 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -926,9 +926,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > void *buf, *head; > > dma_addr_t addr; > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > - return NULL; > > - > > head = page_address(alloc_frag->page); > > > > if (rq->do_dma) { > > @@ -2423,6 +2420,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > len = SKB_DATA_ALIGN(len) + > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > + return -ENOMEM; > > + > > buf = virtnet_rq_alloc(rq, len, gfp); > > if (unlikely(!buf)) > > return -ENOMEM; > > @@ -2525,6 +2525,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > */ > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > + return -ENOMEM; > > This makes me think of another question, how could we guarantee len + > room is less or equal to PAGE_SIZE. Especially considering we need > extra head and tailroom for XDP? If we can't it is a bug: get_mergeable_buf_len() do this. > > """ > /** > * skb_page_frag_refill - check that a page_frag contains enough room > * @sz: minimum size of the fragment we want to get > * @pfrag: pointer to page_frag > * @gfp: priority for memory allocation > * > * Note: While this allocator tries to use high order pages, there is > * no guarantee that allocations succeed. Therefore, @sz MUST be > * less or equal than PAGE_SIZE. > */ > """ > > > + > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > + len -= sizeof(struct virtnet_rq_dma); > > Any reason we need to check alloc_frag->offset? We just need to check when the page of the alloc frag is new. In this case, we need to allocate space to store dma meta. If the offset > 0, then we do not need to allocate extra space, so it is safe. > > > + > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > Btw, as pointed out in previous review, we should have a consistent API: > > 1) hide the alloc_frag like virtnet_rq_alloc() > > or > > 2) pass the alloc_frag to virtnet_rq_alloc() Now we need to check len+room before calling skb_page_frag_refill() so we must move virtnet_rq_alloc() outside virtnet_rq_alloc(). Thanks > > > if (unlikely(!buf)) > > return -ENOMEM; > > -- > > 2.32.0.3.g01195cf9f > > > > Thanks >
On Fri, Oct 25, 2024 at 10:35:53AM +0800, Xuan Zhuo wrote: > On Thu, 17 Oct 2024 15:42:59 +0200, Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > On 10/14/24 05:12, Xuan Zhuo wrote: > > > When the frag just got a page, then may lead to regression on VM. > > > Specially if the sysctl net.core.high_order_alloc_disable value is 1, > > > then the frag always get a page when do refill. > > > > > > Which could see reliable crashes or scp failure (scp a file 100M in size > > > to VM): > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > everything is fine. However, if the frag is only one page and the > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > overflow may occur. > > > > > > Here, when the frag size is not enough, we reduce the buffer len to fix > > > this problem. > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > This looks like a fix that should target the net tree, but the following > > patches looks like net-next material. Any special reason to bundle them > > together? > > Sorry, I forgot to add net-next as a target tree. > > This may look like a fix. But the feature was disabled in the last Linux > version. So the bug cannot be triggered, so we don't need to push to the net > tree. I think it would be useful to be clear in the commit message, use of tags, and target tree regarding fixes and non-fixes. Please describe in the commit message why this is not fixing a bug, as you have described above. And please do not include Fixes tags in patches that are not bug fixes, which seems to be the case here. If you want to refer to the patch that introduced the problem, you can use the following syntax, in the commit message, before the tags. Unlike Fixes tags, this may be line wrapped. This problem is not a bug fix for net because... It was was introduced by commit f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api"). Reported-by: ... ...
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index f8131f92a392..59a99bbaf852 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -926,9 +926,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) void *buf, *head; dma_addr_t addr; - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) - return NULL; - head = page_address(alloc_frag->page); if (rq->do_dma) { @@ -2423,6 +2420,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, len = SKB_DATA_ALIGN(len) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) + return -ENOMEM; + buf = virtnet_rq_alloc(rq, len, gfp); if (unlikely(!buf)) return -ENOMEM; @@ -2525,6 +2525,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, */ len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) + return -ENOMEM; + + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) + len -= sizeof(struct virtnet_rq_dma); + buf = virtnet_rq_alloc(rq, len + room, gfp); if (unlikely(!buf)) return -ENOMEM;
When the frag just got a page, then may lead to regression on VM. Specially if the sysctl net.core.high_order_alloc_disable value is 1, then the frag always get a page when do refill. Which could see reliable crashes or scp failure (scp a file 100M in size to VM): The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning of a new frag. When the frag size is larger than PAGE_SIZE, everything is fine. However, if the frag is only one page and the total size of the buffer and virtnet_rq_dma is larger than one page, an overflow may occur. Here, when the frag size is not enough, we reduce the buffer len to fix this problem. Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> --- drivers/net/virtio_net.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)