Message ID | 20240820071913.68004-1-xuanzhuo@linux.alibaba.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] virtio-net: fix overflow inside virtnet_rq_alloc | expand |
From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Date: Tue, 20 Aug 2024 15:19:13 +0800 > leads to regression on VM with the sysctl value of: Where's the beginning of the sentence? You mean, "This overflow leads"? > > - net.core.high_order_alloc_disable=1 This `- ` can be removed - at least some syntax highlighters color it in red :D But I think you can just drop this reference to high_order_alloc_disable. Just write that if the backing page is order-0, then this happens. > > which could see reliable crashes or scp failure (scp a file 100M in size > to VM): > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > of a new frag. When the frag size is larger than PAGE_SIZE, > everything is fine. However, if the frag is only one page and the > total size of the buffer and virtnet_rq_dma is larger than one page, an > overflow may occur. In this case, if an overflow is possible, I adjust > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > the first buffer of the frag is affected. > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > --- > drivers/net/virtio_net.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c6af18948092..e5286a6da863 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > void *buf, *head; > dma_addr_t addr; > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > - return NULL; > - > head = page_address(alloc_frag->page); > > dma = head; > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > len = SKB_DATA_ALIGN(len) + > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > + return -ENOMEM; Why did you move this call into the call sites? > + > buf = virtnet_rq_alloc(rq, len, gfp); > if (unlikely(!buf)) > return -ENOMEM; > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > */ > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > + return -ENOMEM; > + > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > + len -= sizeof(struct virtnet_rq_dma); > + > buf = virtnet_rq_alloc(rq, len + room, gfp); `len + room` is referenced 3 times here, perhaps is worth a variable? Is it fine that you decrease @len here? > if (unlikely(!buf)) > return -ENOMEM; Thanks, Olek
On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > leads to regression on VM with the sysctl value of: > > - net.core.high_order_alloc_disable=1 > which could see reliable crashes or scp failure (scp a file 100M in size > to VM): > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > of a new frag. When the frag size is larger than PAGE_SIZE, > everything is fine. However, if the frag is only one page and the > total size of the buffer and virtnet_rq_dma is larger than one page, an > overflow may occur. In this case, if an overflow is possible, I adjust > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > the first buffer of the frag is affected. > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Darren, could you pls test and confirm? > --- > drivers/net/virtio_net.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c6af18948092..e5286a6da863 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > void *buf, *head; > dma_addr_t addr; > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > - return NULL; > - > head = page_address(alloc_frag->page); > > dma = head; > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > len = SKB_DATA_ALIGN(len) + > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > + return -ENOMEM; > + > buf = virtnet_rq_alloc(rq, len, gfp); > if (unlikely(!buf)) > return -ENOMEM; > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > */ > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > + return -ENOMEM; > + > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > + len -= sizeof(struct virtnet_rq_dma); > + > buf = virtnet_rq_alloc(rq, len + room, gfp); > if (unlikely(!buf)) > return -ENOMEM; > -- > 2.32.0.3.g01195cf9f
On 8/20/2024 12:19 AM, Xuan Zhuo wrote: > leads to regression on VM with the sysctl value of: > > - net.core.high_order_alloc_disable=1 > > which could see reliable crashes or scp failure (scp a file 100M in size > to VM): > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > of a new frag. When the frag size is larger than PAGE_SIZE, > everything is fine. However, if the frag is only one page and the > total size of the buffer and virtnet_rq_dma is larger than one page, an > overflow may occur. In this case, if an overflow is possible, I adjust > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > the first buffer of the frag is affected. > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > --- > drivers/net/virtio_net.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c6af18948092..e5286a6da863 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > void *buf, *head; > dma_addr_t addr; > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > - return NULL; > - > head = page_address(alloc_frag->page); > > dma = head; > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > len = SKB_DATA_ALIGN(len) + > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > + return -ENOMEM; > + Do you want to document the assumption that small packet case won't end up crossing the page frag boundary unlike the mergeable case? Add a comment block to explain or a WARN_ON() check against potential overflow would work with me. > buf = virtnet_rq_alloc(rq, len, gfp); > if (unlikely(!buf)) > return -ENOMEM; > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > */ > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > + return -ENOMEM; > + > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > + len -= sizeof(struct virtnet_rq_dma); > + This could address my previous concern for possibly regressing every buffer size for the mergeable case, thanks. Though I still don't get why carving up a small chunk from page_frag for storing the virtnet_rq_dma metadata, this would cause perf regression on certain MTU size that happens to end up with one more base page (and an extra descriptor as well) to be allocated compared to the previous code without the extra virtnet_rq_dma content. How hard would it be to allocate a dedicated struct to store the related information without affecting the (size of) datapath pages? FWIW, out of the code review perspective, I've looked up the past conversations but didn't see comprehensive benchmark was done before removing the old code and making premap the sole default mode. Granted this would reduce the footprint of additional code and the associated maintaining cost immediately, but I would assume at least there should have been thorough performance runs upfront to guarantee no regression is seen with every possible use case, or the negative effect is comparatively negligible even though there's slight regression in some limited case. If that kind of perf measurement hadn't been done before getting accepted/merged, I think at least it should allow both modes to coexist for a while such that every user could gauge the performance effect. Thanks, -Siwei > buf = virtnet_rq_alloc(rq, len + room, gfp); > if (unlikely(!buf)) > return -ENOMEM;
On Tue, Aug 20, 2024 at 12:44:46PM -0700, Si-Wei Liu wrote: > > > On 8/20/2024 12:19 AM, Xuan Zhuo wrote: > > leads to regression on VM with the sysctl value of: > > > > - net.core.high_order_alloc_disable=1 > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > to VM): > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > of a new frag. When the frag size is larger than PAGE_SIZE, > > everything is fine. However, if the frag is only one page and the > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > overflow may occur. In this case, if an overflow is possible, I adjust > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > the first buffer of the frag is affected. > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > --- > > drivers/net/virtio_net.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c6af18948092..e5286a6da863 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > void *buf, *head; > > dma_addr_t addr; > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > - return NULL; > > - > > head = page_address(alloc_frag->page); > > dma = head; > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > len = SKB_DATA_ALIGN(len) + > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > + return -ENOMEM; > > + > Do you want to document the assumption that small packet case won't end up > crossing the page frag boundary unlike the mergeable case? Add a comment > block to explain or a WARN_ON() check against potential overflow would work > with me. > > > buf = virtnet_rq_alloc(rq, len, gfp); > > if (unlikely(!buf)) > > return -ENOMEM; > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > */ > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > + return -ENOMEM; > > + > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > + len -= sizeof(struct virtnet_rq_dma); > > + > This could address my previous concern for possibly regressing every buffer > size for the mergeable case, thanks. Though I still don't get why carving up > a small chunk from page_frag for storing the virtnet_rq_dma metadata, this > would cause perf regression on certain MTU size 4Kbyte MTU exactly? > that happens to end up with > one more base page (and an extra descriptor as well) to be allocated > compared to the previous code without the extra virtnet_rq_dma content. How > hard would it be to allocate a dedicated struct to store the related > information without affecting the (size of) datapath pages? > > FWIW, out of the code review perspective, I've looked up the past > conversations but didn't see comprehensive benchmark was done before > removing the old code and making premap the sole default mode. Granted this > would reduce the footprint of additional code and the associated maintaining > cost immediately, but I would assume at least there should have been > thorough performance runs upfront to guarantee no regression is seen with > every possible use case, or the negative effect is comparatively negligible > even though there's slight regression in some limited case. If that kind of > perf measurement hadn't been done before getting accepted/merged, I think at > least it should allow both modes to coexist for a while such that every user > could gauge the performance effect. > > Thanks, > -Siwei > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > if (unlikely(!buf)) > > return -ENOMEM;
On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > leads to regression on VM with the sysctl value of: > > - net.core.high_order_alloc_disable=1 > > which could see reliable crashes or scp failure (scp a file 100M in size > to VM): > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > of a new frag. When the frag size is larger than PAGE_SIZE, > everything is fine. However, if the frag is only one page and the > total size of the buffer and virtnet_rq_dma is larger than one page, an > overflow may occur. In this case, if an overflow is possible, I adjust > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > the first buffer of the frag is affected. > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > --- > drivers/net/virtio_net.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c6af18948092..e5286a6da863 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > void *buf, *head; > dma_addr_t addr; > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > - return NULL; > - > head = page_address(alloc_frag->page); > > dma = head; From API POV, I don't like it that virtnet_rq_alloc relies on the caller to invoke skb_page_frag_refill. It's better to pass in the length to refill. > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > len = SKB_DATA_ALIGN(len) + > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > + return -ENOMEM; > + > buf = virtnet_rq_alloc(rq, len, gfp); > if (unlikely(!buf)) > return -ENOMEM; > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > */ > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > + return -ENOMEM; > + > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > + len -= sizeof(struct virtnet_rq_dma); > + > buf = virtnet_rq_alloc(rq, len + room, gfp); > if (unlikely(!buf)) > return -ENOMEM; > -- > 2.32.0.3.g01195cf9f
On 8/20/2024 1:09 PM, Michael S. Tsirkin wrote: > On Tue, Aug 20, 2024 at 12:44:46PM -0700, Si-Wei Liu wrote: >> >> On 8/20/2024 12:19 AM, Xuan Zhuo wrote: >>> leads to regression on VM with the sysctl value of: >>> >>> - net.core.high_order_alloc_disable=1 >>> >>> which could see reliable crashes or scp failure (scp a file 100M in size >>> to VM): >>> >>> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning >>> of a new frag. When the frag size is larger than PAGE_SIZE, >>> everything is fine. However, if the frag is only one page and the >>> total size of the buffer and virtnet_rq_dma is larger than one page, an >>> overflow may occur. In this case, if an overflow is possible, I adjust >>> the buffer size. If net.core.high_order_alloc_disable=1, the maximum >>> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only >>> the first buffer of the frag is affected. >>> >>> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") >>> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> >>> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com >>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> >>> --- >>> drivers/net/virtio_net.c | 12 +++++++++--- >>> 1 file changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>> index c6af18948092..e5286a6da863 100644 >>> --- a/drivers/net/virtio_net.c >>> +++ b/drivers/net/virtio_net.c >>> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) >>> void *buf, *head; >>> dma_addr_t addr; >>> - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) >>> - return NULL; >>> - >>> head = page_address(alloc_frag->page); >>> dma = head; >>> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, >>> len = SKB_DATA_ALIGN(len) + >>> SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); >>> + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) >>> + return -ENOMEM; >>> + >> Do you want to document the assumption that small packet case won't end up >> crossing the page frag boundary unlike the mergeable case? Add a comment >> block to explain or a WARN_ON() check against potential overflow would work >> with me. >> >>> buf = virtnet_rq_alloc(rq, len, gfp); >>> if (unlikely(!buf)) >>> return -ENOMEM; >>> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, >>> */ >>> len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); >>> + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) >>> + return -ENOMEM; >>> + >>> + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) >>> + len -= sizeof(struct virtnet_rq_dma); >>> + >> This could address my previous concern for possibly regressing every buffer >> size for the mergeable case, thanks. Though I still don't get why carving up >> a small chunk from page_frag for storing the virtnet_rq_dma metadata, this >> would cause perf regression on certain MTU size > 4Kbyte MTU exactly? Close to that, excluding all headers upfront (depending on virtio features and header layout). The size of struct virtnet_rq_dma is now 16 bytes, this could lead to performance impact on roughly: 16 / 4096 = 0.4 % across all MTU sizes, more obviously to be seen with order-0 page allocations. -Siwei > >> that happens to end up with >> one more base page (and an extra descriptor as well) to be allocated >> compared to the previous code without the extra virtnet_rq_dma content. How >> hard would it be to allocate a dedicated struct to store the related >> information without affecting the (size of) datapath pages? >> >> FWIW, out of the code review perspective, I've looked up the past >> conversations but didn't see comprehensive benchmark was done before >> removing the old code and making premap the sole default mode. Granted this >> would reduce the footprint of additional code and the associated maintaining >> cost immediately, but I would assume at least there should have been >> thorough performance runs upfront to guarantee no regression is seen with >> every possible use case, or the negative effect is comparatively negligible >> even though there's slight regression in some limited case. If that kind of >> perf measurement hadn't been done before getting accepted/merged, I think at >> least it should allow both modes to coexist for a while such that every user >> could gauge the performance effect. >> >> Thanks, >> -Siwei >> >>> buf = virtnet_rq_alloc(rq, len + room, gfp); >>> if (unlikely(!buf)) >>> return -ENOMEM;
Hi Michael, On Tuesday, 2024-08-20 at 12:50:39 -04, Michael S. Tsirkin wrote: > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: >> leads to regression on VM with the sysctl value of: >> >> - net.core.high_order_alloc_disable=1 > > > > >> which could see reliable crashes or scp failure (scp a file 100M in size >> to VM): >> >> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning >> of a new frag. When the frag size is larger than PAGE_SIZE, >> everything is fine. However, if the frag is only one page and the >> total size of the buffer and virtnet_rq_dma is larger than one page, an >> overflow may occur. In this case, if an overflow is possible, I adjust >> the buffer size. If net.core.high_order_alloc_disable=1, the maximum >> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only >> the first buffer of the frag is affected. >> >> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") >> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> >> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com >> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > Darren, could you pls test and confirm? Unfortunately with this change I seem to still get a panic as soon as I start a download using wget: [ 144.055630] Kernel panic - not syncing: corrupted stack end detected inside scheduler [ 144.056249] CPU: 8 PID: 37894 Comm: sleep Kdump: loaded Not tainted 6.10.0-1.el8uek.x86_64 #2 [ 144.056850] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014 [ 144.057585] Call Trace: [ 144.057791] <TASK> [ 144.057973] panic+0x347/0x370 [ 144.058223] schedule_debug.isra.0+0xfb/0x100 [ 144.058565] __schedule+0x58/0x6a0 [ 144.058838] ? refill_stock+0x26/0x50 [ 144.059120] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.059473] do_task_dead+0x42/0x50 [ 144.059752] do_exit+0x31e/0x4b0 [ 144.060011] ? __audit_syscall_entry+0xee/0x150 [ 144.060352] do_group_exit+0x30/0x80 [ 144.060633] __x64_sys_exit_group+0x18/0x20 [ 144.060946] do_syscall_64+0x8c/0x1c0 [ 144.061228] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.061570] ? __audit_filter_op+0xbe/0x140 [ 144.061873] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.062204] ? audit_reset_context+0x232/0x310 [ 144.062514] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.062851] ? syscall_exit_work+0x103/0x130 [ 144.063148] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.063473] ? syscall_exit_to_user_mode+0x77/0x220 [ 144.063813] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.064142] ? do_syscall_64+0xb9/0x1c0 [ 144.064411] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.064747] ? do_syscall_64+0xb9/0x1c0 [ 144.065018] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.065345] ? do_read_fault+0x109/0x1b0 [ 144.065628] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.065961] ? do_fault+0x1aa/0x2f0 [ 144.066212] ? handle_pte_fault+0x102/0x1a0 [ 144.066503] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.066836] ? __handle_mm_fault+0x5ed/0x710 [ 144.067137] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.067464] ? __count_memcg_events+0x72/0x110 [ 144.067779] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.068106] ? count_memcg_events.constprop.0+0x26/0x50 [ 144.068457] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.068788] ? handle_mm_fault+0xae/0x320 [ 144.069068] ? srso_alias_return_thunk+0x5/0xfbef5 [ 144.069395] ? do_user_addr_fault+0x34a/0x6b0 [ 144.069708] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 144.070049] RIP: 0033:0x7fc5524f9c66 [ 144.070307] Code: Unable to access opcode bytes at 0x7fc5524f9c3c. [ 144.070720] RSP: 002b:00007ffee052beb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 144.071214] RAX: ffffffffffffffda RBX: 00007fc5527bb860 RCX: 00007fc5524f9c66 [ 144.071684] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 [ 144.072146] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78 [ 144.072608] R10: 00007ffee052bdef R11: 0000000000000246 R12: 00007fc5527bb860 [ 144.073076] R13: 0000000000000002 R14: 00007fc5527c4528 R15: 0000000000000000 [ 144.073543] </TASK> [ 144.074780] Kernel Offset: 0x37c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Thanks, Darren. >> --- >> drivers/net/virtio_net.c | 12 +++++++++--- >> 1 file changed, 9 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index c6af18948092..e5286a6da863 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) >> void *buf, *head; >> dma_addr_t addr; >> >> - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) >> - return NULL; >> - >> head = page_address(alloc_frag->page); >> >> dma = head; >> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, >> len = SKB_DATA_ALIGN(len) + >> SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); >> >> + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) >> + return -ENOMEM; >> + >> buf = virtnet_rq_alloc(rq, len, gfp); >> if (unlikely(!buf)) >> return -ENOMEM; >> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, >> */ >> len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); >> >> + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) >> + return -ENOMEM; >> + >> + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) >> + len -= sizeof(struct virtnet_rq_dma); >> + >> buf = virtnet_rq_alloc(rq, len + room, gfp); >> if (unlikely(!buf)) >> return -ENOMEM; >> -- >> 2.32.0.3.g01195cf9f
On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > leads to regression on VM with the sysctl value of: > > - net.core.high_order_alloc_disable=1 > > which could see reliable crashes or scp failure (scp a file 100M in size > to VM): > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > of a new frag. When the frag size is larger than PAGE_SIZE, > everything is fine. However, if the frag is only one page and the > total size of the buffer and virtnet_rq_dma is larger than one page, an > overflow may occur. In this case, if an overflow is possible, I adjust > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > the first buffer of the frag is affected. I wonder instead of trying to make use of headroom, would it be simpler if we allocate dedicated arrays of virtnet_rq_dma? Btw, I see it has a need_sync, I wonder if it can help for performance or not? If not, any reason to keep that? > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > --- > drivers/net/virtio_net.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c6af18948092..e5286a6da863 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > void *buf, *head; > dma_addr_t addr; > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > - return NULL; > - > head = page_address(alloc_frag->page); > > dma = head; > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > len = SKB_DATA_ALIGN(len) + > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > + return -ENOMEM; > + > buf = virtnet_rq_alloc(rq, len, gfp); > if (unlikely(!buf)) > return -ENOMEM; > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > */ > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > + return -ENOMEM; > + > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > + len -= sizeof(struct virtnet_rq_dma); > + > buf = virtnet_rq_alloc(rq, len + room, gfp); > if (unlikely(!buf)) > return -ENOMEM; > -- > 2.32.0.3.g01195cf9f Thanks >
On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > leads to regression on VM with the sysctl value of: > > > > - net.core.high_order_alloc_disable=1 > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > to VM): > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > of a new frag. When the frag size is larger than PAGE_SIZE, > > everything is fine. However, if the frag is only one page and the > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > overflow may occur. In this case, if an overflow is possible, I adjust > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > the first buffer of the frag is affected. > > I wonder instead of trying to make use of headroom, would it be > simpler if we allocate dedicated arrays of virtnet_rq_dma? Sorry for the late reply. My mailbox was full, so I missed the reply to this thread. Thanks to Si-Wei for reminding me. If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf. struct page *page = virt_to_head_page(buf); head = page_address(page); If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to virtio core, the array has the same size with the rx ring. The virtnet_rq_dma will be: struct virtnet_rq_dma { dma_addr_t addr; u32 ref; u16 len; u16 need_sync; + void *buf; }; That will be simpler. > > Btw, I see it has a need_sync, I wonder if it can help for performance > or not? If not, any reason to keep that? I think yes, we can skip the cpu sync when we do not need it. Thanks. > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > --- > > drivers/net/virtio_net.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c6af18948092..e5286a6da863 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > void *buf, *head; > > dma_addr_t addr; > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > - return NULL; > > - > > head = page_address(alloc_frag->page); > > > > dma = head; > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > len = SKB_DATA_ALIGN(len) + > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > + return -ENOMEM; > > + > > buf = virtnet_rq_alloc(rq, len, gfp); > > if (unlikely(!buf)) > > return -ENOMEM; > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > */ > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > + return -ENOMEM; > > + > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > + len -= sizeof(struct virtnet_rq_dma); > > + > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > if (unlikely(!buf)) > > return -ENOMEM; > > -- > > 2.32.0.3.g01195cf9f > > Thanks > > > > > >
Just in case Xuan missed the last email while his email server kept rejecting incoming emails in the last week.: the patch doesn't seem fix the regression. Xuan, given this is not very hard to reproduce and we have clearly stated how to, could you try to get the patch verified in house before posting to upstream? Or you were unable to reproduce locally? Thanks, -Siwei On 8/21/2024 9:47 AM, Darren Kenny wrote: > Hi Michael, > > On Tuesday, 2024-08-20 at 12:50:39 -04, Michael S. Tsirkin wrote: >> On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: >>> leads to regression on VM with the sysctl value of: >>> >>> - net.core.high_order_alloc_disable=1 >> >> >> >>> which could see reliable crashes or scp failure (scp a file 100M in size >>> to VM): >>> >>> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning >>> of a new frag. When the frag size is larger than PAGE_SIZE, >>> everything is fine. However, if the frag is only one page and the >>> total size of the buffer and virtnet_rq_dma is larger than one page, an >>> overflow may occur. In this case, if an overflow is possible, I adjust >>> the buffer size. If net.core.high_order_alloc_disable=1, the maximum >>> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only >>> the first buffer of the frag is affected. >>> >>> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") >>> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> >>> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com >>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> >> >> Darren, could you pls test and confirm? > Unfortunately with this change I seem to still get a panic as soon as I start a > download using wget: > > [ 144.055630] Kernel panic - not syncing: corrupted stack end detected inside scheduler > [ 144.056249] CPU: 8 PID: 37894 Comm: sleep Kdump: loaded Not tainted 6.10.0-1.el8uek.x86_64 #2 > [ 144.056850] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014 > [ 144.057585] Call Trace: > [ 144.057791] <TASK> > [ 144.057973] panic+0x347/0x370 > [ 144.058223] schedule_debug.isra.0+0xfb/0x100 > [ 144.058565] __schedule+0x58/0x6a0 > [ 144.058838] ? refill_stock+0x26/0x50 > [ 144.059120] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.059473] do_task_dead+0x42/0x50 > [ 144.059752] do_exit+0x31e/0x4b0 > [ 144.060011] ? __audit_syscall_entry+0xee/0x150 > [ 144.060352] do_group_exit+0x30/0x80 > [ 144.060633] __x64_sys_exit_group+0x18/0x20 > [ 144.060946] do_syscall_64+0x8c/0x1c0 > [ 144.061228] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.061570] ? __audit_filter_op+0xbe/0x140 > [ 144.061873] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.062204] ? audit_reset_context+0x232/0x310 > [ 144.062514] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.062851] ? syscall_exit_work+0x103/0x130 > [ 144.063148] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.063473] ? syscall_exit_to_user_mode+0x77/0x220 > [ 144.063813] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.064142] ? do_syscall_64+0xb9/0x1c0 > [ 144.064411] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.064747] ? do_syscall_64+0xb9/0x1c0 > [ 144.065018] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.065345] ? do_read_fault+0x109/0x1b0 > [ 144.065628] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.065961] ? do_fault+0x1aa/0x2f0 > [ 144.066212] ? handle_pte_fault+0x102/0x1a0 > [ 144.066503] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.066836] ? __handle_mm_fault+0x5ed/0x710 > [ 144.067137] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.067464] ? __count_memcg_events+0x72/0x110 > [ 144.067779] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.068106] ? count_memcg_events.constprop.0+0x26/0x50 > [ 144.068457] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.068788] ? handle_mm_fault+0xae/0x320 > [ 144.069068] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.069395] ? do_user_addr_fault+0x34a/0x6b0 > [ 144.069708] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 144.070049] RIP: 0033:0x7fc5524f9c66 > [ 144.070307] Code: Unable to access opcode bytes at 0x7fc5524f9c3c. > [ 144.070720] RSP: 002b:00007ffee052beb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > [ 144.071214] RAX: ffffffffffffffda RBX: 00007fc5527bb860 RCX: 00007fc5524f9c66 > [ 144.071684] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 > [ 144.072146] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78 > [ 144.072608] R10: 00007ffee052bdef R11: 0000000000000246 R12: 00007fc5527bb860 > [ 144.073076] R13: 0000000000000002 R14: 00007fc5527c4528 R15: 0000000000000000 > [ 144.073543] </TASK> > [ 144.074780] Kernel Offset: 0x37c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > Thanks, > > Darren. > >>> --- >>> drivers/net/virtio_net.c | 12 +++++++++--- >>> 1 file changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>> index c6af18948092..e5286a6da863 100644 >>> --- a/drivers/net/virtio_net.c >>> +++ b/drivers/net/virtio_net.c >>> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) >>> void *buf, *head; >>> dma_addr_t addr; >>> >>> - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) >>> - return NULL; >>> - >>> head = page_address(alloc_frag->page); >>> >>> dma = head; >>> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, >>> len = SKB_DATA_ALIGN(len) + >>> SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); >>> >>> + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) >>> + return -ENOMEM; >>> + >>> buf = virtnet_rq_alloc(rq, len, gfp); >>> if (unlikely(!buf)) >>> return -ENOMEM; >>> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, >>> */ >>> len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); >>> >>> + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) >>> + return -ENOMEM; >>> + >>> + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) >>> + len -= sizeof(struct virtnet_rq_dma); >>> + >>> buf = virtnet_rq_alloc(rq, len + room, gfp); >>> if (unlikely(!buf)) >>> return -ENOMEM; >>> -- >>> 2.32.0.3.g01195cf9f
On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > leads to regression on VM with the sysctl value of: > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > to VM): > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > everything is fine. However, if the frag is only one page and the > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > the first buffer of the frag is affected. > > > > I wonder instead of trying to make use of headroom, would it be > > simpler if we allocate dedicated arrays of virtnet_rq_dma? > > Sorry for the late reply. My mailbox was full, so I missed the reply to this > thread. Thanks to Si-Wei for reminding me. > > If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf. > > struct page *page = virt_to_head_page(buf); > > head = page_address(page); > > If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to > virtio core, the array has the same size with the rx ring. > > The virtnet_rq_dma will be: > > struct virtnet_rq_dma { > dma_addr_t addr; > u32 ref; > u16 len; > u16 need_sync; > + void *buf; > }; > > That will be simpler. I'm not sure I understand here, did you mean using a dedicated array is simpler? > > > > > Btw, I see it has a need_sync, I wonder if it can help for performance > > or not? If not, any reason to keep that? > > I think yes, we can skip the cpu sync when we do not need it. I meant it looks to me the needs_sync is not necessary in the structure as we can call need_sync() any time if we had dma addr. Thanks > > Thanks. > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > --- > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index c6af18948092..e5286a6da863 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > void *buf, *head; > > > dma_addr_t addr; > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > - return NULL; > > > - > > > head = page_address(alloc_frag->page); > > > > > > dma = head; > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > len = SKB_DATA_ALIGN(len) + > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > + return -ENOMEM; > > > + > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > if (unlikely(!buf)) > > > return -ENOMEM; > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > */ > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > + return -ENOMEM; > > > + > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > + len -= sizeof(struct virtnet_rq_dma); > > > + > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > if (unlikely(!buf)) > > > return -ENOMEM; > > > -- > > > 2.32.0.3.g01195cf9f > > > > Thanks > > > > > > > > > > > >
On Wed, 28 Aug 2024 12:57:56 -0700, "Si-Wei Liu" <si-wei.liu@oracle.com> wrote: > Just in case Xuan missed the last email while his email server kept > rejecting incoming emails in the last week.: the patch doesn't seem fix > the regression. > > Xuan, given this is not very hard to reproduce and we have clearly > stated how to, could you try to get the patch verified in house before > posting to upstream? Or you were unable to reproduce locally? Of course I have tested it locally first. And the crash reported this time is different from the previous one. I really don't know where the problem is. I may make a new patch based on Jason's suggestion, which should solve the problem of occupied frag space that you have been worried about. Of course, before sending the patch, I will reproduce and test it. Thanks. > > Thanks, > -Siwei > > On 8/21/2024 9:47 AM, Darren Kenny wrote: > > Hi Michael, > > > > On Tuesday, 2024-08-20 at 12:50:39 -04, Michael S. Tsirkin wrote: > >> On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > >>> leads to regression on VM with the sysctl value of: > >>> > >>> - net.core.high_order_alloc_disable=1 > >> > >> > >> > >>> which could see reliable crashes or scp failure (scp a file 100M in size > >>> to VM): > >>> > >>> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > >>> of a new frag. When the frag size is larger than PAGE_SIZE, > >>> everything is fine. However, if the frag is only one page and the > >>> total size of the buffer and virtnet_rq_dma is larger than one page, an > >>> overflow may occur. In this case, if an overflow is possible, I adjust > >>> the buffer size. If net.core.high_order_alloc_disable=1, the maximum > >>> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > >>> the first buffer of the frag is affected. > >>> > >>> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > >>> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > >>> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > >>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > >> > >> Darren, could you pls test and confirm? > > Unfortunately with this change I seem to still get a panic as soon as I start a > > download using wget: > > > > [ 144.055630] Kernel panic - not syncing: corrupted stack end detected inside scheduler > > [ 144.056249] CPU: 8 PID: 37894 Comm: sleep Kdump: loaded Not tainted 6.10.0-1.el8uek.x86_64 #2 > > [ 144.056850] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014 > > [ 144.057585] Call Trace: > > [ 144.057791] <TASK> > > [ 144.057973] panic+0x347/0x370 > > [ 144.058223] schedule_debug.isra.0+0xfb/0x100 > > [ 144.058565] __schedule+0x58/0x6a0 > > [ 144.058838] ? refill_stock+0x26/0x50 > > [ 144.059120] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.059473] do_task_dead+0x42/0x50 > > [ 144.059752] do_exit+0x31e/0x4b0 > > [ 144.060011] ? __audit_syscall_entry+0xee/0x150 > > [ 144.060352] do_group_exit+0x30/0x80 > > [ 144.060633] __x64_sys_exit_group+0x18/0x20 > > [ 144.060946] do_syscall_64+0x8c/0x1c0 > > [ 144.061228] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.061570] ? __audit_filter_op+0xbe/0x140 > > [ 144.061873] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.062204] ? audit_reset_context+0x232/0x310 > > [ 144.062514] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.062851] ? syscall_exit_work+0x103/0x130 > > [ 144.063148] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.063473] ? syscall_exit_to_user_mode+0x77/0x220 > > [ 144.063813] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.064142] ? do_syscall_64+0xb9/0x1c0 > > [ 144.064411] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.064747] ? do_syscall_64+0xb9/0x1c0 > > [ 144.065018] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.065345] ? do_read_fault+0x109/0x1b0 > > [ 144.065628] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.065961] ? do_fault+0x1aa/0x2f0 > > [ 144.066212] ? handle_pte_fault+0x102/0x1a0 > > [ 144.066503] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.066836] ? __handle_mm_fault+0x5ed/0x710 > > [ 144.067137] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.067464] ? __count_memcg_events+0x72/0x110 > > [ 144.067779] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.068106] ? count_memcg_events.constprop.0+0x26/0x50 > > [ 144.068457] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.068788] ? handle_mm_fault+0xae/0x320 > > [ 144.069068] ? srso_alias_return_thunk+0x5/0xfbef5 > > [ 144.069395] ? do_user_addr_fault+0x34a/0x6b0 > > [ 144.069708] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 144.070049] RIP: 0033:0x7fc5524f9c66 > > [ 144.070307] Code: Unable to access opcode bytes at 0x7fc5524f9c3c. > > [ 144.070720] RSP: 002b:00007ffee052beb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > [ 144.071214] RAX: ffffffffffffffda RBX: 00007fc5527bb860 RCX: 00007fc5524f9c66 > > [ 144.071684] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 > > [ 144.072146] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78 > > [ 144.072608] R10: 00007ffee052bdef R11: 0000000000000246 R12: 00007fc5527bb860 > > [ 144.073076] R13: 0000000000000002 R14: 00007fc5527c4528 R15: 0000000000000000 > > [ 144.073543] </TASK> > > [ 144.074780] Kernel Offset: 0x37c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > > > Thanks, > > > > Darren. > > > >>> --- > >>> drivers/net/virtio_net.c | 12 +++++++++--- > >>> 1 file changed, 9 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>> index c6af18948092..e5286a6da863 100644 > >>> --- a/drivers/net/virtio_net.c > >>> +++ b/drivers/net/virtio_net.c > >>> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > >>> void *buf, *head; > >>> dma_addr_t addr; > >>> > >>> - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > >>> - return NULL; > >>> - > >>> head = page_address(alloc_frag->page); > >>> > >>> dma = head; > >>> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > >>> len = SKB_DATA_ALIGN(len) + > >>> SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > >>> > >>> + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > >>> + return -ENOMEM; > >>> + > >>> buf = virtnet_rq_alloc(rq, len, gfp); > >>> if (unlikely(!buf)) > >>> return -ENOMEM; > >>> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > >>> */ > >>> len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > >>> > >>> + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > >>> + return -ENOMEM; > >>> + > >>> + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > >>> + len -= sizeof(struct virtnet_rq_dma); > >>> + > >>> buf = virtnet_rq_alloc(rq, len + room, gfp); > >>> if (unlikely(!buf)) > >>> return -ENOMEM; > >>> -- > >>> 2.32.0.3.g01195cf9f >
On Thu, 29 Aug 2024 12:51:31 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > to VM): > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > everything is fine. However, if the frag is only one page and the > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > the first buffer of the frag is affected. > > > > > > I wonder instead of trying to make use of headroom, would it be > > > simpler if we allocate dedicated arrays of virtnet_rq_dma? > > > > Sorry for the late reply. My mailbox was full, so I missed the reply to this > > thread. Thanks to Si-Wei for reminding me. > > > > If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf. > > > > struct page *page = virt_to_head_page(buf); > > > > head = page_address(page); > > > > If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to > > virtio core, the array has the same size with the rx ring. > > > > The virtnet_rq_dma will be: > > > > struct virtnet_rq_dma { > > dma_addr_t addr; > > u32 ref; > > u16 len; > > u16 need_sync; > > + void *buf; > > }; > > > > That will be simpler. > > I'm not sure I understand here, did you mean using a dedicated array is simpler? YES. I will post a patch soon. > > > > > > > > > Btw, I see it has a need_sync, I wonder if it can help for performance > > > or not? If not, any reason to keep that? > > > > I think yes, we can skip the cpu sync when we do not need it. > > I meant it looks to me the needs_sync is not necessary in the > structure as we can call need_sync() any time if we had dma addr. I see. I think it is ok, but if you do not like it, I will remove it. Thanks. > > Thanks > > > > > Thanks. > > > > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > --- > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > index c6af18948092..e5286a6da863 100644 > > > > --- a/drivers/net/virtio_net.c > > > > +++ b/drivers/net/virtio_net.c > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > void *buf, *head; > > > > dma_addr_t addr; > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > - return NULL; > > > > - > > > > head = page_address(alloc_frag->page); > > > > > > > > dma = head; > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > len = SKB_DATA_ALIGN(len) + > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > + return -ENOMEM; > > > > + > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > if (unlikely(!buf)) > > > > return -ENOMEM; > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > */ > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > + return -ENOMEM; > > > > + > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > + > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > if (unlikely(!buf)) > > > > return -ENOMEM; > > > > -- > > > > 2.32.0.3.g01195cf9f > > > > > > Thanks > > > > > > > > > > > > > > > > > > >
On Thu, 29 Aug 2024 12:51:31 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > to VM): > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > everything is fine. However, if the frag is only one page and the > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > the first buffer of the frag is affected. > > > > > > I wonder instead of trying to make use of headroom, would it be > > > simpler if we allocate dedicated arrays of virtnet_rq_dma? > > > > Sorry for the late reply. My mailbox was full, so I missed the reply to this > > thread. Thanks to Si-Wei for reminding me. > > > > If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf. > > > > struct page *page = virt_to_head_page(buf); > > > > head = page_address(page); > > > > If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to > > virtio core, the array has the same size with the rx ring. > > > > The virtnet_rq_dma will be: > > > > struct virtnet_rq_dma { > > dma_addr_t addr; > > u32 ref; > > u16 len; > > u16 need_sync; > > + void *buf; > > }; > > > > That will be simpler. > > I'm not sure I understand here, did you mean using a dedicated array is simpler? I found the old version(that used a dedicated array): http://lore.kernel.org/all/20230710034237.12391-11-xuanzhuo@linux.alibaba.com If you think that is ok, I can port a new version based that. Thanks. > > > > > > > > > Btw, I see it has a need_sync, I wonder if it can help for performance > > > or not? If not, any reason to keep that? > > > > I think yes, we can skip the cpu sync when we do not need it. > > I meant it looks to me the needs_sync is not necessary in the > structure as we can call need_sync() any time if we had dma addr. > > Thanks > > > > > Thanks. > > > > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > --- > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > index c6af18948092..e5286a6da863 100644 > > > > --- a/drivers/net/virtio_net.c > > > > +++ b/drivers/net/virtio_net.c > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > void *buf, *head; > > > > dma_addr_t addr; > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > - return NULL; > > > > - > > > > head = page_address(alloc_frag->page); > > > > > > > > dma = head; > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > len = SKB_DATA_ALIGN(len) + > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > + return -ENOMEM; > > > > + > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > if (unlikely(!buf)) > > > > return -ENOMEM; > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > */ > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > + return -ENOMEM; > > > > + > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > + > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > if (unlikely(!buf)) > > > > return -ENOMEM; > > > > -- > > > > 2.32.0.3.g01195cf9f > > > > > > Thanks > > > > > > > > > > > > > > > > > > >
On Thu, Aug 29, 2024 at 03:26:00PM +0800, Xuan Zhuo wrote: > On Thu, 29 Aug 2024 12:51:31 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > to VM): > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > everything is fine. However, if the frag is only one page and the > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > the first buffer of the frag is affected. I don't exactly get it, when you say "only the first buffer of the frag is affected" what do you mean? Affected how? > > > > > > > > I wonder instead of trying to make use of headroom, would it be > > > > simpler if we allocate dedicated arrays of virtnet_rq_dma? > > > > > > Sorry for the late reply. My mailbox was full, so I missed the reply to this > > > thread. Thanks to Si-Wei for reminding me. > > > > > > If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf. > > > > > > struct page *page = virt_to_head_page(buf); > > > > > > head = page_address(page); > > > > > > If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to > > > virtio core, the array has the same size with the rx ring. > > > > > > The virtnet_rq_dma will be: > > > > > > struct virtnet_rq_dma { > > > dma_addr_t addr; > > > u32 ref; > > > u16 len; > > > u16 need_sync; > > > + void *buf; > > > }; > > > > > > That will be simpler. > > > > I'm not sure I understand here, did you mean using a dedicated array is simpler? > > I found the old version(that used a dedicated array): > > http://lore.kernel.org/all/20230710034237.12391-11-xuanzhuo@linux.alibaba.com > > If you think that is ok, I can port a new version based that. > > Thanks. > That one got a bunch of comments that likely still apply. And this looks like a much bigger change than what this patch proposes. > > > > > > > > > > > > > Btw, I see it has a need_sync, I wonder if it can help for performance > > > > or not? If not, any reason to keep that? > > > > > > I think yes, we can skip the cpu sync when we do not need it. > > > > I meant it looks to me the needs_sync is not necessary in the > > structure as we can call need_sync() any time if we had dma addr. > > > > Thanks > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > --- > > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > index c6af18948092..e5286a6da863 100644 > > > > > --- a/drivers/net/virtio_net.c > > > > > +++ b/drivers/net/virtio_net.c > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > > void *buf, *head; > > > > > dma_addr_t addr; > > > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > > - return NULL; > > > > > - > > > > > head = page_address(alloc_frag->page); > > > > > > > > > > dma = head; > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > > len = SKB_DATA_ALIGN(len) + > > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > > + return -ENOMEM; > > > > > + > > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > > if (unlikely(!buf)) > > > > > return -ENOMEM; > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > */ > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > + return -ENOMEM; > > > > > + > > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > > + > > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > > if (unlikely(!buf)) > > > > > return -ENOMEM; > > > > > -- > > > > > 2.32.0.3.g01195cf9f > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > >
On Thu, 29 Aug 2024 03:35:58 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Thu, Aug 29, 2024 at 03:26:00PM +0800, Xuan Zhuo wrote: > > On Thu, 29 Aug 2024 12:51:31 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > > to VM): > > > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > > everything is fine. However, if the frag is only one page and the > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > > the first buffer of the frag is affected. > > I don't exactly get it, when you say "only the first buffer of the frag > is affected" what do you mean? Affected how? I should say that if the frag is 32k, that is safe. Only when that frag is 4k, that is not safe. Thanks. > > > > > > > > > > > I wonder instead of trying to make use of headroom, would it be > > > > > simpler if we allocate dedicated arrays of virtnet_rq_dma? > > > > > > > > Sorry for the late reply. My mailbox was full, so I missed the reply to this > > > > thread. Thanks to Si-Wei for reminding me. > > > > > > > > If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf. > > > > > > > > struct page *page = virt_to_head_page(buf); > > > > > > > > head = page_address(page); > > > > > > > > If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to > > > > virtio core, the array has the same size with the rx ring. > > > > > > > > The virtnet_rq_dma will be: > > > > > > > > struct virtnet_rq_dma { > > > > dma_addr_t addr; > > > > u32 ref; > > > > u16 len; > > > > u16 need_sync; > > > > + void *buf; > > > > }; > > > > > > > > That will be simpler. > > > > > > I'm not sure I understand here, did you mean using a dedicated array is simpler? > > > > I found the old version(that used a dedicated array): > > > > http://lore.kernel.org/all/20230710034237.12391-11-xuanzhuo@linux.alibaba.com > > > > If you think that is ok, I can port a new version based that. > > > > Thanks. > > > > That one got a bunch of comments that likely still apply. > And this looks like a much bigger change than what this > patch proposes. > > > > > > > > > > > > > > > > > > Btw, I see it has a need_sync, I wonder if it can help for performance > > > > > or not? If not, any reason to keep that? > > > > > > > > I think yes, we can skip the cpu sync when we do not need it. > > > > > > I meant it looks to me the needs_sync is not necessary in the > > > structure as we can call need_sync() any time if we had dma addr. > > > > > > Thanks > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > --- > > > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > index c6af18948092..e5286a6da863 100644 > > > > > > --- a/drivers/net/virtio_net.c > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > > > void *buf, *head; > > > > > > dma_addr_t addr; > > > > > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > > > - return NULL; > > > > > > - > > > > > > head = page_address(alloc_frag->page); > > > > > > > > > > > > dma = head; > > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > > > len = SKB_DATA_ALIGN(len) + > > > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > > > + return -ENOMEM; > > > > > > + > > > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > > > if (unlikely(!buf)) > > > > > > return -ENOMEM; > > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > > */ > > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > + return -ENOMEM; > > > > > > + > > > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > > > + > > > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > > > if (unlikely(!buf)) > > > > > > return -ENOMEM; > > > > > > -- > > > > > > 2.32.0.3.g01195cf9f > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
On Thu, Aug 29, 2024 at 03:38:07PM +0800, Xuan Zhuo wrote: > On Thu, 29 Aug 2024 03:35:58 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Thu, Aug 29, 2024 at 03:26:00PM +0800, Xuan Zhuo wrote: > > > On Thu, 29 Aug 2024 12:51:31 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > > > to VM): > > > > > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > > > everything is fine. However, if the frag is only one page and the > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > > > the first buffer of the frag is affected. > > > > I don't exactly get it, when you say "only the first buffer of the frag > > is affected" what do you mean? Affected how? > > > I should say that if the frag is 32k, that is safe. > Only when that frag is 4k, that is not safe. > > Thanks. It looks like nothing changes when net.core.high_order_alloc_disable=0 (which is the default) but maybe I am missing something. It is worth testing performance with mtu set to 4k, just to make sure we did not miss anything. > > > > > > > > > > > > > > > I wonder instead of trying to make use of headroom, would it be > > > > > > simpler if we allocate dedicated arrays of virtnet_rq_dma? > > > > > > > > > > Sorry for the late reply. My mailbox was full, so I missed the reply to this > > > > > thread. Thanks to Si-Wei for reminding me. > > > > > > > > > > If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf. > > > > > > > > > > struct page *page = virt_to_head_page(buf); > > > > > > > > > > head = page_address(page); > > > > > > > > > > If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to > > > > > virtio core, the array has the same size with the rx ring. > > > > > > > > > > The virtnet_rq_dma will be: > > > > > > > > > > struct virtnet_rq_dma { > > > > > dma_addr_t addr; > > > > > u32 ref; > > > > > u16 len; > > > > > u16 need_sync; > > > > > + void *buf; > > > > > }; > > > > > > > > > > That will be simpler. > > > > > > > > I'm not sure I understand here, did you mean using a dedicated array is simpler? > > > > > > I found the old version(that used a dedicated array): > > > > > > http://lore.kernel.org/all/20230710034237.12391-11-xuanzhuo@linux.alibaba.com > > > > > > If you think that is ok, I can port a new version based that. > > > > > > Thanks. > > > > > > > That one got a bunch of comments that likely still apply. > > And this looks like a much bigger change than what this > > patch proposes. > > > > > > > > > > > > > > > > > > > > > > > Btw, I see it has a need_sync, I wonder if it can help for performance > > > > > > or not? If not, any reason to keep that? > > > > > > > > > > I think yes, we can skip the cpu sync when we do not need it. > > > > > > > > I meant it looks to me the needs_sync is not necessary in the > > > > structure as we can call need_sync() any time if we had dma addr. > > > > > > > > Thanks > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > --- > > > > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > > index c6af18948092..e5286a6da863 100644 > > > > > > > --- a/drivers/net/virtio_net.c > > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > > > > void *buf, *head; > > > > > > > dma_addr_t addr; > > > > > > > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > > > > - return NULL; > > > > > > > - > > > > > > > head = page_address(alloc_frag->page); > > > > > > > > > > > > > > dma = head; > > > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > > > > len = SKB_DATA_ALIGN(len) + > > > > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > > > > + return -ENOMEM; > > > > > > > + > > > > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > > > > if (unlikely(!buf)) > > > > > > > return -ENOMEM; > > > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > > > */ > > > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > > + return -ENOMEM; > > > > > > > + > > > > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > > > > + > > > > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > > > > if (unlikely(!buf)) > > > > > > > return -ENOMEM; > > > > > > > -- > > > > > > > 2.32.0.3.g01195cf9f > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
On Wed, 21 Aug 2024 17:47:06 +0100, Darren Kenny <darren.kenny@oracle.com> wrote: > > Hi Michael, > > On Tuesday, 2024-08-20 at 12:50:39 -04, Michael S. Tsirkin wrote: > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > >> leads to regression on VM with the sysctl value of: > >> > >> - net.core.high_order_alloc_disable=1 > > > > > > > > > >> which could see reliable crashes or scp failure (scp a file 100M in size > >> to VM): > >> > >> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > >> of a new frag. When the frag size is larger than PAGE_SIZE, > >> everything is fine. However, if the frag is only one page and the > >> total size of the buffer and virtnet_rq_dma is larger than one page, an > >> overflow may occur. In this case, if an overflow is possible, I adjust > >> the buffer size. If net.core.high_order_alloc_disable=1, the maximum > >> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > >> the first buffer of the frag is affected. > >> > >> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > >> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > >> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > >> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > Darren, could you pls test and confirm? > > Unfortunately with this change I seem to still get a panic as soon as I start a > download using wget: It's strange that I can't reproduce your panic. My test method is to test with net.core.high_order_alloc_disable=1 on the net branch code. An exception soon occurred (connection disconnected), and then the kernel panicked. while true; do scp vmlinux root@192.168.122.100:; done Use this patch to recompile virtio_net.ko, restart, configure net.core.high_order_alloc_disable=1 again, and test for a long time without problems. So, could you re-test? Thanks. > > [ 144.055630] Kernel panic - not syncing: corrupted stack end detected inside scheduler > [ 144.056249] CPU: 8 PID: 37894 Comm: sleep Kdump: loaded Not tainted 6.10.0-1.el8uek.x86_64 #2 > [ 144.056850] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014 > [ 144.057585] Call Trace: > [ 144.057791] <TASK> > [ 144.057973] panic+0x347/0x370 > [ 144.058223] schedule_debug.isra.0+0xfb/0x100 > [ 144.058565] __schedule+0x58/0x6a0 > [ 144.058838] ? refill_stock+0x26/0x50 > [ 144.059120] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.059473] do_task_dead+0x42/0x50 > [ 144.059752] do_exit+0x31e/0x4b0 > [ 144.060011] ? __audit_syscall_entry+0xee/0x150 > [ 144.060352] do_group_exit+0x30/0x80 > [ 144.060633] __x64_sys_exit_group+0x18/0x20 > [ 144.060946] do_syscall_64+0x8c/0x1c0 > [ 144.061228] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.061570] ? __audit_filter_op+0xbe/0x140 > [ 144.061873] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.062204] ? audit_reset_context+0x232/0x310 > [ 144.062514] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.062851] ? syscall_exit_work+0x103/0x130 > [ 144.063148] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.063473] ? syscall_exit_to_user_mode+0x77/0x220 > [ 144.063813] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.064142] ? do_syscall_64+0xb9/0x1c0 > [ 144.064411] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.064747] ? do_syscall_64+0xb9/0x1c0 > [ 144.065018] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.065345] ? do_read_fault+0x109/0x1b0 > [ 144.065628] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.065961] ? do_fault+0x1aa/0x2f0 > [ 144.066212] ? handle_pte_fault+0x102/0x1a0 > [ 144.066503] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.066836] ? __handle_mm_fault+0x5ed/0x710 > [ 144.067137] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.067464] ? __count_memcg_events+0x72/0x110 > [ 144.067779] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.068106] ? count_memcg_events.constprop.0+0x26/0x50 > [ 144.068457] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.068788] ? handle_mm_fault+0xae/0x320 > [ 144.069068] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 144.069395] ? do_user_addr_fault+0x34a/0x6b0 > [ 144.069708] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 144.070049] RIP: 0033:0x7fc5524f9c66 > [ 144.070307] Code: Unable to access opcode bytes at 0x7fc5524f9c3c. > [ 144.070720] RSP: 002b:00007ffee052beb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > [ 144.071214] RAX: ffffffffffffffda RBX: 00007fc5527bb860 RCX: 00007fc5524f9c66 > [ 144.071684] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 > [ 144.072146] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78 > [ 144.072608] R10: 00007ffee052bdef R11: 0000000000000246 R12: 00007fc5527bb860 > [ 144.073076] R13: 0000000000000002 R14: 00007fc5527c4528 R15: 0000000000000000 > [ 144.073543] </TASK> > [ 144.074780] Kernel Offset: 0x37c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > Thanks, > > Darren. > > >> --- > >> drivers/net/virtio_net.c | 12 +++++++++--- > >> 1 file changed, 9 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >> index c6af18948092..e5286a6da863 100644 > >> --- a/drivers/net/virtio_net.c > >> +++ b/drivers/net/virtio_net.c > >> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > >> void *buf, *head; > >> dma_addr_t addr; > >> > >> - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > >> - return NULL; > >> - > >> head = page_address(alloc_frag->page); > >> > >> dma = head; > >> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > >> len = SKB_DATA_ALIGN(len) + > >> SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > >> > >> + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > >> + return -ENOMEM; > >> + > >> buf = virtnet_rq_alloc(rq, len, gfp); > >> if (unlikely(!buf)) > >> return -ENOMEM; > >> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > >> */ > >> len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > >> > >> + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > >> + return -ENOMEM; > >> + > >> + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > >> + len -= sizeof(struct virtnet_rq_dma); > >> + > >> buf = virtnet_rq_alloc(rq, len + room, gfp); > >> if (unlikely(!buf)) > >> return -ENOMEM; > >> -- > >> 2.32.0.3.g01195cf9f > >
On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > leads to regression on VM with the sysctl value of: > > - net.core.high_order_alloc_disable=1 > > which could see reliable crashes or scp failure (scp a file 100M in size > to VM): > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > of a new frag. When the frag size is larger than PAGE_SIZE, > everything is fine. However, if the frag is only one page and the > total size of the buffer and virtnet_rq_dma is larger than one page, an > overflow may occur. In this case, if an overflow is possible, I adjust > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > the first buffer of the frag is affected. > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Guys where are we going with this? We have a crasher right now, if this is not fixed ASAP I'd have to revert a ton of work Xuan Zhuo just did. > --- > drivers/net/virtio_net.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c6af18948092..e5286a6da863 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > void *buf, *head; > dma_addr_t addr; > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > - return NULL; > - > head = page_address(alloc_frag->page); > > dma = head; > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > len = SKB_DATA_ALIGN(len) + > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > + return -ENOMEM; > + > buf = virtnet_rq_alloc(rq, len, gfp); > if (unlikely(!buf)) > return -ENOMEM; > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > */ > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > + return -ENOMEM; > + > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > + len -= sizeof(struct virtnet_rq_dma); > + > buf = virtnet_rq_alloc(rq, len + room, gfp); > if (unlikely(!buf)) > return -ENOMEM; > -- > 2.32.0.3.g01195cf9f
On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > leads to regression on VM with the sysctl value of: > > > > - net.core.high_order_alloc_disable=1 > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > to VM): > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > of a new frag. When the frag size is larger than PAGE_SIZE, > > everything is fine. However, if the frag is only one page and the > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > overflow may occur. In this case, if an overflow is possible, I adjust > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > the first buffer of the frag is affected. > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > Guys where are we going with this? We have a crasher right now, > if this is not fixed ASAP I'd have to revert a ton of > work Xuan Zhuo just did. I think this patch can fix it and I tested it. But Darren said this patch did not work. I need more info about the crash that Darren encountered. Thanks. > > > > --- > > drivers/net/virtio_net.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c6af18948092..e5286a6da863 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > void *buf, *head; > > dma_addr_t addr; > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > - return NULL; > > - > > head = page_address(alloc_frag->page); > > > > dma = head; > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > len = SKB_DATA_ALIGN(len) + > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > + return -ENOMEM; > > + > > buf = virtnet_rq_alloc(rq, len, gfp); > > if (unlikely(!buf)) > > return -ENOMEM; > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > */ > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > + return -ENOMEM; > > + > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > + len -= sizeof(struct virtnet_rq_dma); > > + > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > if (unlikely(!buf)) > > return -ENOMEM; > > -- > > 2.32.0.3.g01195cf9f >
On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > leads to regression on VM with the sysctl value of: > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > to VM): > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > everything is fine. However, if the frag is only one page and the > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > the first buffer of the frag is affected. > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > Guys where are we going with this? We have a crasher right now, > > if this is not fixed ASAP I'd have to revert a ton of > > work Xuan Zhuo just did. > > I think this patch can fix it and I tested it. > But Darren said this patch did not work. > I need more info about the crash that Darren encountered. > > Thanks. So what are we doing? Revert the whole pile for now? Seems to be a bit of a pity, but maybe that's the best we can do for this release. > > > > > > > --- > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index c6af18948092..e5286a6da863 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > void *buf, *head; > > > dma_addr_t addr; > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > - return NULL; > > > - > > > head = page_address(alloc_frag->page); > > > > > > dma = head; > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > len = SKB_DATA_ALIGN(len) + > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > + return -ENOMEM; > > > + > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > if (unlikely(!buf)) > > > return -ENOMEM; > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > */ > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > + return -ENOMEM; > > > + > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > + len -= sizeof(struct virtnet_rq_dma); > > > + > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > if (unlikely(!buf)) > > > return -ENOMEM; > > > -- > > > 2.32.0.3.g01195cf9f > >
On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > leads to regression on VM with the sysctl value of: > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > to VM): > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > everything is fine. However, if the frag is only one page and the > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > the first buffer of the frag is affected. > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > Guys where are we going with this? We have a crasher right now, > > > if this is not fixed ASAP I'd have to revert a ton of > > > work Xuan Zhuo just did. > > > > I think this patch can fix it and I tested it. > > But Darren said this patch did not work. > > I need more info about the crash that Darren encountered. > > > > Thanks. > > So what are we doing? Revert the whole pile for now? > Seems to be a bit of a pity, but maybe that's the best we can do > for this release. @Jason Could you review this? I think this problem is clear, though I do not know why it did not work for Darren. Thanks. > > > > > > > > > > > > --- > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > index c6af18948092..e5286a6da863 100644 > > > > --- a/drivers/net/virtio_net.c > > > > +++ b/drivers/net/virtio_net.c > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > void *buf, *head; > > > > dma_addr_t addr; > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > - return NULL; > > > > - > > > > head = page_address(alloc_frag->page); > > > > > > > > dma = head; > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > len = SKB_DATA_ALIGN(len) + > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > + return -ENOMEM; > > > > + > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > if (unlikely(!buf)) > > > > return -ENOMEM; > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > */ > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > + return -ENOMEM; > > > > + > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > + > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > if (unlikely(!buf)) > > > > return -ENOMEM; > > > > -- > > > > 2.32.0.3.g01195cf9f > > > >
On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote: > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > to VM): > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > everything is fine. However, if the frag is only one page and the > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > the first buffer of the frag is affected. > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > > > > Guys where are we going with this? We have a crasher right now, > > > > if this is not fixed ASAP I'd have to revert a ton of > > > > work Xuan Zhuo just did. > > > > > > I think this patch can fix it and I tested it. > > > But Darren said this patch did not work. > > > I need more info about the crash that Darren encountered. > > > > > > Thanks. > > > > So what are we doing? Revert the whole pile for now? > > Seems to be a bit of a pity, but maybe that's the best we can do > > for this release. > > @Jason Could you review this? > > I think this problem is clear, though I do not know why it did not work > for Darren. > > Thanks. > No regressions is a hard rule. If we can't figure out the regression now, we should revert and you can try again for the next release. > > > > > > > > > > > > > > > > > --- > > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > index c6af18948092..e5286a6da863 100644 > > > > > --- a/drivers/net/virtio_net.c > > > > > +++ b/drivers/net/virtio_net.c > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > > void *buf, *head; > > > > > dma_addr_t addr; > > > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > > - return NULL; > > > > > - > > > > > head = page_address(alloc_frag->page); > > > > > > > > > > dma = head; > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > > len = SKB_DATA_ALIGN(len) + > > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > > + return -ENOMEM; > > > > > + > > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > > if (unlikely(!buf)) > > > > > return -ENOMEM; > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > */ > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > + return -ENOMEM; > > > > > + > > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > > + > > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > > if (unlikely(!buf)) > > > > > return -ENOMEM; > > > > > -- > > > > > 2.32.0.3.g01195cf9f > > > > > >
On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote: > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > > to VM): > > > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > > everything is fine. However, if the frag is only one page and the > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > > the first buffer of the frag is affected. > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > > > > > > > Guys where are we going with this? We have a crasher right now, > > > > > if this is not fixed ASAP I'd have to revert a ton of > > > > > work Xuan Zhuo just did. > > > > > > > > I think this patch can fix it and I tested it. > > > > But Darren said this patch did not work. > > > > I need more info about the crash that Darren encountered. > > > > > > > > Thanks. > > > > > > So what are we doing? Revert the whole pile for now? > > > Seems to be a bit of a pity, but maybe that's the best we can do > > > for this release. > > > > @Jason Could you review this? > > > > I think this problem is clear, though I do not know why it did not work > > for Darren. > > > > Thanks. > > > > No regressions is a hard rule. If we can't figure out the regression > now, we should revert and you can try again for the next release. I see. I think I fixed it. Hope Darren can reply before you post the revert patches. Thanks. > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > index c6af18948092..e5286a6da863 100644 > > > > > > --- a/drivers/net/virtio_net.c > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > > > void *buf, *head; > > > > > > dma_addr_t addr; > > > > > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > > > - return NULL; > > > > > > - > > > > > > head = page_address(alloc_frag->page); > > > > > > > > > > > > dma = head; > > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > > > len = SKB_DATA_ALIGN(len) + > > > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > > > + return -ENOMEM; > > > > > > + > > > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > > > if (unlikely(!buf)) > > > > > > return -ENOMEM; > > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > > */ > > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > + return -ENOMEM; > > > > > > + > > > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > > > + > > > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > > > if (unlikely(!buf)) > > > > > > return -ENOMEM; > > > > > > -- > > > > > > 2.32.0.3.g01195cf9f > > > > > > > > >
On Fri, Sep 06, 2024 at 05:46:02PM +0800, Xuan Zhuo wrote: > On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote: > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > > > to VM): > > > > > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > > > everything is fine. However, if the frag is only one page and the > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > > > the first buffer of the frag is affected. > > > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > > > > > > > > > > Guys where are we going with this? We have a crasher right now, > > > > > > if this is not fixed ASAP I'd have to revert a ton of > > > > > > work Xuan Zhuo just did. > > > > > > > > > > I think this patch can fix it and I tested it. > > > > > But Darren said this patch did not work. > > > > > I need more info about the crash that Darren encountered. > > > > > > > > > > Thanks. > > > > > > > > So what are we doing? Revert the whole pile for now? > > > > Seems to be a bit of a pity, but maybe that's the best we can do > > > > for this release. > > > > > > @Jason Could you review this? > > > > > > I think this problem is clear, though I do not know why it did not work > > > for Darren. > > > > > > Thanks. > > > > > > > No regressions is a hard rule. If we can't figure out the regression > > now, we should revert and you can try again for the next release. > > I see. I think I fixed it. > > Hope Darren can reply before you post the revert patches. > > Thanks. > It's very rushed anyway. I posted the reverts, but as RFC for now. You should post a debugging patch for Darren to help you figure out what is going on. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > > index c6af18948092..e5286a6da863 100644 > > > > > > > --- a/drivers/net/virtio_net.c > > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > > > > void *buf, *head; > > > > > > > dma_addr_t addr; > > > > > > > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > > > > - return NULL; > > > > > > > - > > > > > > > head = page_address(alloc_frag->page); > > > > > > > > > > > > > > dma = head; > > > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > > > > len = SKB_DATA_ALIGN(len) + > > > > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > > > > + return -ENOMEM; > > > > > > > + > > > > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > > > > if (unlikely(!buf)) > > > > > > > return -ENOMEM; > > > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > > > */ > > > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > > + return -ENOMEM; > > > > > > > + > > > > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > > > > + > > > > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > > > > if (unlikely(!buf)) > > > > > > > return -ENOMEM; > > > > > > > -- > > > > > > > 2.32.0.3.g01195cf9f > > > > > > > > > > > >
2024年9月6日(金) 18:55 Michael S. Tsirkin <mst@redhat.com>: > > On Fri, Sep 06, 2024 at 05:46:02PM +0800, Xuan Zhuo wrote: > > On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote: > > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > > > > to VM): > > > > > > > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > > > > everything is fine. However, if the frag is only one page and the > > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > > > > the first buffer of the frag is affected. > > > > > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > > > > > > > > > > > > > Guys where are we going with this? We have a crasher right now, > > > > > > > if this is not fixed ASAP I'd have to revert a ton of > > > > > > > work Xuan Zhuo just did. > > > > > > > > > > > > I think this patch can fix it and I tested it. > > > > > > But Darren said this patch did not work. > > > > > > I need more info about the crash that Darren encountered. > > > > > > > > > > > > Thanks. > > > > > > > > > > So what are we doing? Revert the whole pile for now? > > > > > Seems to be a bit of a pity, but maybe that's the best we can do > > > > > for this release. > > > > > > > > @Jason Could you review this? > > > > > > > > I think this problem is clear, though I do not know why it did not work > > > > for Darren. > > > > > > > > Thanks. > > > > > > > > > > No regressions is a hard rule. If we can't figure out the regression > > > now, we should revert and you can try again for the next release. > > > > I see. I think I fixed it. > > > > Hope Darren can reply before you post the revert patches. > > > > Thanks. > > > > It's very rushed anyway. I posted the reverts, but as RFC for now. > You should post a debugging patch for Darren to help you figure > out what is going on. > > Hello, My issue [1], which bisected to the commit f9dac92ba908, was resolved after applying the patch on v6.11-rc6. [1] https://bugzilla.kernel.org/show_bug.cgi?id=219154 In my case, random crashes occur when receiving large data under heavy memory/IO load. Although the crash details differ, the memory corruption during data transfers is consistent. If Darren is unable to confirm the fix, would it be possible to consider merging this patch to close [1] instead? Thanks.
On Sat, Sep 07, 2024 at 12:16:24PM +0900, Takero Funaki wrote: > 2024年9月6日(金) 18:55 Michael S. Tsirkin <mst@redhat.com>: > > > > On Fri, Sep 06, 2024 at 05:46:02PM +0800, Xuan Zhuo wrote: > > > On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote: > > > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > > > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > > > > > to VM): > > > > > > > > > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > > > > > everything is fine. However, if the frag is only one page and the > > > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > > > > > the first buffer of the frag is affected. > > > > > > > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > > > > > > > > > > > > > > > > Guys where are we going with this? We have a crasher right now, > > > > > > > > if this is not fixed ASAP I'd have to revert a ton of > > > > > > > > work Xuan Zhuo just did. > > > > > > > > > > > > > > I think this patch can fix it and I tested it. > > > > > > > But Darren said this patch did not work. > > > > > > > I need more info about the crash that Darren encountered. > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > So what are we doing? Revert the whole pile for now? > > > > > > Seems to be a bit of a pity, but maybe that's the best we can do > > > > > > for this release. > > > > > > > > > > @Jason Could you review this? > > > > > > > > > > I think this problem is clear, though I do not know why it did not work > > > > > for Darren. > > > > > > > > > > Thanks. > > > > > > > > > > > > > No regressions is a hard rule. If we can't figure out the regression > > > > now, we should revert and you can try again for the next release. > > > > > > I see. I think I fixed it. > > > > > > Hope Darren can reply before you post the revert patches. > > > > > > Thanks. > > > > > > > It's very rushed anyway. I posted the reverts, but as RFC for now. > > You should post a debugging patch for Darren to help you figure > > out what is going on. > > > > > > Hello, > > My issue [1], which bisected to the commit f9dac92ba908, was resolved > after applying the patch on v6.11-rc6. > [1] https://bugzilla.kernel.org/show_bug.cgi?id=219154 > > In my case, random crashes occur when receiving large data under heavy > memory/IO load. Although the crash details differ, the memory > corruption during data transfers is consistent. > > If Darren is unable to confirm the fix, would it be possible to > consider merging this patch to close [1] instead? > > Thanks. Could you also test https://lore.kernel.org/all/cover.1725616135.git.mst@redhat.com/ please?
2024年9月8日(日) 19:19 Michael S. Tsirkin <mst@redhat.com>: > > On Sat, Sep 07, 2024 at 12:16:24PM +0900, Takero Funaki wrote: > > 2024年9月6日(金) 18:55 Michael S. Tsirkin <mst@redhat.com>: > > > > > > On Fri, Sep 06, 2024 at 05:46:02PM +0800, Xuan Zhuo wrote: > > > > On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote: > > > > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > > > > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > > > > > > to VM): > > > > > > > > > > > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > > > > > > everything is fine. However, if the frag is only one page and the > > > > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > > > > > > the first buffer of the frag is affected. > > > > > > > > > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > Guys where are we going with this? We have a crasher right now, > > > > > > > > > if this is not fixed ASAP I'd have to revert a ton of > > > > > > > > > work Xuan Zhuo just did. > > > > > > > > > > > > > > > > I think this patch can fix it and I tested it. > > > > > > > > But Darren said this patch did not work. > > > > > > > > I need more info about the crash that Darren encountered. > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > So what are we doing? Revert the whole pile for now? > > > > > > > Seems to be a bit of a pity, but maybe that's the best we can do > > > > > > > for this release. > > > > > > > > > > > > @Jason Could you review this? > > > > > > > > > > > > I think this problem is clear, though I do not know why it did not work > > > > > > for Darren. > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > No regressions is a hard rule. If we can't figure out the regression > > > > > now, we should revert and you can try again for the next release. > > > > > > > > I see. I think I fixed it. > > > > > > > > Hope Darren can reply before you post the revert patches. > > > > > > > > Thanks. > > > > > > > > > > It's very rushed anyway. I posted the reverts, but as RFC for now. > > > You should post a debugging patch for Darren to help you figure > > > out what is going on. > > > > > > > > > > Hello, > > > > My issue [1], which bisected to the commit f9dac92ba908, was resolved > > after applying the patch on v6.11-rc6. > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=219154 > > > > In my case, random crashes occur when receiving large data under heavy > > memory/IO load. Although the crash details differ, the memory > > corruption during data transfers is consistent. > > > > If Darren is unable to confirm the fix, would it be possible to > > consider merging this patch to close [1] instead? > > > > Thanks. > > > Could you also test > https://lore.kernel.org/all/cover.1725616135.git.mst@redhat.com/ > > please? > I have confirmed that both your patchset [1] and Xuan's patchset [2] on v6.11-rc6 successfully resolved my issue. [1] https://lore.kernel.org/all/cover.1725616135.git.mst@redhat.com/ [2] https://lore.kernel.org/netdev/20240906123137.108741-1-xuanzhuo@linux.alibaba.com/ It seems that simply reverting the original patchset would suffice to resolve the crash.
On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > leads to regression on VM with the sysctl value of: > > - net.core.high_order_alloc_disable=1 > > which could see reliable crashes or scp failure (scp a file 100M in size > to VM): > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > of a new frag. When the frag size is larger than PAGE_SIZE, > everything is fine. However, if the frag is only one page and the > total size of the buffer and virtnet_rq_dma is larger than one page, an > overflow may occur. In this case, if an overflow is possible, I adjust > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > the first buffer of the frag is affected. > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> BTW why isn't it needed if we revert f9dac92ba908? > --- > drivers/net/virtio_net.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c6af18948092..e5286a6da863 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > void *buf, *head; > dma_addr_t addr; > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > - return NULL; > - > head = page_address(alloc_frag->page); > > dma = head; > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > len = SKB_DATA_ALIGN(len) + > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > + return -ENOMEM; > + > buf = virtnet_rq_alloc(rq, len, gfp); > if (unlikely(!buf)) > return -ENOMEM; > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > */ > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > + return -ENOMEM; > + > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > + len -= sizeof(struct virtnet_rq_dma); > + > buf = virtnet_rq_alloc(rq, len + room, gfp); > if (unlikely(!buf)) > return -ENOMEM; > -- > 2.32.0.3.g01195cf9f
On Sun, 8 Sep 2024 15:40:32 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > leads to regression on VM with the sysctl value of: > > > > - net.core.high_order_alloc_disable=1 > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > to VM): > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > of a new frag. When the frag size is larger than PAGE_SIZE, > > everything is fine. However, if the frag is only one page and the > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > overflow may occur. In this case, if an overflow is possible, I adjust > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > the first buffer of the frag is affected. > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > BTW why isn't it needed if we revert f9dac92ba908? This patch fixes the bug in premapped mode. The revert operation just disables premapped mode. So I think this patch is enough to fix the bug, and we can enable premapped by default. If you worry about the premapped mode, I advice you merge this patch and do the revert[1]. Then the bug is fixed, and the premapped mode is disabled by default, we can just enable it for af-xdp. [1]: http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com Thanks. > > > --- > > drivers/net/virtio_net.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c6af18948092..e5286a6da863 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > void *buf, *head; > > dma_addr_t addr; > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > - return NULL; > > - > > head = page_address(alloc_frag->page); > > > > dma = head; > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > len = SKB_DATA_ALIGN(len) + > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > + return -ENOMEM; > > + > > buf = virtnet_rq_alloc(rq, len, gfp); > > if (unlikely(!buf)) > > return -ENOMEM; > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > */ > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > + return -ENOMEM; > > + > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > + len -= sizeof(struct virtnet_rq_dma); > > + > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > if (unlikely(!buf)) > > return -ENOMEM; > > -- > > 2.32.0.3.g01195cf9f >
On Fri, Sep 6, 2024 at 5:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > to VM): > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > everything is fine. However, if the frag is only one page and the > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > the first buffer of the frag is affected. > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > > > > Guys where are we going with this? We have a crasher right now, > > > > if this is not fixed ASAP I'd have to revert a ton of > > > > work Xuan Zhuo just did. > > > > > > I think this patch can fix it and I tested it. > > > But Darren said this patch did not work. > > > I need more info about the crash that Darren encountered. > > > > > > Thanks. > > > > So what are we doing? Revert the whole pile for now? > > Seems to be a bit of a pity, but maybe that's the best we can do > > for this release. > > @Jason Could you review this? I think we probably need some tweaks for this patch. For example, the changelog is not easy to be understood especially consider it starts something like: " leads to regression on VM with the sysctl value of: - net.core.high_order_alloc_disable=1 which could see reliable crashes or scp failure (scp a file 100M in size to VM): " Need some context and actually sysctl is not a must to reproduce the issue, it can also happen when memory is fragmented. Another issue is that, if we move the skb_page_frag_refill() out of the virtnet_rq_alloc(). The function semantics turns out to be weird: skb_page_frag_refill(len, &rq->alloc_frag, gfp); ... virtnet_rq_alloc(rq, len, gfp); I wonder instead of subtracting the dma->len, how about simply count the dma->len in len if we call virtnet_rq_aloc() in add_recvbuf_small()? > > I think this problem is clear, though I do not know why it did not work > for Darren. I had a try. This issue could be reproduced easily and this patch seems to fix the issue with a KASAN enabled kernel. Thanks > > Thanks. > > > > > > > > > > > > > > > > > > > --- > > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > index c6af18948092..e5286a6da863 100644 > > > > > --- a/drivers/net/virtio_net.c > > > > > +++ b/drivers/net/virtio_net.c > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > > void *buf, *head; > > > > > dma_addr_t addr; > > > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > > - return NULL; > > > > > - > > > > > head = page_address(alloc_frag->page); > > > > > > > > > > dma = head; > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > > len = SKB_DATA_ALIGN(len) + > > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > > + return -ENOMEM; > > > > > + > > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > > if (unlikely(!buf)) > > > > > return -ENOMEM; > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > */ > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > + return -ENOMEM; > > > > > + > > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > > + > > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > > if (unlikely(!buf)) > > > > > return -ENOMEM; > > > > > -- > > > > > 2.32.0.3.g01195cf9f > > > > > > >
On Mon, 9 Sep 2024 16:38:16 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Fri, Sep 6, 2024 at 5:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > > to VM): > > > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > > everything is fine. However, if the frag is only one page and the > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > > the first buffer of the frag is affected. > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > > > > > > > Guys where are we going with this? We have a crasher right now, > > > > > if this is not fixed ASAP I'd have to revert a ton of > > > > > work Xuan Zhuo just did. > > > > > > > > I think this patch can fix it and I tested it. > > > > But Darren said this patch did not work. > > > > I need more info about the crash that Darren encountered. > > > > > > > > Thanks. > > > > > > So what are we doing? Revert the whole pile for now? > > > Seems to be a bit of a pity, but maybe that's the best we can do > > > for this release. > > > > @Jason Could you review this? > > I think we probably need some tweaks for this patch. > > For example, the changelog is not easy to be understood especially > consider it starts something like: > > " > leads to regression on VM with the sysctl value of: > > - net.core.high_order_alloc_disable=1 > > which could see reliable crashes or scp failure (scp a file 100M in size > to VM): > " > > Need some context and actually sysctl is not a must to reproduce the > issue, it can also happen when memory is fragmented. OK. > > Another issue is that, if we move the skb_page_frag_refill() out of > the virtnet_rq_alloc(). The function semantics turns out to be weird: > > skb_page_frag_refill(len, &rq->alloc_frag, gfp); > ... > virtnet_rq_alloc(rq, len, gfp); YES. > > I wonder instead of subtracting the dma->len, how about simply count > the dma->len in len if we call virtnet_rq_aloc() in > add_recvbuf_small()? 1. For the small mode, it is safe. That just happens in the merge mode. 2. In the merge mode, if we count the dma->len in len, we should know if the frag->offset is zero or not. We can not do that before skb_page_frag_refill(), because skb_page_frag_refill() may allocate new page, the frag->offset is zero. So the judgment must is after skb_page_frag_refill(). Thanks. > > > > > I think this problem is clear, though I do not know why it did not work > > for Darren. > > I had a try. This issue could be reproduced easily and this patch > seems to fix the issue with a KASAN enabled kernel. > > Thanks > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > index c6af18948092..e5286a6da863 100644 > > > > > > --- a/drivers/net/virtio_net.c > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > > > void *buf, *head; > > > > > > dma_addr_t addr; > > > > > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > > > - return NULL; > > > > > > - > > > > > > head = page_address(alloc_frag->page); > > > > > > > > > > > > dma = head; > > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > > > len = SKB_DATA_ALIGN(len) + > > > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > > > + return -ENOMEM; > > > > > > + > > > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > > > if (unlikely(!buf)) > > > > > > return -ENOMEM; > > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > > */ > > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > + return -ENOMEM; > > > > > > + > > > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > > > + > > > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > > > if (unlikely(!buf)) > > > > > > return -ENOMEM; > > > > > > -- > > > > > > 2.32.0.3.g01195cf9f > > > > > > > > > > >
On Mon, Sep 9, 2024 at 11:16 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Sun, 8 Sep 2024 15:40:32 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > leads to regression on VM with the sysctl value of: > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > to VM): > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > everything is fine. However, if the frag is only one page and the > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > the first buffer of the frag is affected. > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > BTW why isn't it needed if we revert f9dac92ba908? > > > This patch fixes the bug in premapped mode. > > The revert operation just disables premapped mode. > > So I think this patch is enough to fix the bug, and we can enable > premapped by default. > > If you worry about the premapped mode, I advice you merge this patch and do > the revert[1]. Then the bug is fixed, and the premapped mode is > disabled by default, we can just enable it for af-xdp. > > [1]: http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com Though I think this is a good balance but if we can't get more inputs from Darren. It seems safer to merge what he had tested. Thanks > > Thanks. > > > > > > > --- > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index c6af18948092..e5286a6da863 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > void *buf, *head; > > > dma_addr_t addr; > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > - return NULL; > > > - > > > head = page_address(alloc_frag->page); > > > > > > dma = head; > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > len = SKB_DATA_ALIGN(len) + > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > + return -ENOMEM; > > > + > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > if (unlikely(!buf)) > > > return -ENOMEM; > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > */ > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > + return -ENOMEM; > > > + > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > + len -= sizeof(struct virtnet_rq_dma); > > > + > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > if (unlikely(!buf)) > > > return -ENOMEM; > > > -- > > > 2.32.0.3.g01195cf9f > > >
On Mon, 9 Sep 2024 16:47:02 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Sep 9, 2024 at 11:16 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Sun, 8 Sep 2024 15:40:32 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > leads to regression on VM with the sysctl value of: > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > to VM): > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > everything is fine. However, if the frag is only one page and the > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > the first buffer of the frag is affected. > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > BTW why isn't it needed if we revert f9dac92ba908? > > > > > > This patch fixes the bug in premapped mode. > > > > The revert operation just disables premapped mode. > > > > So I think this patch is enough to fix the bug, and we can enable > > premapped by default. > > > > If you worry about the premapped mode, I advice you merge this patch and do > > the revert[1]. Then the bug is fixed, and the premapped mode is > > disabled by default, we can just enable it for af-xdp. > > > > [1]: http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com > > Though I think this is a good balance but if we can't get more inputs > from Darren. It seems safer to merge what he had tested. So you mean just merge this: http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com Right? Thanks > > Thanks > > > > > Thanks. > > > > > > > > > > > --- > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > index c6af18948092..e5286a6da863 100644 > > > > --- a/drivers/net/virtio_net.c > > > > +++ b/drivers/net/virtio_net.c > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > void *buf, *head; > > > > dma_addr_t addr; > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > - return NULL; > > > > - > > > > head = page_address(alloc_frag->page); > > > > > > > > dma = head; > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > len = SKB_DATA_ALIGN(len) + > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > + return -ENOMEM; > > > > + > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > if (unlikely(!buf)) > > > > return -ENOMEM; > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > */ > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > + return -ENOMEM; > > > > + > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > + > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > if (unlikely(!buf)) > > > > return -ENOMEM; > > > > -- > > > > 2.32.0.3.g01195cf9f > > > > > >
Hi, Apologies, I've been OOTO for the best part of 2 weeks, I'll try get to this as soon as I can, but playing catch-up right now. Thanks, Darren. On Saturday, 2024-09-07 at 12:16:24 +09, Takero Funaki wrote: > 2024年9月6日(金) 18:55 Michael S. Tsirkin <mst@redhat.com>: >> >> On Fri, Sep 06, 2024 at 05:46:02PM +0800, Xuan Zhuo wrote: >> > On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: >> > > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote: >> > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: >> > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: >> > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: >> > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: >> > > > > > > > leads to regression on VM with the sysctl value of: >> > > > > > > > >> > > > > > > > - net.core.high_order_alloc_disable=1 >> > > > > > > > >> > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size >> > > > > > > > to VM): >> > > > > > > > >> > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning >> > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, >> > > > > > > > everything is fine. However, if the frag is only one page and the >> > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an >> > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust >> > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum >> > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only >> > > > > > > > the first buffer of the frag is affected. >> > > > > > > > >> > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") >> > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> >> > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com >> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> >> > > > > > > >> > > > > > > >> > > > > > > Guys where are we going with this? We have a crasher right now, >> > > > > > > if this is not fixed ASAP I'd have to revert a ton of >> > > > > > > work Xuan Zhuo just did. >> > > > > > >> > > > > > I think this patch can fix it and I tested it. >> > > > > > But Darren said this patch did not work. >> > > > > > I need more info about the crash that Darren encountered. >> > > > > > >> > > > > > Thanks. >> > > > > >> > > > > So what are we doing? Revert the whole pile for now? >> > > > > Seems to be a bit of a pity, but maybe that's the best we can do >> > > > > for this release. >> > > > >> > > > @Jason Could you review this? >> > > > >> > > > I think this problem is clear, though I do not know why it did not work >> > > > for Darren. >> > > > >> > > > Thanks. >> > > > >> > > >> > > No regressions is a hard rule. If we can't figure out the regression >> > > now, we should revert and you can try again for the next release. >> > >> > I see. I think I fixed it. >> > >> > Hope Darren can reply before you post the revert patches. >> > >> > Thanks. >> > >> >> It's very rushed anyway. I posted the reverts, but as RFC for now. >> You should post a debugging patch for Darren to help you figure >> out what is going on. >> >> > > Hello, > > My issue [1], which bisected to the commit f9dac92ba908, was resolved > after applying the patch on v6.11-rc6. > [1] https://bugzilla.kernel.org/show_bug.cgi?id=219154 > > In my case, random crashes occur when receiving large data under heavy > memory/IO load. Although the crash details differ, the memory > corruption during data transfers is consistent. > > If Darren is unable to confirm the fix, would it be possible to > consider merging this patch to close [1] instead? > > Thanks.
On Mon, Sep 9, 2024 at 4:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 9 Sep 2024 16:47:02 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Sep 9, 2024 at 11:16 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Sun, 8 Sep 2024 15:40:32 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > to VM): > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > everything is fine. However, if the frag is only one page and the > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > the first buffer of the frag is affected. > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > > > > BTW why isn't it needed if we revert f9dac92ba908? > > > > > > > > > This patch fixes the bug in premapped mode. > > > > > > The revert operation just disables premapped mode. > > > > > > So I think this patch is enough to fix the bug, and we can enable > > > premapped by default. > > > > > > If you worry about the premapped mode, I advice you merge this patch and do > > > the revert[1]. Then the bug is fixed, and the premapped mode is > > > disabled by default, we can just enable it for af-xdp. > > > > > > [1]: http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com > > > > Though I think this is a good balance but if we can't get more inputs > > from Darren. It seems safer to merge what he had tested. > > So you mean just merge this: > > http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com > > Right? It looks to me it hasn't been tested by Darren yet. Thanks > > Thanks > > > > > > Thanks > > > > > > > > Thanks. > > > > > > > > > > > > > > > --- > > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > index c6af18948092..e5286a6da863 100644 > > > > > --- a/drivers/net/virtio_net.c > > > > > +++ b/drivers/net/virtio_net.c > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > > void *buf, *head; > > > > > dma_addr_t addr; > > > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > > - return NULL; > > > > > - > > > > > head = page_address(alloc_frag->page); > > > > > > > > > > dma = head; > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > > len = SKB_DATA_ALIGN(len) + > > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > > + return -ENOMEM; > > > > > + > > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > > if (unlikely(!buf)) > > > > > return -ENOMEM; > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > */ > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > + return -ENOMEM; > > > > > + > > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > > + > > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > > if (unlikely(!buf)) > > > > > return -ENOMEM; > > > > > -- > > > > > 2.32.0.3.g01195cf9f > > > > > > > > > >
On Mon, Sep 9, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 9 Sep 2024 16:38:16 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Fri, Sep 6, 2024 at 5:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > > > to VM): > > > > > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > > > everything is fine. However, if the frag is only one page and the > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > > > the first buffer of the frag is affected. > > > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > > > > > > > > > > Guys where are we going with this? We have a crasher right now, > > > > > > if this is not fixed ASAP I'd have to revert a ton of > > > > > > work Xuan Zhuo just did. > > > > > > > > > > I think this patch can fix it and I tested it. > > > > > But Darren said this patch did not work. > > > > > I need more info about the crash that Darren encountered. > > > > > > > > > > Thanks. > > > > > > > > So what are we doing? Revert the whole pile for now? > > > > Seems to be a bit of a pity, but maybe that's the best we can do > > > > for this release. > > > > > > @Jason Could you review this? > > > > I think we probably need some tweaks for this patch. > > > > For example, the changelog is not easy to be understood especially > > consider it starts something like: > > > > " > > leads to regression on VM with the sysctl value of: > > > > - net.core.high_order_alloc_disable=1 > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > to VM): > > " > > > > Need some context and actually sysctl is not a must to reproduce the > > issue, it can also happen when memory is fragmented. > > OK. > > > > > > Another issue is that, if we move the skb_page_frag_refill() out of > > the virtnet_rq_alloc(). The function semantics turns out to be weird: > > > > skb_page_frag_refill(len, &rq->alloc_frag, gfp); > > ... > > virtnet_rq_alloc(rq, len, gfp); > > YES. > > > > > I wonder instead of subtracting the dma->len, how about simply count > > the dma->len in len if we call virtnet_rq_aloc() in > > add_recvbuf_small()? > > 1. For the small mode, it is safe. That just happens in the merge mode. > 2. In the merge mode, if we count the dma->len in len, we should know > if the frag->offset is zero or not. We can not do that before > skb_page_frag_refill(), because skb_page_frag_refill() may allocate > new page, the frag->offset is zero. So the judgment must is after > skb_page_frag_refill(). I may miss something. I mean always reserve dma->len for each frag. But anyway, we need to tweak the function API, either explicitly pass the frag or use the rq->frag implicitly. Thanks > > Thanks. > > > > > > > > > > I think this problem is clear, though I do not know why it did not work > > > for Darren. > > > > I had a try. This issue could be reproduced easily and this patch > > seems to fix the issue with a KASAN enabled kernel. > > > > Thanks > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > > index c6af18948092..e5286a6da863 100644 > > > > > > > --- a/drivers/net/virtio_net.c > > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > > > > void *buf, *head; > > > > > > > dma_addr_t addr; > > > > > > > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > > > > - return NULL; > > > > > > > - > > > > > > > head = page_address(alloc_frag->page); > > > > > > > > > > > > > > dma = head; > > > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > > > > len = SKB_DATA_ALIGN(len) + > > > > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > > > > + return -ENOMEM; > > > > > > > + > > > > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > > > > if (unlikely(!buf)) > > > > > > > return -ENOMEM; > > > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > > > */ > > > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > > + return -ENOMEM; > > > > > > > + > > > > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > > > > + > > > > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > > > > if (unlikely(!buf)) > > > > > > > return -ENOMEM; > > > > > > > -- > > > > > > > 2.32.0.3.g01195cf9f > > > > > > > > > > > > > > > >
On Tue, 10 Sep 2024 14:18:37 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Sep 9, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 9 Sep 2024 16:38:16 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Fri, Sep 6, 2024 at 5:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote: > > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote: > > > > > > > > leads to regression on VM with the sysctl value of: > > > > > > > > > > > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > > > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > > > > > > to VM): > > > > > > > > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning > > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE, > > > > > > > > everything is fine. However, if the frag is only one page and the > > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an > > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust > > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum > > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only > > > > > > > > the first buffer of the frag is affected. > > > > > > > > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") > > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> > > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > > > > > > > > > > > > > > > > > > > > Guys where are we going with this? We have a crasher right now, > > > > > > > if this is not fixed ASAP I'd have to revert a ton of > > > > > > > work Xuan Zhuo just did. > > > > > > > > > > > > I think this patch can fix it and I tested it. > > > > > > But Darren said this patch did not work. > > > > > > I need more info about the crash that Darren encountered. > > > > > > > > > > > > Thanks. > > > > > > > > > > So what are we doing? Revert the whole pile for now? > > > > > Seems to be a bit of a pity, but maybe that's the best we can do > > > > > for this release. > > > > > > > > @Jason Could you review this? > > > > > > I think we probably need some tweaks for this patch. > > > > > > For example, the changelog is not easy to be understood especially > > > consider it starts something like: > > > > > > " > > > leads to regression on VM with the sysctl value of: > > > > > > - net.core.high_order_alloc_disable=1 > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size > > > to VM): > > > " > > > > > > Need some context and actually sysctl is not a must to reproduce the > > > issue, it can also happen when memory is fragmented. > > > > OK. > > > > > > > > > > Another issue is that, if we move the skb_page_frag_refill() out of > > > the virtnet_rq_alloc(). The function semantics turns out to be weird: > > > > > > skb_page_frag_refill(len, &rq->alloc_frag, gfp); > > > ... > > > virtnet_rq_alloc(rq, len, gfp); > > > > YES. > > > > > > > > I wonder instead of subtracting the dma->len, how about simply count > > > the dma->len in len if we call virtnet_rq_aloc() in > > > add_recvbuf_small()? > > > > 1. For the small mode, it is safe. That just happens in the merge mode. > > 2. In the merge mode, if we count the dma->len in len, we should know > > if the frag->offset is zero or not. We can not do that before > > skb_page_frag_refill(), because skb_page_frag_refill() may allocate > > new page, the frag->offset is zero. So the judgment must is after > > skb_page_frag_refill(). > > I may miss something. I mean always reserve dma->len for each frag. This problem is a little complex. We need to consider one point, if the frag reserves sizeof(dma), then the len must minus that, otherwise we can not allocate from the frag when frag is only one page and 'len' is PAGE_SIZE. Thanks. > > But anyway, we need to tweak the function API, either explicitly pass > the frag or use the rq->frag implicitly. > > Thanks > > > > > Thanks. > > > > > > > > > > > > > > > I think this problem is clear, though I do not know why it did not work > > > > for Darren. > > > > > > I had a try. This issue could be reproduced easily and this patch > > > seems to fix the issue with a KASAN enabled kernel. > > > > > > Thanks > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > drivers/net/virtio_net.c | 12 +++++++++--- > > > > > > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > > > index c6af18948092..e5286a6da863 100644 > > > > > > > > --- a/drivers/net/virtio_net.c > > > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > > > > > > > > void *buf, *head; > > > > > > > > dma_addr_t addr; > > > > > > > > > > > > > > > > - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > > > > > > > - return NULL; > > > > > > > > - > > > > > > > > head = page_address(alloc_frag->page); > > > > > > > > > > > > > > > > dma = head; > > > > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, > > > > > > > > len = SKB_DATA_ALIGN(len) + > > > > > > > > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) > > > > > > > > + return -ENOMEM; > > > > > > > > + > > > > > > > > buf = virtnet_rq_alloc(rq, len, gfp); > > > > > > > > if (unlikely(!buf)) > > > > > > > > return -ENOMEM; > > > > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > > > > */ > > > > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > > > + return -ENOMEM; > > > > > > > > + > > > > > > > > + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) > > > > > > > > + len -= sizeof(struct virtnet_rq_dma); > > > > > > > > + > > > > > > > > buf = virtnet_rq_alloc(rq, len + room, gfp); > > > > > > > > if (unlikely(!buf)) > > > > > > > > return -ENOMEM; > > > > > > > > -- > > > > > > > > 2.32.0.3.g01195cf9f > > > > > > > > > > > > > > > > > > > > > >
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c6af18948092..e5286a6da863 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) void *buf, *head; dma_addr_t addr; - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) - return NULL; - head = page_address(alloc_frag->page); dma = head; @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, len = SKB_DATA_ALIGN(len) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) + return -ENOMEM; + buf = virtnet_rq_alloc(rq, len, gfp); if (unlikely(!buf)) return -ENOMEM; @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, */ len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) + return -ENOMEM; + + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size) + len -= sizeof(struct virtnet_rq_dma); + buf = virtnet_rq_alloc(rq, len + room, gfp); if (unlikely(!buf)) return -ENOMEM;
leads to regression on VM with the sysctl value of: - net.core.high_order_alloc_disable=1 which could see reliable crashes or scp failure (scp a file 100M in size to VM): The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning of a new frag. When the frag size is larger than PAGE_SIZE, everything is fine. However, if the frag is only one page and the total size of the buffer and virtnet_rq_dma is larger than one page, an overflow may occur. In this case, if an overflow is possible, I adjust the buffer size. If net.core.high_order_alloc_disable=1, the maximum buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only the first buffer of the frag is affected. Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api") Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> --- drivers/net/virtio_net.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)