diff mbox series

[net] virtio-net: fix overflow inside virtnet_rq_alloc

Message ID 20240820071913.68004-1-xuanzhuo@linux.alibaba.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] virtio-net: fix overflow inside virtnet_rq_alloc | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch warning WARNING: line length of 97 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-20--09-00 (tests: 712)

Commit Message

Xuan Zhuo Aug. 20, 2024, 7:19 a.m. UTC
leads to regression on VM with the sysctl value of:

- net.core.high_order_alloc_disable=1

which could see reliable crashes or scp failure (scp a file 100M in size
to VM):

The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
of a new frag. When the frag size is larger than PAGE_SIZE,
everything is fine. However, if the frag is only one page and the
total size of the buffer and virtnet_rq_dma is larger than one page, an
overflow may occur. In this case, if an overflow is possible, I adjust
the buffer size. If net.core.high_order_alloc_disable=1, the maximum
buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
the first buffer of the frag is affected.

Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Alexander Lobakin Aug. 20, 2024, 3:12 p.m. UTC | #1
From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Tue, 20 Aug 2024 15:19:13 +0800

> leads to regression on VM with the sysctl value of:

Where's the beginning of the sentence? You mean, "This overflow leads"?

> 
> - net.core.high_order_alloc_disable=1

This `- ` can be removed - at least some syntax highlighters color it in
red :D

But I think you can just drop this reference to
high_order_alloc_disable. Just write that if the backing page is
order-0, then this happens.

> 
> which could see reliable crashes or scp failure (scp a file 100M in size
> to VM):
> 
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur. In this case, if an overflow is possible, I adjust
> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> the first buffer of the frag is affected.
> 
> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c6af18948092..e5286a6da863 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>  	void *buf, *head;
>  	dma_addr_t addr;
>  
> -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> -		return NULL;
> -
>  	head = page_address(alloc_frag->page);
>  
>  	dma = head;
> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	len = SKB_DATA_ALIGN(len) +
>  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  
> +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> +		return -ENOMEM;

Why did you move this call into the call sites?

> +
>  	buf = virtnet_rq_alloc(rq, len, gfp);
>  	if (unlikely(!buf))
>  		return -ENOMEM;
> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  	 */
>  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>  
> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> +		return -ENOMEM;
> +
> +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> +		len -= sizeof(struct virtnet_rq_dma);
> +
>  	buf = virtnet_rq_alloc(rq, len + room, gfp);

`len + room` is referenced 3 times here, perhaps is worth a variable?

Is it fine that you decrease @len here?

>  	if (unlikely(!buf))
>  		return -ENOMEM;

Thanks,
Olek
Michael S. Tsirkin Aug. 20, 2024, 4:50 p.m. UTC | #2
On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> leads to regression on VM with the sysctl value of:
> 
> - net.core.high_order_alloc_disable=1




> which could see reliable crashes or scp failure (scp a file 100M in size
> to VM):
> 
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur. In this case, if an overflow is possible, I adjust
> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> the first buffer of the frag is affected.
> 
> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>


Darren, could you pls test and confirm?

> ---
>  drivers/net/virtio_net.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c6af18948092..e5286a6da863 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>  	void *buf, *head;
>  	dma_addr_t addr;
>  
> -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> -		return NULL;
> -
>  	head = page_address(alloc_frag->page);
>  
>  	dma = head;
> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	len = SKB_DATA_ALIGN(len) +
>  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  
> +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> +		return -ENOMEM;
> +
>  	buf = virtnet_rq_alloc(rq, len, gfp);
>  	if (unlikely(!buf))
>  		return -ENOMEM;
> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  	 */
>  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>  
> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> +		return -ENOMEM;
> +
> +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> +		len -= sizeof(struct virtnet_rq_dma);
> +
>  	buf = virtnet_rq_alloc(rq, len + room, gfp);
>  	if (unlikely(!buf))
>  		return -ENOMEM;
> -- 
> 2.32.0.3.g01195cf9f
Si-Wei Liu Aug. 20, 2024, 7:44 p.m. UTC | #3
On 8/20/2024 12:19 AM, Xuan Zhuo wrote:
> leads to regression on VM with the sysctl value of:
>
> - net.core.high_order_alloc_disable=1
>
> which could see reliable crashes or scp failure (scp a file 100M in size
> to VM):
>
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur. In this case, if an overflow is possible, I adjust
> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> the first buffer of the frag is affected.
>
> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 12 +++++++++---
>   1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c6af18948092..e5286a6da863 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>   	void *buf, *head;
>   	dma_addr_t addr;
>   
> -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> -		return NULL;
> -
>   	head = page_address(alloc_frag->page);
>   
>   	dma = head;
> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>   	len = SKB_DATA_ALIGN(len) +
>   	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>   
> +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> +		return -ENOMEM;
> +
Do you want to document the assumption that small packet case won't end 
up crossing the page frag boundary unlike the mergeable case? Add a 
comment block to explain or a WARN_ON() check against potential overflow 
would work with me.

>   	buf = virtnet_rq_alloc(rq, len, gfp);
>   	if (unlikely(!buf))
>   		return -ENOMEM;
> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>   	 */
>   	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>   
> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> +		return -ENOMEM;
> +
> +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> +		len -= sizeof(struct virtnet_rq_dma);
> +
This could address my previous concern for possibly regressing every 
buffer size for the mergeable case, thanks. Though I still don't get why 
carving up a small chunk from page_frag for storing the virtnet_rq_dma 
metadata, this would cause perf regression on certain MTU size that 
happens to end up with one more base page (and an extra descriptor as 
well) to be allocated compared to the previous code without the extra 
virtnet_rq_dma content. How hard would it be to allocate a dedicated 
struct to store the related information without affecting the (size of) 
datapath pages?

FWIW, out of the code review perspective, I've looked up the past 
conversations but didn't see comprehensive benchmark was done before 
removing the old code and making premap the sole default mode. Granted 
this would reduce the footprint of additional code and the associated 
maintaining cost immediately, but I would assume at least there should 
have been thorough performance runs upfront to guarantee no regression 
is seen with every possible use case, or the negative effect is 
comparatively negligible even though there's slight regression in some 
limited case. If that kind of perf measurement hadn't been done before 
getting accepted/merged, I think at least it should allow both modes to 
coexist for a while such that every user could gauge the performance effect.

Thanks,
-Siwei

>   	buf = virtnet_rq_alloc(rq, len + room, gfp);
>   	if (unlikely(!buf))
>   		return -ENOMEM;
Michael S. Tsirkin Aug. 20, 2024, 8:09 p.m. UTC | #4
On Tue, Aug 20, 2024 at 12:44:46PM -0700, Si-Wei Liu wrote:
> 
> 
> On 8/20/2024 12:19 AM, Xuan Zhuo wrote:
> > leads to regression on VM with the sysctl value of:
> > 
> > - net.core.high_order_alloc_disable=1
> > 
> > which could see reliable crashes or scp failure (scp a file 100M in size
> > to VM):
> > 
> > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > of a new frag. When the frag size is larger than PAGE_SIZE,
> > everything is fine. However, if the frag is only one page and the
> > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > overflow may occur. In this case, if an overflow is possible, I adjust
> > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > the first buffer of the frag is affected.
> > 
> > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >   drivers/net/virtio_net.c | 12 +++++++++---
> >   1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c6af18948092..e5286a6da863 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> >   	void *buf, *head;
> >   	dma_addr_t addr;
> > -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > -		return NULL;
> > -
> >   	head = page_address(alloc_frag->page);
> >   	dma = head;
> > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >   	len = SKB_DATA_ALIGN(len) +
> >   	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > +		return -ENOMEM;
> > +
> Do you want to document the assumption that small packet case won't end up
> crossing the page frag boundary unlike the mergeable case? Add a comment
> block to explain or a WARN_ON() check against potential overflow would work
> with me.
> 
> >   	buf = virtnet_rq_alloc(rq, len, gfp);
> >   	if (unlikely(!buf))
> >   		return -ENOMEM;
> > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >   	 */
> >   	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > +		return -ENOMEM;
> > +
> > +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > +		len -= sizeof(struct virtnet_rq_dma);
> > +
> This could address my previous concern for possibly regressing every buffer
> size for the mergeable case, thanks. Though I still don't get why carving up
> a small chunk from page_frag for storing the virtnet_rq_dma metadata, this
> would cause perf regression on certain MTU size

4Kbyte MTU exactly?

> that happens to end up with
> one more base page (and an extra descriptor as well) to be allocated
> compared to the previous code without the extra virtnet_rq_dma content. How
> hard would it be to allocate a dedicated struct to store the related
> information without affecting the (size of) datapath pages?
> 
> FWIW, out of the code review perspective, I've looked up the past
> conversations but didn't see comprehensive benchmark was done before
> removing the old code and making premap the sole default mode. Granted this
> would reduce the footprint of additional code and the associated maintaining
> cost immediately, but I would assume at least there should have been
> thorough performance runs upfront to guarantee no regression is seen with
> every possible use case, or the negative effect is comparatively negligible
> even though there's slight regression in some limited case. If that kind of
> perf measurement hadn't been done before getting accepted/merged, I think at
> least it should allow both modes to coexist for a while such that every user
> could gauge the performance effect.
> 
> Thanks,
> -Siwei
> 
> >   	buf = virtnet_rq_alloc(rq, len + room, gfp);
> >   	if (unlikely(!buf))
> >   		return -ENOMEM;
Michael S. Tsirkin Aug. 20, 2024, 8:11 p.m. UTC | #5
On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> leads to regression on VM with the sysctl value of:
> 
> - net.core.high_order_alloc_disable=1
> 
> which could see reliable crashes or scp failure (scp a file 100M in size
> to VM):
> 
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur. In this case, if an overflow is possible, I adjust
> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> the first buffer of the frag is affected.
> 
> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c6af18948092..e5286a6da863 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>  	void *buf, *head;
>  	dma_addr_t addr;
>  
> -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> -		return NULL;
> -
>  	head = page_address(alloc_frag->page);
>  
>  	dma = head;

From API POV, I don't like it that virtnet_rq_alloc relies on
the caller to invoke skb_page_frag_refill.
It's better to pass in the length to refill.

> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	len = SKB_DATA_ALIGN(len) +
>  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  
> +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> +		return -ENOMEM;
> +
>  	buf = virtnet_rq_alloc(rq, len, gfp);
>  	if (unlikely(!buf))
>  		return -ENOMEM;
> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  	 */
>  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>  
> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> +		return -ENOMEM;
> +
> +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> +		len -= sizeof(struct virtnet_rq_dma);
> +
>  	buf = virtnet_rq_alloc(rq, len + room, gfp);
>  	if (unlikely(!buf))
>  		return -ENOMEM;
> -- 
> 2.32.0.3.g01195cf9f
Si-Wei Liu Aug. 20, 2024, 11:05 p.m. UTC | #6
On 8/20/2024 1:09 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 20, 2024 at 12:44:46PM -0700, Si-Wei Liu wrote:
>>
>> On 8/20/2024 12:19 AM, Xuan Zhuo wrote:
>>> leads to regression on VM with the sysctl value of:
>>>
>>> - net.core.high_order_alloc_disable=1
>>>
>>> which could see reliable crashes or scp failure (scp a file 100M in size
>>> to VM):
>>>
>>> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
>>> of a new frag. When the frag size is larger than PAGE_SIZE,
>>> everything is fine. However, if the frag is only one page and the
>>> total size of the buffer and virtnet_rq_dma is larger than one page, an
>>> overflow may occur. In this case, if an overflow is possible, I adjust
>>> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
>>> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
>>> the first buffer of the frag is affected.
>>>
>>> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
>>> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
>>> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> ---
>>>    drivers/net/virtio_net.c | 12 +++++++++---
>>>    1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index c6af18948092..e5286a6da863 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>>>    	void *buf, *head;
>>>    	dma_addr_t addr;
>>> -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
>>> -		return NULL;
>>> -
>>>    	head = page_address(alloc_frag->page);
>>>    	dma = head;
>>> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>>>    	len = SKB_DATA_ALIGN(len) +
>>>    	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>>> +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
>>> +		return -ENOMEM;
>>> +
>> Do you want to document the assumption that small packet case won't end up
>> crossing the page frag boundary unlike the mergeable case? Add a comment
>> block to explain or a WARN_ON() check against potential overflow would work
>> with me.
>>
>>>    	buf = virtnet_rq_alloc(rq, len, gfp);
>>>    	if (unlikely(!buf))
>>>    		return -ENOMEM;
>>> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>>>    	 */
>>>    	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>>> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
>>> +		return -ENOMEM;
>>> +
>>> +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
>>> +		len -= sizeof(struct virtnet_rq_dma);
>>> +
>> This could address my previous concern for possibly regressing every buffer
>> size for the mergeable case, thanks. Though I still don't get why carving up
>> a small chunk from page_frag for storing the virtnet_rq_dma metadata, this
>> would cause perf regression on certain MTU size
> 4Kbyte MTU exactly?
Close to that, excluding all headers upfront (depending on virtio 
features and header layout). The size of struct virtnet_rq_dma is now 16 
bytes, this could lead to performance impact on roughly: 16 / 4096 = 0.4 
% across all MTU sizes, more obviously to be seen with order-0 page 
allocations.

-Siwei

>
>> that happens to end up with
>> one more base page (and an extra descriptor as well) to be allocated
>> compared to the previous code without the extra virtnet_rq_dma content. How
>> hard would it be to allocate a dedicated struct to store the related
>> information without affecting the (size of) datapath pages?
>>
>> FWIW, out of the code review perspective, I've looked up the past
>> conversations but didn't see comprehensive benchmark was done before
>> removing the old code and making premap the sole default mode. Granted this
>> would reduce the footprint of additional code and the associated maintaining
>> cost immediately, but I would assume at least there should have been
>> thorough performance runs upfront to guarantee no regression is seen with
>> every possible use case, or the negative effect is comparatively negligible
>> even though there's slight regression in some limited case. If that kind of
>> perf measurement hadn't been done before getting accepted/merged, I think at
>> least it should allow both modes to coexist for a while such that every user
>> could gauge the performance effect.
>>
>> Thanks,
>> -Siwei
>>
>>>    	buf = virtnet_rq_alloc(rq, len + room, gfp);
>>>    	if (unlikely(!buf))
>>>    		return -ENOMEM;
Darren Kenny Aug. 21, 2024, 4:47 p.m. UTC | #7
Hi Michael,

On Tuesday, 2024-08-20 at 12:50:39 -04, Michael S. Tsirkin wrote:
> On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
>> leads to regression on VM with the sysctl value of:
>> 
>> - net.core.high_order_alloc_disable=1
>
>
>
>
>> which could see reliable crashes or scp failure (scp a file 100M in size
>> to VM):
>> 
>> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
>> of a new frag. When the frag size is larger than PAGE_SIZE,
>> everything is fine. However, if the frag is only one page and the
>> total size of the buffer and virtnet_rq_dma is larger than one page, an
>> overflow may occur. In this case, if an overflow is possible, I adjust
>> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
>> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
>> the first buffer of the frag is affected.
>> 
>> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
>> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
>> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
>
> Darren, could you pls test and confirm?

Unfortunately with this change I seem to still get a panic as soon as I start a
download using wget:

[  144.055630] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[  144.056249] CPU: 8 PID: 37894 Comm: sleep Kdump: loaded Not tainted 6.10.0-1.el8uek.x86_64 #2
[  144.056850] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
[  144.057585] Call Trace:
[  144.057791]  <TASK>
[  144.057973]  panic+0x347/0x370
[  144.058223]  schedule_debug.isra.0+0xfb/0x100
[  144.058565]  __schedule+0x58/0x6a0
[  144.058838]  ? refill_stock+0x26/0x50
[  144.059120]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.059473]  do_task_dead+0x42/0x50
[  144.059752]  do_exit+0x31e/0x4b0
[  144.060011]  ? __audit_syscall_entry+0xee/0x150
[  144.060352]  do_group_exit+0x30/0x80
[  144.060633]  __x64_sys_exit_group+0x18/0x20
[  144.060946]  do_syscall_64+0x8c/0x1c0
[  144.061228]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.061570]  ? __audit_filter_op+0xbe/0x140
[  144.061873]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.062204]  ? audit_reset_context+0x232/0x310
[  144.062514]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.062851]  ? syscall_exit_work+0x103/0x130
[  144.063148]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.063473]  ? syscall_exit_to_user_mode+0x77/0x220
[  144.063813]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.064142]  ? do_syscall_64+0xb9/0x1c0
[  144.064411]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.064747]  ? do_syscall_64+0xb9/0x1c0
[  144.065018]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.065345]  ? do_read_fault+0x109/0x1b0
[  144.065628]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.065961]  ? do_fault+0x1aa/0x2f0
[  144.066212]  ? handle_pte_fault+0x102/0x1a0
[  144.066503]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.066836]  ? __handle_mm_fault+0x5ed/0x710
[  144.067137]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.067464]  ? __count_memcg_events+0x72/0x110
[  144.067779]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.068106]  ? count_memcg_events.constprop.0+0x26/0x50
[  144.068457]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.068788]  ? handle_mm_fault+0xae/0x320
[  144.069068]  ? srso_alias_return_thunk+0x5/0xfbef5
[  144.069395]  ? do_user_addr_fault+0x34a/0x6b0
[  144.069708]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  144.070049] RIP: 0033:0x7fc5524f9c66
[  144.070307] Code: Unable to access opcode bytes at 0x7fc5524f9c3c.
[  144.070720] RSP: 002b:00007ffee052beb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[  144.071214] RAX: ffffffffffffffda RBX: 00007fc5527bb860 RCX: 00007fc5524f9c66
[  144.071684] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[  144.072146] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78
[  144.072608] R10: 00007ffee052bdef R11: 0000000000000246 R12: 00007fc5527bb860
[  144.073076] R13: 0000000000000002 R14: 00007fc5527c4528 R15: 0000000000000000
[  144.073543]  </TASK>
[  144.074780] Kernel Offset: 0x37c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Thanks,

Darren.

>> ---
>>  drivers/net/virtio_net.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index c6af18948092..e5286a6da863 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>>  	void *buf, *head;
>>  	dma_addr_t addr;
>>  
>> -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
>> -		return NULL;
>> -
>>  	head = page_address(alloc_frag->page);
>>  
>>  	dma = head;
>> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>>  	len = SKB_DATA_ALIGN(len) +
>>  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>>  
>> +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
>> +		return -ENOMEM;
>> +
>>  	buf = virtnet_rq_alloc(rq, len, gfp);
>>  	if (unlikely(!buf))
>>  		return -ENOMEM;
>> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>>  	 */
>>  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>>  
>> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
>> +		return -ENOMEM;
>> +
>> +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
>> +		len -= sizeof(struct virtnet_rq_dma);
>> +
>>  	buf = virtnet_rq_alloc(rq, len + room, gfp);
>>  	if (unlikely(!buf))
>>  		return -ENOMEM;
>> -- 
>> 2.32.0.3.g01195cf9f
Jason Wang Aug. 27, 2024, 3:38 a.m. UTC | #8
On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> leads to regression on VM with the sysctl value of:
>
> - net.core.high_order_alloc_disable=1
>
> which could see reliable crashes or scp failure (scp a file 100M in size
> to VM):
>
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur. In this case, if an overflow is possible, I adjust
> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> the first buffer of the frag is affected.

I wonder instead of trying to make use of headroom, would it be
simpler if we allocate dedicated arrays of virtnet_rq_dma?

Btw, I see it has a need_sync, I wonder if it can help for performance
or not? If not, any reason to keep that?

>
> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c6af18948092..e5286a6da863 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>         void *buf, *head;
>         dma_addr_t addr;
>
> -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> -               return NULL;
> -
>         head = page_address(alloc_frag->page);
>
>         dma = head;
> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>         len = SKB_DATA_ALIGN(len) +
>               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>
> +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> +               return -ENOMEM;
> +
>         buf = virtnet_rq_alloc(rq, len, gfp);
>         if (unlikely(!buf))
>                 return -ENOMEM;
> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>          */
>         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>
> +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> +               return -ENOMEM;
> +
> +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> +               len -= sizeof(struct virtnet_rq_dma);
> +
>         buf = virtnet_rq_alloc(rq, len + room, gfp);
>         if (unlikely(!buf))
>                 return -ENOMEM;
> --
> 2.32.0.3.g01195cf9f

Thanks

>
Xuan Zhuo Aug. 28, 2024, 11:11 a.m. UTC | #9
On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > leads to regression on VM with the sysctl value of:
> >
> > - net.core.high_order_alloc_disable=1
> >
> > which could see reliable crashes or scp failure (scp a file 100M in size
> > to VM):
> >
> > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > of a new frag. When the frag size is larger than PAGE_SIZE,
> > everything is fine. However, if the frag is only one page and the
> > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > overflow may occur. In this case, if an overflow is possible, I adjust
> > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > the first buffer of the frag is affected.
>
> I wonder instead of trying to make use of headroom, would it be
> simpler if we allocate dedicated arrays of virtnet_rq_dma?

Sorry for the late reply. My mailbox was full, so I missed the reply to this
thread. Thanks to Si-Wei for reminding me.

If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf.

	struct page *page = virt_to_head_page(buf);

	head = page_address(page);

If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to
virtio core, the array has the same size with the rx ring.

The virtnet_rq_dma will be:

struct virtnet_rq_dma {
	dma_addr_t addr;
	u32 ref;
	u16 len;
	u16 need_sync;
+	void *buf;
};

That will be simpler.

>
> Btw, I see it has a need_sync, I wonder if it can help for performance
> or not? If not, any reason to keep that?

I think yes, we can skip the cpu sync when we do not need it.

Thanks.


>
> >
> > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c6af18948092..e5286a6da863 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> >         void *buf, *head;
> >         dma_addr_t addr;
> >
> > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > -               return NULL;
> > -
> >         head = page_address(alloc_frag->page);
> >
> >         dma = head;
> > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         len = SKB_DATA_ALIGN(len) +
> >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >
> > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > +               return -ENOMEM;
> > +
> >         buf = virtnet_rq_alloc(rq, len, gfp);
> >         if (unlikely(!buf))
> >                 return -ENOMEM;
> > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >          */
> >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> >
> > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > +               return -ENOMEM;
> > +
> > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > +               len -= sizeof(struct virtnet_rq_dma);
> > +
> >         buf = virtnet_rq_alloc(rq, len + room, gfp);
> >         if (unlikely(!buf))
> >                 return -ENOMEM;
> > --
> > 2.32.0.3.g01195cf9f
>
> Thanks
>
> >
>
>
>
Si-Wei Liu Aug. 28, 2024, 7:57 p.m. UTC | #10
Just in case Xuan missed the last email while his email server kept 
rejecting incoming emails in the last week.: the patch doesn't seem fix 
the regression.

Xuan, given this is not very hard to reproduce and we have clearly 
stated how to, could you try to get the patch verified in house before 
posting to upstream? Or you were unable to reproduce locally?

Thanks,
-Siwei

On 8/21/2024 9:47 AM, Darren Kenny wrote:
> Hi Michael,
>
> On Tuesday, 2024-08-20 at 12:50:39 -04, Michael S. Tsirkin wrote:
>> On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
>>> leads to regression on VM with the sysctl value of:
>>>
>>> - net.core.high_order_alloc_disable=1
>>
>>
>>
>>> which could see reliable crashes or scp failure (scp a file 100M in size
>>> to VM):
>>>
>>> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
>>> of a new frag. When the frag size is larger than PAGE_SIZE,
>>> everything is fine. However, if the frag is only one page and the
>>> total size of the buffer and virtnet_rq_dma is larger than one page, an
>>> overflow may occur. In this case, if an overflow is possible, I adjust
>>> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
>>> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
>>> the first buffer of the frag is affected.
>>>
>>> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
>>> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
>>> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>
>> Darren, could you pls test and confirm?
> Unfortunately with this change I seem to still get a panic as soon as I start a
> download using wget:
>
> [  144.055630] Kernel panic - not syncing: corrupted stack end detected inside scheduler
> [  144.056249] CPU: 8 PID: 37894 Comm: sleep Kdump: loaded Not tainted 6.10.0-1.el8uek.x86_64 #2
> [  144.056850] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
> [  144.057585] Call Trace:
> [  144.057791]  <TASK>
> [  144.057973]  panic+0x347/0x370
> [  144.058223]  schedule_debug.isra.0+0xfb/0x100
> [  144.058565]  __schedule+0x58/0x6a0
> [  144.058838]  ? refill_stock+0x26/0x50
> [  144.059120]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.059473]  do_task_dead+0x42/0x50
> [  144.059752]  do_exit+0x31e/0x4b0
> [  144.060011]  ? __audit_syscall_entry+0xee/0x150
> [  144.060352]  do_group_exit+0x30/0x80
> [  144.060633]  __x64_sys_exit_group+0x18/0x20
> [  144.060946]  do_syscall_64+0x8c/0x1c0
> [  144.061228]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.061570]  ? __audit_filter_op+0xbe/0x140
> [  144.061873]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.062204]  ? audit_reset_context+0x232/0x310
> [  144.062514]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.062851]  ? syscall_exit_work+0x103/0x130
> [  144.063148]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.063473]  ? syscall_exit_to_user_mode+0x77/0x220
> [  144.063813]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.064142]  ? do_syscall_64+0xb9/0x1c0
> [  144.064411]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.064747]  ? do_syscall_64+0xb9/0x1c0
> [  144.065018]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.065345]  ? do_read_fault+0x109/0x1b0
> [  144.065628]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.065961]  ? do_fault+0x1aa/0x2f0
> [  144.066212]  ? handle_pte_fault+0x102/0x1a0
> [  144.066503]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.066836]  ? __handle_mm_fault+0x5ed/0x710
> [  144.067137]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.067464]  ? __count_memcg_events+0x72/0x110
> [  144.067779]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.068106]  ? count_memcg_events.constprop.0+0x26/0x50
> [  144.068457]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.068788]  ? handle_mm_fault+0xae/0x320
> [  144.069068]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.069395]  ? do_user_addr_fault+0x34a/0x6b0
> [  144.069708]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  144.070049] RIP: 0033:0x7fc5524f9c66
> [  144.070307] Code: Unable to access opcode bytes at 0x7fc5524f9c3c.
> [  144.070720] RSP: 002b:00007ffee052beb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> [  144.071214] RAX: ffffffffffffffda RBX: 00007fc5527bb860 RCX: 00007fc5524f9c66
> [  144.071684] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
> [  144.072146] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78
> [  144.072608] R10: 00007ffee052bdef R11: 0000000000000246 R12: 00007fc5527bb860
> [  144.073076] R13: 0000000000000002 R14: 00007fc5527c4528 R15: 0000000000000000
> [  144.073543]  </TASK>
> [  144.074780] Kernel Offset: 0x37c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>
> Thanks,
>
> Darren.
>
>>> ---
>>>   drivers/net/virtio_net.c | 12 +++++++++---
>>>   1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index c6af18948092..e5286a6da863 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>>>   	void *buf, *head;
>>>   	dma_addr_t addr;
>>>   
>>> -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
>>> -		return NULL;
>>> -
>>>   	head = page_address(alloc_frag->page);
>>>   
>>>   	dma = head;
>>> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>>>   	len = SKB_DATA_ALIGN(len) +
>>>   	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>>>   
>>> +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
>>> +		return -ENOMEM;
>>> +
>>>   	buf = virtnet_rq_alloc(rq, len, gfp);
>>>   	if (unlikely(!buf))
>>>   		return -ENOMEM;
>>> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>>>   	 */
>>>   	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>>>   
>>> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
>>> +		return -ENOMEM;
>>> +
>>> +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
>>> +		len -= sizeof(struct virtnet_rq_dma);
>>> +
>>>   	buf = virtnet_rq_alloc(rq, len + room, gfp);
>>>   	if (unlikely(!buf))
>>>   		return -ENOMEM;
>>> -- 
>>> 2.32.0.3.g01195cf9f
Jason Wang Aug. 29, 2024, 4:51 a.m. UTC | #11
On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > leads to regression on VM with the sysctl value of:
> > >
> > > - net.core.high_order_alloc_disable=1
> > >
> > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > to VM):
> > >
> > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > everything is fine. However, if the frag is only one page and the
> > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > the first buffer of the frag is affected.
> >
> > I wonder instead of trying to make use of headroom, would it be
> > simpler if we allocate dedicated arrays of virtnet_rq_dma?
>
> Sorry for the late reply. My mailbox was full, so I missed the reply to this
> thread. Thanks to Si-Wei for reminding me.
>
> If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf.
>
>         struct page *page = virt_to_head_page(buf);
>
>         head = page_address(page);
>
> If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to
> virtio core, the array has the same size with the rx ring.
>
> The virtnet_rq_dma will be:
>
> struct virtnet_rq_dma {
>         dma_addr_t addr;
>         u32 ref;
>         u16 len;
>         u16 need_sync;
> +       void *buf;
> };
>
> That will be simpler.

I'm not sure I understand here, did you mean using a dedicated array is simpler?

>
> >
> > Btw, I see it has a need_sync, I wonder if it can help for performance
> > or not? If not, any reason to keep that?
>
> I think yes, we can skip the cpu sync when we do not need it.

I meant it looks to me the needs_sync is not necessary in the
structure as we can call need_sync() any time if we had dma addr.

Thanks

>
> Thanks.
>
>
> >
> > >
> > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio_net.c | 12 +++++++++---
> > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index c6af18948092..e5286a6da863 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > >         void *buf, *head;
> > >         dma_addr_t addr;
> > >
> > > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > -               return NULL;
> > > -
> > >         head = page_address(alloc_frag->page);
> > >
> > >         dma = head;
> > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >         len = SKB_DATA_ALIGN(len) +
> > >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > >
> > > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > +               return -ENOMEM;
> > > +
> > >         buf = virtnet_rq_alloc(rq, len, gfp);
> > >         if (unlikely(!buf))
> > >                 return -ENOMEM;
> > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >          */
> > >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > >
> > > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > +               return -ENOMEM;
> > > +
> > > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > +               len -= sizeof(struct virtnet_rq_dma);
> > > +
> > >         buf = virtnet_rq_alloc(rq, len + room, gfp);
> > >         if (unlikely(!buf))
> > >                 return -ENOMEM;
> > > --
> > > 2.32.0.3.g01195cf9f
> >
> > Thanks
> >
> > >
> >
> >
> >
>
Xuan Zhuo Aug. 29, 2024, 6:18 a.m. UTC | #12
On Wed, 28 Aug 2024 12:57:56 -0700, "Si-Wei Liu" <si-wei.liu@oracle.com> wrote:
> Just in case Xuan missed the last email while his email server kept
> rejecting incoming emails in the last week.: the patch doesn't seem fix
> the regression.
>
> Xuan, given this is not very hard to reproduce and we have clearly
> stated how to, could you try to get the patch verified in house before
> posting to upstream? Or you were unable to reproduce locally?

Of course I have tested it locally first. And the crash reported this time is
different from the previous one. I really don't know where the problem is. I may
make a new patch based on Jason's suggestion, which should solve the problem of
occupied frag space that you have been worried about.

Of course, before sending the patch, I will reproduce and test it.

Thanks.


>
> Thanks,
> -Siwei
>
> On 8/21/2024 9:47 AM, Darren Kenny wrote:
> > Hi Michael,
> >
> > On Tuesday, 2024-08-20 at 12:50:39 -04, Michael S. Tsirkin wrote:
> >> On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> >>> leads to regression on VM with the sysctl value of:
> >>>
> >>> - net.core.high_order_alloc_disable=1
> >>
> >>
> >>
> >>> which could see reliable crashes or scp failure (scp a file 100M in size
> >>> to VM):
> >>>
> >>> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> >>> of a new frag. When the frag size is larger than PAGE_SIZE,
> >>> everything is fine. However, if the frag is only one page and the
> >>> total size of the buffer and virtnet_rq_dma is larger than one page, an
> >>> overflow may occur. In this case, if an overflow is possible, I adjust
> >>> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> >>> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> >>> the first buffer of the frag is affected.
> >>>
> >>> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> >>> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> >>> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> >>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >>
> >> Darren, could you pls test and confirm?
> > Unfortunately with this change I seem to still get a panic as soon as I start a
> > download using wget:
> >
> > [  144.055630] Kernel panic - not syncing: corrupted stack end detected inside scheduler
> > [  144.056249] CPU: 8 PID: 37894 Comm: sleep Kdump: loaded Not tainted 6.10.0-1.el8uek.x86_64 #2
> > [  144.056850] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
> > [  144.057585] Call Trace:
> > [  144.057791]  <TASK>
> > [  144.057973]  panic+0x347/0x370
> > [  144.058223]  schedule_debug.isra.0+0xfb/0x100
> > [  144.058565]  __schedule+0x58/0x6a0
> > [  144.058838]  ? refill_stock+0x26/0x50
> > [  144.059120]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.059473]  do_task_dead+0x42/0x50
> > [  144.059752]  do_exit+0x31e/0x4b0
> > [  144.060011]  ? __audit_syscall_entry+0xee/0x150
> > [  144.060352]  do_group_exit+0x30/0x80
> > [  144.060633]  __x64_sys_exit_group+0x18/0x20
> > [  144.060946]  do_syscall_64+0x8c/0x1c0
> > [  144.061228]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.061570]  ? __audit_filter_op+0xbe/0x140
> > [  144.061873]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.062204]  ? audit_reset_context+0x232/0x310
> > [  144.062514]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.062851]  ? syscall_exit_work+0x103/0x130
> > [  144.063148]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.063473]  ? syscall_exit_to_user_mode+0x77/0x220
> > [  144.063813]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.064142]  ? do_syscall_64+0xb9/0x1c0
> > [  144.064411]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.064747]  ? do_syscall_64+0xb9/0x1c0
> > [  144.065018]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.065345]  ? do_read_fault+0x109/0x1b0
> > [  144.065628]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.065961]  ? do_fault+0x1aa/0x2f0
> > [  144.066212]  ? handle_pte_fault+0x102/0x1a0
> > [  144.066503]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.066836]  ? __handle_mm_fault+0x5ed/0x710
> > [  144.067137]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.067464]  ? __count_memcg_events+0x72/0x110
> > [  144.067779]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.068106]  ? count_memcg_events.constprop.0+0x26/0x50
> > [  144.068457]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.068788]  ? handle_mm_fault+0xae/0x320
> > [  144.069068]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [  144.069395]  ? do_user_addr_fault+0x34a/0x6b0
> > [  144.069708]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [  144.070049] RIP: 0033:0x7fc5524f9c66
> > [  144.070307] Code: Unable to access opcode bytes at 0x7fc5524f9c3c.
> > [  144.070720] RSP: 002b:00007ffee052beb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> > [  144.071214] RAX: ffffffffffffffda RBX: 00007fc5527bb860 RCX: 00007fc5524f9c66
> > [  144.071684] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
> > [  144.072146] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78
> > [  144.072608] R10: 00007ffee052bdef R11: 0000000000000246 R12: 00007fc5527bb860
> > [  144.073076] R13: 0000000000000002 R14: 00007fc5527c4528 R15: 0000000000000000
> > [  144.073543]  </TASK>
> > [  144.074780] Kernel Offset: 0x37c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >
> > Thanks,
> >
> > Darren.
> >
> >>> ---
> >>>   drivers/net/virtio_net.c | 12 +++++++++---
> >>>   1 file changed, 9 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>> index c6af18948092..e5286a6da863 100644
> >>> --- a/drivers/net/virtio_net.c
> >>> +++ b/drivers/net/virtio_net.c
> >>> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> >>>   	void *buf, *head;
> >>>   	dma_addr_t addr;
> >>>
> >>> -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> >>> -		return NULL;
> >>> -
> >>>   	head = page_address(alloc_frag->page);
> >>>
> >>>   	dma = head;
> >>> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >>>   	len = SKB_DATA_ALIGN(len) +
> >>>   	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >>>
> >>> +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> >>> +		return -ENOMEM;
> >>> +
> >>>   	buf = virtnet_rq_alloc(rq, len, gfp);
> >>>   	if (unlikely(!buf))
> >>>   		return -ENOMEM;
> >>> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >>>   	 */
> >>>   	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> >>>
> >>> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> >>> +		return -ENOMEM;
> >>> +
> >>> +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> >>> +		len -= sizeof(struct virtnet_rq_dma);
> >>> +
> >>>   	buf = virtnet_rq_alloc(rq, len + room, gfp);
> >>>   	if (unlikely(!buf))
> >>>   		return -ENOMEM;
> >>> --
> >>> 2.32.0.3.g01195cf9f
>
Xuan Zhuo Aug. 29, 2024, 6:21 a.m. UTC | #13
On Thu, 29 Aug 2024 12:51:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > leads to regression on VM with the sysctl value of:
> > > >
> > > > - net.core.high_order_alloc_disable=1
> > > >
> > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > to VM):
> > > >
> > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > everything is fine. However, if the frag is only one page and the
> > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > the first buffer of the frag is affected.
> > >
> > > I wonder instead of trying to make use of headroom, would it be
> > > simpler if we allocate dedicated arrays of virtnet_rq_dma?
> >
> > Sorry for the late reply. My mailbox was full, so I missed the reply to this
> > thread. Thanks to Si-Wei for reminding me.
> >
> > If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf.
> >
> >         struct page *page = virt_to_head_page(buf);
> >
> >         head = page_address(page);
> >
> > If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to
> > virtio core, the array has the same size with the rx ring.
> >
> > The virtnet_rq_dma will be:
> >
> > struct virtnet_rq_dma {
> >         dma_addr_t addr;
> >         u32 ref;
> >         u16 len;
> >         u16 need_sync;
> > +       void *buf;
> > };
> >
> > That will be simpler.
>
> I'm not sure I understand here, did you mean using a dedicated array is simpler?

YES. I will post a patch soon.


>
> >
> > >
> > > Btw, I see it has a need_sync, I wonder if it can help for performance
> > > or not? If not, any reason to keep that?
> >
> > I think yes, we can skip the cpu sync when we do not need it.
>
> I meant it looks to me the needs_sync is not necessary in the
> structure as we can call need_sync() any time if we had dma addr.

I see. I think it is ok, but if you do not like it, I will remove it.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > >
> > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index c6af18948092..e5286a6da863 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > >         void *buf, *head;
> > > >         dma_addr_t addr;
> > > >
> > > > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > -               return NULL;
> > > > -
> > > >         head = page_address(alloc_frag->page);
> > > >
> > > >         dma = head;
> > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >         len = SKB_DATA_ALIGN(len) +
> > > >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > >
> > > > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > +               return -ENOMEM;
> > > > +
> > > >         buf = virtnet_rq_alloc(rq, len, gfp);
> > > >         if (unlikely(!buf))
> > > >                 return -ENOMEM;
> > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >          */
> > > >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > >
> > > > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > +               return -ENOMEM;
> > > > +
> > > > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > +               len -= sizeof(struct virtnet_rq_dma);
> > > > +
> > > >         buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > >         if (unlikely(!buf))
> > > >                 return -ENOMEM;
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > >
> > > Thanks
> > >
> > > >
> > >
> > >
> > >
> >
>
Xuan Zhuo Aug. 29, 2024, 7:26 a.m. UTC | #14
On Thu, 29 Aug 2024 12:51:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > leads to regression on VM with the sysctl value of:
> > > >
> > > > - net.core.high_order_alloc_disable=1
> > > >
> > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > to VM):
> > > >
> > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > everything is fine. However, if the frag is only one page and the
> > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > the first buffer of the frag is affected.
> > >
> > > I wonder instead of trying to make use of headroom, would it be
> > > simpler if we allocate dedicated arrays of virtnet_rq_dma?
> >
> > Sorry for the late reply. My mailbox was full, so I missed the reply to this
> > thread. Thanks to Si-Wei for reminding me.
> >
> > If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf.
> >
> >         struct page *page = virt_to_head_page(buf);
> >
> >         head = page_address(page);
> >
> > If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to
> > virtio core, the array has the same size with the rx ring.
> >
> > The virtnet_rq_dma will be:
> >
> > struct virtnet_rq_dma {
> >         dma_addr_t addr;
> >         u32 ref;
> >         u16 len;
> >         u16 need_sync;
> > +       void *buf;
> > };
> >
> > That will be simpler.
>
> I'm not sure I understand here, did you mean using a dedicated array is simpler?

I found the old version(that used a dedicated array):

http://lore.kernel.org/all/20230710034237.12391-11-xuanzhuo@linux.alibaba.com

If you think that is ok, I can port a new version based that.

Thanks.


>
> >
> > >
> > > Btw, I see it has a need_sync, I wonder if it can help for performance
> > > or not? If not, any reason to keep that?
> >
> > I think yes, we can skip the cpu sync when we do not need it.
>
> I meant it looks to me the needs_sync is not necessary in the
> structure as we can call need_sync() any time if we had dma addr.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > >
> > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index c6af18948092..e5286a6da863 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > >         void *buf, *head;
> > > >         dma_addr_t addr;
> > > >
> > > > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > -               return NULL;
> > > > -
> > > >         head = page_address(alloc_frag->page);
> > > >
> > > >         dma = head;
> > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >         len = SKB_DATA_ALIGN(len) +
> > > >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > >
> > > > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > +               return -ENOMEM;
> > > > +
> > > >         buf = virtnet_rq_alloc(rq, len, gfp);
> > > >         if (unlikely(!buf))
> > > >                 return -ENOMEM;
> > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >          */
> > > >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > >
> > > > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > +               return -ENOMEM;
> > > > +
> > > > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > +               len -= sizeof(struct virtnet_rq_dma);
> > > > +
> > > >         buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > >         if (unlikely(!buf))
> > > >                 return -ENOMEM;
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > >
> > > Thanks
> > >
> > > >
> > >
> > >
> > >
> >
>
Michael S. Tsirkin Aug. 29, 2024, 7:35 a.m. UTC | #15
On Thu, Aug 29, 2024 at 03:26:00PM +0800, Xuan Zhuo wrote:
> On Thu, 29 Aug 2024 12:51:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > leads to regression on VM with the sysctl value of:
> > > > >
> > > > > - net.core.high_order_alloc_disable=1
> > > > >
> > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > to VM):
> > > > >
> > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > everything is fine. However, if the frag is only one page and the
> > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > the first buffer of the frag is affected.

I don't exactly get it, when you say "only the first buffer of the frag
is affected" what do you mean? Affected how?

> > > >
> > > > I wonder instead of trying to make use of headroom, would it be
> > > > simpler if we allocate dedicated arrays of virtnet_rq_dma?
> > >
> > > Sorry for the late reply. My mailbox was full, so I missed the reply to this
> > > thread. Thanks to Si-Wei for reminding me.
> > >
> > > If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf.
> > >
> > >         struct page *page = virt_to_head_page(buf);
> > >
> > >         head = page_address(page);
> > >
> > > If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to
> > > virtio core, the array has the same size with the rx ring.
> > >
> > > The virtnet_rq_dma will be:
> > >
> > > struct virtnet_rq_dma {
> > >         dma_addr_t addr;
> > >         u32 ref;
> > >         u16 len;
> > >         u16 need_sync;
> > > +       void *buf;
> > > };
> > >
> > > That will be simpler.
> >
> > I'm not sure I understand here, did you mean using a dedicated array is simpler?
> 
> I found the old version(that used a dedicated array):
> 
> http://lore.kernel.org/all/20230710034237.12391-11-xuanzhuo@linux.alibaba.com
> 
> If you think that is ok, I can port a new version based that.
> 
> Thanks.
> 

That one got a bunch of comments that likely still apply.
And this looks like a much bigger change than what this
patch proposes.

> >
> > >
> > > >
> > > > Btw, I see it has a need_sync, I wonder if it can help for performance
> > > > or not? If not, any reason to keep that?
> > >
> > > I think yes, we can skip the cpu sync when we do not need it.
> >
> > I meant it looks to me the needs_sync is not necessary in the
> > structure as we can call need_sync() any time if we had dma addr.
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > >
> > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index c6af18948092..e5286a6da863 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > > >         void *buf, *head;
> > > > >         dma_addr_t addr;
> > > > >
> > > > > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > > -               return NULL;
> > > > > -
> > > > >         head = page_address(alloc_frag->page);
> > > > >
> > > > >         dma = head;
> > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >         len = SKB_DATA_ALIGN(len) +
> > > > >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > >
> > > > > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > > +               return -ENOMEM;
> > > > > +
> > > > >         buf = virtnet_rq_alloc(rq, len, gfp);
> > > > >         if (unlikely(!buf))
> > > > >                 return -ENOMEM;
> > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >          */
> > > > >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > > >
> > > > > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > > +               return -ENOMEM;
> > > > > +
> > > > > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > > +               len -= sizeof(struct virtnet_rq_dma);
> > > > > +
> > > > >         buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > > >         if (unlikely(!buf))
> > > > >                 return -ENOMEM;
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > >
> > > > Thanks
> > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> >
Xuan Zhuo Aug. 29, 2024, 7:38 a.m. UTC | #16
On Thu, 29 Aug 2024 03:35:58 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Aug 29, 2024 at 03:26:00PM +0800, Xuan Zhuo wrote:
> > On Thu, 29 Aug 2024 12:51:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > leads to regression on VM with the sysctl value of:
> > > > > >
> > > > > > - net.core.high_order_alloc_disable=1
> > > > > >
> > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > > to VM):
> > > > > >
> > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > > everything is fine. However, if the frag is only one page and the
> > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > > the first buffer of the frag is affected.
>
> I don't exactly get it, when you say "only the first buffer of the frag
> is affected" what do you mean? Affected how?


I should say that if the frag is 32k, that is safe.
Only when that frag is 4k, that is not safe.

Thanks.


>
> > > > >
> > > > > I wonder instead of trying to make use of headroom, would it be
> > > > > simpler if we allocate dedicated arrays of virtnet_rq_dma?
> > > >
> > > > Sorry for the late reply. My mailbox was full, so I missed the reply to this
> > > > thread. Thanks to Si-Wei for reminding me.
> > > >
> > > > If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf.
> > > >
> > > >         struct page *page = virt_to_head_page(buf);
> > > >
> > > >         head = page_address(page);
> > > >
> > > > If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to
> > > > virtio core, the array has the same size with the rx ring.
> > > >
> > > > The virtnet_rq_dma will be:
> > > >
> > > > struct virtnet_rq_dma {
> > > >         dma_addr_t addr;
> > > >         u32 ref;
> > > >         u16 len;
> > > >         u16 need_sync;
> > > > +       void *buf;
> > > > };
> > > >
> > > > That will be simpler.
> > >
> > > I'm not sure I understand here, did you mean using a dedicated array is simpler?
> >
> > I found the old version(that used a dedicated array):
> >
> > http://lore.kernel.org/all/20230710034237.12391-11-xuanzhuo@linux.alibaba.com
> >
> > If you think that is ok, I can port a new version based that.
> >
> > Thanks.
> >
>
> That one got a bunch of comments that likely still apply.
> And this looks like a much bigger change than what this
> patch proposes.
>
> > >
> > > >
> > > > >
> > > > > Btw, I see it has a need_sync, I wonder if it can help for performance
> > > > > or not? If not, any reason to keep that?
> > > >
> > > > I think yes, we can skip the cpu sync when we do not need it.
> > >
> > > I meant it looks to me the needs_sync is not necessary in the
> > > structure as we can call need_sync() any time if we had dma addr.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index c6af18948092..e5286a6da863 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > > > >         void *buf, *head;
> > > > > >         dma_addr_t addr;
> > > > > >
> > > > > > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > > > -               return NULL;
> > > > > > -
> > > > > >         head = page_address(alloc_frag->page);
> > > > > >
> > > > > >         dma = head;
> > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > > >
> > > > > > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > > > +               return -ENOMEM;
> > > > > > +
> > > > > >         buf = virtnet_rq_alloc(rq, len, gfp);
> > > > > >         if (unlikely(!buf))
> > > > > >                 return -ENOMEM;
> > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > >          */
> > > > > >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > > > >
> > > > > > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > > > +               return -ENOMEM;
> > > > > > +
> > > > > > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > > > +               len -= sizeof(struct virtnet_rq_dma);
> > > > > > +
> > > > > >         buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > > > >         if (unlikely(!buf))
> > > > > >                 return -ENOMEM;
> > > > > > --
> > > > > > 2.32.0.3.g01195cf9f
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
>
Michael S. Tsirkin Aug. 29, 2024, 7:48 a.m. UTC | #17
On Thu, Aug 29, 2024 at 03:38:07PM +0800, Xuan Zhuo wrote:
> On Thu, 29 Aug 2024 03:35:58 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Aug 29, 2024 at 03:26:00PM +0800, Xuan Zhuo wrote:
> > > On Thu, 29 Aug 2024 12:51:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Wed, Aug 28, 2024 at 7:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Tue, 27 Aug 2024 11:38:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Tue, Aug 20, 2024 at 3:19 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > leads to regression on VM with the sysctl value of:
> > > > > > >
> > > > > > > - net.core.high_order_alloc_disable=1
> > > > > > >
> > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > > > to VM):
> > > > > > >
> > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > > > everything is fine. However, if the frag is only one page and the
> > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > > > the first buffer of the frag is affected.
> >
> > I don't exactly get it, when you say "only the first buffer of the frag
> > is affected" what do you mean? Affected how?
> 
> 
> I should say that if the frag is 32k, that is safe.
> Only when that frag is 4k, that is not safe.
> 
> Thanks.


It looks like nothing changes when net.core.high_order_alloc_disable=0
(which is the default) but maybe I am missing something.
It is worth testing performance with mtu set to 4k, just to make sure
we did not miss anything.

> 
> >
> > > > > >
> > > > > > I wonder instead of trying to make use of headroom, would it be
> > > > > > simpler if we allocate dedicated arrays of virtnet_rq_dma?
> > > > >
> > > > > Sorry for the late reply. My mailbox was full, so I missed the reply to this
> > > > > thread. Thanks to Si-Wei for reminding me.
> > > > >
> > > > > If the virtnet_rq_dma is at the headroom, we can get the virtnet_rq_dma by buf.
> > > > >
> > > > >         struct page *page = virt_to_head_page(buf);
> > > > >
> > > > >         head = page_address(page);
> > > > >
> > > > > If we use a dedicated array, then we need pass the virtnet_rq_dma pointer to
> > > > > virtio core, the array has the same size with the rx ring.
> > > > >
> > > > > The virtnet_rq_dma will be:
> > > > >
> > > > > struct virtnet_rq_dma {
> > > > >         dma_addr_t addr;
> > > > >         u32 ref;
> > > > >         u16 len;
> > > > >         u16 need_sync;
> > > > > +       void *buf;
> > > > > };
> > > > >
> > > > > That will be simpler.
> > > >
> > > > I'm not sure I understand here, did you mean using a dedicated array is simpler?
> > >
> > > I found the old version(that used a dedicated array):
> > >
> > > http://lore.kernel.org/all/20230710034237.12391-11-xuanzhuo@linux.alibaba.com
> > >
> > > If you think that is ok, I can port a new version based that.
> > >
> > > Thanks.
> > >
> >
> > That one got a bunch of comments that likely still apply.
> > And this looks like a much bigger change than what this
> > patch proposes.
> >
> > > >
> > > > >
> > > > > >
> > > > > > Btw, I see it has a need_sync, I wonder if it can help for performance
> > > > > > or not? If not, any reason to keep that?
> > > > >
> > > > > I think yes, we can skip the cpu sync when we do not need it.
> > > >
> > > > I meant it looks to me the needs_sync is not necessary in the
> > > > structure as we can call need_sync() any time if we had dma addr.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index c6af18948092..e5286a6da863 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > > > > >         void *buf, *head;
> > > > > > >         dma_addr_t addr;
> > > > > > >
> > > > > > > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > > > > -               return NULL;
> > > > > > > -
> > > > > > >         head = page_address(alloc_frag->page);
> > > > > > >
> > > > > > >         dma = head;
> > > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > > > >
> > > > > > > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > > > > +               return -ENOMEM;
> > > > > > > +
> > > > > > >         buf = virtnet_rq_alloc(rq, len, gfp);
> > > > > > >         if (unlikely(!buf))
> > > > > > >                 return -ENOMEM;
> > > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > >          */
> > > > > > >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > > > > >
> > > > > > > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > > > > +               return -ENOMEM;
> > > > > > > +
> > > > > > > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > > > > +               len -= sizeof(struct virtnet_rq_dma);
> > > > > > > +
> > > > > > >         buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > > > > >         if (unlikely(!buf))
> > > > > > >                 return -ENOMEM;
> > > > > > > --
> > > > > > > 2.32.0.3.g01195cf9f
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> >
Xuan Zhuo Aug. 30, 2024, 12:54 a.m. UTC | #18
On Wed, 21 Aug 2024 17:47:06 +0100, Darren Kenny <darren.kenny@oracle.com> wrote:
>
> Hi Michael,
>
> On Tuesday, 2024-08-20 at 12:50:39 -04, Michael S. Tsirkin wrote:
> > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> >> leads to regression on VM with the sysctl value of:
> >>
> >> - net.core.high_order_alloc_disable=1
> >
> >
> >
> >
> >> which could see reliable crashes or scp failure (scp a file 100M in size
> >> to VM):
> >>
> >> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> >> of a new frag. When the frag size is larger than PAGE_SIZE,
> >> everything is fine. However, if the frag is only one page and the
> >> total size of the buffer and virtnet_rq_dma is larger than one page, an
> >> overflow may occur. In this case, if an overflow is possible, I adjust
> >> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> >> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> >> the first buffer of the frag is affected.
> >>
> >> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> >> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> >> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> >> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >
> >
> > Darren, could you pls test and confirm?
>
> Unfortunately with this change I seem to still get a panic as soon as I start a
> download using wget:

It's strange that I can't reproduce your panic.

My test method is to test with net.core.high_order_alloc_disable=1 on the net
branch code. An exception soon occurred (connection disconnected), and then
the kernel panicked.

	while true; do scp vmlinux root@192.168.122.100:; done

Use this patch to recompile virtio_net.ko, restart, configure
net.core.high_order_alloc_disable=1 again, and test for a long time without
problems.

So, could you re-test?

Thanks.




>
> [  144.055630] Kernel panic - not syncing: corrupted stack end detected inside scheduler
> [  144.056249] CPU: 8 PID: 37894 Comm: sleep Kdump: loaded Not tainted 6.10.0-1.el8uek.x86_64 #2
> [  144.056850] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
> [  144.057585] Call Trace:
> [  144.057791]  <TASK>
> [  144.057973]  panic+0x347/0x370
> [  144.058223]  schedule_debug.isra.0+0xfb/0x100
> [  144.058565]  __schedule+0x58/0x6a0
> [  144.058838]  ? refill_stock+0x26/0x50
> [  144.059120]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.059473]  do_task_dead+0x42/0x50
> [  144.059752]  do_exit+0x31e/0x4b0
> [  144.060011]  ? __audit_syscall_entry+0xee/0x150
> [  144.060352]  do_group_exit+0x30/0x80
> [  144.060633]  __x64_sys_exit_group+0x18/0x20
> [  144.060946]  do_syscall_64+0x8c/0x1c0
> [  144.061228]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.061570]  ? __audit_filter_op+0xbe/0x140
> [  144.061873]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.062204]  ? audit_reset_context+0x232/0x310
> [  144.062514]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.062851]  ? syscall_exit_work+0x103/0x130
> [  144.063148]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.063473]  ? syscall_exit_to_user_mode+0x77/0x220
> [  144.063813]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.064142]  ? do_syscall_64+0xb9/0x1c0
> [  144.064411]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.064747]  ? do_syscall_64+0xb9/0x1c0
> [  144.065018]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.065345]  ? do_read_fault+0x109/0x1b0
> [  144.065628]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.065961]  ? do_fault+0x1aa/0x2f0
> [  144.066212]  ? handle_pte_fault+0x102/0x1a0
> [  144.066503]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.066836]  ? __handle_mm_fault+0x5ed/0x710
> [  144.067137]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.067464]  ? __count_memcg_events+0x72/0x110
> [  144.067779]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.068106]  ? count_memcg_events.constprop.0+0x26/0x50
> [  144.068457]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.068788]  ? handle_mm_fault+0xae/0x320
> [  144.069068]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  144.069395]  ? do_user_addr_fault+0x34a/0x6b0
> [  144.069708]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  144.070049] RIP: 0033:0x7fc5524f9c66
> [  144.070307] Code: Unable to access opcode bytes at 0x7fc5524f9c3c.
> [  144.070720] RSP: 002b:00007ffee052beb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> [  144.071214] RAX: ffffffffffffffda RBX: 00007fc5527bb860 RCX: 00007fc5524f9c66
> [  144.071684] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
> [  144.072146] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78
> [  144.072608] R10: 00007ffee052bdef R11: 0000000000000246 R12: 00007fc5527bb860
> [  144.073076] R13: 0000000000000002 R14: 00007fc5527c4528 R15: 0000000000000000
> [  144.073543]  </TASK>
> [  144.074780] Kernel Offset: 0x37c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>
> Thanks,
>
> Darren.
>
> >> ---
> >>  drivers/net/virtio_net.c | 12 +++++++++---
> >>  1 file changed, 9 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index c6af18948092..e5286a6da863 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> >>  	void *buf, *head;
> >>  	dma_addr_t addr;
> >>
> >> -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> >> -		return NULL;
> >> -
> >>  	head = page_address(alloc_frag->page);
> >>
> >>  	dma = head;
> >> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >>  	len = SKB_DATA_ALIGN(len) +
> >>  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >>
> >> +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> >> +		return -ENOMEM;
> >> +
> >>  	buf = virtnet_rq_alloc(rq, len, gfp);
> >>  	if (unlikely(!buf))
> >>  		return -ENOMEM;
> >> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >>  	 */
> >>  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> >>
> >> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> >> +		return -ENOMEM;
> >> +
> >> +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> >> +		len -= sizeof(struct virtnet_rq_dma);
> >> +
> >>  	buf = virtnet_rq_alloc(rq, len + room, gfp);
> >>  	if (unlikely(!buf))
> >>  		return -ENOMEM;
> >> --
> >> 2.32.0.3.g01195cf9f
>
>
Michael S. Tsirkin Sept. 6, 2024, 8:43 a.m. UTC | #19
On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> leads to regression on VM with the sysctl value of:
> 
> - net.core.high_order_alloc_disable=1
> 
> which could see reliable crashes or scp failure (scp a file 100M in size
> to VM):
> 
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur. In this case, if an overflow is possible, I adjust
> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> the first buffer of the frag is affected.
> 
> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>


Guys where are we going with this? We have a crasher right now,
if this is not fixed ASAP I'd have to revert a ton of
work Xuan Zhuo just did.


> ---
>  drivers/net/virtio_net.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c6af18948092..e5286a6da863 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>  	void *buf, *head;
>  	dma_addr_t addr;
>  
> -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> -		return NULL;
> -
>  	head = page_address(alloc_frag->page);
>  
>  	dma = head;
> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	len = SKB_DATA_ALIGN(len) +
>  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  
> +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> +		return -ENOMEM;
> +
>  	buf = virtnet_rq_alloc(rq, len, gfp);
>  	if (unlikely(!buf))
>  		return -ENOMEM;
> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  	 */
>  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>  
> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> +		return -ENOMEM;
> +
> +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> +		len -= sizeof(struct virtnet_rq_dma);
> +
>  	buf = virtnet_rq_alloc(rq, len + room, gfp);
>  	if (unlikely(!buf))
>  		return -ENOMEM;
> -- 
> 2.32.0.3.g01195cf9f
Xuan Zhuo Sept. 6, 2024, 8:53 a.m. UTC | #20
On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > leads to regression on VM with the sysctl value of:
> >
> > - net.core.high_order_alloc_disable=1
> >
> > which could see reliable crashes or scp failure (scp a file 100M in size
> > to VM):
> >
> > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > of a new frag. When the frag size is larger than PAGE_SIZE,
> > everything is fine. However, if the frag is only one page and the
> > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > overflow may occur. In this case, if an overflow is possible, I adjust
> > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > the first buffer of the frag is affected.
> >
> > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
>
> Guys where are we going with this? We have a crasher right now,
> if this is not fixed ASAP I'd have to revert a ton of
> work Xuan Zhuo just did.

I think this patch can fix it and I tested it.
But Darren said this patch did not work.
I need more info about the crash that Darren encountered.

Thanks.

>
>
> > ---
> >  drivers/net/virtio_net.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c6af18948092..e5286a6da863 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> >  	void *buf, *head;
> >  	dma_addr_t addr;
> >
> > -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > -		return NULL;
> > -
> >  	head = page_address(alloc_frag->page);
> >
> >  	dma = head;
> > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >  	len = SKB_DATA_ALIGN(len) +
> >  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >
> > +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > +		return -ENOMEM;
> > +
> >  	buf = virtnet_rq_alloc(rq, len, gfp);
> >  	if (unlikely(!buf))
> >  		return -ENOMEM;
> > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >  	 */
> >  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> >
> > +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > +		return -ENOMEM;
> > +
> > +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > +		len -= sizeof(struct virtnet_rq_dma);
> > +
> >  	buf = virtnet_rq_alloc(rq, len + room, gfp);
> >  	if (unlikely(!buf))
> >  		return -ENOMEM;
> > --
> > 2.32.0.3.g01195cf9f
>
Michael S. Tsirkin Sept. 6, 2024, 9:08 a.m. UTC | #21
On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > leads to regression on VM with the sysctl value of:
> > >
> > > - net.core.high_order_alloc_disable=1
> > >
> > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > to VM):
> > >
> > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > everything is fine. However, if the frag is only one page and the
> > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > the first buffer of the frag is affected.
> > >
> > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >
> >
> > Guys where are we going with this? We have a crasher right now,
> > if this is not fixed ASAP I'd have to revert a ton of
> > work Xuan Zhuo just did.
> 
> I think this patch can fix it and I tested it.
> But Darren said this patch did not work.
> I need more info about the crash that Darren encountered.
> 
> Thanks.

So what are we doing? Revert the whole pile for now?
Seems to be a bit of a pity, but maybe that's the best we can do
for this release.


> >
> >
> > > ---
> > >  drivers/net/virtio_net.c | 12 +++++++++---
> > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index c6af18948092..e5286a6da863 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > >  	void *buf, *head;
> > >  	dma_addr_t addr;
> > >
> > > -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > -		return NULL;
> > > -
> > >  	head = page_address(alloc_frag->page);
> > >
> > >  	dma = head;
> > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >  	len = SKB_DATA_ALIGN(len) +
> > >  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > >
> > > +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > +		return -ENOMEM;
> > > +
> > >  	buf = virtnet_rq_alloc(rq, len, gfp);
> > >  	if (unlikely(!buf))
> > >  		return -ENOMEM;
> > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >  	 */
> > >  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > >
> > > +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > +		return -ENOMEM;
> > > +
> > > +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > +		len -= sizeof(struct virtnet_rq_dma);
> > > +
> > >  	buf = virtnet_rq_alloc(rq, len + room, gfp);
> > >  	if (unlikely(!buf))
> > >  		return -ENOMEM;
> > > --
> > > 2.32.0.3.g01195cf9f
> >
Xuan Zhuo Sept. 6, 2024, 9:25 a.m. UTC | #22
On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > leads to regression on VM with the sysctl value of:
> > > >
> > > > - net.core.high_order_alloc_disable=1
> > > >
> > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > to VM):
> > > >
> > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > everything is fine. However, if the frag is only one page and the
> > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > the first buffer of the frag is affected.
> > > >
> > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > >
> > >
> > > Guys where are we going with this? We have a crasher right now,
> > > if this is not fixed ASAP I'd have to revert a ton of
> > > work Xuan Zhuo just did.
> >
> > I think this patch can fix it and I tested it.
> > But Darren said this patch did not work.
> > I need more info about the crash that Darren encountered.
> >
> > Thanks.
>
> So what are we doing? Revert the whole pile for now?
> Seems to be a bit of a pity, but maybe that's the best we can do
> for this release.

@Jason Could you review this?

I think this problem is clear, though I do not know why it did not work
for Darren.

Thanks.


>
>
> > >
> > >
> > > > ---
> > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index c6af18948092..e5286a6da863 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > >  	void *buf, *head;
> > > >  	dma_addr_t addr;
> > > >
> > > > -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > -		return NULL;
> > > > -
> > > >  	head = page_address(alloc_frag->page);
> > > >
> > > >  	dma = head;
> > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >  	len = SKB_DATA_ALIGN(len) +
> > > >  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > >
> > > > +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > +		return -ENOMEM;
> > > > +
> > > >  	buf = virtnet_rq_alloc(rq, len, gfp);
> > > >  	if (unlikely(!buf))
> > > >  		return -ENOMEM;
> > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >  	 */
> > > >  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > >
> > > > +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > +		return -ENOMEM;
> > > > +
> > > > +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > +		len -= sizeof(struct virtnet_rq_dma);
> > > > +
> > > >  	buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > >  	if (unlikely(!buf))
> > > >  		return -ENOMEM;
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > >
>
Michael S. Tsirkin Sept. 6, 2024, 9:44 a.m. UTC | #23
On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote:
> On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > leads to regression on VM with the sysctl value of:
> > > > >
> > > > > - net.core.high_order_alloc_disable=1
> > > > >
> > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > to VM):
> > > > >
> > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > everything is fine. However, if the frag is only one page and the
> > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > the first buffer of the frag is affected.
> > > > >
> > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >
> > > >
> > > > Guys where are we going with this? We have a crasher right now,
> > > > if this is not fixed ASAP I'd have to revert a ton of
> > > > work Xuan Zhuo just did.
> > >
> > > I think this patch can fix it and I tested it.
> > > But Darren said this patch did not work.
> > > I need more info about the crash that Darren encountered.
> > >
> > > Thanks.
> >
> > So what are we doing? Revert the whole pile for now?
> > Seems to be a bit of a pity, but maybe that's the best we can do
> > for this release.
> 
> @Jason Could you review this?
> 
> I think this problem is clear, though I do not know why it did not work
> for Darren.
> 
> Thanks.
> 

No regressions is a hard rule. If we can't figure out the regression
now, we should revert and you can try again for the next release.


> >
> >
> > > >
> > > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index c6af18948092..e5286a6da863 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > > >  	void *buf, *head;
> > > > >  	dma_addr_t addr;
> > > > >
> > > > > -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > > -		return NULL;
> > > > > -
> > > > >  	head = page_address(alloc_frag->page);
> > > > >
> > > > >  	dma = head;
> > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >  	len = SKB_DATA_ALIGN(len) +
> > > > >  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > >
> > > > > +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > > +		return -ENOMEM;
> > > > > +
> > > > >  	buf = virtnet_rq_alloc(rq, len, gfp);
> > > > >  	if (unlikely(!buf))
> > > > >  		return -ENOMEM;
> > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >  	 */
> > > > >  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > > >
> > > > > +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > > +		return -ENOMEM;
> > > > > +
> > > > > +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > > +		len -= sizeof(struct virtnet_rq_dma);
> > > > > +
> > > > >  	buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > > >  	if (unlikely(!buf))
> > > > >  		return -ENOMEM;
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > >
> >
Xuan Zhuo Sept. 6, 2024, 9:46 a.m. UTC | #24
On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote:
> > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > > leads to regression on VM with the sysctl value of:
> > > > > >
> > > > > > - net.core.high_order_alloc_disable=1
> > > > > >
> > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > > to VM):
> > > > > >
> > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > > everything is fine. However, if the frag is only one page and the
> > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > > the first buffer of the frag is affected.
> > > > > >
> > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > >
> > > > >
> > > > > Guys where are we going with this? We have a crasher right now,
> > > > > if this is not fixed ASAP I'd have to revert a ton of
> > > > > work Xuan Zhuo just did.
> > > >
> > > > I think this patch can fix it and I tested it.
> > > > But Darren said this patch did not work.
> > > > I need more info about the crash that Darren encountered.
> > > >
> > > > Thanks.
> > >
> > > So what are we doing? Revert the whole pile for now?
> > > Seems to be a bit of a pity, but maybe that's the best we can do
> > > for this release.
> >
> > @Jason Could you review this?
> >
> > I think this problem is clear, though I do not know why it did not work
> > for Darren.
> >
> > Thanks.
> >
>
> No regressions is a hard rule. If we can't figure out the regression
> now, we should revert and you can try again for the next release.

I see. I think I fixed it.

Hope Darren can reply before you post the revert patches.

Thanks.


>
>
> > >
> > >
> > > > >
> > > > >
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index c6af18948092..e5286a6da863 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > > > >  	void *buf, *head;
> > > > > >  	dma_addr_t addr;
> > > > > >
> > > > > > -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > > > -		return NULL;
> > > > > > -
> > > > > >  	head = page_address(alloc_frag->page);
> > > > > >
> > > > > >  	dma = head;
> > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > >  	len = SKB_DATA_ALIGN(len) +
> > > > > >  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > > >
> > > > > > +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > > > +		return -ENOMEM;
> > > > > > +
> > > > > >  	buf = virtnet_rq_alloc(rq, len, gfp);
> > > > > >  	if (unlikely(!buf))
> > > > > >  		return -ENOMEM;
> > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > >  	 */
> > > > > >  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > > > >
> > > > > > +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > > > +		return -ENOMEM;
> > > > > > +
> > > > > > +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > > > +		len -= sizeof(struct virtnet_rq_dma);
> > > > > > +
> > > > > >  	buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > > > >  	if (unlikely(!buf))
> > > > > >  		return -ENOMEM;
> > > > > > --
> > > > > > 2.32.0.3.g01195cf9f
> > > > >
> > >
>
Michael S. Tsirkin Sept. 6, 2024, 9:53 a.m. UTC | #25
On Fri, Sep 06, 2024 at 05:46:02PM +0800, Xuan Zhuo wrote:
> On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote:
> > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > > > leads to regression on VM with the sysctl value of:
> > > > > > >
> > > > > > > - net.core.high_order_alloc_disable=1
> > > > > > >
> > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > > > to VM):
> > > > > > >
> > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > > > everything is fine. However, if the frag is only one page and the
> > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > > > the first buffer of the frag is affected.
> > > > > > >
> > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > >
> > > > > >
> > > > > > Guys where are we going with this? We have a crasher right now,
> > > > > > if this is not fixed ASAP I'd have to revert a ton of
> > > > > > work Xuan Zhuo just did.
> > > > >
> > > > > I think this patch can fix it and I tested it.
> > > > > But Darren said this patch did not work.
> > > > > I need more info about the crash that Darren encountered.
> > > > >
> > > > > Thanks.
> > > >
> > > > So what are we doing? Revert the whole pile for now?
> > > > Seems to be a bit of a pity, but maybe that's the best we can do
> > > > for this release.
> > >
> > > @Jason Could you review this?
> > >
> > > I think this problem is clear, though I do not know why it did not work
> > > for Darren.
> > >
> > > Thanks.
> > >
> >
> > No regressions is a hard rule. If we can't figure out the regression
> > now, we should revert and you can try again for the next release.
> 
> I see. I think I fixed it.
> 
> Hope Darren can reply before you post the revert patches.
> 
> Thanks.
> 

It's very rushed anyway. I posted the reverts, but as RFC for now.
You should post a debugging patch for Darren to help you figure
out what is going on.


> >
> >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index c6af18948092..e5286a6da863 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > > > > >  	void *buf, *head;
> > > > > > >  	dma_addr_t addr;
> > > > > > >
> > > > > > > -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > > > > -		return NULL;
> > > > > > > -
> > > > > > >  	head = page_address(alloc_frag->page);
> > > > > > >
> > > > > > >  	dma = head;
> > > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > >  	len = SKB_DATA_ALIGN(len) +
> > > > > > >  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > > > >
> > > > > > > +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > > > > +		return -ENOMEM;
> > > > > > > +
> > > > > > >  	buf = virtnet_rq_alloc(rq, len, gfp);
> > > > > > >  	if (unlikely(!buf))
> > > > > > >  		return -ENOMEM;
> > > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > >  	 */
> > > > > > >  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > > > > >
> > > > > > > +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > > > > +		return -ENOMEM;
> > > > > > > +
> > > > > > > +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > > > > +		len -= sizeof(struct virtnet_rq_dma);
> > > > > > > +
> > > > > > >  	buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > > > > >  	if (unlikely(!buf))
> > > > > > >  		return -ENOMEM;
> > > > > > > --
> > > > > > > 2.32.0.3.g01195cf9f
> > > > > >
> > > >
> >
Takero Funaki Sept. 7, 2024, 3:16 a.m. UTC | #26
2024年9月6日(金) 18:55 Michael S. Tsirkin <mst@redhat.com>:
>
> On Fri, Sep 06, 2024 at 05:46:02PM +0800, Xuan Zhuo wrote:
> > On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote:
> > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > > > > leads to regression on VM with the sysctl value of:
> > > > > > > >
> > > > > > > > - net.core.high_order_alloc_disable=1
> > > > > > > >
> > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > > > > to VM):
> > > > > > > >
> > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > > > > everything is fine. However, if the frag is only one page and the
> > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > > > > the first buffer of the frag is affected.
> > > > > > > >
> > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > >
> > > > > > >
> > > > > > > Guys where are we going with this? We have a crasher right now,
> > > > > > > if this is not fixed ASAP I'd have to revert a ton of
> > > > > > > work Xuan Zhuo just did.
> > > > > >
> > > > > > I think this patch can fix it and I tested it.
> > > > > > But Darren said this patch did not work.
> > > > > > I need more info about the crash that Darren encountered.
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > > So what are we doing? Revert the whole pile for now?
> > > > > Seems to be a bit of a pity, but maybe that's the best we can do
> > > > > for this release.
> > > >
> > > > @Jason Could you review this?
> > > >
> > > > I think this problem is clear, though I do not know why it did not work
> > > > for Darren.
> > > >
> > > > Thanks.
> > > >
> > >
> > > No regressions is a hard rule. If we can't figure out the regression
> > > now, we should revert and you can try again for the next release.
> >
> > I see. I think I fixed it.
> >
> > Hope Darren can reply before you post the revert patches.
> >
> > Thanks.
> >
>
> It's very rushed anyway. I posted the reverts, but as RFC for now.
> You should post a debugging patch for Darren to help you figure
> out what is going on.
>
>

Hello,

My issue [1], which bisected to the commit f9dac92ba908, was resolved
after applying the patch on v6.11-rc6.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=219154

In my case, random crashes occur when receiving large data under heavy
memory/IO load. Although the crash details differ, the memory
corruption during data transfers is consistent.

If Darren is unable to confirm the fix, would it be possible to
consider merging this patch to close [1] instead?

Thanks.
Michael S. Tsirkin Sept. 8, 2024, 10:18 a.m. UTC | #27
On Sat, Sep 07, 2024 at 12:16:24PM +0900, Takero Funaki wrote:
> 2024年9月6日(金) 18:55 Michael S. Tsirkin <mst@redhat.com>:
> >
> > On Fri, Sep 06, 2024 at 05:46:02PM +0800, Xuan Zhuo wrote:
> > > On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote:
> > > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > > > > > leads to regression on VM with the sysctl value of:
> > > > > > > > >
> > > > > > > > > - net.core.high_order_alloc_disable=1
> > > > > > > > >
> > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > > > > > to VM):
> > > > > > > > >
> > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > > > > > everything is fine. However, if the frag is only one page and the
> > > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > > > > > the first buffer of the frag is affected.
> > > > > > > > >
> > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > >
> > > > > > > >
> > > > > > > > Guys where are we going with this? We have a crasher right now,
> > > > > > > > if this is not fixed ASAP I'd have to revert a ton of
> > > > > > > > work Xuan Zhuo just did.
> > > > > > >
> > > > > > > I think this patch can fix it and I tested it.
> > > > > > > But Darren said this patch did not work.
> > > > > > > I need more info about the crash that Darren encountered.
> > > > > > >
> > > > > > > Thanks.
> > > > > >
> > > > > > So what are we doing? Revert the whole pile for now?
> > > > > > Seems to be a bit of a pity, but maybe that's the best we can do
> > > > > > for this release.
> > > > >
> > > > > @Jason Could you review this?
> > > > >
> > > > > I think this problem is clear, though I do not know why it did not work
> > > > > for Darren.
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > > > No regressions is a hard rule. If we can't figure out the regression
> > > > now, we should revert and you can try again for the next release.
> > >
> > > I see. I think I fixed it.
> > >
> > > Hope Darren can reply before you post the revert patches.
> > >
> > > Thanks.
> > >
> >
> > It's very rushed anyway. I posted the reverts, but as RFC for now.
> > You should post a debugging patch for Darren to help you figure
> > out what is going on.
> >
> >
> 
> Hello,
> 
> My issue [1], which bisected to the commit f9dac92ba908, was resolved
> after applying the patch on v6.11-rc6.
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=219154
> 
> In my case, random crashes occur when receiving large data under heavy
> memory/IO load. Although the crash details differ, the memory
> corruption during data transfers is consistent.
> 
> If Darren is unable to confirm the fix, would it be possible to
> consider merging this patch to close [1] instead?
> 
> Thanks.


Could you also test
https://lore.kernel.org/all/cover.1725616135.git.mst@redhat.com/

please?
Takero Funaki Sept. 8, 2024, 5:56 p.m. UTC | #28
2024年9月8日(日) 19:19 Michael S. Tsirkin <mst@redhat.com>:
>
> On Sat, Sep 07, 2024 at 12:16:24PM +0900, Takero Funaki wrote:
> > 2024年9月6日(金) 18:55 Michael S. Tsirkin <mst@redhat.com>:
> > >
> > > On Fri, Sep 06, 2024 at 05:46:02PM +0800, Xuan Zhuo wrote:
> > > > On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote:
> > > > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > > > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > leads to regression on VM with the sysctl value of:
> > > > > > > > > >
> > > > > > > > > > - net.core.high_order_alloc_disable=1
> > > > > > > > > >
> > > > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > > > > > > to VM):
> > > > > > > > > >
> > > > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > > > > > > everything is fine. However, if the frag is only one page and the
> > > > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > > > > > > the first buffer of the frag is affected.
> > > > > > > > > >
> > > > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Guys where are we going with this? We have a crasher right now,
> > > > > > > > > if this is not fixed ASAP I'd have to revert a ton of
> > > > > > > > > work Xuan Zhuo just did.
> > > > > > > >
> > > > > > > > I think this patch can fix it and I tested it.
> > > > > > > > But Darren said this patch did not work.
> > > > > > > > I need more info about the crash that Darren encountered.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > >
> > > > > > > So what are we doing? Revert the whole pile for now?
> > > > > > > Seems to be a bit of a pity, but maybe that's the best we can do
> > > > > > > for this release.
> > > > > >
> > > > > > @Jason Could you review this?
> > > > > >
> > > > > > I think this problem is clear, though I do not know why it did not work
> > > > > > for Darren.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > > > No regressions is a hard rule. If we can't figure out the regression
> > > > > now, we should revert and you can try again for the next release.
> > > >
> > > > I see. I think I fixed it.
> > > >
> > > > Hope Darren can reply before you post the revert patches.
> > > >
> > > > Thanks.
> > > >
> > >
> > > It's very rushed anyway. I posted the reverts, but as RFC for now.
> > > You should post a debugging patch for Darren to help you figure
> > > out what is going on.
> > >
> > >
> >
> > Hello,
> >
> > My issue [1], which bisected to the commit f9dac92ba908, was resolved
> > after applying the patch on v6.11-rc6.
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=219154
> >
> > In my case, random crashes occur when receiving large data under heavy
> > memory/IO load. Although the crash details differ, the memory
> > corruption during data transfers is consistent.
> >
> > If Darren is unable to confirm the fix, would it be possible to
> > consider merging this patch to close [1] instead?
> >
> > Thanks.
>
>
> Could you also test
> https://lore.kernel.org/all/cover.1725616135.git.mst@redhat.com/
>
> please?
>

I have confirmed that both your patchset [1] and Xuan's patchset [2]
on v6.11-rc6 successfully resolved my issue.
[1] https://lore.kernel.org/all/cover.1725616135.git.mst@redhat.com/
[2] https://lore.kernel.org/netdev/20240906123137.108741-1-xuanzhuo@linux.alibaba.com/

It seems that simply reverting the original patchset would suffice to
resolve the crash.
Michael S. Tsirkin Sept. 8, 2024, 7:40 p.m. UTC | #29
On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> leads to regression on VM with the sysctl value of:
> 
> - net.core.high_order_alloc_disable=1
> 
> which could see reliable crashes or scp failure (scp a file 100M in size
> to VM):
> 
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur. In this case, if an overflow is possible, I adjust
> the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> the first buffer of the frag is affected.
> 
> Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>


BTW why isn't it needed if we revert f9dac92ba908?

> ---
>  drivers/net/virtio_net.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c6af18948092..e5286a6da863 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
>  	void *buf, *head;
>  	dma_addr_t addr;
>  
> -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> -		return NULL;
> -
>  	head = page_address(alloc_frag->page);
>  
>  	dma = head;
> @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	len = SKB_DATA_ALIGN(len) +
>  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  
> +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> +		return -ENOMEM;
> +
>  	buf = virtnet_rq_alloc(rq, len, gfp);
>  	if (unlikely(!buf))
>  		return -ENOMEM;
> @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  	 */
>  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>  
> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> +		return -ENOMEM;
> +
> +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> +		len -= sizeof(struct virtnet_rq_dma);
> +
>  	buf = virtnet_rq_alloc(rq, len + room, gfp);
>  	if (unlikely(!buf))
>  		return -ENOMEM;
> -- 
> 2.32.0.3.g01195cf9f
Xuan Zhuo Sept. 9, 2024, 3:08 a.m. UTC | #30
On Sun, 8 Sep 2024 15:40:32 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > leads to regression on VM with the sysctl value of:
> >
> > - net.core.high_order_alloc_disable=1
> >
> > which could see reliable crashes or scp failure (scp a file 100M in size
> > to VM):
> >
> > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > of a new frag. When the frag size is larger than PAGE_SIZE,
> > everything is fine. However, if the frag is only one page and the
> > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > overflow may occur. In this case, if an overflow is possible, I adjust
> > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > the first buffer of the frag is affected.
> >
> > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
>
> BTW why isn't it needed if we revert f9dac92ba908?


This patch fixes the bug in premapped mode.

The revert operation just disables premapped mode.

So I think this patch is enough to fix the bug, and we can enable
premapped by default.

If you worry about the premapped mode, I advice you merge this patch and do
the revert[1]. Then the bug is fixed, and the premapped mode is
disabled by default, we can just enable it for af-xdp.

[1]: http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com

Thanks.


>
> > ---
> >  drivers/net/virtio_net.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c6af18948092..e5286a6da863 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> >  	void *buf, *head;
> >  	dma_addr_t addr;
> >
> > -	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > -		return NULL;
> > -
> >  	head = page_address(alloc_frag->page);
> >
> >  	dma = head;
> > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >  	len = SKB_DATA_ALIGN(len) +
> >  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >
> > +	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > +		return -ENOMEM;
> > +
> >  	buf = virtnet_rq_alloc(rq, len, gfp);
> >  	if (unlikely(!buf))
> >  		return -ENOMEM;
> > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >  	 */
> >  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> >
> > +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > +		return -ENOMEM;
> > +
> > +	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > +		len -= sizeof(struct virtnet_rq_dma);
> > +
> >  	buf = virtnet_rq_alloc(rq, len + room, gfp);
> >  	if (unlikely(!buf))
> >  		return -ENOMEM;
> > --
> > 2.32.0.3.g01195cf9f
>
Jason Wang Sept. 9, 2024, 8:38 a.m. UTC | #31
On Fri, Sep 6, 2024 at 5:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > leads to regression on VM with the sysctl value of:
> > > > >
> > > > > - net.core.high_order_alloc_disable=1
> > > > >
> > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > to VM):
> > > > >
> > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > everything is fine. However, if the frag is only one page and the
> > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > the first buffer of the frag is affected.
> > > > >
> > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >
> > > >
> > > > Guys where are we going with this? We have a crasher right now,
> > > > if this is not fixed ASAP I'd have to revert a ton of
> > > > work Xuan Zhuo just did.
> > >
> > > I think this patch can fix it and I tested it.
> > > But Darren said this patch did not work.
> > > I need more info about the crash that Darren encountered.
> > >
> > > Thanks.
> >
> > So what are we doing? Revert the whole pile for now?
> > Seems to be a bit of a pity, but maybe that's the best we can do
> > for this release.
>
> @Jason Could you review this?

I think we probably need some tweaks for this patch.

For example, the changelog is not easy to be understood especially
consider it starts something like:

"
    leads to regression on VM with the sysctl value of:

    - net.core.high_order_alloc_disable=1

    which could see reliable crashes or scp failure (scp a file 100M in size
    to VM):
"

Need some context and actually sysctl is not a must to reproduce the
issue, it can also happen when memory is fragmented.

Another issue is that, if we move the skb_page_frag_refill() out of
the virtnet_rq_alloc(). The function semantics turns out to be weird:

skb_page_frag_refill(len, &rq->alloc_frag, gfp);
...
virtnet_rq_alloc(rq, len, gfp);

I wonder instead of subtracting the dma->len, how about simply count
the dma->len in len if we call virtnet_rq_aloc() in
add_recvbuf_small()?

>
> I think this problem is clear, though I do not know why it did not work
> for Darren.

I had a try. This issue could be reproduced easily and this patch
seems to fix the issue with a KASAN enabled kernel.

Thanks

>
> Thanks.
>
>
> >
> >
> > > >
> > > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index c6af18948092..e5286a6da863 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > > >         void *buf, *head;
> > > > >         dma_addr_t addr;
> > > > >
> > > > > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > > -               return NULL;
> > > > > -
> > > > >         head = page_address(alloc_frag->page);
> > > > >
> > > > >         dma = head;
> > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >         len = SKB_DATA_ALIGN(len) +
> > > > >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > >
> > > > > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > > +               return -ENOMEM;
> > > > > +
> > > > >         buf = virtnet_rq_alloc(rq, len, gfp);
> > > > >         if (unlikely(!buf))
> > > > >                 return -ENOMEM;
> > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >          */
> > > > >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > > >
> > > > > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > > +               return -ENOMEM;
> > > > > +
> > > > > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > > +               len -= sizeof(struct virtnet_rq_dma);
> > > > > +
> > > > >         buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > > >         if (unlikely(!buf))
> > > > >                 return -ENOMEM;
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > >
> >
>
Xuan Zhuo Sept. 9, 2024, 8:43 a.m. UTC | #32
On Mon, 9 Sep 2024 16:38:16 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Sep 6, 2024 at 5:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > > leads to regression on VM with the sysctl value of:
> > > > > >
> > > > > > - net.core.high_order_alloc_disable=1
> > > > > >
> > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > > to VM):
> > > > > >
> > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > > everything is fine. However, if the frag is only one page and the
> > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > > the first buffer of the frag is affected.
> > > > > >
> > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > >
> > > > >
> > > > > Guys where are we going with this? We have a crasher right now,
> > > > > if this is not fixed ASAP I'd have to revert a ton of
> > > > > work Xuan Zhuo just did.
> > > >
> > > > I think this patch can fix it and I tested it.
> > > > But Darren said this patch did not work.
> > > > I need more info about the crash that Darren encountered.
> > > >
> > > > Thanks.
> > >
> > > So what are we doing? Revert the whole pile for now?
> > > Seems to be a bit of a pity, but maybe that's the best we can do
> > > for this release.
> >
> > @Jason Could you review this?
>
> I think we probably need some tweaks for this patch.
>
> For example, the changelog is not easy to be understood especially
> consider it starts something like:
>
> "
>     leads to regression on VM with the sysctl value of:
>
>     - net.core.high_order_alloc_disable=1
>
>     which could see reliable crashes or scp failure (scp a file 100M in size
>     to VM):
> "
>
> Need some context and actually sysctl is not a must to reproduce the
> issue, it can also happen when memory is fragmented.

OK.


>
> Another issue is that, if we move the skb_page_frag_refill() out of
> the virtnet_rq_alloc(). The function semantics turns out to be weird:
>
> skb_page_frag_refill(len, &rq->alloc_frag, gfp);
> ...
> virtnet_rq_alloc(rq, len, gfp);

YES.

>
> I wonder instead of subtracting the dma->len, how about simply count
> the dma->len in len if we call virtnet_rq_aloc() in
> add_recvbuf_small()?

1. For the small mode, it is safe. That just happens in the merge mode.
2. In the merge mode, if we count the dma->len in len, we should know
   if the frag->offset is zero or not. We can not do that before
   skb_page_frag_refill(), because skb_page_frag_refill() may allocate
   new page, the frag->offset is zero. So the judgment must is after
   skb_page_frag_refill().

Thanks.


>
> >
> > I think this problem is clear, though I do not know why it did not work
> > for Darren.
>
> I had a try. This issue could be reproduced easily and this patch
> seems to fix the issue with a KASAN enabled kernel.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > >
> > > > >
> > > > >
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index c6af18948092..e5286a6da863 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > > > >         void *buf, *head;
> > > > > >         dma_addr_t addr;
> > > > > >
> > > > > > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > > > -               return NULL;
> > > > > > -
> > > > > >         head = page_address(alloc_frag->page);
> > > > > >
> > > > > >         dma = head;
> > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > > >
> > > > > > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > > > +               return -ENOMEM;
> > > > > > +
> > > > > >         buf = virtnet_rq_alloc(rq, len, gfp);
> > > > > >         if (unlikely(!buf))
> > > > > >                 return -ENOMEM;
> > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > >          */
> > > > > >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > > > >
> > > > > > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > > > +               return -ENOMEM;
> > > > > > +
> > > > > > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > > > +               len -= sizeof(struct virtnet_rq_dma);
> > > > > > +
> > > > > >         buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > > > >         if (unlikely(!buf))
> > > > > >                 return -ENOMEM;
> > > > > > --
> > > > > > 2.32.0.3.g01195cf9f
> > > > >
> > >
> >
>
Jason Wang Sept. 9, 2024, 8:47 a.m. UTC | #33
On Mon, Sep 9, 2024 at 11:16 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Sun, 8 Sep 2024 15:40:32 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > leads to regression on VM with the sysctl value of:
> > >
> > > - net.core.high_order_alloc_disable=1
> > >
> > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > to VM):
> > >
> > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > everything is fine. However, if the frag is only one page and the
> > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > the first buffer of the frag is affected.
> > >
> > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >
> >
> > BTW why isn't it needed if we revert f9dac92ba908?
>
>
> This patch fixes the bug in premapped mode.
>
> The revert operation just disables premapped mode.
>
> So I think this patch is enough to fix the bug, and we can enable
> premapped by default.
>
> If you worry about the premapped mode, I advice you merge this patch and do
> the revert[1]. Then the bug is fixed, and the premapped mode is
> disabled by default, we can just enable it for af-xdp.
>
> [1]: http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com

Though I think this is a good balance but if we can't get more inputs
from Darren. It seems safer to merge what he had tested.

Thanks

>
> Thanks.
>
>
> >
> > > ---
> > >  drivers/net/virtio_net.c | 12 +++++++++---
> > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index c6af18948092..e5286a6da863 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > >     void *buf, *head;
> > >     dma_addr_t addr;
> > >
> > > -   if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > -           return NULL;
> > > -
> > >     head = page_address(alloc_frag->page);
> > >
> > >     dma = head;
> > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >     len = SKB_DATA_ALIGN(len) +
> > >           SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > >
> > > +   if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > +           return -ENOMEM;
> > > +
> > >     buf = virtnet_rq_alloc(rq, len, gfp);
> > >     if (unlikely(!buf))
> > >             return -ENOMEM;
> > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >      */
> > >     len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > >
> > > +   if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > +           return -ENOMEM;
> > > +
> > > +   if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > +           len -= sizeof(struct virtnet_rq_dma);
> > > +
> > >     buf = virtnet_rq_alloc(rq, len + room, gfp);
> > >     if (unlikely(!buf))
> > >             return -ENOMEM;
> > > --
> > > 2.32.0.3.g01195cf9f
> >
>
Xuan Zhuo Sept. 9, 2024, 8:51 a.m. UTC | #34
On Mon, 9 Sep 2024 16:47:02 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Sep 9, 2024 at 11:16 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Sun, 8 Sep 2024 15:40:32 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > leads to regression on VM with the sysctl value of:
> > > >
> > > > - net.core.high_order_alloc_disable=1
> > > >
> > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > to VM):
> > > >
> > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > everything is fine. However, if the frag is only one page and the
> > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > the first buffer of the frag is affected.
> > > >
> > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > >
> > >
> > > BTW why isn't it needed if we revert f9dac92ba908?
> >
> >
> > This patch fixes the bug in premapped mode.
> >
> > The revert operation just disables premapped mode.
> >
> > So I think this patch is enough to fix the bug, and we can enable
> > premapped by default.
> >
> > If you worry about the premapped mode, I advice you merge this patch and do
> > the revert[1]. Then the bug is fixed, and the premapped mode is
> > disabled by default, we can just enable it for af-xdp.
> >
> > [1]: http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com
>
> Though I think this is a good balance but if we can't get more inputs
> from Darren. It seems safer to merge what he had tested.

So you mean just merge this:

	http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com

Right?

Thanks


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > > ---
> > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index c6af18948092..e5286a6da863 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > >     void *buf, *head;
> > > >     dma_addr_t addr;
> > > >
> > > > -   if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > -           return NULL;
> > > > -
> > > >     head = page_address(alloc_frag->page);
> > > >
> > > >     dma = head;
> > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >     len = SKB_DATA_ALIGN(len) +
> > > >           SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > >
> > > > +   if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > +           return -ENOMEM;
> > > > +
> > > >     buf = virtnet_rq_alloc(rq, len, gfp);
> > > >     if (unlikely(!buf))
> > > >             return -ENOMEM;
> > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >      */
> > > >     len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > >
> > > > +   if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > +           return -ENOMEM;
> > > > +
> > > > +   if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > +           len -= sizeof(struct virtnet_rq_dma);
> > > > +
> > > >     buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > >     if (unlikely(!buf))
> > > >             return -ENOMEM;
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > >
> >
>
Darren Kenny Sept. 9, 2024, 9:05 a.m. UTC | #35
Hi,

Apologies, I've been OOTO for the best part of 2 weeks, I'll try get to
this as soon as I can, but playing catch-up right now.

Thanks,

Darren.

On Saturday, 2024-09-07 at 12:16:24 +09, Takero Funaki wrote:
> 2024年9月6日(金) 18:55 Michael S. Tsirkin <mst@redhat.com>:
>>
>> On Fri, Sep 06, 2024 at 05:46:02PM +0800, Xuan Zhuo wrote:
>> > On Fri, 6 Sep 2024 05:44:27 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> > > On Fri, Sep 06, 2024 at 05:25:36PM +0800, Xuan Zhuo wrote:
>> > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
>> > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
>> > > > > > > > leads to regression on VM with the sysctl value of:
>> > > > > > > >
>> > > > > > > > - net.core.high_order_alloc_disable=1
>> > > > > > > >
>> > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
>> > > > > > > > to VM):
>> > > > > > > >
>> > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
>> > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
>> > > > > > > > everything is fine. However, if the frag is only one page and the
>> > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
>> > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
>> > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
>> > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
>> > > > > > > > the first buffer of the frag is affected.
>> > > > > > > >
>> > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
>> > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
>> > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
>> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> > > > > > >
>> > > > > > >
>> > > > > > > Guys where are we going with this? We have a crasher right now,
>> > > > > > > if this is not fixed ASAP I'd have to revert a ton of
>> > > > > > > work Xuan Zhuo just did.
>> > > > > >
>> > > > > > I think this patch can fix it and I tested it.
>> > > > > > But Darren said this patch did not work.
>> > > > > > I need more info about the crash that Darren encountered.
>> > > > > >
>> > > > > > Thanks.
>> > > > >
>> > > > > So what are we doing? Revert the whole pile for now?
>> > > > > Seems to be a bit of a pity, but maybe that's the best we can do
>> > > > > for this release.
>> > > >
>> > > > @Jason Could you review this?
>> > > >
>> > > > I think this problem is clear, though I do not know why it did not work
>> > > > for Darren.
>> > > >
>> > > > Thanks.
>> > > >
>> > >
>> > > No regressions is a hard rule. If we can't figure out the regression
>> > > now, we should revert and you can try again for the next release.
>> >
>> > I see. I think I fixed it.
>> >
>> > Hope Darren can reply before you post the revert patches.
>> >
>> > Thanks.
>> >
>>
>> It's very rushed anyway. I posted the reverts, but as RFC for now.
>> You should post a debugging patch for Darren to help you figure
>> out what is going on.
>>
>>
>
> Hello,
>
> My issue [1], which bisected to the commit f9dac92ba908, was resolved
> after applying the patch on v6.11-rc6.
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=219154
>
> In my case, random crashes occur when receiving large data under heavy
> memory/IO load. Although the crash details differ, the memory
> corruption during data transfers is consistent.
>
> If Darren is unable to confirm the fix, would it be possible to
> consider merging this patch to close [1] instead?
>
> Thanks.
Jason Wang Sept. 10, 2024, 6:17 a.m. UTC | #36
On Mon, Sep 9, 2024 at 4:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 9 Sep 2024 16:47:02 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Sep 9, 2024 at 11:16 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Sun, 8 Sep 2024 15:40:32 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > leads to regression on VM with the sysctl value of:
> > > > >
> > > > > - net.core.high_order_alloc_disable=1
> > > > >
> > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > to VM):
> > > > >
> > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > everything is fine. However, if the frag is only one page and the
> > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > the first buffer of the frag is affected.
> > > > >
> > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >
> > > >
> > > > BTW why isn't it needed if we revert f9dac92ba908?
> > >
> > >
> > > This patch fixes the bug in premapped mode.
> > >
> > > The revert operation just disables premapped mode.
> > >
> > > So I think this patch is enough to fix the bug, and we can enable
> > > premapped by default.
> > >
> > > If you worry about the premapped mode, I advice you merge this patch and do
> > > the revert[1]. Then the bug is fixed, and the premapped mode is
> > > disabled by default, we can just enable it for af-xdp.
> > >
> > > [1]: http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com
> >
> > Though I think this is a good balance but if we can't get more inputs
> > from Darren. It seems safer to merge what he had tested.
>
> So you mean just merge this:
>
>         http://lore.kernel.org/all/20240906123137.108741-1-xuanzhuo@linux.alibaba.com
>
> Right?

It looks to me it hasn't been tested by Darren yet.

Thanks

>
> Thanks
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index c6af18948092..e5286a6da863 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > > >     void *buf, *head;
> > > > >     dma_addr_t addr;
> > > > >
> > > > > -   if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > > -           return NULL;
> > > > > -
> > > > >     head = page_address(alloc_frag->page);
> > > > >
> > > > >     dma = head;
> > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >     len = SKB_DATA_ALIGN(len) +
> > > > >           SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > >
> > > > > +   if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > > +           return -ENOMEM;
> > > > > +
> > > > >     buf = virtnet_rq_alloc(rq, len, gfp);
> > > > >     if (unlikely(!buf))
> > > > >             return -ENOMEM;
> > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >      */
> > > > >     len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > > >
> > > > > +   if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > > +           return -ENOMEM;
> > > > > +
> > > > > +   if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > > +           len -= sizeof(struct virtnet_rq_dma);
> > > > > +
> > > > >     buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > > >     if (unlikely(!buf))
> > > > >             return -ENOMEM;
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > >
> > >
> >
>
Jason Wang Sept. 10, 2024, 6:18 a.m. UTC | #37
On Mon, Sep 9, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 9 Sep 2024 16:38:16 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Fri, Sep 6, 2024 at 5:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > > > leads to regression on VM with the sysctl value of:
> > > > > > >
> > > > > > > - net.core.high_order_alloc_disable=1
> > > > > > >
> > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > > > to VM):
> > > > > > >
> > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > > > everything is fine. However, if the frag is only one page and the
> > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > > > the first buffer of the frag is affected.
> > > > > > >
> > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > >
> > > > > >
> > > > > > Guys where are we going with this? We have a crasher right now,
> > > > > > if this is not fixed ASAP I'd have to revert a ton of
> > > > > > work Xuan Zhuo just did.
> > > > >
> > > > > I think this patch can fix it and I tested it.
> > > > > But Darren said this patch did not work.
> > > > > I need more info about the crash that Darren encountered.
> > > > >
> > > > > Thanks.
> > > >
> > > > So what are we doing? Revert the whole pile for now?
> > > > Seems to be a bit of a pity, but maybe that's the best we can do
> > > > for this release.
> > >
> > > @Jason Could you review this?
> >
> > I think we probably need some tweaks for this patch.
> >
> > For example, the changelog is not easy to be understood especially
> > consider it starts something like:
> >
> > "
> >     leads to regression on VM with the sysctl value of:
> >
> >     - net.core.high_order_alloc_disable=1
> >
> >     which could see reliable crashes or scp failure (scp a file 100M in size
> >     to VM):
> > "
> >
> > Need some context and actually sysctl is not a must to reproduce the
> > issue, it can also happen when memory is fragmented.
>
> OK.
>
>
> >
> > Another issue is that, if we move the skb_page_frag_refill() out of
> > the virtnet_rq_alloc(). The function semantics turns out to be weird:
> >
> > skb_page_frag_refill(len, &rq->alloc_frag, gfp);
> > ...
> > virtnet_rq_alloc(rq, len, gfp);
>
> YES.
>
> >
> > I wonder instead of subtracting the dma->len, how about simply count
> > the dma->len in len if we call virtnet_rq_aloc() in
> > add_recvbuf_small()?
>
> 1. For the small mode, it is safe. That just happens in the merge mode.
> 2. In the merge mode, if we count the dma->len in len, we should know
>    if the frag->offset is zero or not. We can not do that before
>    skb_page_frag_refill(), because skb_page_frag_refill() may allocate
>    new page, the frag->offset is zero. So the judgment must is after
>    skb_page_frag_refill().

I may miss something. I mean always reserve dma->len for each frag.

But anyway, we need to tweak the function API, either explicitly pass
the frag or use the rq->frag implicitly.

Thanks

>
> Thanks.
>
>
> >
> > >
> > > I think this problem is clear, though I do not know why it did not work
> > > for Darren.
> >
> > I had a try. This issue could be reproduced easily and this patch
> > seems to fix the issue with a KASAN enabled kernel.
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index c6af18948092..e5286a6da863 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > > > > >         void *buf, *head;
> > > > > > >         dma_addr_t addr;
> > > > > > >
> > > > > > > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > > > > -               return NULL;
> > > > > > > -
> > > > > > >         head = page_address(alloc_frag->page);
> > > > > > >
> > > > > > >         dma = head;
> > > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > > > >
> > > > > > > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > > > > +               return -ENOMEM;
> > > > > > > +
> > > > > > >         buf = virtnet_rq_alloc(rq, len, gfp);
> > > > > > >         if (unlikely(!buf))
> > > > > > >                 return -ENOMEM;
> > > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > >          */
> > > > > > >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > > > > >
> > > > > > > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > > > > +               return -ENOMEM;
> > > > > > > +
> > > > > > > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > > > > +               len -= sizeof(struct virtnet_rq_dma);
> > > > > > > +
> > > > > > >         buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > > > > >         if (unlikely(!buf))
> > > > > > >                 return -ENOMEM;
> > > > > > > --
> > > > > > > 2.32.0.3.g01195cf9f
> > > > > >
> > > >
> > >
> >
>
Xuan Zhuo Sept. 10, 2024, 7:20 a.m. UTC | #38
On Tue, 10 Sep 2024 14:18:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Sep 9, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 9 Sep 2024 16:38:16 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Fri, Sep 6, 2024 at 5:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Fri, 6 Sep 2024 05:08:56 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Fri, Sep 06, 2024 at 04:53:38PM +0800, Xuan Zhuo wrote:
> > > > > > On Fri, 6 Sep 2024 04:43:29 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Tue, Aug 20, 2024 at 03:19:13PM +0800, Xuan Zhuo wrote:
> > > > > > > > leads to regression on VM with the sysctl value of:
> > > > > > > >
> > > > > > > > - net.core.high_order_alloc_disable=1
> > > > > > > >
> > > > > > > > which could see reliable crashes or scp failure (scp a file 100M in size
> > > > > > > > to VM):
> > > > > > > >
> > > > > > > > The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> > > > > > > > of a new frag. When the frag size is larger than PAGE_SIZE,
> > > > > > > > everything is fine. However, if the frag is only one page and the
> > > > > > > > total size of the buffer and virtnet_rq_dma is larger than one page, an
> > > > > > > > overflow may occur. In this case, if an overflow is possible, I adjust
> > > > > > > > the buffer size. If net.core.high_order_alloc_disable=1, the maximum
> > > > > > > > buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
> > > > > > > > the first buffer of the frag is affected.
> > > > > > > >
> > > > > > > > Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
> > > > > > > > Reported-by: "Si-Wei Liu" <si-wei.liu@oracle.com>
> > > > > > > > Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > >
> > > > > > >
> > > > > > > Guys where are we going with this? We have a crasher right now,
> > > > > > > if this is not fixed ASAP I'd have to revert a ton of
> > > > > > > work Xuan Zhuo just did.
> > > > > >
> > > > > > I think this patch can fix it and I tested it.
> > > > > > But Darren said this patch did not work.
> > > > > > I need more info about the crash that Darren encountered.
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > > So what are we doing? Revert the whole pile for now?
> > > > > Seems to be a bit of a pity, but maybe that's the best we can do
> > > > > for this release.
> > > >
> > > > @Jason Could you review this?
> > >
> > > I think we probably need some tweaks for this patch.
> > >
> > > For example, the changelog is not easy to be understood especially
> > > consider it starts something like:
> > >
> > > "
> > >     leads to regression on VM with the sysctl value of:
> > >
> > >     - net.core.high_order_alloc_disable=1
> > >
> > >     which could see reliable crashes or scp failure (scp a file 100M in size
> > >     to VM):
> > > "
> > >
> > > Need some context and actually sysctl is not a must to reproduce the
> > > issue, it can also happen when memory is fragmented.
> >
> > OK.
> >
> >
> > >
> > > Another issue is that, if we move the skb_page_frag_refill() out of
> > > the virtnet_rq_alloc(). The function semantics turns out to be weird:
> > >
> > > skb_page_frag_refill(len, &rq->alloc_frag, gfp);
> > > ...
> > > virtnet_rq_alloc(rq, len, gfp);
> >
> > YES.
> >
> > >
> > > I wonder instead of subtracting the dma->len, how about simply count
> > > the dma->len in len if we call virtnet_rq_aloc() in
> > > add_recvbuf_small()?
> >
> > 1. For the small mode, it is safe. That just happens in the merge mode.
> > 2. In the merge mode, if we count the dma->len in len, we should know
> >    if the frag->offset is zero or not. We can not do that before
> >    skb_page_frag_refill(), because skb_page_frag_refill() may allocate
> >    new page, the frag->offset is zero. So the judgment must is after
> >    skb_page_frag_refill().
>
> I may miss something. I mean always reserve dma->len for each frag.

This problem is a little complex.

We need to consider one point, if the frag reserves sizeof(dma), then
the len must minus that, otherwise we can not allocate from the
frag when frag is only one page and 'len' is PAGE_SIZE.

Thanks.


>
> But anyway, we need to tweak the function API, either explicitly pass
> the frag or use the rq->frag implicitly.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > >
> > > > I think this problem is clear, though I do not know why it did not work
> > > > for Darren.
> > >
> > > I had a try. This issue could be reproduced easily and this patch
> > > seems to fix the issue with a KASAN enabled kernel.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > ---
> > > > > > > >  drivers/net/virtio_net.c | 12 +++++++++---
> > > > > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index c6af18948092..e5286a6da863 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> > > > > > > >         void *buf, *head;
> > > > > > > >         dma_addr_t addr;
> > > > > > > >
> > > > > > > > -       if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > > > > > > > -               return NULL;
> > > > > > > > -
> > > > > > > >         head = page_address(alloc_frag->page);
> > > > > > > >
> > > > > > > >         dma = head;
> > > > > > > > @@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > >               SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > > > > >
> > > > > > > > +       if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> > > > > > > > +               return -ENOMEM;
> > > > > > > > +
> > > > > > > >         buf = virtnet_rq_alloc(rq, len, gfp);
> > > > > > > >         if (unlikely(!buf))
> > > > > > > >                 return -ENOMEM;
> > > > > > > > @@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > >          */
> > > > > > > >         len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> > > > > > > >
> > > > > > > > +       if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> > > > > > > > +               return -ENOMEM;
> > > > > > > > +
> > > > > > > > +       if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
> > > > > > > > +               len -= sizeof(struct virtnet_rq_dma);
> > > > > > > > +
> > > > > > > >         buf = virtnet_rq_alloc(rq, len + room, gfp);
> > > > > > > >         if (unlikely(!buf))
> > > > > > > >                 return -ENOMEM;
> > > > > > > > --
> > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > >
> > > > >
> > > >
> > >
> >
>
diff mbox series

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c6af18948092..e5286a6da863 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -918,9 +918,6 @@  static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
 	void *buf, *head;
 	dma_addr_t addr;
 
-	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
-		return NULL;
-
 	head = page_address(alloc_frag->page);
 
 	dma = head;
@@ -2421,6 +2418,9 @@  static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 	len = SKB_DATA_ALIGN(len) +
 	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
+	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
+		return -ENOMEM;
+
 	buf = virtnet_rq_alloc(rq, len, gfp);
 	if (unlikely(!buf))
 		return -ENOMEM;
@@ -2521,6 +2521,12 @@  static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	 */
 	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
 
+	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
+		return -ENOMEM;
+
+	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
+		len -= sizeof(struct virtnet_rq_dma);
+
 	buf = virtnet_rq_alloc(rq, len + room, gfp);
 	if (unlikely(!buf))
 		return -ENOMEM;