[net] vhost_net: fix high cpu load when sendmsg fails

Message ID	1607514504-20956-1-git-send-email-wangyunjian@huawei.com (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> From: wangyunjian <wangyunjian@huawei.com> To: <mst@redhat.com>, <jasowang@redhat.com> CC: <virtualization@lists.linux-foundation.org>, <netdev@vger.kernel.org>, <jerry.lilijun@huawei.com>, <chenchanghu@huawei.com>, <xudingke@huawei.com>, Yunjian Wang <wangyunjian@huawei.com> Subject: [PATCH net] vhost_net: fix high cpu load when sendmsg fails Date: Wed, 9 Dec 2020 19:48:24 +0800 Message-ID: <1607514504-20956-1-git-send-email-wangyunjian@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	[net] vhost_net: fix high cpu load when sendmsg fails \| expand [net] vhost_net: fix high cpu load when sendmsg fails

Context	Check	Description
netdev/cover_letter	success	Link
netdev/fixes_present	fail	Series targets non-next tree, but doesn't contain any Fixes tags
netdev/patch_count	success	Link
netdev/tree_selection	success	Clearly marked for net
netdev/subject_prefix	success	Link
netdev/source_inline	success	Was 0 now: 0
netdev/verify_signedoff	success	Link
netdev/module_param	success	Was 0 now: 0
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/verify_fixes	success	Link
netdev/checkpatch	warning	WARNING: line length of 94 exceeds 80 columns
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/header_inline	success	Link
netdev/stable	success	Stable not CCed

wangyunjian Dec. 9, 2020, 11:48 a.m. UTC

From: Yunjian Wang <wangyunjian@huawei.com>

Currently we break the loop and wake up the vhost_worker when
sendmsg fails. When the worker wakes up again, we'll meet the
same error. This will cause high CPU load. To fix this issue,
we can skip this description by ignoring the error.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
---
 drivers/vhost/net.c | 24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)

Michael S. Tsirkin Dec. 9, 2020, 12:49 p.m. UTC | #1

On Wed, Dec 09, 2020 at 07:48:24PM +0800, wangyunjian wrote:
> From: Yunjian Wang <wangyunjian@huawei.com>
> 
> Currently we break the loop and wake up the vhost_worker when
> sendmsg fails. When the worker wakes up again, we'll meet the
> same error. This will cause high CPU load. To fix this issue,
> we can skip this description by ignoring the error.
> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---
>  drivers/vhost/net.c | 24 +++++-------------------
>  1 file changed, 5 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 531a00d703cd..ac950b1120f5 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -829,14 +829,8 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  
>  		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>  		err = sock->ops->sendmsg(sock, &msg, len);
> -		if (unlikely(err < 0)) {
> -			vhost_discard_vq_desc(vq, 1);
> -			vhost_net_enable_vq(net, vq);
> -			break;
> -		}
> -		if (err != len)
> -			pr_debug("Truncated TX packet: len %d != %zd\n",
> -				 err, len);
> +		if (unlikely(err < 0 || err != len))
> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len);
>  done:
>  		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
>  		vq->heads[nvq->done_idx].len = 0;

One of the reasons for sendmsg to fail is ENOBUFS.
In that case for sure we don't want to drop packet.
There could be other transient errors.
Which error did you encounter, specifically?

> @@ -925,19 +919,11 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>  
>  		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>  		err = sock->ops->sendmsg(sock, &msg, len);
> -		if (unlikely(err < 0)) {
> -			if (zcopy_used) {
> +		if (unlikely(err < 0 || err != len)) {
> +			if (zcopy_used && err < 0)
>  				vhost_net_ubuf_put(ubufs);
> -				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> -					% UIO_MAXIOV;
> -			}
> -			vhost_discard_vq_desc(vq, 1);
> -			vhost_net_enable_vq(net, vq);
> -			break;
> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len);
>  		}
> -		if (err != len)
> -			pr_debug("Truncated TX packet: "
> -				 " len %d != %zd\n", err, len);
>  		if (!zcopy_used)
>  			vhost_add_used_and_signal(&net->dev, vq, head, 0);
>  		else
> -- 
> 2.23.0

wangyunjian Dec. 9, 2020, 1:27 p.m. UTC | #2

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Wednesday, December 9, 2020 8:50 PM
> To: wangyunjian <wangyunjian@huawei.com>
> Cc: jasowang@redhat.com; virtualization@lists.linux-foundation.org;
> netdev@vger.kernel.org; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
> chenchanghu <chenchanghu@huawei.com>; xudingke <xudingke@huawei.com>
> Subject: Re: [PATCH net] vhost_net: fix high cpu load when sendmsg fails
> 
> On Wed, Dec 09, 2020 at 07:48:24PM +0800, wangyunjian wrote:
> > From: Yunjian Wang <wangyunjian@huawei.com>
> >
> > Currently we break the loop and wake up the vhost_worker when sendmsg
> > fails. When the worker wakes up again, we'll meet the same error. This
> > will cause high CPU load. To fix this issue, we can skip this
> > description by ignoring the error.
> >
> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > ---
> >  drivers/vhost/net.c | 24 +++++-------------------
> >  1 file changed, 5 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index
> > 531a00d703cd..ac950b1120f5 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -829,14 +829,8 @@ static void handle_tx_copy(struct vhost_net *net,
> > struct socket *sock)
> >
> >  		/* TODO: Check specific error and bomb out unless ENOBUFS? */
> >  		err = sock->ops->sendmsg(sock, &msg, len);
> > -		if (unlikely(err < 0)) {
> > -			vhost_discard_vq_desc(vq, 1);
> > -			vhost_net_enable_vq(net, vq);
> > -			break;
> > -		}
> > -		if (err != len)
> > -			pr_debug("Truncated TX packet: len %d != %zd\n",
> > -				 err, len);
> > +		if (unlikely(err < 0 || err != len))
> > +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err,
> > +len);
> >  done:
> >  		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
> >  		vq->heads[nvq->done_idx].len = 0;
> 
> One of the reasons for sendmsg to fail is ENOBUFS.
> In that case for sure we don't want to drop packet.

Now the function tap_sendmsg()/tun_sendmsg() don't return ENOBUFS.

> There could be other transient errors.
> Which error did you encounter, specifically?

Currently a guest vm send a skb which length is more than 64k.
If virtio hdr is wrong, the problem will also be triggered.

Thanks

> 
> > @@ -925,19 +919,11 @@ static void handle_tx_zerocopy(struct vhost_net
> > *net, struct socket *sock)
> >
> >  		/* TODO: Check specific error and bomb out unless ENOBUFS? */
> >  		err = sock->ops->sendmsg(sock, &msg, len);
> > -		if (unlikely(err < 0)) {
> > -			if (zcopy_used) {
> > +		if (unlikely(err < 0 || err != len)) {
> > +			if (zcopy_used && err < 0)
> >  				vhost_net_ubuf_put(ubufs);
> > -				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> > -					% UIO_MAXIOV;
> > -			}
> > -			vhost_discard_vq_desc(vq, 1);
> > -			vhost_net_enable_vq(net, vq);
> > -			break;
> > +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err,
> > +len);
> >  		}
> > -		if (err != len)
> > -			pr_debug("Truncated TX packet: "
> > -				 " len %d != %zd\n", err, len);
> >  		if (!zcopy_used)
> >  			vhost_add_used_and_signal(&net->dev, vq, head, 0);
> >  		else
> > --
> > 2.23.0

Jason Wang Dec. 11, 2020, 2:52 a.m. UTC | #3

On 2020/12/9 下午9:27, wangyunjian wrote:
>> -----Original Message-----
>> From: Michael S. Tsirkin [mailto:mst@redhat.com]
>> Sent: Wednesday, December 9, 2020 8:50 PM
>> To: wangyunjian <wangyunjian@huawei.com>
>> Cc: jasowang@redhat.com; virtualization@lists.linux-foundation.org;
>> netdev@vger.kernel.org; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
>> chenchanghu <chenchanghu@huawei.com>; xudingke <xudingke@huawei.com>
>> Subject: Re: [PATCH net] vhost_net: fix high cpu load when sendmsg fails
>>
>> On Wed, Dec 09, 2020 at 07:48:24PM +0800, wangyunjian wrote:
>>> From: Yunjian Wang <wangyunjian@huawei.com>
>>>
>>> Currently we break the loop and wake up the vhost_worker when sendmsg
>>> fails. When the worker wakes up again, we'll meet the same error. This
>>> will cause high CPU load. To fix this issue, we can skip this
>>> description by ignoring the error.
>>>
>>> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
>>> ---
>>>   drivers/vhost/net.c | 24 +++++-------------------
>>>   1 file changed, 5 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index
>>> 531a00d703cd..ac950b1120f5 100644
>>> --- a/drivers/vhost/net.c
>>> +++ b/drivers/vhost/net.c
>>> @@ -829,14 +829,8 @@ static void handle_tx_copy(struct vhost_net *net,
>>> struct socket *sock)
>>>
>>>   		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>>>   		err = sock->ops->sendmsg(sock, &msg, len);
>>> -		if (unlikely(err < 0)) {
>>> -			vhost_discard_vq_desc(vq, 1);
>>> -			vhost_net_enable_vq(net, vq);
>>> -			break;
>>> -		}
>>> -		if (err != len)
>>> -			pr_debug("Truncated TX packet: len %d != %zd\n",
>>> -				 err, len);
>>> +		if (unlikely(err < 0 || err != len))
>>> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err,
>>> +len);
>>>   done:
>>>   		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
>>>   		vq->heads[nvq->done_idx].len = 0;
>> One of the reasons for sendmsg to fail is ENOBUFS.
>> In that case for sure we don't want to drop packet.
> Now the function tap_sendmsg()/tun_sendmsg() don't return ENOBUFS.


I think not, it can happen if we exceeds sndbuf. E.g see tun_alloc_skb().

Thanks


>
>> There could be other transient errors.
>> Which error did you encounter, specifically?
> Currently a guest vm send a skb which length is more than 64k.
> If virtio hdr is wrong, the problem will also be triggered.
>
> Thanks
>
>>> @@ -925,19 +919,11 @@ static void handle_tx_zerocopy(struct vhost_net
>>> *net, struct socket *sock)
>>>
>>>   		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>>>   		err = sock->ops->sendmsg(sock, &msg, len);
>>> -		if (unlikely(err < 0)) {
>>> -			if (zcopy_used) {
>>> +		if (unlikely(err < 0 || err != len)) {
>>> +			if (zcopy_used && err < 0)
>>>   				vhost_net_ubuf_put(ubufs);
>>> -				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
>>> -					% UIO_MAXIOV;
>>> -			}
>>> -			vhost_discard_vq_desc(vq, 1);
>>> -			vhost_net_enable_vq(net, vq);
>>> -			break;
>>> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err,
>>> +len);
>>>   		}
>>> -		if (err != len)
>>> -			pr_debug("Truncated TX packet: "
>>> -				 " len %d != %zd\n", err, len);
>>>   		if (!zcopy_used)
>>>   			vhost_add_used_and_signal(&net->dev, vq, head, 0);
>>>   		else
>>> --
>>> 2.23.0

wangyunjian Dec. 11, 2020, 7:37 a.m. UTC | #4

> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Friday, December 11, 2020 10:53 AM
> To: wangyunjian <wangyunjian@huawei.com>; Michael S. Tsirkin
> <mst@redhat.com>
> Cc: virtualization@lists.linux-foundation.org; netdev@vger.kernel.org; Lilijun
> (Jerry) <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>;
> xudingke <xudingke@huawei.com>
> Subject: Re: [PATCH net] vhost_net: fix high cpu load when sendmsg fails
> 
> 
> On 2020/12/9 下午9:27, wangyunjian wrote:
> >> -----Original Message-----
> >> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> >> Sent: Wednesday, December 9, 2020 8:50 PM
> >> To: wangyunjian <wangyunjian@huawei.com>
> >> Cc: jasowang@redhat.com; virtualization@lists.linux-foundation.org;
> >> netdev@vger.kernel.org; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
> >> chenchanghu <chenchanghu@huawei.com>; xudingke
> <xudingke@huawei.com>
> >> Subject: Re: [PATCH net] vhost_net: fix high cpu load when sendmsg
> >> fails
> >>
> >> On Wed, Dec 09, 2020 at 07:48:24PM +0800, wangyunjian wrote:
> >>> From: Yunjian Wang <wangyunjian@huawei.com>
> >>>
> >>> Currently we break the loop and wake up the vhost_worker when
> >>> sendmsg fails. When the worker wakes up again, we'll meet the same
> >>> error. This will cause high CPU load. To fix this issue, we can skip
> >>> this description by ignoring the error.
> >>>
> >>> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> >>> ---
> >>>   drivers/vhost/net.c | 24 +++++-------------------
> >>>   1 file changed, 5 insertions(+), 19 deletions(-)
> >>>
> >>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index
> >>> 531a00d703cd..ac950b1120f5 100644
> >>> --- a/drivers/vhost/net.c
> >>> +++ b/drivers/vhost/net.c
> >>> @@ -829,14 +829,8 @@ static void handle_tx_copy(struct vhost_net
> >>> *net, struct socket *sock)
> >>>
> >>>   		/* TODO: Check specific error and bomb out unless ENOBUFS?
> */
> >>>   		err = sock->ops->sendmsg(sock, &msg, len);
> >>> -		if (unlikely(err < 0)) {
> >>> -			vhost_discard_vq_desc(vq, 1);
> >>> -			vhost_net_enable_vq(net, vq);
> >>> -			break;
> >>> -		}
> >>> -		if (err != len)
> >>> -			pr_debug("Truncated TX packet: len %d != %zd\n",
> >>> -				 err, len);
> >>> +		if (unlikely(err < 0 || err != len))
> >>> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n",
> err,
> >>> +len);
> >>>   done:
> >>>   		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
> >>>   		vq->heads[nvq->done_idx].len = 0;
> >> One of the reasons for sendmsg to fail is ENOBUFS.
> >> In that case for sure we don't want to drop packet.
> > Now the function tap_sendmsg()/tun_sendmsg() don't return ENOBUFS.
> 
> 
> I think not, it can happen if we exceeds sndbuf. E.g see tun_alloc_skb().

This patch 'net: add alloc_skb_with_frags() helper' modifys the return value
of sock_alloc_send_pskb() from -ENOBUFS to -EAGAIN when we exceeds sndbuf.
So the return value of tun_alloc_skb has been changed.

We don't drop packet if the reasons for sendmsg to fail is EAGAIN.
How about this?

Thanks

> 
> Thanks
> 
> 
> >
> >> There could be other transient errors.
> >> Which error did you encounter, specifically?
> > Currently a guest vm send a skb which length is more than 64k.
> > If virtio hdr is wrong, the problem will also be triggered.
> >
> > Thanks
> >
> >>> @@ -925,19 +919,11 @@ static void handle_tx_zerocopy(struct
> >>> vhost_net *net, struct socket *sock)
> >>>
> >>>   		/* TODO: Check specific error and bomb out unless ENOBUFS?
> */
> >>>   		err = sock->ops->sendmsg(sock, &msg, len);
> >>> -		if (unlikely(err < 0)) {
> >>> -			if (zcopy_used) {
> >>> +		if (unlikely(err < 0 || err != len)) {
> >>> +			if (zcopy_used && err < 0)
> >>>   				vhost_net_ubuf_put(ubufs);
> >>> -				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> >>> -					% UIO_MAXIOV;
> >>> -			}
> >>> -			vhost_discard_vq_desc(vq, 1);
> >>> -			vhost_net_enable_vq(net, vq);
> >>> -			break;
> >>> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n",
> err,
> >>> +len);
> >>>   		}
> >>> -		if (err != len)
> >>> -			pr_debug("Truncated TX packet: "
> >>> -				 " len %d != %zd\n", err, len);
> >>>   		if (!zcopy_used)
> >>>   			vhost_add_used_and_signal(&net->dev, vq, head, 0);
> >>>   		else
> >>> --
> >>> 2.23.0

Jason Wang Dec. 14, 2020, 3:13 a.m. UTC | #5

On 2020/12/11 下午3:37, wangyunjian wrote:
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Friday, December 11, 2020 10:53 AM
>> To: wangyunjian <wangyunjian@huawei.com>; Michael S. Tsirkin
>> <mst@redhat.com>
>> Cc: virtualization@lists.linux-foundation.org; netdev@vger.kernel.org; Lilijun
>> (Jerry) <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>;
>> xudingke <xudingke@huawei.com>
>> Subject: Re: [PATCH net] vhost_net: fix high cpu load when sendmsg fails
>>
>>
>> On 2020/12/9 下午9:27, wangyunjian wrote:
>>>> -----Original Message-----
>>>> From: Michael S. Tsirkin [mailto:mst@redhat.com]
>>>> Sent: Wednesday, December 9, 2020 8:50 PM
>>>> To: wangyunjian <wangyunjian@huawei.com>
>>>> Cc: jasowang@redhat.com; virtualization@lists.linux-foundation.org;
>>>> netdev@vger.kernel.org; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
>>>> chenchanghu <chenchanghu@huawei.com>; xudingke
>> <xudingke@huawei.com>
>>>> Subject: Re: [PATCH net] vhost_net: fix high cpu load when sendmsg
>>>> fails
>>>>
>>>> On Wed, Dec 09, 2020 at 07:48:24PM +0800, wangyunjian wrote:
>>>>> From: Yunjian Wang <wangyunjian@huawei.com>
>>>>>
>>>>> Currently we break the loop and wake up the vhost_worker when
>>>>> sendmsg fails. When the worker wakes up again, we'll meet the same
>>>>> error. This will cause high CPU load. To fix this issue, we can skip
>>>>> this description by ignoring the error.
>>>>>
>>>>> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
>>>>> ---
>>>>>    drivers/vhost/net.c | 24 +++++-------------------
>>>>>    1 file changed, 5 insertions(+), 19 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index
>>>>> 531a00d703cd..ac950b1120f5 100644
>>>>> --- a/drivers/vhost/net.c
>>>>> +++ b/drivers/vhost/net.c
>>>>> @@ -829,14 +829,8 @@ static void handle_tx_copy(struct vhost_net
>>>>> *net, struct socket *sock)
>>>>>
>>>>>    		/* TODO: Check specific error and bomb out unless ENOBUFS?
>> */
>>>>>    		err = sock->ops->sendmsg(sock, &msg, len);
>>>>> -		if (unlikely(err < 0)) {
>>>>> -			vhost_discard_vq_desc(vq, 1);
>>>>> -			vhost_net_enable_vq(net, vq);
>>>>> -			break;
>>>>> -		}
>>>>> -		if (err != len)
>>>>> -			pr_debug("Truncated TX packet: len %d != %zd\n",
>>>>> -				 err, len);
>>>>> +		if (unlikely(err < 0 || err != len))
>>>>> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n",
>> err,
>>>>> +len);
>>>>>    done:
>>>>>    		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
>>>>>    		vq->heads[nvq->done_idx].len = 0;
>>>> One of the reasons for sendmsg to fail is ENOBUFS.
>>>> In that case for sure we don't want to drop packet.
>>> Now the function tap_sendmsg()/tun_sendmsg() don't return ENOBUFS.
>>
>> I think not, it can happen if we exceeds sndbuf. E.g see tun_alloc_skb().
> This patch 'net: add alloc_skb_with_frags() helper' modifys the return value
> of sock_alloc_send_pskb() from -ENOBUFS to -EAGAIN when we exceeds sndbuf.
> So the return value of tun_alloc_skb has been changed.


Ok.


>
> We don't drop packet if the reasons for sendmsg to fail is EAGAIN.
> How about this?


It should work.

Btw, the patch doesn't add the head to the used ring. This may confuses 
the driver.

Thanks


>
> Thanks
>
>> Thanks
>>
>>
>>>> There could be other transient errors.
>>>> Which error did you encounter, specifically?
>>> Currently a guest vm send a skb which length is more than 64k.
>>> If virtio hdr is wrong, the problem will also be triggered.
>>>
>>> Thanks
>>>
>>>>> @@ -925,19 +919,11 @@ static void handle_tx_zerocopy(struct
>>>>> vhost_net *net, struct socket *sock)
>>>>>
>>>>>    		/* TODO: Check specific error and bomb out unless ENOBUFS?
>> */
>>>>>    		err = sock->ops->sendmsg(sock, &msg, len);
>>>>> -		if (unlikely(err < 0)) {
>>>>> -			if (zcopy_used) {
>>>>> +		if (unlikely(err < 0 || err != len)) {
>>>>> +			if (zcopy_used && err < 0)
>>>>>    				vhost_net_ubuf_put(ubufs);
>>>>> -				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
>>>>> -					% UIO_MAXIOV;
>>>>> -			}
>>>>> -			vhost_discard_vq_desc(vq, 1);
>>>>> -			vhost_net_enable_vq(net, vq);
>>>>> -			break;
>>>>> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n",
>> err,
>>>>> +len);
>>>>>    		}
>>>>> -		if (err != len)
>>>>> -			pr_debug("Truncated TX packet: "
>>>>> -				 " len %d != %zd\n", err, len);
>>>>>    		if (!zcopy_used)
>>>>>    			vhost_add_used_and_signal(&net->dev, vq, head, 0);
>>>>>    		else
>>>>> --
>>>>> 2.23.0

Jason Wang Dec. 14, 2020, 3:13 a.m. UTC | #6

On 2020/12/11 下午3:37, wangyunjian wrote:
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Friday, December 11, 2020 10:53 AM
>> To: wangyunjian <wangyunjian@huawei.com>; Michael S. Tsirkin
>> <mst@redhat.com>
>> Cc: virtualization@lists.linux-foundation.org; netdev@vger.kernel.org; Lilijun
>> (Jerry) <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>;
>> xudingke <xudingke@huawei.com>
>> Subject: Re: [PATCH net] vhost_net: fix high cpu load when sendmsg fails
>>
>>
>> On 2020/12/9 下午9:27, wangyunjian wrote:
>>>> -----Original Message-----
>>>> From: Michael S. Tsirkin [mailto:mst@redhat.com]
>>>> Sent: Wednesday, December 9, 2020 8:50 PM
>>>> To: wangyunjian <wangyunjian@huawei.com>
>>>> Cc: jasowang@redhat.com; virtualization@lists.linux-foundation.org;
>>>> netdev@vger.kernel.org; Lilijun (Jerry) <jerry.lilijun@huawei.com>;
>>>> chenchanghu <chenchanghu@huawei.com>; xudingke
>> <xudingke@huawei.com>
>>>> Subject: Re: [PATCH net] vhost_net: fix high cpu load when sendmsg
>>>> fails
>>>>
>>>> On Wed, Dec 09, 2020 at 07:48:24PM +0800, wangyunjian wrote:
>>>>> From: Yunjian Wang <wangyunjian@huawei.com>
>>>>>
>>>>> Currently we break the loop and wake up the vhost_worker when
>>>>> sendmsg fails. When the worker wakes up again, we'll meet the same
>>>>> error. This will cause high CPU load. To fix this issue, we can skip
>>>>> this description by ignoring the error.
>>>>>
>>>>> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
>>>>> ---
>>>>>    drivers/vhost/net.c | 24 +++++-------------------
>>>>>    1 file changed, 5 insertions(+), 19 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index
>>>>> 531a00d703cd..ac950b1120f5 100644
>>>>> --- a/drivers/vhost/net.c
>>>>> +++ b/drivers/vhost/net.c
>>>>> @@ -829,14 +829,8 @@ static void handle_tx_copy(struct vhost_net
>>>>> *net, struct socket *sock)
>>>>>
>>>>>    		/* TODO: Check specific error and bomb out unless ENOBUFS?
>> */
>>>>>    		err = sock->ops->sendmsg(sock, &msg, len);
>>>>> -		if (unlikely(err < 0)) {
>>>>> -			vhost_discard_vq_desc(vq, 1);
>>>>> -			vhost_net_enable_vq(net, vq);
>>>>> -			break;
>>>>> -		}
>>>>> -		if (err != len)
>>>>> -			pr_debug("Truncated TX packet: len %d != %zd\n",
>>>>> -				 err, len);
>>>>> +		if (unlikely(err < 0 || err != len))
>>>>> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n",
>> err,
>>>>> +len);
>>>>>    done:
>>>>>    		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
>>>>>    		vq->heads[nvq->done_idx].len = 0;
>>>> One of the reasons for sendmsg to fail is ENOBUFS.
>>>> In that case for sure we don't want to drop packet.
>>> Now the function tap_sendmsg()/tun_sendmsg() don't return ENOBUFS.
>>
>> I think not, it can happen if we exceeds sndbuf. E.g see tun_alloc_skb().
> This patch 'net: add alloc_skb_with_frags() helper' modifys the return value
> of sock_alloc_send_pskb() from -ENOBUFS to -EAGAIN when we exceeds sndbuf.
> So the return value of tun_alloc_skb has been changed.


Ok.


>
> We don't drop packet if the reasons for sendmsg to fail is EAGAIN.
> How about this?


It should work.

Btw, the patch doesn't add the head to the used ring. This may confuse 
the driver.

Thanks


>
> Thanks
>
>> Thanks
>>
>>
>>>> There could be other transient errors.
>>>> Which error did you encounter, specifically?
>>> Currently a guest vm send a skb which length is more than 64k.
>>> If virtio hdr is wrong, the problem will also be triggered.
>>>
>>> Thanks
>>>
>>>>> @@ -925,19 +919,11 @@ static void handle_tx_zerocopy(struct
>>>>> vhost_net *net, struct socket *sock)
>>>>>
>>>>>    		/* TODO: Check specific error and bomb out unless ENOBUFS?
>> */
>>>>>    		err = sock->ops->sendmsg(sock, &msg, len);
>>>>> -		if (unlikely(err < 0)) {
>>>>> -			if (zcopy_used) {
>>>>> +		if (unlikely(err < 0 || err != len)) {
>>>>> +			if (zcopy_used && err < 0)
>>>>>    				vhost_net_ubuf_put(ubufs);
>>>>> -				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
>>>>> -					% UIO_MAXIOV;
>>>>> -			}
>>>>> -			vhost_discard_vq_desc(vq, 1);
>>>>> -			vhost_net_enable_vq(net, vq);
>>>>> -			break;
>>>>> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n",
>> err,
>>>>> +len);
>>>>>    		}
>>>>> -		if (err != len)
>>>>> -			pr_debug("Truncated TX packet: "
>>>>> -				 " len %d != %zd\n", err, len);
>>>>>    		if (!zcopy_used)
>>>>>    			vhost_add_used_and_signal(&net->dev, vq, head, 0);
>>>>>    		else
>>>>> --
>>>>> 2.23.0

[net] vhost_net: fix high cpu load when sendmsg fails

Checks

Commit Message

Comments

Patch