diff mbox series

[net,v2,2/2] vhost_net: fix high cpu load when sendmsg fails

Message ID 6b4c5fff8705dc4b5b6a25a45c50f36349350c73.1608065644.git.wangyunjian@huawei.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series fixes for vhost_net | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning CHECK: braces {} should be used on all arms of this statement WARNING: line length of 94 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

wangyunjian Dec. 16, 2020, 8:20 a.m. UTC
From: Yunjian Wang <wangyunjian@huawei.com>

Currently we break the loop and wake up the vhost_worker when
sendmsg fails. When the worker wakes up again, we'll meet the
same error. This will cause high CPU load. To fix this issue,
we can skip this description by ignoring the error. When we
exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
the case we don't skip the description and don't drop packet.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
---
 drivers/vhost/net.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

Comments

Michael S. Tsirkin Dec. 16, 2020, 9:23 a.m. UTC | #1
On Wed, Dec 16, 2020 at 04:20:37PM +0800, wangyunjian wrote:
> From: Yunjian Wang <wangyunjian@huawei.com>
> 
> Currently we break the loop and wake up the vhost_worker when
> sendmsg fails. When the worker wakes up again, we'll meet the
> same error. This will cause high CPU load. To fix this issue,
> we can skip this description by ignoring the error. When we
> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
> the case we don't skip the description and don't drop packet.

Question: with this patch, what happens if sendmsg is interrupted by a signal?


> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---
>  drivers/vhost/net.c | 21 +++++++++------------
>  1 file changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index c8784dfafdd7..3d33f3183abe 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -827,16 +827,13 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  				msg.msg_flags &= ~MSG_MORE;
>  		}
>  
> -		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>  		err = sock->ops->sendmsg(sock, &msg, len);
> -		if (unlikely(err < 0)) {
> +		if (unlikely(err == -EAGAIN)) {
>  			vhost_discard_vq_desc(vq, 1);
>  			vhost_net_enable_vq(net, vq);
>  			break;
> -		}
> -		if (err != len)
> -			pr_debug("Truncated TX packet: len %d != %zd\n",
> -				 err, len);
> +		} else if (unlikely(err != len))
> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len);
>  done:
>  		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
>  		vq->heads[nvq->done_idx].len = 0;
> @@ -922,7 +919,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>  			msg.msg_flags &= ~MSG_MORE;
>  		}
>  
> -		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>  		err = sock->ops->sendmsg(sock, &msg, len);
>  		if (unlikely(err < 0)) {
>  			if (zcopy_used) {
> @@ -931,13 +927,14 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>  				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
>  					% UIO_MAXIOV;
>  			}
> -			vhost_discard_vq_desc(vq, 1);
> -			vhost_net_enable_vq(net, vq);
> -			break;
> +			if (err == -EAGAIN) {
> +				vhost_discard_vq_desc(vq, 1);
> +				vhost_net_enable_vq(net, vq);
> +				break;
> +			}
>  		}
>  		if (err != len)
> -			pr_debug("Truncated TX packet: "
> -				 " len %d != %zd\n", err, len);
> +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len);

I'd rather make the pr_debug -> vq_err a separate change, with proper
commit log describing motivation.


>  		if (!zcopy_used)
>  			vhost_add_used_and_signal(&net->dev, vq, head, 0);
>  		else
> -- 
> 2.23.0
wangyunjian Dec. 17, 2020, 2:38 a.m. UTC | #2
> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Wednesday, December 16, 2020 5:23 PM
> To: wangyunjian <wangyunjian@huawei.com>
> Cc: netdev@vger.kernel.org; jasowang@redhat.com;
> willemdebruijn.kernel@gmail.com; virtualization@lists.linux-foundation.org;
> Lilijun (Jerry) <jerry.lilijun@huawei.com>; chenchanghu
> <chenchanghu@huawei.com>; xudingke <xudingke@huawei.com>; huangbin (J)
> <brian.huangbin@huawei.com>
> Subject: Re: [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails
> 
> On Wed, Dec 16, 2020 at 04:20:37PM +0800, wangyunjian wrote:
> > From: Yunjian Wang <wangyunjian@huawei.com>
> >
> > Currently we break the loop and wake up the vhost_worker when sendmsg
> > fails. When the worker wakes up again, we'll meet the same error. This
> > will cause high CPU load. To fix this issue, we can skip this
> > description by ignoring the error. When we exceeds sndbuf, the return
> > value of sendmsg is -EAGAIN. In the case we don't skip the description
> > and don't drop packet.
> 
> Question: with this patch, what happens if sendmsg is interrupted by a signal?

The descriptors are consumed as normal. However, the packet is discarded.
Could you explain the specific scenario?

> 
> 
> >
> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > ---
> >  drivers/vhost/net.c | 21 +++++++++------------
> >  1 file changed, 9 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index
> > c8784dfafdd7..3d33f3183abe 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -827,16 +827,13 @@ static void handle_tx_copy(struct vhost_net *net,
> struct socket *sock)
> >  				msg.msg_flags &= ~MSG_MORE;
> >  		}
> >
> > -		/* TODO: Check specific error and bomb out unless ENOBUFS? */
> >  		err = sock->ops->sendmsg(sock, &msg, len);
> > -		if (unlikely(err < 0)) {
> > +		if (unlikely(err == -EAGAIN)) {
> >  			vhost_discard_vq_desc(vq, 1);
> >  			vhost_net_enable_vq(net, vq);
> >  			break;
> > -		}
> > -		if (err != len)
> > -			pr_debug("Truncated TX packet: len %d != %zd\n",
> > -				 err, len);
> > +		} else if (unlikely(err != len))
> > +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err,
> > +len);
> >  done:
> >  		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
> >  		vq->heads[nvq->done_idx].len = 0;
> > @@ -922,7 +919,6 @@ static void handle_tx_zerocopy(struct vhost_net
> *net, struct socket *sock)
> >  			msg.msg_flags &= ~MSG_MORE;
> >  		}
> >
> > -		/* TODO: Check specific error and bomb out unless ENOBUFS? */
> >  		err = sock->ops->sendmsg(sock, &msg, len);
> >  		if (unlikely(err < 0)) {
> >  			if (zcopy_used) {
> > @@ -931,13 +927,14 @@ static void handle_tx_zerocopy(struct vhost_net
> *net, struct socket *sock)
> >  				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> >  					% UIO_MAXIOV;
> >  			}
> > -			vhost_discard_vq_desc(vq, 1);
> > -			vhost_net_enable_vq(net, vq);
> > -			break;
> > +			if (err == -EAGAIN) {
> > +				vhost_discard_vq_desc(vq, 1);
> > +				vhost_net_enable_vq(net, vq);
> > +				break;
> > +			}
> >  		}
> >  		if (err != len)
> > -			pr_debug("Truncated TX packet: "
> > -				 " len %d != %zd\n", err, len);
> > +			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err,
> > +len);
> 
> I'd rather make the pr_debug -> vq_err a separate change, with proper commit
> log describing motivation.

This log was originally triggered when packets were truncated. But after the
modification of this patch, other error scenarios will also trigger this log.
That's why I modified the content and level of this log together.
Now, should I just change the content of this patch?

Thanks

> 
> 
> >  		if (!zcopy_used)
> >  			vhost_add_used_and_signal(&net->dev, vq, head, 0);
> >  		else
> > --
> > 2.23.0
Jason Wang Dec. 17, 2020, 3:19 a.m. UTC | #3
On 2020/12/16 下午5:23, Michael S. Tsirkin wrote:
> On Wed, Dec 16, 2020 at 04:20:37PM +0800, wangyunjian wrote:
>> From: Yunjian Wang<wangyunjian@huawei.com>
>>
>> Currently we break the loop and wake up the vhost_worker when
>> sendmsg fails. When the worker wakes up again, we'll meet the
>> same error. This will cause high CPU load. To fix this issue,
>> we can skip this description by ignoring the error. When we
>> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
>> the case we don't skip the description and don't drop packet.
> Question: with this patch, what happens if sendmsg is interrupted by a signal?


Since we use MSG_DONTWAIT, we don't need to care about signal I think.

Thanks


>
>
Willem de Bruijn Dec. 21, 2020, 11:07 p.m. UTC | #4
On Wed, Dec 16, 2020 at 3:20 AM wangyunjian <wangyunjian@huawei.com> wrote:
>
> From: Yunjian Wang <wangyunjian@huawei.com>
>
> Currently we break the loop and wake up the vhost_worker when
> sendmsg fails. When the worker wakes up again, we'll meet the
> same error.

The patch is based on the assumption that such error cases always
return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?

> This will cause high CPU load. To fix this issue,
> we can skip this description by ignoring the error. When we
> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
> the case we don't skip the description and don't drop packet.

the -> that

here and above: description -> descriptor

Perhaps slightly revise to more explicitly state that

1. in the case of persistent failure (i.e., bad packet), the driver
drops the packet
2. in the case of transient failure (e.g,. memory pressure) the driver
schedules the worker to try again later


> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---
>  drivers/vhost/net.c | 21 +++++++++------------
>  1 file changed, 9 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index c8784dfafdd7..3d33f3183abe 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -827,16 +827,13 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>                                 msg.msg_flags &= ~MSG_MORE;
>                 }
>
> -               /* TODO: Check specific error and bomb out unless ENOBUFS? */
>                 err = sock->ops->sendmsg(sock, &msg, len);
> -               if (unlikely(err < 0)) {
> +               if (unlikely(err == -EAGAIN)) {
>                         vhost_discard_vq_desc(vq, 1);
>                         vhost_net_enable_vq(net, vq);
>                         break;
> -               }
> -               if (err != len)
> -                       pr_debug("Truncated TX packet: len %d != %zd\n",
> -                                err, len);
> +               } else if (unlikely(err != len))
> +                       vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len);

sending -> send

Even though vq_err is a wrapper around pr_debug, I agree with Michael
that such a change should be a separate patch to net-next, does not
belong in a fix.

More importantly, the error message is now the same for persistent
errors and for truncated packets. But on truncation the packet was
sent, so that is not entirely correct.

>  done:
>                 vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
>                 vq->heads[nvq->done_idx].len = 0;
> @@ -922,7 +919,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>                         msg.msg_flags &= ~MSG_MORE;
>                 }
>
> -               /* TODO: Check specific error and bomb out unless ENOBUFS? */
>                 err = sock->ops->sendmsg(sock, &msg, len);
>                 if (unlikely(err < 0)) {
>                         if (zcopy_used) {
> @@ -931,13 +927,14 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>                                 nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
>                                         % UIO_MAXIOV;
>                         }
> -                       vhost_discard_vq_desc(vq, 1);
> -                       vhost_net_enable_vq(net, vq);
> -                       break;
> +                       if (err == -EAGAIN) {
> +                               vhost_discard_vq_desc(vq, 1);
> +                               vhost_net_enable_vq(net, vq);
> +                               break;
> +                       }
>                 }
>                 if (err != len)
> -                       pr_debug("Truncated TX packet: "
> -                                " len %d != %zd\n", err, len);
> +                       vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len);
>                 if (!zcopy_used)
>                         vhost_add_used_and_signal(&net->dev, vq, head, 0);
>                 else
> --
> 2.23.0
>
Jason Wang Dec. 22, 2020, 4:41 a.m. UTC | #5
On 2020/12/22 上午7:07, Willem de Bruijn wrote:
> On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian@huawei.com>  wrote:
>> From: Yunjian Wang<wangyunjian@huawei.com>
>>
>> Currently we break the loop and wake up the vhost_worker when
>> sendmsg fails. When the worker wakes up again, we'll meet the
>> same error.
> The patch is based on the assumption that such error cases always
> return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?
>
>> This will cause high CPU load. To fix this issue,
>> we can skip this description by ignoring the error. When we
>> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
>> the case we don't skip the description and don't drop packet.
> the -> that
>
> here and above: description -> descriptor
>
> Perhaps slightly revise to more explicitly state that
>
> 1. in the case of persistent failure (i.e., bad packet), the driver
> drops the packet
> 2. in the case of transient failure (e.g,. memory pressure) the driver
> schedules the worker to try again later


If we want to go with this way, we need a better time to wakeup the 
worker. Otherwise it just produces more stress on the cpu that is what 
this patch tries to avoid.

Thanks


>
>
Willem de Bruijn Dec. 22, 2020, 2:24 p.m. UTC | #6
On Mon, Dec 21, 2020 at 11:41 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/12/22 上午7:07, Willem de Bruijn wrote:
> > On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian@huawei.com>  wrote:
> >> From: Yunjian Wang<wangyunjian@huawei.com>
> >>
> >> Currently we break the loop and wake up the vhost_worker when
> >> sendmsg fails. When the worker wakes up again, we'll meet the
> >> same error.
> > The patch is based on the assumption that such error cases always
> > return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?
> >
> >> This will cause high CPU load. To fix this issue,
> >> we can skip this description by ignoring the error. When we
> >> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
> >> the case we don't skip the description and don't drop packet.
> > the -> that
> >
> > here and above: description -> descriptor
> >
> > Perhaps slightly revise to more explicitly state that
> >
> > 1. in the case of persistent failure (i.e., bad packet), the driver
> > drops the packet
> > 2. in the case of transient failure (e.g,. memory pressure) the driver
> > schedules the worker to try again later
>
>
> If we want to go with this way, we need a better time to wakeup the
> worker. Otherwise it just produces more stress on the cpu that is what
> this patch tries to avoid.

Perhaps I misunderstood the purpose of the patch: is it to drop
everything, regardless of transient or persistent failure, until the
ring runs out of descriptors?

I can understand both a blocking and drop strategy during memory
pressure. But partial drop strategy until exceeding ring capacity
seems like a peculiar hybrid?
wangyunjian Dec. 23, 2020, 2:46 a.m. UTC | #7
> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Tuesday, December 22, 2020 12:41 PM
> To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>; wangyunjian
> <wangyunjian@huawei.com>
> Cc: Network Development <netdev@vger.kernel.org>; Michael S. Tsirkin
> <mst@redhat.com>; virtualization@lists.linux-foundation.org; Lilijun (Jerry)
> <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>;
> xudingke <xudingke@huawei.com>; huangbin (J)
> <brian.huangbin@huawei.com>
> Subject: Re: [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails
> 
> 
> On 2020/12/22 上午7:07, Willem de Bruijn wrote:
> > On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian@huawei.com>
> wrote:
> >> From: Yunjian Wang<wangyunjian@huawei.com>
> >>
> >> Currently we break the loop and wake up the vhost_worker when sendmsg
> >> fails. When the worker wakes up again, we'll meet the same error.
> > The patch is based on the assumption that such error cases always
> > return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?
> >
> >> This will cause high CPU load. To fix this issue, we can skip this
> >> description by ignoring the error. When we exceeds sndbuf, the return
> >> value of sendmsg is -EAGAIN. In the case we don't skip the
> >> description and don't drop packet.
> > the -> that
> >
> > here and above: description -> descriptor
> >
> > Perhaps slightly revise to more explicitly state that
> >
> > 1. in the case of persistent failure (i.e., bad packet), the driver
> > drops the packet 2. in the case of transient failure (e.g,. memory
> > pressure) the driver schedules the worker to try again later
> 
> 
> If we want to go with this way, we need a better time to wakeup the worker.
> Otherwise it just produces more stress on the cpu that is what this patch tries
> to avoid.

The problem was initially discovered when a VM sent an abnormal packet,
which causing the VM can't send packets anymore. After this patch
"feb8892cb441c7 vhost_net: conditionally enable tx polling", there have
also been high CPU consumption issues. 

It is the first problem that I am actually more concerned with and want
to solve.

Thanks

> 
> Thanks
> 
> 
> >
> >
Jason Wang Dec. 23, 2020, 2:53 a.m. UTC | #8
On 2020/12/22 下午10:24, Willem de Bruijn wrote:
> On Mon, Dec 21, 2020 at 11:41 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2020/12/22 上午7:07, Willem de Bruijn wrote:
>>> On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian@huawei.com>  wrote:
>>>> From: Yunjian Wang<wangyunjian@huawei.com>
>>>>
>>>> Currently we break the loop and wake up the vhost_worker when
>>>> sendmsg fails. When the worker wakes up again, we'll meet the
>>>> same error.
>>> The patch is based on the assumption that such error cases always
>>> return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?
>>>
>>>> This will cause high CPU load. To fix this issue,
>>>> we can skip this description by ignoring the error. When we
>>>> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
>>>> the case we don't skip the description and don't drop packet.
>>> the -> that
>>>
>>> here and above: description -> descriptor
>>>
>>> Perhaps slightly revise to more explicitly state that
>>>
>>> 1. in the case of persistent failure (i.e., bad packet), the driver
>>> drops the packet
>>> 2. in the case of transient failure (e.g,. memory pressure) the driver
>>> schedules the worker to try again later
>>
>> If we want to go with this way, we need a better time to wakeup the
>> worker. Otherwise it just produces more stress on the cpu that is what
>> this patch tries to avoid.
> Perhaps I misunderstood the purpose of the patch: is it to drop
> everything, regardless of transient or persistent failure, until the
> ring runs out of descriptors?


My understanding is that the main motivation is to avoid high cpu 
utilization when sendmsg() fail due to guest reason (e.g bad packet).


>
> I can understand both a blocking and drop strategy during memory
> pressure. But partial drop strategy until exceeding ring capacity
> seems like a peculiar hybrid?


Yes. So I wonder if we want to be do better when we are in the memory 
pressure. E.g can we let socket wake up us instead of rescheduling the 
workers here? At least in this case we know some memory might be freed?

Thanks


>
wangyunjian Dec. 23, 2020, 1:21 p.m. UTC | #9
> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Wednesday, December 23, 2020 10:54 AM
> To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Cc: wangyunjian <wangyunjian@huawei.com>; Network Development
> <netdev@vger.kernel.org>; Michael S. Tsirkin <mst@redhat.com>;
> virtualization@lists.linux-foundation.org; Lilijun (Jerry)
> <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>;
> xudingke <xudingke@huawei.com>; huangbin (J)
> <brian.huangbin@huawei.com>
> Subject: Re: [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails
> 
> 
> On 2020/12/22 下午10:24, Willem de Bruijn wrote:
> > On Mon, Dec 21, 2020 at 11:41 PM Jason Wang <jasowang@redhat.com>
> wrote:
> >>
> >> On 2020/12/22 上午7:07, Willem de Bruijn wrote:
> >>> On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian@huawei.com>
> wrote:
> >>>> From: Yunjian Wang<wangyunjian@huawei.com>
> >>>>
> >>>> Currently we break the loop and wake up the vhost_worker when
> >>>> sendmsg fails. When the worker wakes up again, we'll meet the same
> >>>> error.
> >>> The patch is based on the assumption that such error cases always
> >>> return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?
> >>>
> >>>> This will cause high CPU load. To fix this issue, we can skip this
> >>>> description by ignoring the error. When we exceeds sndbuf, the
> >>>> return value of sendmsg is -EAGAIN. In the case we don't skip the
> >>>> description and don't drop packet.
> >>> the -> that
> >>>
> >>> here and above: description -> descriptor
> >>>
> >>> Perhaps slightly revise to more explicitly state that
> >>>
> >>> 1. in the case of persistent failure (i.e., bad packet), the driver
> >>> drops the packet 2. in the case of transient failure (e.g,. memory
> >>> pressure) the driver schedules the worker to try again later
> >>
> >> If we want to go with this way, we need a better time to wakeup the
> >> worker. Otherwise it just produces more stress on the cpu that is
> >> what this patch tries to avoid.
> > Perhaps I misunderstood the purpose of the patch: is it to drop
> > everything, regardless of transient or persistent failure, until the
> > ring runs out of descriptors?
> 
> 
> My understanding is that the main motivation is to avoid high cpu utilization
> when sendmsg() fail due to guest reason (e.g bad packet).
> 

My main motivation is to avoid the tx queue stuck.

Should I describe it like this:
Currently the driver don't drop a packet which can't be send by tun
(e.g bad packet). In this case, the driver will always process the
same packet lead to the tx queue stuck.

To fix this issue:
1. in the case of persistent failure (e.g bad packet), the driver can skip
this descriptior by ignoring the error.
2. in the case of transient failure (e.g -EAGAIN and -ENOMEM), the driver
schedules the worker to try again.

Thanks

> 
> >
> > I can understand both a blocking and drop strategy during memory
> > pressure. But partial drop strategy until exceeding ring capacity
> > seems like a peculiar hybrid?
> 
> 
> Yes. So I wonder if we want to be do better when we are in the memory
> pressure. E.g can we let socket wake up us instead of rescheduling the
> workers here? At least in this case we know some memory might be freed?
> 
> Thanks
> 
> 
> >
Willem de Bruijn Dec. 23, 2020, 1:48 p.m. UTC | #10
On Wed, Dec 23, 2020 at 8:21 AM wangyunjian <wangyunjian@huawei.com> wrote:
>
> > -----Original Message-----
> > From: Jason Wang [mailto:jasowang@redhat.com]
> > Sent: Wednesday, December 23, 2020 10:54 AM
> > To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> > Cc: wangyunjian <wangyunjian@huawei.com>; Network Development
> > <netdev@vger.kernel.org>; Michael S. Tsirkin <mst@redhat.com>;
> > virtualization@lists.linux-foundation.org; Lilijun (Jerry)
> > <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>;
> > xudingke <xudingke@huawei.com>; huangbin (J)
> > <brian.huangbin@huawei.com>
> > Subject: Re: [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails
> >
> >
> > On 2020/12/22 下午10:24, Willem de Bruijn wrote:
> > > On Mon, Dec 21, 2020 at 11:41 PM Jason Wang <jasowang@redhat.com>
> > wrote:
> > >>
> > >> On 2020/12/22 上午7:07, Willem de Bruijn wrote:
> > >>> On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian@huawei.com>
> > wrote:
> > >>>> From: Yunjian Wang<wangyunjian@huawei.com>
> > >>>>
> > >>>> Currently we break the loop and wake up the vhost_worker when
> > >>>> sendmsg fails. When the worker wakes up again, we'll meet the same
> > >>>> error.
> > >>> The patch is based on the assumption that such error cases always
> > >>> return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?
> > >>>
> > >>>> This will cause high CPU load. To fix this issue, we can skip this
> > >>>> description by ignoring the error. When we exceeds sndbuf, the
> > >>>> return value of sendmsg is -EAGAIN. In the case we don't skip the
> > >>>> description and don't drop packet.
> > >>> the -> that
> > >>>
> > >>> here and above: description -> descriptor
> > >>>
> > >>> Perhaps slightly revise to more explicitly state that
> > >>>
> > >>> 1. in the case of persistent failure (i.e., bad packet), the driver
> > >>> drops the packet 2. in the case of transient failure (e.g,. memory
> > >>> pressure) the driver schedules the worker to try again later
> > >>
> > >> If we want to go with this way, we need a better time to wakeup the
> > >> worker. Otherwise it just produces more stress on the cpu that is
> > >> what this patch tries to avoid.
> > > Perhaps I misunderstood the purpose of the patch: is it to drop
> > > everything, regardless of transient or persistent failure, until the
> > > ring runs out of descriptors?
> >
> >
> > My understanding is that the main motivation is to avoid high cpu utilization
> > when sendmsg() fail due to guest reason (e.g bad packet).
> >
>
> My main motivation is to avoid the tx queue stuck.
>
> Should I describe it like this:
> Currently the driver don't drop a packet which can't be send by tun
> (e.g bad packet). In this case, the driver will always process the
> same packet lead to the tx queue stuck.
>
> To fix this issue:
> 1. in the case of persistent failure (e.g bad packet), the driver can skip
> this descriptior by ignoring the error.
> 2. in the case of transient failure (e.g -EAGAIN and -ENOMEM), the driver
> schedules the worker to try again.

That sounds good to me, thanks.

> Thanks
>
> >
> > >
> > > I can understand both a blocking and drop strategy during memory
> > > pressure. But partial drop strategy until exceeding ring capacity
> > > seems like a peculiar hybrid?
> >
> >
> > Yes. So I wonder if we want to be do better when we are in the memory
> > pressure. E.g can we let socket wake up us instead of rescheduling the
> > workers here? At least in this case we know some memory might be freed?

I don't know whether a blocking or drop strategy is the better choice.
Either way, it probably deserves to be handled separately.
diff mbox series

Patch

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index c8784dfafdd7..3d33f3183abe 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -827,16 +827,13 @@  static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 				msg.msg_flags &= ~MSG_MORE;
 		}
 
-		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
-		if (unlikely(err < 0)) {
+		if (unlikely(err == -EAGAIN)) {
 			vhost_discard_vq_desc(vq, 1);
 			vhost_net_enable_vq(net, vq);
 			break;
-		}
-		if (err != len)
-			pr_debug("Truncated TX packet: len %d != %zd\n",
-				 err, len);
+		} else if (unlikely(err != len))
+			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len);
 done:
 		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
 		vq->heads[nvq->done_idx].len = 0;
@@ -922,7 +919,6 @@  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			msg.msg_flags &= ~MSG_MORE;
 		}
 
-		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
 		if (unlikely(err < 0)) {
 			if (zcopy_used) {
@@ -931,13 +927,14 @@  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
 					% UIO_MAXIOV;
 			}
-			vhost_discard_vq_desc(vq, 1);
-			vhost_net_enable_vq(net, vq);
-			break;
+			if (err == -EAGAIN) {
+				vhost_discard_vq_desc(vq, 1);
+				vhost_net_enable_vq(net, vq);
+				break;
+			}
 		}
 		if (err != len)
-			pr_debug("Truncated TX packet: "
-				 " len %d != %zd\n", err, len);
+			vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len);
 		if (!zcopy_used)
 			vhost_add_used_and_signal(&net->dev, vq, head, 0);
 		else