diff mbox series

[net-next,v7] vhost_net: avoid tx queue stuck when sendmsg fails

Message ID 1610685980-38608-1-git-send-email-wangyunjian@huawei.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v7] vhost_net: avoid tx queue stuck when sendmsg fails | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 1 maintainers not CCed: kvm@vger.kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

wangyunjian Jan. 15, 2021, 4:46 a.m. UTC
From: Yunjian Wang <wangyunjian@huawei.com>

Currently the driver doesn't drop a packet which can't be sent by tun
(e.g bad packet). In this case, the driver will always process the
same packet lead to the tx queue stuck.

To fix this issue:
1. in the case of persistent failure (e.g bad packet), the driver
   can skip this descriptor by ignoring the error.
2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM),
   the driver schedules the worker to try again.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
---
v7:
   * code rebase
v6:
   * update code styles and commit log
---
 drivers/vhost/net.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

Comments

Jason Wang Jan. 15, 2021, 6:05 a.m. UTC | #1
On 2021/1/15 下午12:46, wangyunjian wrote:
> From: Yunjian Wang <wangyunjian@huawei.com>
>
> Currently the driver doesn't drop a packet which can't be sent by tun
> (e.g bad packet). In this case, the driver will always process the
> same packet lead to the tx queue stuck.
>
> To fix this issue:
> 1. in the case of persistent failure (e.g bad packet), the driver
>     can skip this descriptor by ignoring the error.
> 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM),
>     the driver schedules the worker to try again.
>
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
> v7:
>     * code rebase
> v6:
>     * update code styles and commit log
> ---
>   drivers/vhost/net.c | 26 ++++++++++++++------------
>   1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 3b744031ec8f..df82b124170e 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -828,14 +828,15 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>   				msg.msg_flags &= ~MSG_MORE;
>   		}
>   
> -		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>   		err = sock->ops->sendmsg(sock, &msg, len);
>   		if (unlikely(err < 0)) {
> -			vhost_discard_vq_desc(vq, 1);
> -			vhost_net_enable_vq(net, vq);
> -			break;
> -		}
> -		if (err != len)
> +			if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) {
> +				vhost_discard_vq_desc(vq, 1);
> +				vhost_net_enable_vq(net, vq);
> +				break;
> +			}
> +			pr_debug("Fail to send packet: err %d", err);
> +		} else if (unlikely(err != len))
>   			pr_debug("Truncated TX packet: len %d != %zd\n",
>   				 err, len);
>   done:
> @@ -924,7 +925,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>   			msg.msg_flags &= ~MSG_MORE;
>   		}
>   
> -		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>   		err = sock->ops->sendmsg(sock, &msg, len);
>   		if (unlikely(err < 0)) {
>   			if (zcopy_used) {
> @@ -933,11 +933,13 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>   				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
>   					% UIO_MAXIOV;
>   			}
> -			vhost_discard_vq_desc(vq, 1);
> -			vhost_net_enable_vq(net, vq);
> -			break;
> -		}
> -		if (err != len)
> +			if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) {
> +				vhost_discard_vq_desc(vq, 1);
> +				vhost_net_enable_vq(net, vq);
> +				break;
> +			}
> +			pr_debug("Fail to send packet: err %d", err);
> +		} else if (unlikely(err != len))
>   			pr_debug("Truncated TX packet: "
>   				 " len %d != %zd\n", err, len);
>   		if (!zcopy_used)
Willem de Bruijn Jan. 15, 2021, 6:15 p.m. UTC | #2
On Fri, Jan 15, 2021 at 1:12 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2021/1/15 下午12:46, wangyunjian wrote:
> > From: Yunjian Wang <wangyunjian@huawei.com>
> >
> > Currently the driver doesn't drop a packet which can't be sent by tun
> > (e.g bad packet). In this case, the driver will always process the
> > same packet lead to the tx queue stuck.
> >
> > To fix this issue:
> > 1. in the case of persistent failure (e.g bad packet), the driver
> >     can skip this descriptor by ignoring the error.
> > 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM),
> >     the driver schedules the worker to try again.
> >
> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
>
>
> Acked-by: Jason Wang <jasowang@redhat.com>

Acked-by: Willem de Bruijn <willemb@google.com>
Jakub Kicinski Jan. 18, 2021, 10:33 p.m. UTC | #3
On Fri, 15 Jan 2021 12:46:20 +0800 wangyunjian wrote:
> From: Yunjian Wang <wangyunjian@huawei.com>
> 
> Currently the driver doesn't drop a packet which can't be sent by tun
> (e.g bad packet). In this case, the driver will always process the
> same packet lead to the tx queue stuck.
> 
> To fix this issue:
> 1. in the case of persistent failure (e.g bad packet), the driver
>    can skip this descriptor by ignoring the error.
> 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM),
>    the driver schedules the worker to try again.
> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>

Michael, LMK if you want to have a closer look otherwise I'll apply
tomorrow.
Michael S. Tsirkin Jan. 19, 2021, 9:55 a.m. UTC | #4
On Fri, Jan 15, 2021 at 12:46:20PM +0800, wangyunjian wrote:
> From: Yunjian Wang <wangyunjian@huawei.com>
> 
> Currently the driver doesn't drop a packet which can't be sent by tun
> (e.g bad packet). In this case, the driver will always process the
> same packet lead to the tx queue stuck.
> 
> To fix this issue:
> 1. in the case of persistent failure (e.g bad packet), the driver
>    can skip this descriptor by ignoring the error.
> 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM),
>    the driver schedules the worker to try again.
> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
> v7:
>    * code rebase
> v6:
>    * update code styles and commit log
> ---
>  drivers/vhost/net.c | 26 ++++++++++++++------------
>  1 file changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 3b744031ec8f..df82b124170e 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -828,14 +828,15 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  				msg.msg_flags &= ~MSG_MORE;
>  		}
>  
> -		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>  		err = sock->ops->sendmsg(sock, &msg, len);
>  		if (unlikely(err < 0)) {
> -			vhost_discard_vq_desc(vq, 1);
> -			vhost_net_enable_vq(net, vq);
> -			break;
> -		}
> -		if (err != len)
> +			if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) {
> +				vhost_discard_vq_desc(vq, 1);
> +				vhost_net_enable_vq(net, vq);
> +				break;
> +			}
> +			pr_debug("Fail to send packet: err %d", err);
> +		} else if (unlikely(err != len))
>  			pr_debug("Truncated TX packet: len %d != %zd\n",
>  				 err, len);
>  done:
> @@ -924,7 +925,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>  			msg.msg_flags &= ~MSG_MORE;
>  		}
>  
> -		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>  		err = sock->ops->sendmsg(sock, &msg, len);
>  		if (unlikely(err < 0)) {
>  			if (zcopy_used) {
> @@ -933,11 +933,13 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>  				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
>  					% UIO_MAXIOV;
>  			}
> -			vhost_discard_vq_desc(vq, 1);
> -			vhost_net_enable_vq(net, vq);
> -			break;
> -		}
> -		if (err != len)
> +			if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) {
> +				vhost_discard_vq_desc(vq, 1);
> +				vhost_net_enable_vq(net, vq);
> +				break;
> +			}
> +			pr_debug("Fail to send packet: err %d", err);
> +		} else if (unlikely(err != len))
>  			pr_debug("Truncated TX packet: "
>  				 " len %d != %zd\n", err, len);
>  		if (!zcopy_used)
> -- 
> 2.23.0
Michael S. Tsirkin Jan. 19, 2021, 9:56 a.m. UTC | #5
On Mon, Jan 18, 2021 at 02:33:29PM -0800, Jakub Kicinski wrote:
> On Fri, 15 Jan 2021 12:46:20 +0800 wangyunjian wrote:
> > From: Yunjian Wang <wangyunjian@huawei.com>
> > 
> > Currently the driver doesn't drop a packet which can't be sent by tun
> > (e.g bad packet). In this case, the driver will always process the
> > same packet lead to the tx queue stuck.
> > 
> > To fix this issue:
> > 1. in the case of persistent failure (e.g bad packet), the driver
> >    can skip this descriptor by ignoring the error.
> > 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM),
> >    the driver schedules the worker to try again.
> > 
> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> 
> Michael, LMK if you want to have a closer look otherwise I'll apply
> tomorrow.

Thanks for the reminder. Acked.
Jakub Kicinski Jan. 19, 2021, 7:14 p.m. UTC | #6
On Tue, 19 Jan 2021 04:56:59 -0500 Michael S. Tsirkin wrote:
> On Mon, Jan 18, 2021 at 02:33:29PM -0800, Jakub Kicinski wrote:
> > On Fri, 15 Jan 2021 12:46:20 +0800 wangyunjian wrote:  
> > > From: Yunjian Wang <wangyunjian@huawei.com>
> > > 
> > > Currently the driver doesn't drop a packet which can't be sent by tun
> > > (e.g bad packet). In this case, the driver will always process the
> > > same packet lead to the tx queue stuck.
> > > 
> > > To fix this issue:
> > > 1. in the case of persistent failure (e.g bad packet), the driver
> > >    can skip this descriptor by ignoring the error.
> > > 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM),
> > >    the driver schedules the worker to try again.
> > > 
> > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>  
> > 
> > Michael, LMK if you want to have a closer look otherwise I'll apply
> > tomorrow.  
> 
> Thanks for the reminder. Acked.

Applied, thanks everyone!
diff mbox series

Patch

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 3b744031ec8f..df82b124170e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -828,14 +828,15 @@  static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 				msg.msg_flags &= ~MSG_MORE;
 		}
 
-		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
 		if (unlikely(err < 0)) {
-			vhost_discard_vq_desc(vq, 1);
-			vhost_net_enable_vq(net, vq);
-			break;
-		}
-		if (err != len)
+			if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) {
+				vhost_discard_vq_desc(vq, 1);
+				vhost_net_enable_vq(net, vq);
+				break;
+			}
+			pr_debug("Fail to send packet: err %d", err);
+		} else if (unlikely(err != len))
 			pr_debug("Truncated TX packet: len %d != %zd\n",
 				 err, len);
 done:
@@ -924,7 +925,6 @@  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			msg.msg_flags &= ~MSG_MORE;
 		}
 
-		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
 		if (unlikely(err < 0)) {
 			if (zcopy_used) {
@@ -933,11 +933,13 @@  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
 					% UIO_MAXIOV;
 			}
-			vhost_discard_vq_desc(vq, 1);
-			vhost_net_enable_vq(net, vq);
-			break;
-		}
-		if (err != len)
+			if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) {
+				vhost_discard_vq_desc(vq, 1);
+				vhost_net_enable_vq(net, vq);
+				break;
+			}
+			pr_debug("Fail to send packet: err %d", err);
+		} else if (unlikely(err != len))
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
 		if (!zcopy_used)