diff mbox series

[1/2,V3,net] net: mana: Fix MANA VF unload when host is unresponsive

Message ID 1687771137-26911-1-git-send-email-schakrabarti@linux.microsoft.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [1/2,V3,net] net: mana: Fix MANA VF unload when host is unresponsive | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers fail 2 blamed authors not CCed: stephen@networkplumber.org shacharr@microsoft.com; 2 maintainers not CCed: stephen@networkplumber.org shacharr@microsoft.com
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes fail Problems with Fixes tag: 1
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch warning WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: 2768a5ea2083 ("net: mana: Fix MANA VF unload when host is unresponsive")'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Souradeep Chakrabarti June 26, 2023, 9:18 a.m. UTC
From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>

This patch addresses the VF unload issue, where mana_dealloc_queues()
gets stuck in infinite while loop, because of host unresponsiveness.
It adds a timeout in the while loop, to fix it.

Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
Microsoft Azure Network Adapter)
Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
V2 -> V3:
* Splitted the patch in two parts.
* Removed the unnecessary braces from mana_dealloc_queues().
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Praveen Kumar June 26, 2023, 2:20 p.m. UTC | #1
On 6/26/2023 2:48 PM, souradeep chakrabarti wrote:
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> 
> This patch addresses the VF unload issue, where mana_dealloc_queues()
> gets stuck in infinite while loop, because of host unresponsiveness.
> It adds a timeout in the while loop, to fix it.
> 
> Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
> Microsoft Azure Network Adapter)
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> V2 -> V3:
> * Splitted the patch in two parts.
> * Removed the unnecessary braces from mana_dealloc_queues().
> ---
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d907727c7b7a..cb5c43c3c47e 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -2329,7 +2329,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  {
>  	struct mana_port_context *apc = netdev_priv(ndev);
>  	struct gdma_dev *gd = apc->ac->gdma_dev;
> +	unsigned long timeout;
>  	struct mana_txq *txq;
> +	struct sk_buff *skb;
> +	struct mana_cq *cq;
>  	int i, err;
>  
>  	if (apc->port_is_up)
> @@ -2348,13 +2351,25 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  	 *
>  	 * Drain all the in-flight TX packets
>  	 */
> +
> +	timeout = jiffies + 120 * HZ;
>  	for (i = 0; i < apc->num_queues; i++) {
>  		txq = &apc->tx_qp[i].txq;
> -
> -		while (atomic_read(&txq->pending_sends) > 0)
> +		while (atomic_read(&txq->pending_sends) > 0 &&
> +		       time_before(jiffies, timeout))
>  			usleep_range(1000, 2000);
>  	}
>  
> +	for (i = 0; i < apc->num_queues; i++) {
> +		txq = &apc->tx_qp[i].txq;
> +		cq = &apc->tx_qp[i].tx_cq;
> +		while (atomic_read(&txq->pending_sends)) {
> +			skb = skb_dequeue(&txq->pending_skbs);
> +			mana_unmap_skb(skb, apc);
> +			napi_consume_skb(skb, cq->budget);
> +			atomic_sub(1, &txq->pending_sends);
> +		}
> +	}

Can we combine these 2 loops into 1 something like this ?

	for (i = 0; i < apc->num_queues; i++) {
		txq = &apc->tx_qp[i].txq;
		cq = &apc->tx_qp[i].tx_cq;
		while (atomic_read(&txq->pending_sends)) {
			if (time_before(jiffies, timeout)) {
				usleep_range(1000, 2000);
			} else {
				skb = skb_dequeue(&txq->pending_skbs);
				mana_unmap_skb(skb, apc);
				napi_consume_skb(skb, cq->budget);
				atomic_sub(1, &txq->pending_sends);
			}
		}
	}
>  	/* We're 100% sure the queues can no longer be woken up, because
>  	 * we're sure now mana_poll_tx_cq() can't be running.
>  	 */
Michael Kelley (LINUX) June 26, 2023, 3:35 p.m. UTC | #2
From: souradeep chakrabarti <schakrabarti@linux.microsoft.com> Sent: Monday, June 26, 2023 2:19 AM
> 
> This patch addresses the VF unload issue, where mana_dealloc_queues()
> gets stuck in infinite while loop, because of host unresponsiveness.
> It adds a timeout in the while loop, to fix it.

For a patch series, the cover letter (patch 0 of the series) does not get
included in the commit log anywhere.   The cover letter can provide
overall motivation and describe how the patches fit together, but the
commit message for each patch should be as self-contained as possible.

The commit message here refers to "the VF unload issue", and there's
no context for understanding what that issue is, though you do provide
some description in the text following "where". Could you provide a
commit message that is a bit more self-contained?

Same comment applies to commit message for the 2nd patch of this
series.

Also, avoid text like "this patch".   See the "Describe your changes"
section in Documentation/process/submitting-patches.rst where the
use of imperative mood is mentioned.  If you like, I can provide some
offline help on writing a good commit message.

Michael

> 
> Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
> Microsoft Azure Network Adapter)
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> ---
> V2 -> V3:
> * Splitted the patch in two parts.
> * Removed the unnecessary braces from mana_dealloc_queues().
> ---
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d907727c7b7a..cb5c43c3c47e 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -2329,7 +2329,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  {
>  	struct mana_port_context *apc = netdev_priv(ndev);
>  	struct gdma_dev *gd = apc->ac->gdma_dev;
> +	unsigned long timeout;
>  	struct mana_txq *txq;
> +	struct sk_buff *skb;
> +	struct mana_cq *cq;
>  	int i, err;
> 
>  	if (apc->port_is_up)
> @@ -2348,13 +2351,25 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  	 *
>  	 * Drain all the in-flight TX packets
>  	 */
> +
> +	timeout = jiffies + 120 * HZ;
>  	for (i = 0; i < apc->num_queues; i++) {
>  		txq = &apc->tx_qp[i].txq;
> -
> -		while (atomic_read(&txq->pending_sends) > 0)
> +		while (atomic_read(&txq->pending_sends) > 0 &&
> +		       time_before(jiffies, timeout))
>  			usleep_range(1000, 2000);
>  	}
> 
> +	for (i = 0; i < apc->num_queues; i++) {
> +		txq = &apc->tx_qp[i].txq;
> +		cq = &apc->tx_qp[i].tx_cq;
> +		while (atomic_read(&txq->pending_sends)) {
> +			skb = skb_dequeue(&txq->pending_skbs);
> +			mana_unmap_skb(skb, apc);
> +			napi_consume_skb(skb, cq->budget);
> +			atomic_sub(1, &txq->pending_sends);
> +		}
> +	}
>  	/* We're 100% sure the queues can no longer be woken up, because
>  	 * we're sure now mana_poll_tx_cq() can't be running.
>  	 */
> --
> 2.34.1
Souradeep Chakrabarti June 27, 2023, 8:44 a.m. UTC | #3
On Mon, Jun 26, 2023 at 07:50:44PM +0530, Praveen Kumar wrote:
> On 6/26/2023 2:48 PM, souradeep chakrabarti wrote:
> > From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > 
> > This patch addresses the VF unload issue, where mana_dealloc_queues()
> > gets stuck in infinite while loop, because of host unresponsiveness.
> > It adds a timeout in the while loop, to fix it.
> > 
> > Fixes: ca9c54d2d6a5ab2430c4eda364c77125d62e5e0f (net: mana: Add a driver for
> > Microsoft Azure Network Adapter)
> > Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> > ---
> > V2 -> V3:
> > * Splitted the patch in two parts.
> > * Removed the unnecessary braces from mana_dealloc_queues().
> > ---
> >  drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index d907727c7b7a..cb5c43c3c47e 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -2329,7 +2329,10 @@ static int mana_dealloc_queues(struct net_device *ndev)
> >  {
> >  	struct mana_port_context *apc = netdev_priv(ndev);
> >  	struct gdma_dev *gd = apc->ac->gdma_dev;
> > +	unsigned long timeout;
> >  	struct mana_txq *txq;
> > +	struct sk_buff *skb;
> > +	struct mana_cq *cq;
> >  	int i, err;
> >  
> >  	if (apc->port_is_up)
> > @@ -2348,13 +2351,25 @@ static int mana_dealloc_queues(struct net_device *ndev)
> >  	 *
> >  	 * Drain all the in-flight TX packets
> >  	 */
> > +
> > +	timeout = jiffies + 120 * HZ;
> >  	for (i = 0; i < apc->num_queues; i++) {
> >  		txq = &apc->tx_qp[i].txq;
> > -
> > -		while (atomic_read(&txq->pending_sends) > 0)
> > +		while (atomic_read(&txq->pending_sends) > 0 &&
> > +		       time_before(jiffies, timeout))
> >  			usleep_range(1000, 2000);
> >  	}
> >  
> > +	for (i = 0; i < apc->num_queues; i++) {
> > +		txq = &apc->tx_qp[i].txq;
> > +		cq = &apc->tx_qp[i].tx_cq;
> > +		while (atomic_read(&txq->pending_sends)) {
> > +			skb = skb_dequeue(&txq->pending_skbs);
> > +			mana_unmap_skb(skb, apc);
> > +			napi_consume_skb(skb, cq->budget);
> > +			atomic_sub(1, &txq->pending_sends);
> > +		}
> > +	}
> 
> Can we combine these 2 loops into 1 something like this ?
> 
> 	for (i = 0; i < apc->num_queues; i++) {
> 		txq = &apc->tx_qp[i].txq;
> 		cq = &apc->tx_qp[i].tx_cq;
> 		while (atomic_read(&txq->pending_sends)) {
> 			if (time_before(jiffies, timeout)) {
> 				usleep_range(1000, 2000);
> 			} else {
> 				skb = skb_dequeue(&txq->pending_skbs);
> 				mana_unmap_skb(skb, apc);
> 				napi_consume_skb(skb, cq->budget);
> 				atomic_sub(1, &txq->pending_sends);
> 			}
> 		}
> 	}
We should free up the skbs only after timeout has happened or after all the
queues are looped.
> >  	/* We're 100% sure the queues can no longer be woken up, because
> >  	 * we're sure now mana_poll_tx_cq() can't be running.
> >  	 */
diff mbox series

Patch

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index d907727c7b7a..cb5c43c3c47e 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2329,7 +2329,10 @@  static int mana_dealloc_queues(struct net_device *ndev)
 {
 	struct mana_port_context *apc = netdev_priv(ndev);
 	struct gdma_dev *gd = apc->ac->gdma_dev;
+	unsigned long timeout;
 	struct mana_txq *txq;
+	struct sk_buff *skb;
+	struct mana_cq *cq;
 	int i, err;
 
 	if (apc->port_is_up)
@@ -2348,13 +2351,25 @@  static int mana_dealloc_queues(struct net_device *ndev)
 	 *
 	 * Drain all the in-flight TX packets
 	 */
+
+	timeout = jiffies + 120 * HZ;
 	for (i = 0; i < apc->num_queues; i++) {
 		txq = &apc->tx_qp[i].txq;
-
-		while (atomic_read(&txq->pending_sends) > 0)
+		while (atomic_read(&txq->pending_sends) > 0 &&
+		       time_before(jiffies, timeout))
 			usleep_range(1000, 2000);
 	}
 
+	for (i = 0; i < apc->num_queues; i++) {
+		txq = &apc->tx_qp[i].txq;
+		cq = &apc->tx_qp[i].tx_cq;
+		while (atomic_read(&txq->pending_sends)) {
+			skb = skb_dequeue(&txq->pending_skbs);
+			mana_unmap_skb(skb, apc);
+			napi_consume_skb(skb, cq->budget);
+			atomic_sub(1, &txq->pending_sends);
+		}
+	}
 	/* We're 100% sure the queues can no longer be woken up, because
 	 * we're sure now mana_poll_tx_cq() can't be running.
 	 */