[net,1/4] ibmvnic: Allow extra failures before disabling

Message ID	20220122025921.199446-1-sukadev@linux.ibm.com (mailing list archive)
State	Accepted
Commit	db9f0e8bf79e6da7068b5818fea0ffd9d0d4b4da
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> From: Sukadev Bhattiprolu <sukadev@linux.ibm.com> To: netdev@vger.kernel.org Cc: Brian King <brking@linux.ibm.com>, Dany Madden <drt@linux.ibm.com>, Rick Lindsley <ricklind@linux.ibm.com> Subject: [PATCH net 1/4] ibmvnic: Allow extra failures before disabling Date: Fri, 21 Jan 2022 18:59:18 -0800 Message-Id: <20220122025921.199446-1-sukadev@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[net,1/4] ibmvnic: Allow extra failures before disabling \| expand [net,1/4] ibmvnic: Allow extra failures before disabling [net,2/4] ibmvnic: init ->running_cap_crqs early [net,3/4] ibmvnic: don't spin in tasklet [net,4/4] ibmvnic: remove unused ->wait_capability

Message ID

20220122025921.199446-1-sukadev@linux.ibm.com (mailing list archive)

State

Accepted

Commit

db9f0e8bf79e6da7068b5818fea0ffd9d0d4b4da

Delegated to:

Netdev Maintainers

Headers

show

Return-Path: <netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 37323C433EF
	for <netdev@archiver.kernel.org>; Sat, 22 Jan 2022 02:59:27 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232009AbiAVC7Z (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Fri, 21 Jan 2022 21:59:25 -0500
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:49260 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S230127AbiAVC7Z (ORCPT
        <rfc822;netdev@vger.kernel.org>); Fri, 21 Jan 2022 21:59:25 -0500
Received: from pps.filterd (m0098396.ppops.net [127.0.0.1])
        by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id
 20M2ka87005758
        for <netdev@vger.kernel.org>; Sat, 22 Jan 2022 02:59:25 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com;
 h=from : to : cc : subject
 : date : message-id : mime-version : content-transfer-encoding; s=pp1;
 bh=M8lXwot3bWUCfQWe4pR0lJm+OYv0QDsRm6rf1ZqE8h8=;
 b=gedN908/+LcMpJtjLqAIi02cMFwZf2cdjwjXFs/wOoDgAIsS/sFU24DXs8eZA8zwMNHX
 jhNbdMLmIWmLGef4Mm/gHX37I9/wS418YlvZPwJk16FJLrBBhqSEaAAilChSJ4LZbjvt
 o2hX291HRA9bWv0u1NAXn0o2piJkVoDiBm5M3vHSlQ7x1HXT/0N6T/ydWJs5NPO4367E
 cum2+R4KjezGxlbcijkdWX/YUmJpLNeywLf+d3a7aCAqfgI+9b18asEuWgs3vTdmd1la
 rHRFXad0nQWOQ5AYc7E1++MFCg3t+dj4nJF/yPr18iPaijPAzAI0tH8VwsGHwEH87l0n gQ==
Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com
 [169.62.189.10])
        by mx0a-001b2d01.pphosted.com with ESMTP id 3dr9ejg4be-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256
 verify=NOT)
        for <netdev@vger.kernel.org>; Sat, 22 Jan 2022 02:59:24 +0000
Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1])
        by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id
 20M2ugfD012615
        for <netdev@vger.kernel.org>; Sat, 22 Jan 2022 02:59:24 GMT
Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com
 [9.57.198.23])
        by ppma02dal.us.ibm.com with ESMTP id 3dr9j8r16d-1
        (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256
 verify=NOT)
        for <netdev@vger.kernel.org>; Sat, 22 Jan 2022 02:59:24 +0000
Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com
 [9.57.199.106])
        by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP
 id 20M2xNac35062092
        (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
 verify=OK);
        Sat, 22 Jan 2022 02:59:23 GMT
Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1])
        by IMSVA (Postfix) with ESMTP id E795328060;
        Sat, 22 Jan 2022 02:59:22 +0000 (GMT)
Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1])
        by IMSVA (Postfix) with ESMTP id 5640B2805E;
        Sat, 22 Jan 2022 02:59:22 +0000 (GMT)
Received: from suka-w540.ibmuc.com (unknown [9.77.135.77])
        by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTP;
        Sat, 22 Jan 2022 02:59:22 +0000 (GMT)
From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
To: netdev@vger.kernel.org
Cc: Brian King <brking@linux.ibm.com>, Dany Madden <drt@linux.ibm.com>,
        Rick Lindsley <ricklind@linux.ibm.com>
Subject: [PATCH net 1/4] ibmvnic: Allow extra failures before disabling
Date: Fri, 21 Jan 2022 18:59:18 -0800
Message-Id: <20220122025921.199446-1-sukadev@linux.ibm.com>
X-Mailer: git-send-email 2.31.1
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-TM-AS-GCONF: 00
X-Proofpoint-ORIG-GUID: v23NOrtY3rikaww-2Z3ZMV0jjnTe_ZaA
X-Proofpoint-GUID: v23NOrtY3rikaww-2Z3ZMV0jjnTe_ZaA
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.62.513
 definitions=2022-01-21_10,2022-01-21_01,2021-12-02_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 suspectscore=0 spamscore=0
 priorityscore=1501 mlxlogscore=999 impostorscore=0 malwarescore=0
 bulkscore=0 clxscore=1015 adultscore=0 phishscore=0 lowpriorityscore=0
 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2201110000 definitions=main-2201220010
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org
X-Patchwork-Delegate: kuba@kernel.org

Series

[net,1/4] ibmvnic: Allow extra failures before disabling | expand

Context	Check	Description
netdev/tree_selection	success	Clearly marked for net
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/subject_prefix	success	Link
netdev/cover_letter	warning	Series does not have a cover letter
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	fail	1 blamed authors not CCed: kuba@kernel.org; 7 maintainers not CCed: paulus@samba.org kuba@kernel.org benh@kernel.crashing.org mpe@ellerman.id.au linuxppc-dev@lists.ozlabs.org davem@davemloft.net tlfalcon@linux.ibm.com
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/checkpatch	warning	WARNING: line length of 83 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Context

Check

Description

netdev/tree_selection

success

Clearly marked for net

netdev/fixes_present

success

Fixes tag present in non-next series

netdev/subject_prefix

success

Link

netdev/cover_letter

warning

Series does not have a cover letter

netdev/patch_count

success

Link

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 0 this patch: 0

netdev/cc_maintainers

fail

1 blamed authors not CCed: kuba@kernel.org; 7 maintainers not CCed: paulus@samba.org kuba@kernel.org benh@kernel.crashing.org mpe@ellerman.id.au linuxppc-dev@lists.ozlabs.org davem@davemloft.net tlfalcon@linux.ibm.com

netdev/build_clang

success

Errors and warnings before: 0 this patch: 0

netdev/module_param

success

Was 0 now: 0

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/verify_fixes

success

Fixes tag looks correct

netdev/build_allmodconfig_warn

success

Errors and warnings before: 0 this patch: 0

netdev/checkpatch

warning

WARNING: line length of 83 exceeds 80 columns

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

Commit Message

Sukadev Bhattiprolu Jan. 22, 2022, 2:59 a.m. UTC

If auto-priority-failover (APF) is enabled and there are at least two
backing devices of different priorities, some resets like fail-over,
change-param etc can cause at least two back to back failovers. (Failover
from high priority backing device to lower priority one and then back
to the higher priority one if that is still functional).

Depending on the timimg of the two failovers it is possible to trigger
a "hard" reset and for the hard reset to fail due to failovers. When this
occurs, the driver assumes that the network is unstable and disables the
VNIC for a 60-second "settling time". This in turn can cause the ethtool
command to fail with "No such device" while the vnic automatically recovers
a little while later.

Given that it's possible to have two back to back failures, allow for extra
failures before disabling the vnic for the settling time.

Fixes: f15fde9d47b8 ("ibmvnic: delay next reset if hard reset fails")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

Comments

Dany Madden Jan. 23, 2022, 12:22 a.m. UTC | #1

On 2022-01-21 18:59, Sukadev Bhattiprolu wrote:
> If auto-priority-failover (APF) is enabled and there are at least two
> backing devices of different priorities, some resets like fail-over,
> change-param etc can cause at least two back to back failovers. 
> (Failover
> from high priority backing device to lower priority one and then back
> to the higher priority one if that is still functional).
> 
> Depending on the timimg of the two failovers it is possible to trigger
> a "hard" reset and for the hard reset to fail due to failovers. When 
> this
> occurs, the driver assumes that the network is unstable and disables 
> the
> VNIC for a 60-second "settling time". This in turn can cause the 
> ethtool
> command to fail with "No such device" while the vnic automatically 
> recovers
> a little while later.
> 
> Given that it's possible to have two back to back failures, allow for 
> extra
> failures before disabling the vnic for the settling time.
> 
> Fixes: f15fde9d47b8 ("ibmvnic: delay next reset if hard reset fails")
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>

> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c
> b/drivers/net/ethernet/ibm/ibmvnic.c
> index 0bb3911dd014..9b2d16ad76f1 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -2598,6 +2598,7 @@ static void __ibmvnic_reset(struct work_struct 
> *work)
>  	struct ibmvnic_rwi *rwi;
>  	unsigned long flags;
>  	u32 reset_state;
> +	int num_fails = 0;
>  	int rc = 0;
> 
>  	adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset);
> @@ -2651,11 +2652,23 @@ static void __ibmvnic_reset(struct work_struct 
> *work)
>  				rc = do_hard_reset(adapter, rwi, reset_state);
>  				rtnl_unlock();
>  			}
> -			if (rc) {
> -				/* give backing device time to settle down */
> +			if (rc)
> +				num_fails++;
> +			else
> +				num_fails = 0;
> +
> +			/* If auto-priority-failover is enabled we can get
> +			 * back to back failovers during resets, resulting
> +			 * in at least two failed resets (from high-priority
> +			 * backing device to low-priority one and then back)
> +			 * If resets continue to fail beyond that, give the
> +			 * adapter some time to settle down before retrying.
> +			 */
> +			if (num_fails >= 3) {
>  				netdev_dbg(adapter->netdev,
> -					   "[S:%s] Hard reset failed, waiting 60 secs\n",
> -					   adapter_state_to_string(adapter->state));
> +					   "[S:%s] Hard reset failed %d times, waiting 60 secs\n",
> +					   adapter_state_to_string(adapter->state),
> +					   num_fails);
>  				set_current_state(TASK_UNINTERRUPTIBLE);
>  				schedule_timeout(60 * HZ);
>  			}

patchwork-bot+netdevbpf@kernel.org Jan. 24, 2022, 12:10 p.m. UTC | #2

Hello:

This series was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Fri, 21 Jan 2022 18:59:18 -0800 you wrote:
> If auto-priority-failover (APF) is enabled and there are at least two
> backing devices of different priorities, some resets like fail-over,
> change-param etc can cause at least two back to back failovers. (Failover
> from high priority backing device to lower priority one and then back
> to the higher priority one if that is still functional).
> 
> Depending on the timimg of the two failovers it is possible to trigger
> a "hard" reset and for the hard reset to fail due to failovers. When this
> occurs, the driver assumes that the network is unstable and disables the
> VNIC for a 60-second "settling time". This in turn can cause the ethtool
> command to fail with "No such device" while the vnic automatically recovers
> a little while later.
> 
> [...]

Here is the summary with links:
  - [net,1/4] ibmvnic: Allow extra failures before disabling
    https://git.kernel.org/netdev/net/c/db9f0e8bf79e
  - [net,2/4] ibmvnic: init ->running_cap_crqs early
    https://git.kernel.org/netdev/net/c/151b6a5c06b6
  - [net,3/4] ibmvnic: don't spin in tasklet
    https://git.kernel.org/netdev/net/c/48079e7fdd02
  - [net,4/4] ibmvnic: remove unused ->wait_capability
    https://git.kernel.org/netdev/net/c/3a5d9db7fbdf

You are awesome, thank you!

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 0bb3911dd014..9b2d16ad76f1 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2598,6 +2598,7 @@  static void __ibmvnic_reset(struct work_struct *work)
 	struct ibmvnic_rwi *rwi;
 	unsigned long flags;
 	u32 reset_state;
+	int num_fails = 0;
 	int rc = 0;
 
 	adapter = container_of(work, struct ibmvnic_adapter, ibmvnic_reset);
@@ -2651,11 +2652,23 @@  static void __ibmvnic_reset(struct work_struct *work)
 				rc = do_hard_reset(adapter, rwi, reset_state);
 				rtnl_unlock();
 			}
-			if (rc) {
-				/* give backing device time to settle down */
+			if (rc)
+				num_fails++;
+			else
+				num_fails = 0;
+
+			/* If auto-priority-failover is enabled we can get
+			 * back to back failovers during resets, resulting
+			 * in at least two failed resets (from high-priority
+			 * backing device to low-priority one and then back)
+			 * If resets continue to fail beyond that, give the
+			 * adapter some time to settle down before retrying.
+			 */
+			if (num_fails >= 3) {
 				netdev_dbg(adapter->netdev,
-					   "[S:%s] Hard reset failed, waiting 60 secs\n",
-					   adapter_state_to_string(adapter->state));
+					   "[S:%s] Hard reset failed %d times, waiting 60 secs\n",
+					   adapter_state_to_string(adapter->state),
+					   num_fails);
 				set_current_state(TASK_UNINTERRUPTIBLE);
 				schedule_timeout(60 * HZ);
 			}

[net,1/4] ibmvnic: Allow extra failures before disabling

Checks

Commit Message

Comments

Patch