From patchwork Thu Aug 3 20:20:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nick Child X-Patchwork-Id: 13340824 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23C2B263B2 for ; Thu, 3 Aug 2023 20:20:17 +0000 (UTC) Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98A084224 for ; Thu, 3 Aug 2023 13:20:15 -0700 (PDT) Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 373KGZMc030544 for ; Thu, 3 Aug 2023 20:20:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=OTfdm3xeOC761Ksp9jMl6jDzPD+XsiyvyzBKqdJkuG8=; b=KcjtApKZ1DGKbaMbf2jzHcqdnLLs0m9c40BGENqhIPjPAsTEgG1Qtlh+SqVf5nKve08O MYE7ikznwGnqqBz9/GDKiSz71tB9c61fNHIdkNvSfmsaZNTAys1C6oK9g/2Wk1vIW5pL QX3bIAavYmd42mFNCuc7G5/EcLKhdG+QddgGpMVKzSESJ8Cvph7XLMX+j+e2IIu9yfVY NLpsl7qoymdFjDXLneDslbrxFAZ2enPDdJOnwSCVLLvk7QidHkqkqoon48jv8bXD9TA2 wy3evKlSdH7/Ktv4ZlNgF9afrxG8sJMPo4L+AxeDb+2/RFOlEhOM8BBkVdBZwUZjo7Pa bQ== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s8jw60gb8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 03 Aug 2023 20:20:14 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 373IiqCE006073 for ; Thu, 3 Aug 2023 20:20:13 GMT Received: from smtprelay02.dal12v.mail.ibm.com ([172.16.1.4]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3s5d3t0yr0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 03 Aug 2023 20:20:13 +0000 Received: from smtpav02.dal12v.mail.ibm.com (smtpav02.dal12v.mail.ibm.com [10.241.53.101]) by smtprelay02.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 373KKClQ43975106 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 3 Aug 2023 20:20:12 GMT Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2196358051; Thu, 3 Aug 2023 20:20:12 +0000 (GMT) Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F031958060; Thu, 3 Aug 2023 20:20:11 +0000 (GMT) Received: from li-8d37cfcc-31b9-11b2-a85c-83226d7135c9.austin.ibm.com (unknown [9.24.4.46]) by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTP; Thu, 3 Aug 2023 20:20:11 +0000 (GMT) From: Nick Child To: netdev@vger.kernel.org Cc: haren@linux.ibm.com, ricklind@us.ibm.com, danymadden@us.ibm.com, tlfalcon@linux.ibm.com, bjking1@linux.ibm.com, Nick Child Subject: [PATCH net 4/5] ibmvnic: Do partial reset on login failure Date: Thu, 3 Aug 2023 15:20:09 -0500 Message-Id: <20230803202010.37149-4-nnac123@linux.ibm.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20230803202010.37149-1-nnac123@linux.ibm.com> References: <20230803202010.37149-1-nnac123@linux.ibm.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: BCLJL_OL0BNYZzvl-_VU7s0byYLmdE82 X-Proofpoint-ORIG-GUID: BCLJL_OL0BNYZzvl-_VU7s0byYLmdE82 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-08-03_22,2023-08-03_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 priorityscore=1501 impostorscore=0 mlxlogscore=919 phishscore=0 mlxscore=0 adultscore=0 clxscore=1015 bulkscore=0 lowpriorityscore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2308030180 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org Perform a partial reset before sending a login request if any of the following are true: 1. If a previous request times out. This can be dangerous because the VIOS could still receive the old login request at any point after the timeout. Therefore, it is best to re-register the CRQ's and sub-CRQ's before retrying. 2. If the previous request returns an error that is not described in PAPR. PAPR provides procedures if the login returns with partial success or aborted return codes (section L.5.1) but other values do not have a defined procedure. Previously, these conditions just returned error from the login function rather than trying to resolve the issue. This can cause further issues since most callers of the login function are not prepared to handle an error when logging in. This improper cleanup can lead to the device being permanently DOWN'd. For example, if the VIOS believes that the device is already logged in then it will return INVALID_STATE (-7). If we never re-register CRQ's then it will always think that the device is already logged in. This leaves the device inoperable. The partial reset involves freeing the sub-CRQs, freeing the CRQ then registering and initializing a new CRQ and sub-CRQs. This essentially restarts all communication with VIOS to allow for a fresh login attempt that will be unhindered by any previous failed attempts. Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child Reviewed-by: Simon Horman --- drivers/net/ethernet/ibm/ibmvnic.c | 46 ++++++++++++++++++++++++++---- 1 file changed, 40 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 718af76fd711..8fd9639665a0 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -97,6 +97,8 @@ static int pending_scrq(struct ibmvnic_adapter *, static union sub_crq *ibmvnic_next_scrq(struct ibmvnic_adapter *, struct ibmvnic_sub_crq_queue *); static int ibmvnic_poll(struct napi_struct *napi, int data); +static int reset_sub_crq_queues(struct ibmvnic_adapter *adapter); +static inline void reinit_init_done(struct ibmvnic_adapter *adapter); static void send_query_map(struct ibmvnic_adapter *adapter); static int send_request_map(struct ibmvnic_adapter *, dma_addr_t, u32, u8); static int send_request_unmap(struct ibmvnic_adapter *, u8); @@ -1527,11 +1529,9 @@ static int ibmvnic_login(struct net_device *netdev) if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { - netdev_warn(netdev, "Login timed out, retrying...\n"); - retry = true; - adapter->init_done_rc = 0; - retry_count++; - continue; + netdev_warn(netdev, "Login timed out\n"); + adapter->login_pending = false; + goto partial_reset; } if (adapter->init_done_rc == ABORTED) { @@ -1576,7 +1576,41 @@ static int ibmvnic_login(struct net_device *netdev) } else if (adapter->init_done_rc) { netdev_warn(netdev, "Adapter login failed, init_done_rc = %d\n", adapter->init_done_rc); - return -EIO; + +partial_reset: + /* adapter login failed, so free any CRQs or sub-CRQs + * and register again before attempting to login again. + * If we don't do this then the VIOS may think that + * we are already logged in and reject any subsequent + * attempts + */ + netdev_warn(netdev, + "Freeing and re-registering CRQs before attempting to login again\n"); + retry = true; + adapter->init_done_rc = 0; + retry_count++; + release_sub_crqs(adapter, true); + reinit_init_done(adapter); + release_crq_queue(adapter); + /* If we don't sleep here then we risk an unnecessary + * failover event from the VIOS. This is a known VIOS + * issue caused by a vnic device freeing and registering + * a CRQ too quickly. + */ + msleep(1500); + rc = init_crq_queue(adapter); + if (rc) { + netdev_err(netdev, "login recovery: init CRQ failed %d\n", + rc); + return -EIO; + } + + rc = ibmvnic_reset_init(adapter, false); + if (rc) { + netdev_err(netdev, "login recovery: Reset init failed %d\n", + rc); + return -EIO; + } } } while (retry);