From patchwork Thu Jan 21 17:16:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12037143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFD41C433DB for ; Thu, 21 Jan 2021 17:17:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5D9F123A57 for ; Thu, 21 Jan 2021 17:17:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5D9F123A57 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C367821FC4B; Thu, 21 Jan 2021 09:17:13 -0800 (PST) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9BBB021FA40 for ; Thu, 21 Jan 2021 09:17:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 2F9DF100804D; Thu, 21 Jan 2021 12:17:05 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 28D8B115B8; Thu, 21 Jan 2021 12:17:05 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 21 Jan 2021 12:16:28 -0500 Message-Id: <1611249422-556-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1611249422-556-1-git-send-email-jsimmons@infradead.org> References: <1611249422-556-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 05/39] lnet: Correct handling of NETWORK_TIMEOUT status X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The original intent of the LNET_MSG_STATUS_NETWORK_TIMEOUT health status was to handle cases where the LND was unsure whether the failure was due to the local or remote NI. In this case, we'll want to decrement both the local and remote NI health and allow recovery to ascertain which interface is actually healthy. HPE-bug-id: LUS-9342 WC-bug-id: https://jira.whamcloud.com/browse/LU-13751 Lustre-commit: ffd4523f2d50ef ("LU-13571 lnet: Correct handling of NETWORK_TIMEOUT status") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/39898 Reviewed-by: Amir Shehata Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index e84cf02..d888090 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -925,9 +925,14 @@ case LNET_MSG_STATUS_REMOTE_ERROR: case LNET_MSG_STATUS_REMOTE_TIMEOUT: + if (handle_remote_health) + lnet_handle_remote_failure(lpni); + return -1; case LNET_MSG_STATUS_NETWORK_TIMEOUT: if (handle_remote_health) lnet_handle_remote_failure(lpni); + if (handle_local_health) + lnet_handle_local_failure(ni); return -1; default: LBUG();