From patchwork Wed Jul 15 20:45:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11666265 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C5AC1392 for ; Wed, 15 Jul 2020 20:46:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 85BD12065F for ; Wed, 15 Jul 2020 20:46:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 85BD12065F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4AC5F21FA11; Wed, 15 Jul 2020 13:46:09 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 706E921F81A for ; Wed, 15 Jul 2020 13:45:33 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C91B75E1; Wed, 15 Jul 2020 16:45:20 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C70D12A0; Wed, 15 Jul 2020 16:45:20 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 15 Jul 2020 16:45:14 -0400 Message-Id: <1594845918-29027-34-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1594845918-29027-1-git-send-email-jsimmons@infradead.org> References: <1594845918-29027-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 33/37] lnet: Set remote NI status in lnet_notify X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The gnilnd receives node health information asynchronous from any tx failure, so aliveness of lpni as reported by lnet_is_peer_ni_alive() may not match what LND is telling us. Use existing reset flag to set cached NI status down so we can be sure that remote NIs are correctly set down. HPE-bug-id: LUS-8897 WC-bug-id: https://jira.whamcloud.com/browse/LU-13648 Lustre-commit: 8010dbb660766 ("LU-13648 lnet: Set remote NI status in lnet_notify") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/38862 Reviewed-by: Amir Shehata Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/router.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index c0578d9..e3b3e71 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -1671,8 +1671,7 @@ bool lnet_router_checker_active(void) CDEBUG(D_NET, "%s notifying %s: %s\n", !ni ? "userspace" : libcfs_nid2str(ni->ni_nid), - libcfs_nid2str(nid), - alive ? "up" : "down"); + libcfs_nid2str(nid), alive ? "up" : "down"); if (ni && LNET_NIDNET(ni->ni_nid) != LNET_NIDNET(nid)) { @@ -1714,6 +1713,7 @@ bool lnet_router_checker_active(void) if (alive) { if (reset) { + lpni->lpni_ns_status = LNET_NI_STATUS_UP; lnet_set_lpni_healthv_locked(lpni, LNET_MAX_HEALTH_VALUE); } else { @@ -1726,6 +1726,8 @@ bool lnet_router_checker_active(void) (sensitivity) ? sensitivity : lnet_health_sensitivity); } + } else if (reset) { + lpni->lpni_ns_status = LNET_NI_STATUS_DOWN; } /* recalculate aliveness */