From patchwork Wed Jul 15 20:45:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11666289 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EA3CF618 for ; Wed, 15 Jul 2020 20:47:19 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D27AE2065F for ; Wed, 15 Jul 2020 20:47:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D27AE2065F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 65CA721FB5D; Wed, 15 Jul 2020 13:46:29 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9E21B21F7F9 for ; Wed, 15 Jul 2020 13:45:34 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D3DE95E9; Wed, 15 Jul 2020 16:45:20 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D2AC88D; Wed, 15 Jul 2020 16:45:20 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 15 Jul 2020 16:45:18 -0400 Message-Id: <1594845918-29027-38-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1594845918-29027-1-git-send-email-jsimmons@infradead.org> References: <1594845918-29027-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 37/37] lnet: check rtr_nid is a gateway X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata The rtr_nid is specified for all REPLY/ACK. However it is possible for the route through the gateway specified by rtr_nid to be removed. In this case we don't want to use it. We should lookup alternative paths. This patch checks if the peer looked up is indeed a gateway. If it's not a gateway then we attempt to find another path. There is no need to fail right away. It's not a hard requirement to fail if the default rtr_nid is not valid. WC-bug-id: https://jira.whamcloud.com/browse/LU-13713 Lustre-commit: 07397a2e7473c ("LU-13713 lnet: check rtr_nid is a gateway") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/39175 Reviewed-by: Chris Horn Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 234fbb5..c0dd30c 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1777,6 +1777,7 @@ struct lnet_ni * struct lnet_route *last_route = NULL; struct lnet_peer_ni *lpni = NULL; struct lnet_peer_ni *gwni = NULL; + bool route_found = false; lnet_nid_t src_nid = (sd->sd_src_nid != LNET_NID_ANY) ? sd->sd_src_nid : sd->sd_best_ni ? sd->sd_best_ni->ni_nid : LNET_NID_ANY; @@ -1790,15 +1791,20 @@ struct lnet_ni * */ if (sd->sd_rtr_nid != LNET_NID_ANY) { gwni = lnet_find_peer_ni_locked(sd->sd_rtr_nid); - if (!gwni) { - CERROR("No peer NI for gateway %s\n", + if (gwni) { + gw = gwni->lpni_peer_net->lpn_peer; + lnet_peer_ni_decref_locked(gwni); + if (gw->lp_rtr_refcount) { + local_lnet = LNET_NIDNET(sd->sd_rtr_nid); + route_found = true; + } + } else { + CWARN("No peer NI for gateway %s. Attempting to find an alternative route.\n", libcfs_nid2str(sd->sd_rtr_nid)); - return -EHOSTUNREACH; } - gw = gwni->lpni_peer_net->lpn_peer; - lnet_peer_ni_decref_locked(gwni); - local_lnet = LNET_NIDNET(sd->sd_rtr_nid); - } else { + } + + if (!route_found) { /* we've already looked up the initial lpni using dst_nid */ lpni = sd->sd_best_lpni; /* the peer tree must be in existence */