From patchwork Thu Feb 27 21:17:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410585 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C907017E0 for ; Thu, 27 Feb 2020 21:41:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B1CA6246A1 for ; Thu, 27 Feb 2020 21:41:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B1CA6246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E654034AA3F; Thu, 27 Feb 2020 13:33:55 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 234F6348A1A for ; Thu, 27 Feb 2020 13:21:29 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 71646A15D; Thu, 27 Feb 2020 16:18:20 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6EDDD47C; Thu, 27 Feb 2020 16:18:20 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:17:58 -0500 Message-Id: <1582838290-17243-611-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 610/622] lnet: Do not assume peers are MR capable X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn If a peer has discovery disabled then it will not consolidate peer NI information. This means we need to use a consistent source NI when sending to it just like we do for non-MR peers. A comment in lnet_discovery_event_reply() indicates that this was a known issue, but the situation is not handled properly. Do not assume peers are multi-rail capable when peer objects are allocated and initialized. Do not mark a peer as multi-rail capable unless all of the following conditions are satisified: 1. The peer has the MR feature flag set 2. The peer has discovery enabled. 3. We have discovery enabled locally Note: 1, 2, and 3 above are implemented in the code for lnet_discovery_event_reply(), but code earlier in the function breaks this behavior. Remove the offending code. Update sanity-lnet tests 100 and 101 to reflect the fact that peers added via the traffic path no longer have multi-rail by default. Cray-bug-id: LUS-7918 WC-bug-id: https://jira.whamcloud.com/browse/LU-12889 Lustre-commit: 3c580c93b8d3 ("LU-12889 lnet: Do not assume peers are MR capable") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/36512 Reviewed-by: Amir Shehata Reviewed-by: Serguei Smirnov Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 45 ++++++++++++++++----------------------------- 1 file changed, 16 insertions(+), 29 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index f987fff..0d7fbd4 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1520,10 +1520,7 @@ struct lnet_peer_net * struct lnet_peer *lp; struct lnet_peer_net *lpn; struct lnet_peer_ni *lpni; - /* Assume peer is Multi-Rail capable and let discovery find out - * otherwise. - */ - unsigned int flags = LNET_PEER_MULTI_RAIL; + unsigned int flags = 0; int rc = 0; if (nid == LNET_NID_ANY) { @@ -2298,20 +2295,7 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) } /* - * Only enable the multi-rail feature on the peer if both sides of - * the connection have discovery on - */ - if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL) { - CDEBUG(D_NET, "Peer %s has Multi-Rail feature enabled\n", - libcfs_nid2str(lp->lp_primary_nid)); - lp->lp_state |= LNET_PEER_MULTI_RAIL; - } else { - CDEBUG(D_NET, "Peer %s has Multi-Rail feature disabled\n", - libcfs_nid2str(lp->lp_primary_nid)); - lp->lp_state &= ~LNET_PEER_MULTI_RAIL; - } - - /* The peer may have discovery disabled at its end. Set + * The peer may have discovery disabled at its end. Set * NO_DISCOVERY as appropriate. */ if ((pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY) && @@ -2332,21 +2316,24 @@ static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) */ if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL) { if (lp->lp_state & LNET_PEER_MULTI_RAIL) { - /* Everything's fine */ + CDEBUG(D_NET, "peer %s(%p) is MR\n", + libcfs_nid2str(lp->lp_primary_nid), lp); } else if (lp->lp_state & LNET_PEER_CONFIGURED) { CWARN("Reply says %s is Multi-Rail, DLC says not\n", libcfs_nid2str(lp->lp_primary_nid)); + } else if (lnet_peer_discovery_disabled) { + CDEBUG(D_NET, + "peer %s(%p) not MR: DD disabled locally\n", + libcfs_nid2str(lp->lp_primary_nid), lp); + } else if (lp->lp_state & LNET_PEER_NO_DISCOVERY) { + CDEBUG(D_NET, + "peer %s(%p) not MR: DD disabled remotely\n", + libcfs_nid2str(lp->lp_primary_nid), lp); } else { - /* if discovery is disabled then we don't want to - * update the state of the peer. All we'll do is - * update the peer_nis which were reported back in - * the initial ping - */ - - if (!lnet_is_discovery_disabled_locked(lp)) { - lp->lp_state |= LNET_PEER_MULTI_RAIL; - lnet_peer_clr_non_mr_pref_nids(lp); - } + CDEBUG(D_NET, "peer %s(%p) is MR capable\n", + libcfs_nid2str(lp->lp_primary_nid), lp); + lp->lp_state |= LNET_PEER_MULTI_RAIL; + lnet_peer_clr_non_mr_pref_nids(lp); } } else if (lp->lp_state & LNET_PEER_MULTI_RAIL) { if (lp->lp_state & LNET_PEER_CONFIGURED) {