From patchwork Thu Feb 27 21:13:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410317 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 63A0392A for ; Thu, 27 Feb 2020 21:34:50 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4C7CE24677 for ; Thu, 27 Feb 2020 21:34:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4C7CE24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D0C46349418; Thu, 27 Feb 2020 13:29:20 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 444D021FA61 for ; Thu, 27 Feb 2020 13:20:07 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 72E7D8A96; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 71D2046A; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:40 -0500 Message-Id: <1582838290-17243-353-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 352/622] lnet: push router interface updates X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata A router can bring up/down its interfaces if it hasn't received any messages on that interface for a configurable period (alive_router_ping_timeout). When this even occures the router can now push its status change to the peers it's talking to in order to inform them of the change in its status. This will allow the router users to handle asym router failures quicker. WC-bug-id: https://jira.whamcloud.com/browse/LU-11664 Lustre-commit: 0fa02a7d81e7 ("LU-11664 lnet: push router interface updates") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33651 Reviewed-by: Sebastien Buisson Reviewed-by: Alexey Lyashkov Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 18 ++++++++++++------ net/lnet/lnet/router.c | 13 +++++++++++-- 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 0ff1d38..d6cbcd1 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3840,16 +3840,17 @@ void lnet_monitor_thr_stop(void) lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid, void *private, int rdma_req) { - int rc = 0; - int cpt; - int for_me; + struct lnet_peer_ni *lpni; struct lnet_msg *msg; + u32 payload_length; lnet_pid_t dest_pid; lnet_nid_t dest_nid; lnet_nid_t src_nid; - struct lnet_peer_ni *lpni; - u32 payload_length; + bool push = false; + int for_me; u32 type; + int rc = 0; + int cpt; LASSERT(!in_interrupt()); @@ -3907,11 +3908,16 @@ void lnet_monitor_thr_stop(void) lnet_ni_lock(ni); ni->ni_last_alive = ktime_get_real_seconds(); if (ni->ni_status && - ni->ni_status->ns_status == LNET_NI_STATUS_DOWN) + ni->ni_status->ns_status == LNET_NI_STATUS_DOWN) { ni->ni_status->ns_status = LNET_NI_STATUS_UP; + push = true; + } lnet_ni_unlock(ni); } + if (push) + lnet_push_update_to_peers(1); + /* * Regard a bad destination NID as a protocol error. Senders should * know what they're doing; if they don't they're misconfigured, buggy diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index eb36df5..0a396d9 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -742,10 +742,11 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) } } -static void +static bool lnet_update_ni_status_locked(void) { struct lnet_ni *ni = NULL; + bool push = false; time64_t now; time64_t timeout; @@ -778,9 +779,12 @@ int lnet_get_rtr_pool_cfg(int cpt, struct lnet_ioctl_pool_cfg *pool_cfg) * NI status to "down" */ ni->ni_status->ns_status = LNET_NI_STATUS_DOWN; + push = true; } lnet_ni_unlock(ni); } + + return push; } void lnet_wait_router_start(void) @@ -817,6 +821,7 @@ bool lnet_router_checker_active(void) { struct lnet_peer_ni *lpni; struct lnet_peer *rtr; + bool push = false; u64 version; time64_t now; int cpt; @@ -883,9 +888,13 @@ bool lnet_router_checker_active(void) } if (the_lnet.ln_routing) - lnet_update_ni_status_locked(); + push = lnet_update_ni_status_locked(); lnet_net_unlock(cpt); + + /* if the status of the ni changed update the peers */ + if (push) + lnet_push_update_to_peers(1); } void