From patchwork Sun Oct 7 23:19:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629789 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 82A1B112B for ; Sun, 7 Oct 2018 23:29:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7186A28A1D for ; Sun, 7 Oct 2018 23:29:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 65B7128CBF; Sun, 7 Oct 2018 23:29:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C8E9D28A1D for ; Sun, 7 Oct 2018 23:29:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7E4F58616FF; Sun, 7 Oct 2018 16:29:40 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 17D4721FB9C for ; Sun, 7 Oct 2018 16:29:39 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 277D1AD93; Sun, 7 Oct 2018 23:29:38 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:37 +1100 Message-ID: <153895437756.16383.14536895691182127915.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 01/24] lustre: lnet: add lnet_interfaces_max tunable X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Add an lnet_interfaces_max tunable value, that describes the maximum number of interfaces per node. This tunable is primarily useful for sanity checks prior to allocating memory. Allow lnet_interfaces_max to be set and get from the sysfs interface. Add LNET_INTERFACES_MIN, value 16, as the minimum value. Add LNET_INTERFACES_MAX_DEFAULT, value 200, as the default value. This value was chosen to ensure that the size of an LNet ping message with any associated LND overhead would fit in 4096 bytes. (The LNET_INTERFACES_MAX name was not reused to allow for the early detection of issues when merging code that uses it.) Rename LNET_NUM_INTERFACES to LNET_INTERFACES_NUM WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/25770 Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-types.h | 2 + .../lustre/include/uapi/linux/lnet/lnet-dlc.h | 4 +-- .../lustre/include/uapi/linux/lnet/lnet-types.h | 7 ++++ .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 2 + .../staging/lustre/lnet/klnds/socklnd/socklnd.c | 22 +++++++------- .../staging/lustre/lnet/klnds/socklnd/socklnd.h | 4 +-- .../staging/lustre/lnet/klnds/socklnd/socklnd_cb.c | 2 + .../lustre/lnet/klnds/socklnd/socklnd_proto.c | 4 +-- drivers/staging/lustre/lnet/lnet/api-ni.c | 32 +++++++++++++++++++- drivers/staging/lustre/lnet/lnet/config.c | 10 +++--- 10 files changed, 62 insertions(+), 27 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index 7219a7bacf6e..7b11c31f0029 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -371,7 +371,7 @@ struct lnet_ni { * equivalent interfaces to use * This is an array because socklnd bonding can still be configured */ - char *ni_interfaces[LNET_NUM_INTERFACES]; + char *ni_interfaces[LNET_INTERFACES_NUM]; /* original net namespace */ struct net *ni_net_ns; }; diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h index 8f03aa3c5676..d88b30d2e76c 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h @@ -81,7 +81,7 @@ struct lnet_ioctl_config_lnd_tunables { }; struct lnet_ioctl_net_config { - char ni_interfaces[LNET_NUM_INTERFACES][LNET_MAX_STR_LEN]; + char ni_interfaces[LNET_INTERFACES_NUM][LNET_MAX_STR_LEN]; __u32 ni_status; __u32 ni_cpts[LNET_MAX_SHOW_NUM_CPT]; char cfg_bulk[0]; @@ -184,7 +184,7 @@ struct lnet_ioctl_element_msg_stats { struct lnet_ioctl_config_ni { struct libcfs_ioctl_hdr lic_cfg_hdr; lnet_nid_t lic_nid; - char lic_ni_intf[LNET_NUM_INTERFACES][LNET_MAX_STR_LEN]; + char lic_ni_intf[LNET_INTERFACES_NUM][LNET_MAX_STR_LEN]; char lic_legacy_ip2nets[LNET_MAX_STR_LEN]; __u32 lic_cpts[LNET_MAX_SHOW_NUM_CPT]; __u32 lic_ncpts; diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h index f8a873bab135..6ee60d07ff84 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h @@ -264,7 +264,12 @@ struct lnet_counters { #define LNET_NI_STATUS_DOWN 0xdeadface #define LNET_NI_STATUS_INVALID 0x00000000 -#define LNET_NUM_INTERFACES 16 +#define LNET_INTERFACES_NUM 16 + +/* The minimum number of interfaces per node supported by LNet. */ +#define LNET_INTERFACES_MIN 16 +/* The default - arbitrary - value of the lnet_max_interfaces tunable. */ +#define LNET_INTERFACES_MAX_DEFAULT 200 /** * Objects maintained by the LNet are accessed through handles. Handle types diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index c20766379323..bf969b3891a9 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -2915,7 +2915,7 @@ static int kiblnd_startup(struct lnet_ni *ni) if (ni->ni_interfaces[0]) { /* Use the IPoIB interface specified in 'networks=' */ - BUILD_BUG_ON(LNET_NUM_INTERFACES <= 1); + BUILD_BUG_ON(LNET_INTERFACES_NUM <= 1); if (ni->ni_interfaces[1]) { CERROR("Multiple interfaces not supported\n"); goto failed; diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c index b2f0148d0087..ff8d73295fff 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c @@ -53,7 +53,7 @@ ksocknal_ip2iface(struct lnet_ni *ni, __u32 ip) struct ksock_interface *iface; for (i = 0; i < net->ksnn_ninterfaces; i++) { - LASSERT(i < LNET_NUM_INTERFACES); + LASSERT(i < LNET_INTERFACES_NUM); iface = &net->ksnn_interfaces[i]; if (iface->ksni_ipaddr == ip) @@ -221,7 +221,7 @@ ksocknal_unlink_peer_locked(struct ksock_peer *peer_ni) struct ksock_interface *iface; for (i = 0; i < peer_ni->ksnp_n_passive_ips; i++) { - LASSERT(i < LNET_NUM_INTERFACES); + LASSERT(i < LNET_INTERFACES_NUM); ip = peer_ni->ksnp_passive_ips[i]; iface = ksocknal_ip2iface(peer_ni->ksnp_ni, ip); @@ -689,7 +689,7 @@ ksocknal_local_ipvec(struct lnet_ni *ni, __u32 *ipaddrs) read_lock(&ksocknal_data.ksnd_global_lock); nip = net->ksnn_ninterfaces; - LASSERT(nip <= LNET_NUM_INTERFACES); + LASSERT(nip <= LNET_INTERFACES_NUM); /* * Only offer interfaces for additional connections if I have @@ -770,8 +770,8 @@ ksocknal_select_ips(struct ksock_peer *peer_ni, __u32 *peerips, int n_peerips) */ write_lock_bh(global_lock); - LASSERT(n_peerips <= LNET_NUM_INTERFACES); - LASSERT(net->ksnn_ninterfaces <= LNET_NUM_INTERFACES); + LASSERT(n_peerips <= LNET_INTERFACES_NUM); + LASSERT(net->ksnn_ninterfaces <= LNET_INTERFACES_NUM); /* * Only match interfaces for additional connections @@ -890,7 +890,7 @@ ksocknal_create_routes(struct ksock_peer *peer_ni, int port, return; } - LASSERT(npeer_ipaddrs <= LNET_NUM_INTERFACES); + LASSERT(npeer_ipaddrs <= LNET_INTERFACES_NUM); for (i = 0; i < npeer_ipaddrs; i++) { if (newroute) { @@ -919,7 +919,7 @@ ksocknal_create_routes(struct ksock_peer *peer_ni, int port, best_nroutes = 0; best_netmatch = 0; - LASSERT(net->ksnn_ninterfaces <= LNET_NUM_INTERFACES); + LASSERT(net->ksnn_ninterfaces <= LNET_INTERFACES_NUM); /* Select interface to connect from */ for (j = 0; j < net->ksnn_ninterfaces; j++) { @@ -1060,7 +1060,7 @@ ksocknal_create_conn(struct lnet_ni *ni, struct ksock_route *route, atomic_set(&conn->ksnc_tx_nob, 0); hello = kvzalloc(offsetof(struct ksock_hello_msg, - kshm_ips[LNET_NUM_INTERFACES]), + kshm_ips[LNET_INTERFACES_NUM]), GFP_KERNEL); if (!hello) { rc = -ENOMEM; @@ -1983,7 +1983,7 @@ ksocknal_add_interface(struct lnet_ni *ni, __u32 ipaddress, __u32 netmask) if (iface) { /* silently ignore dups */ rc = 0; - } else if (net->ksnn_ninterfaces == LNET_NUM_INTERFACES) { + } else if (net->ksnn_ninterfaces == LNET_INTERFACES_NUM) { rc = -ENOSPC; } else { iface = &net->ksnn_interfaces[net->ksnn_ninterfaces++]; @@ -2624,7 +2624,7 @@ ksocknal_enumerate_interfaces(struct ksock_net *net, char *iname) continue; } - if (j == LNET_NUM_INTERFACES) { + if (j == LNET_INTERFACES_NUM) { CWARN("Ignoring interface %s (too many interfaces)\n", name); continue; @@ -2812,7 +2812,7 @@ ksocknal_startup(struct lnet_ni *ni) net->ksnn_ninterfaces = rc; } else { - for (i = 0; i < LNET_NUM_INTERFACES; i++) { + for (i = 0; i < LNET_INTERFACES_NUM; i++) { if (!ni->ni_interfaces[i]) break; rc = ksocknal_enumerate_interfaces(net, ni->ni_interfaces[i]); diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h index 82e3523f6463..297d1e5af1bd 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h @@ -173,7 +173,7 @@ struct ksock_net { int ksnn_npeers; /* # peers */ int ksnn_shutdown; /* shutting down? */ int ksnn_ninterfaces; /* IP interfaces */ - struct ksock_interface ksnn_interfaces[LNET_NUM_INTERFACES]; + struct ksock_interface ksnn_interfaces[LNET_INTERFACES_NUM]; }; /** connd timeout */ @@ -441,7 +441,7 @@ struct ksock_peer { int ksnp_n_passive_ips; /* # of... */ /* preferred local interfaces */ - u32 ksnp_passive_ips[LNET_NUM_INTERFACES]; + u32 ksnp_passive_ips[LNET_INTERFACES_NUM]; }; struct ksock_connreq { diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c index dc9a12910a8d..c401896bf649 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c @@ -1579,7 +1579,7 @@ ksocknal_send_hello(struct lnet_ni *ni, struct ksock_conn *conn, /* CAVEAT EMPTOR: this byte flips 'ipaddrs' */ struct ksock_net *net = (struct ksock_net *)ni->ni_data; - LASSERT(hello->kshm_nips <= LNET_NUM_INTERFACES); + LASSERT(hello->kshm_nips <= LNET_INTERFACES_NUM); /* rely on caller to hold a ref on socket so it wouldn't disappear */ LASSERT(conn->ksnc_proto); diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_proto.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_proto.c index 10a2757895f3..54ec5d0a85c8 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_proto.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_proto.c @@ -614,7 +614,7 @@ ksocknal_recv_hello_v1(struct ksock_conn *conn, struct ksock_hello_msg *hello, hello->kshm_nips = le32_to_cpu(hdr->payload_length) / sizeof(__u32); - if (hello->kshm_nips > LNET_NUM_INTERFACES) { + if (hello->kshm_nips > LNET_INTERFACES_NUM) { CERROR("Bad nips %d from ip %pI4h\n", hello->kshm_nips, &conn->ksnc_ipaddr); rc = -EPROTO; @@ -684,7 +684,7 @@ ksocknal_recv_hello_v2(struct ksock_conn *conn, struct ksock_hello_msg *hello, __swab32s(&hello->kshm_nips); } - if (hello->kshm_nips > LNET_NUM_INTERFACES) { + if (hello->kshm_nips > LNET_INTERFACES_NUM) { CERROR("Bad nips %d from ip %pI4h\n", hello->kshm_nips, &conn->ksnc_ipaddr); return -EPROTO; diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index b37abdedccaa..6a692d5c4608 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -34,6 +34,7 @@ #define DEBUG_SUBSYSTEM S_LNET #include #include +#include #include #include @@ -70,6 +71,13 @@ module_param(lnet_numa_range, uint, 0444); MODULE_PARM_DESC(lnet_numa_range, "NUMA range to consider during Multi-Rail selection"); +static int lnet_interfaces_max = LNET_INTERFACES_MAX_DEFAULT; +static int intf_max_set(const char *val, const struct kernel_param *kp); +module_param_call(lnet_interfaces_max, intf_max_set, param_get_int, + &lnet_interfaces_max, 0644); +MODULE_PARM_DESC(lnet_interfaces_max, + "Maximum number of interfaces in a node."); + /* * This sequence number keeps track of how many times DLC was used to * update the local NIs. It is incremented when a NI is added or @@ -82,6 +90,28 @@ static atomic_t lnet_dlc_seq_no = ATOMIC_INIT(0); static int lnet_ping(struct lnet_process_id id, signed long timeout, struct lnet_process_id __user *ids, int n_ids); +static int +intf_max_set(const char *val, const struct kernel_param *kp) +{ + int value, rc; + + rc = kstrtoint(val, 0, &value); + if (rc) { + CERROR("Invalid module parameter value for 'lnet_interfaces_max'\n"); + return rc; + } + + if (value < LNET_INTERFACES_MIN) { + CWARN("max interfaces provided are too small, setting to %d\n", + LNET_INTERFACES_MIN); + value = LNET_INTERFACES_MIN; + } + + *(int *)kp->arg = value; + + return 0; +} + static char * lnet_get_routes(void) { @@ -2924,7 +2954,7 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, infosz = offsetof(struct lnet_ping_info, pi_ni[n_ids]); /* n_ids limit is arbitrary */ - if (n_ids <= 0 || n_ids > 20 || id.nid == LNET_NID_ANY) + if (n_ids <= 0 || n_ids > lnet_interfaces_max || id.nid == LNET_NID_ANY) return -EINVAL; if (id.pid == LNET_PID_ANY) diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c index 3ea56c81ec0e..087d9a8a6b6a 100644 --- a/drivers/staging/lustre/lnet/lnet/config.c +++ b/drivers/staging/lustre/lnet/lnet/config.c @@ -123,10 +123,10 @@ lnet_ni_unique_net(struct list_head *nilist, char *iface) /* check that the NI is unique to the interfaces with in the same NI. * This is only a consideration if use_tcp_bonding is set */ static bool -lnet_ni_unique_ni(char *iface_list[LNET_NUM_INTERFACES], char *iface) +lnet_ni_unique_ni(char *iface_list[LNET_INTERFACES_NUM], char *iface) { int i; - for (i = 0; i < LNET_NUM_INTERFACES; i++) { + for (i = 0; i < LNET_INTERFACES_NUM; i++) { if (iface_list[i] && strncmp(iface_list[i], iface, strlen(iface)) == 0) return false; @@ -304,7 +304,7 @@ lnet_ni_free(struct lnet_ni *ni) kfree(ni->ni_cpts); - for (i = 0; i < LNET_NUM_INTERFACES && ni->ni_interfaces[i]; i++) + for (i = 0; i < LNET_INTERFACES_NUM && ni->ni_interfaces[i]; i++) kfree(ni->ni_interfaces[i]); /* release reference to net namespace */ @@ -397,11 +397,11 @@ lnet_ni_add_interface(struct lnet_ni *ni, char *iface) * can free the tokens at the end of the function. * The newly allocated ni_interfaces[] can be * freed when freeing the NI */ - while (niface < LNET_NUM_INTERFACES && + while (niface < LNET_INTERFACES_NUM && ni->ni_interfaces[niface]) niface++; - if (niface >= LNET_NUM_INTERFACES) { + if (niface >= LNET_INTERFACES_NUM) { LCONSOLE_ERROR_MSG(0x115, "Too many interfaces " "for net %s\n", libcfs_net2str(LNET_NIDNET(ni->ni_nid))); From patchwork Sun Oct 7 23:19:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629793 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C5DA14D6 for ; Sun, 7 Oct 2018 23:29:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 89DF428A1D for ; Sun, 7 Oct 2018 23:29:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7C31C28CBF; Sun, 7 Oct 2018 23:29:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 273B828A1D for ; Sun, 7 Oct 2018 23:29:49 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CF1FC8616F6; Sun, 7 Oct 2018 16:29:48 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 76E9E21FB9C for ; Sun, 7 Oct 2018 16:29:46 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8B74BAE17; Sun, 7 Oct 2018 23:29:45 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:37 +1100 Message-ID: <153895437761.16383.2145517278397754849.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 02/24] lustre: lnet: configure lnet_interfaces_max tunable from dlc X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Added the ability to configure lnet_interfaces_max from DLC. Combined the configure and show of numa range and max interfaces under a "global" YAML element when configuring using YAML. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Amir Shehata Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25771 Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../lustre/include/uapi/linux/lnet/lnet-dlc.h | 6 +++--- drivers/staging/lustre/lnet/lnet/api-ni.c | 16 ++++++++-------- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h index d88b30d2e76c..706892ca7efb 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h @@ -230,9 +230,9 @@ struct lnet_ioctl_peer_cfg { void __user *prcfg_bulk; }; -struct lnet_ioctl_numa_range { - struct libcfs_ioctl_hdr nr_hdr; - __u32 nr_range; +struct lnet_ioctl_set_value { + struct libcfs_ioctl_hdr sv_hdr; + __u32 sv_value; }; struct lnet_ioctl_lnet_stats { diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 6a692d5c4608..8b6400da2836 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -2708,24 +2708,24 @@ LNetCtl(unsigned int cmd, void *arg) return rc; case IOC_LIBCFS_SET_NUMA_RANGE: { - struct lnet_ioctl_numa_range *numa; + struct lnet_ioctl_set_value *numa; numa = arg; - if (numa->nr_hdr.ioc_len != sizeof(*numa)) + if (numa->sv_hdr.ioc_len != sizeof(*numa)) return -EINVAL; - mutex_lock(&the_lnet.ln_api_mutex); - lnet_numa_range = numa->nr_range; - mutex_unlock(&the_lnet.ln_api_mutex); + lnet_net_lock(LNET_LOCK_EX); + lnet_numa_range = numa->sv_value; + lnet_net_unlock(LNET_LOCK_EX); return 0; } case IOC_LIBCFS_GET_NUMA_RANGE: { - struct lnet_ioctl_numa_range *numa; + struct lnet_ioctl_set_value *numa; numa = arg; - if (numa->nr_hdr.ioc_len != sizeof(*numa)) + if (numa->sv_hdr.ioc_len != sizeof(*numa)) return -EINVAL; - numa->nr_range = lnet_numa_range; + numa->sv_value = lnet_numa_range; return 0; } From patchwork Sun Oct 7 23:19:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629797 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 48CF114D6 for ; Sun, 7 Oct 2018 23:29:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 323AF28A1D for ; Sun, 7 Oct 2018 23:29:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 24E5D28CBF; Sun, 7 Oct 2018 23:29:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8BB2628A1D for ; Sun, 7 Oct 2018 23:29:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 425678616F7; Sun, 7 Oct 2018 16:29:56 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C80FD21FB9C for ; Sun, 7 Oct 2018 16:29:54 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id DF76AAD39; Sun, 7 Oct 2018 23:29:53 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:37 +1100 Message-ID: <153895437765.16383.7644707942018591818.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 03/24] lustre: lnet: add struct lnet_ping_buffer X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber The Multi-Rail code will use the ping target buffer also as the source of data to push to other nodes. This means that there will be multiple MDs referencing the same buffer, and care must be taken to ensure that the buffer is not freed while any such reference remains. Encapsulate the struct lnet_ping_info (aka lnet_ping_info_t) in a struct lnet_ping_buffer. This adds a reference count, and the number of NIDs for the encapsulated lnet_ping_info has been sized. For sizing the buffer the constant LNET_PINGINFO_SIZE is replaced with LNET_PING_INFO_SIZE(NNIS). WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25773 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 22 + .../staging/lustre/include/linux/lnet/lib-types.h | 40 ++ drivers/staging/lustre/lnet/lnet/api-ni.c | 345 +++++++++++--------- drivers/staging/lustre/lnet/lnet/router.c | 94 +++-- 4 files changed, 301 insertions(+), 200 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 16e64d83840d..2e2b5ed27116 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -634,7 +634,27 @@ int lnet_peer_buffer_credits(struct lnet_net *net); int lnet_router_checker_start(void); void lnet_router_checker_stop(void); void lnet_router_ni_update_locked(struct lnet_peer_ni *gw, __u32 net); -void lnet_swap_pinginfo(struct lnet_ping_info *info); +void lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf); + +int lnet_ping_info_validate(struct lnet_ping_info *pinfo); +struct lnet_ping_buffer *lnet_ping_buffer_alloc(int nnis, gfp_t gfp); +void lnet_ping_buffer_free(struct lnet_ping_buffer *pbuf); + +static inline void lnet_ping_buffer_addref(struct lnet_ping_buffer *pbuf) +{ + atomic_inc(&pbuf->pb_refcnt); +} + +static inline void lnet_ping_buffer_decref(struct lnet_ping_buffer *pbuf) +{ + if (atomic_dec_and_test(&pbuf->pb_refcnt)) + lnet_ping_buffer_free(pbuf); +} + +static inline int lnet_ping_buffer_numref(struct lnet_ping_buffer *pbuf) +{ + return atomic_read(&pbuf->pb_refcnt); +} int lnet_parse_ip2nets(char **networksp, char *ip2nets); int lnet_parse_routes(char *route_str, int *im_a_router); diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index 7b11c31f0029..ab8c6d66cdbf 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -387,12 +387,32 @@ struct lnet_ni { #define LNET_PING_FEAT_NI_STATUS BIT(1) /* return NI status */ #define LNET_PING_FEAT_RTE_DISABLED BIT(2) /* Routing enabled */ -#define LNET_PING_FEAT_MASK (LNET_PING_FEAT_BASE | \ - LNET_PING_FEAT_NI_STATUS) +#define LNET_PING_INFO_SIZE(NNIDS) \ + offsetof(struct lnet_ping_info, pi_ni[NNIDS]) +#define LNET_PING_INFO_LONI(PINFO) ((PINFO)->pi_ni[0].ns_nid) +#define LNET_PING_INFO_SEQNO(PINFO) ((PINFO)->pi_ni[0].ns_status) + +/* + * Descriptor of a ping info buffer: keep a separate indicator of the + * size and a reference count. The type is used both as a source and + * sink of data, so we need to keep some information outside of the + * area that may be overwritten by network data. + */ +struct lnet_ping_buffer { + int pb_nnis; + atomic_t pb_refcnt; + struct lnet_ping_info pb_info; +}; + +#define LNET_PING_BUFFER_SIZE(NNIDS) \ + offsetof(struct lnet_ping_buffer, pb_info.pi_ni[NNIDS]) +#define LNET_PING_BUFFER_LONI(PBUF) ((PBUF)->pb_info.pi_ni[0].ns_nid) +#define LNET_PING_BUFFER_SEQNO(PBUF) ((PBUF)->pb_info.pi_ni[0].ns_status) + /* router checker data, per router */ -#define LNET_MAX_RTR_NIS 16 -#define LNET_PINGINFO_SIZE offsetof(struct lnet_ping_info, pi_ni[LNET_MAX_RTR_NIS]) +#define LNET_MAX_RTR_NIS LNET_INTERFACES_MIN +#define LNET_RTR_PINGINFO_SIZE LNET_PING_INFO_SIZE(LNET_MAX_RTR_NIS) struct lnet_rc_data { /* chain on the_lnet.ln_zombie_rcd or ln_deathrow_rcd */ struct list_head rcd_list; @@ -401,7 +421,7 @@ struct lnet_rc_data { /* reference to gateway */ struct lnet_peer_ni *rcd_gateway; /* ping buffer */ - struct lnet_ping_info *rcd_pinginfo; + struct lnet_ping_buffer *rcd_pingbuffer; }; struct lnet_peer_ni { @@ -792,9 +812,17 @@ struct lnet { /* percpt router buffer pools */ struct lnet_rtrbufpool **ln_rtrpools; + /* + * Ping target / Push source + * + * The ping target and push source share a single buffer. The + * ln_ping_target is protected against concurrent updates by + * ln_api_mutex. + */ struct lnet_handle_md ln_ping_target_md; struct lnet_handle_eq ln_ping_target_eq; - struct lnet_ping_info *ln_ping_info; + struct lnet_ping_buffer *ln_ping_target; + atomic_t ln_ping_target_seqno; /* router checker startup/shutdown state */ enum lnet_rc_state ln_rc_state; diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 8b6400da2836..ca28ad75fe2b 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -902,25 +902,44 @@ lnet_count_acceptor_nets(void) return count; } -static struct lnet_ping_info * -lnet_ping_info_create(int num_ni) +struct lnet_ping_buffer * +lnet_ping_buffer_alloc(int nnis, gfp_t gfp) { - struct lnet_ping_info *ping_info; - unsigned int infosz; + struct lnet_ping_buffer *pbuf; - infosz = offsetof(struct lnet_ping_info, pi_ni[num_ni]); - ping_info = kvzalloc(infosz, GFP_KERNEL); - if (!ping_info) { - CERROR("Can't allocate ping info[%d]\n", num_ni); + pbuf = kmalloc(LNET_PING_BUFFER_SIZE(nnis), gfp); + if (pbuf) { + pbuf->pb_nnis = nnis; + atomic_set(&pbuf->pb_refcnt, 1); + } + + return pbuf; +} + +void +lnet_ping_buffer_free(struct lnet_ping_buffer *pbuf) +{ + LASSERT(lnet_ping_buffer_numref(pbuf) == 0); + kfree(pbuf); +} + +static struct lnet_ping_buffer * +lnet_ping_target_create(int nnis) +{ + struct lnet_ping_buffer *pbuf; + + pbuf = lnet_ping_buffer_alloc(nnis, GFP_KERNEL); + if (!pbuf) { + CERROR("Can't allocate ping source [%d]\n", nnis); return NULL; } - ping_info->pi_nnis = num_ni; - ping_info->pi_pid = the_lnet.ln_pid; - ping_info->pi_magic = LNET_PROTO_PING_MAGIC; - ping_info->pi_features = LNET_PING_FEAT_NI_STATUS; + pbuf->pb_info.pi_nnis = nnis; + pbuf->pb_info.pi_pid = the_lnet.ln_pid; + pbuf->pb_info.pi_magic = LNET_PROTO_PING_MAGIC; + pbuf->pb_info.pi_features = LNET_PING_FEAT_NI_STATUS; - return ping_info; + return pbuf; } static inline int @@ -966,14 +985,25 @@ lnet_get_ni_count(void) return count; } -static inline void -lnet_ping_info_free(struct lnet_ping_info *pinfo) +int +lnet_ping_info_validate(struct lnet_ping_info *pinfo) { - kvfree(pinfo); + if (!pinfo) + return -EINVAL; + if (pinfo->pi_magic != LNET_PROTO_PING_MAGIC) + return -EPROTO; + if (!(pinfo->pi_features & LNET_PING_FEAT_NI_STATUS)) + return -EPROTO; + /* Loopback is guaranteed to be present */ + if (pinfo->pi_nnis < 1 || pinfo->pi_nnis > lnet_interfaces_max) + return -ERANGE; + if (LNET_NETTYP(LNET_NIDNET(LNET_PING_INFO_LONI(pinfo))) != LOLND) + return -EPROTO; + return 0; } static void -lnet_ping_info_destroy(void) +lnet_ping_target_destroy(void) { struct lnet_net *net; struct lnet_ni *ni; @@ -988,25 +1018,25 @@ lnet_ping_info_destroy(void) } } - lnet_ping_info_free(the_lnet.ln_ping_info); - the_lnet.ln_ping_info = NULL; + lnet_ping_buffer_decref(the_lnet.ln_ping_target); + the_lnet.ln_ping_target = NULL; lnet_net_unlock(LNET_LOCK_EX); } static void -lnet_ping_event_handler(struct lnet_event *event) +lnet_ping_target_event_handler(struct lnet_event *event) { - struct lnet_ping_info *pinfo = event->md.user_ptr; + struct lnet_ping_buffer *pbuf = event->md.user_ptr; if (event->unlinked) - pinfo->pi_features = LNET_PING_FEAT_INVAL; + lnet_ping_buffer_decref(pbuf); } static int -lnet_ping_info_setup(struct lnet_ping_info **ppinfo, - struct lnet_handle_md *md_handle, - int ni_count, bool set_eq) +lnet_ping_target_setup(struct lnet_ping_buffer **ppbuf, + struct lnet_handle_md *ping_mdh, + int ni_count, bool set_eq) { struct lnet_process_id id = { .nid = LNET_NID_ANY, .pid = LNET_PID_ANY }; @@ -1015,94 +1045,98 @@ lnet_ping_info_setup(struct lnet_ping_info **ppinfo, int rc, rc2; if (set_eq) { - rc = LNetEQAlloc(0, lnet_ping_event_handler, + rc = LNetEQAlloc(0, lnet_ping_target_event_handler, &the_lnet.ln_ping_target_eq); if (rc) { - CERROR("Can't allocate ping EQ: %d\n", rc); + CERROR("Can't allocate ping buffer EQ: %d\n", rc); return rc; } } - *ppinfo = lnet_ping_info_create(ni_count); - if (!*ppinfo) { + *ppbuf = lnet_ping_target_create(ni_count); + if (!*ppbuf) { rc = -ENOMEM; - goto failed_0; + goto fail_free_eq; } + /* Ping target ME/MD */ rc = LNetMEAttach(LNET_RESERVED_PORTAL, id, LNET_PROTO_PING_MATCHBITS, 0, LNET_UNLINK, LNET_INS_AFTER, &me_handle); if (rc) { - CERROR("Can't create ping ME: %d\n", rc); - goto failed_1; + CERROR("Can't create ping target ME: %d\n", rc); + goto fail_decref_ping_buffer; } /* initialize md content */ - md.start = *ppinfo; - md.length = offsetof(struct lnet_ping_info, - pi_ni[(*ppinfo)->pi_nnis]); + md.start = &(*ppbuf)->pb_info; + md.length = LNET_PING_INFO_SIZE((*ppbuf)->pb_nnis); md.threshold = LNET_MD_THRESH_INF; md.max_size = 0; md.options = LNET_MD_OP_GET | LNET_MD_TRUNCATE | LNET_MD_MANAGE_REMOTE; - md.user_ptr = NULL; md.eq_handle = the_lnet.ln_ping_target_eq; - md.user_ptr = *ppinfo; + md.user_ptr = *ppbuf; - rc = LNetMDAttach(me_handle, md, LNET_RETAIN, md_handle); + rc = LNetMDAttach(me_handle, md, LNET_RETAIN, ping_mdh); if (rc) { - CERROR("Can't attach ping MD: %d\n", rc); - goto failed_2; + CERROR("Can't attach ping target MD: %d\n", rc); + goto fail_unlink_ping_me; } + lnet_ping_buffer_addref(*ppbuf); return 0; -failed_2: +fail_unlink_ping_me: rc2 = LNetMEUnlink(me_handle); LASSERT(!rc2); -failed_1: - lnet_ping_info_free(*ppinfo); - *ppinfo = NULL; -failed_0: - if (set_eq) - LNetEQFree(the_lnet.ln_ping_target_eq); +fail_decref_ping_buffer: + LASSERT(lnet_ping_buffer_numref(*ppbuf) == 1); + lnet_ping_buffer_decref(*ppbuf); + *ppbuf = NULL; +fail_free_eq: + if (set_eq) { + rc2 = LNetEQFree(the_lnet.ln_ping_target_eq); + LASSERT(rc2 == 0); + } return rc; } static void -lnet_ping_md_unlink(struct lnet_ping_info *pinfo, - struct lnet_handle_md *md_handle) +lnet_ping_md_unlink(struct lnet_ping_buffer *pbuf, + struct lnet_handle_md *ping_mdh) { - LNetMDUnlink(*md_handle); - LNetInvalidateMDHandle(md_handle); + LNetMDUnlink(*ping_mdh); + LNetInvalidateMDHandle(ping_mdh); - /* NB md could be busy; this just starts the unlink */ - while (pinfo->pi_features != LNET_PING_FEAT_INVAL) { - CDEBUG(D_NET, "Still waiting for ping MD to unlink\n"); + /* NB the MD could be busy; this just starts the unlink */ + while (lnet_ping_buffer_numref(pbuf) > 1) { + CDEBUG(D_NET, "Still waiting for ping data MD to unlink\n"); schedule_timeout_idle(HZ); } } static void -lnet_ping_info_install_locked(struct lnet_ping_info *ping_info) +lnet_ping_target_install_locked(struct lnet_ping_buffer *pbuf) { struct lnet_ni_status *ns; struct lnet_ni *ni; struct lnet_net *net; int i = 0; + int rc; list_for_each_entry(net, &the_lnet.ln_nets, net_list) { list_for_each_entry(ni, &net->net_ni_list, ni_netlist) { - LASSERT(i < ping_info->pi_nnis); + LASSERT(i < pbuf->pb_nnis); - ns = &ping_info->pi_ni[i]; + ns = &pbuf->pb_info.pi_ni[i]; ns->ns_nid = ni->ni_nid; lnet_ni_lock(ni); ns->ns_status = ni->ni_status ? - ni->ni_status->ns_status : + ni->ni_status->ns_status : LNET_NI_STATUS_UP; ni->ni_status = ns; lnet_ni_unlock(ni); @@ -1110,35 +1144,47 @@ lnet_ping_info_install_locked(struct lnet_ping_info *ping_info) i++; } } + /* + * We (ab)use the ns_status of the loopback interface to + * transmit the sequence number. The first interface listed + * must be the loopback interface. + */ + rc = lnet_ping_info_validate(&pbuf->pb_info); + if (rc) { + LCONSOLE_EMERG("Invalid ping target: %d\n", rc); + LBUG(); + } + LNET_PING_BUFFER_SEQNO(pbuf) = + atomic_inc_return(&the_lnet.ln_ping_target_seqno); } static void -lnet_ping_target_update(struct lnet_ping_info *pinfo, - struct lnet_handle_md md_handle) +lnet_ping_target_update(struct lnet_ping_buffer *pbuf, + struct lnet_handle_md ping_mdh) { - struct lnet_ping_info *old_pinfo = NULL; - struct lnet_handle_md old_md; + struct lnet_ping_buffer *old_pbuf = NULL; + struct lnet_handle_md old_ping_md; /* switch the NIs to point to the new ping info created */ lnet_net_lock(LNET_LOCK_EX); if (!the_lnet.ln_routing) - pinfo->pi_features |= LNET_PING_FEAT_RTE_DISABLED; - lnet_ping_info_install_locked(pinfo); + pbuf->pb_info.pi_features |= LNET_PING_FEAT_RTE_DISABLED; + lnet_ping_target_install_locked(pbuf); - if (the_lnet.ln_ping_info) { - old_pinfo = the_lnet.ln_ping_info; - old_md = the_lnet.ln_ping_target_md; + if (the_lnet.ln_ping_target) { + old_pbuf = the_lnet.ln_ping_target; + old_ping_md = the_lnet.ln_ping_target_md; } - the_lnet.ln_ping_target_md = md_handle; - the_lnet.ln_ping_info = pinfo; + the_lnet.ln_ping_target_md = ping_mdh; + the_lnet.ln_ping_target = pbuf; lnet_net_unlock(LNET_LOCK_EX); - if (old_pinfo) { - /* unlink the old ping info */ - lnet_ping_md_unlink(old_pinfo, &old_md); - lnet_ping_info_free(old_pinfo); + if (old_pbuf) { + /* unlink and free the old ping info */ + lnet_ping_md_unlink(old_pbuf, &old_ping_md); + lnet_ping_buffer_decref(old_pbuf); } } @@ -1147,13 +1193,13 @@ lnet_ping_target_fini(void) { int rc; - lnet_ping_md_unlink(the_lnet.ln_ping_info, + lnet_ping_md_unlink(the_lnet.ln_ping_target, &the_lnet.ln_ping_target_md); rc = LNetEQFree(the_lnet.ln_ping_target_eq); LASSERT(!rc); - lnet_ping_info_destroy(); + lnet_ping_target_destroy(); } static int @@ -1745,8 +1791,8 @@ LNetNIInit(lnet_pid_t requested_pid) int im_a_router = 0; int rc; int ni_count; - struct lnet_ping_info *pinfo; - struct lnet_handle_md md_handle; + struct lnet_ping_buffer *pbuf; + struct lnet_handle_md ping_mdh; struct list_head net_head; struct lnet_net *net; @@ -1823,11 +1869,11 @@ LNetNIInit(lnet_pid_t requested_pid) the_lnet.ln_refcount = 1; /* Now I may use my own API functions... */ - rc = lnet_ping_info_setup(&pinfo, &md_handle, ni_count, true); + rc = lnet_ping_target_setup(&pbuf, &ping_mdh, ni_count, true); if (rc) goto err_acceptor_stop; - lnet_ping_target_update(pinfo, md_handle); + lnet_ping_target_update(pbuf, ping_mdh); rc = lnet_router_checker_start(); if (rc) @@ -1936,7 +1982,10 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_ni *cfg_ni, } cfg_ni->lic_nid = ni->ni_nid; - cfg_ni->lic_status = ni->ni_status->ns_status; + if (LNET_NETTYP(LNET_NIDNET(ni->ni_nid)) == LOLND) + cfg_ni->lic_status = LNET_NI_STATUS_UP; + else + cfg_ni->lic_status = ni->ni_status->ns_status; cfg_ni->lic_tcp_bonding = use_tcp_bonding; cfg_ni->lic_dev_cpt = ni->ni_dev_cpt; @@ -2021,7 +2070,10 @@ lnet_fill_ni_info_legacy(struct lnet_ni *ni, config->cfg_config_u.cfg_net.net_peer_rtr_credits = ni->ni_net->net_tunables.lct_peer_rtr_credits; - net_config->ni_status = ni->ni_status->ns_status; + if (LNET_NETTYP(LNET_NIDNET(ni->ni_nid)) == LOLND) + net_config->ni_status = LNET_NI_STATUS_UP; + else + net_config->ni_status = ni->ni_status->ns_status; if (ni->ni_cpts) { int num_cpts = min(ni->ni_ncpts, LNET_MAX_SHOW_NUM_CPT); @@ -2172,8 +2224,8 @@ static int lnet_add_net_common(struct lnet_net *net, struct lnet_ioctl_config_lnd_tunables *tun) { u32 net_id; - struct lnet_ping_info *pinfo; - struct lnet_handle_md md_handle; + struct lnet_ping_buffer *pbuf; + struct lnet_handle_md ping_mdh; int rc; struct lnet_remotenet *rnet; int net_ni_count; @@ -2195,7 +2247,7 @@ static int lnet_add_net_common(struct lnet_net *net, /* * make sure you calculate the correct number of slots in the ping - * info. Since the ping info is a flattened list of all the NIs, + * buffer. Since the ping info is a flattened list of all the NIs, * we should allocate enough slots to accomodate the number of NIs * which will be added. * @@ -2204,9 +2256,9 @@ static int lnet_add_net_common(struct lnet_net *net, */ net_ni_count = lnet_get_net_ni_count_pre(net); - rc = lnet_ping_info_setup(&pinfo, &md_handle, - net_ni_count + lnet_get_ni_count(), - false); + rc = lnet_ping_target_setup(&pbuf, &ping_mdh, + net_ni_count + lnet_get_ni_count(), + false); if (rc < 0) { lnet_net_free(net); return rc; @@ -2257,13 +2309,13 @@ static int lnet_add_net_common(struct lnet_net *net, lnet_peer_net_added(net); lnet_net_unlock(LNET_LOCK_EX); - lnet_ping_target_update(pinfo, md_handle); + lnet_ping_target_update(pbuf, ping_mdh); return 0; failed: - lnet_ping_md_unlink(pinfo, &md_handle); - lnet_ping_info_free(pinfo); + lnet_ping_md_unlink(pbuf, &ping_mdh); + lnet_ping_buffer_decref(pbuf); return rc; } @@ -2354,8 +2406,8 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf) struct lnet_net *net; struct lnet_ni *ni; u32 net_id = LNET_NIDNET(conf->lic_nid); - struct lnet_ping_info *pinfo; - struct lnet_handle_md md_handle; + struct lnet_ping_buffer *pbuf; + struct lnet_handle_md ping_mdh; int rc; int net_count; u32 addr; @@ -2373,7 +2425,7 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf) CERROR("net %s not found\n", libcfs_net2str(net_id)); rc = -ENOENT; - goto net_unlock; + goto unlock_net; } addr = LNET_NIDADDR(conf->lic_nid); @@ -2384,20 +2436,20 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf) lnet_net_unlock(0); /* create and link a new ping info, before removing the old one */ - rc = lnet_ping_info_setup(&pinfo, &md_handle, - lnet_get_ni_count() - net_count, - false); + rc = lnet_ping_target_setup(&pbuf, &ping_mdh, + lnet_get_ni_count() - net_count, + false); if (rc != 0) - goto out; + goto unlock_api_mutex; lnet_shutdown_lndnet(net); if (lnet_count_acceptor_nets() == 0) lnet_acceptor_stop(); - lnet_ping_target_update(pinfo, md_handle); + lnet_ping_target_update(pbuf, ping_mdh); - goto out; + goto unlock_api_mutex; } ni = lnet_nid2ni_locked(conf->lic_nid, 0); @@ -2405,7 +2457,7 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf) CERROR("nid %s not found\n", libcfs_nid2str(conf->lic_nid)); rc = -ENOENT; - goto net_unlock; + goto unlock_net; } net_count = lnet_get_net_ni_count_locked(net); @@ -2413,27 +2465,27 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf) lnet_net_unlock(0); /* create and link a new ping info, before removing the old one */ - rc = lnet_ping_info_setup(&pinfo, &md_handle, - lnet_get_ni_count() - 1, false); + rc = lnet_ping_target_setup(&pbuf, &ping_mdh, + lnet_get_ni_count() - 1, false); if (rc != 0) - goto out; + goto unlock_api_mutex; lnet_shutdown_lndni(ni); if (lnet_count_acceptor_nets() == 0) lnet_acceptor_stop(); - lnet_ping_target_update(pinfo, md_handle); + lnet_ping_target_update(pbuf, ping_mdh); /* check if the net is empty and remove it if it is */ if (net_count == 1) lnet_shutdown_lndnet(net); - goto out; + goto unlock_api_mutex; -net_unlock: +unlock_net: lnet_net_unlock(0); -out: +unlock_api_mutex: mutex_unlock(&the_lnet.ln_api_mutex); return rc; @@ -2501,8 +2553,8 @@ int lnet_dyn_del_net(__u32 net_id) { struct lnet_net *net; - struct lnet_ping_info *pinfo; - struct lnet_handle_md md_handle; + struct lnet_ping_buffer *pbuf; + struct lnet_handle_md ping_mdh; int rc; int net_ni_count; @@ -2525,8 +2577,8 @@ lnet_dyn_del_net(__u32 net_id) lnet_net_unlock(0); /* create and link a new ping info, before removing the old one */ - rc = lnet_ping_info_setup(&pinfo, &md_handle, - lnet_get_ni_count() - net_ni_count, false); + rc = lnet_ping_target_setup(&pbuf, &ping_mdh, + lnet_get_ni_count() - net_ni_count, false); if (rc) goto out; @@ -2535,7 +2587,7 @@ lnet_dyn_del_net(__u32 net_id) if (!lnet_count_acceptor_nets()) lnet_acceptor_stop(); - lnet_ping_target_update(pinfo, md_handle); + lnet_ping_target_update(pbuf, ping_mdh); out: mutex_unlock(&the_lnet.ln_api_mutex); @@ -2943,16 +2995,13 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, int unlinked = 0; int replied = 0; const signed long a_long_time = 60*HZ; - int infosz; - struct lnet_ping_info *info; + struct lnet_ping_buffer *pbuf; struct lnet_process_id tmpid; int i; int nob; int rc; int rc2; - infosz = offsetof(struct lnet_ping_info, pi_ni[n_ids]); - /* n_ids limit is arbitrary */ if (n_ids <= 0 || n_ids > lnet_interfaces_max || id.nid == LNET_NID_ANY) return -EINVAL; @@ -2960,20 +3009,20 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, if (id.pid == LNET_PID_ANY) id.pid = LNET_PID_LUSTRE; - info = kzalloc(infosz, GFP_KERNEL); - if (!info) + pbuf = lnet_ping_buffer_alloc(n_ids, GFP_NOFS); + if (!pbuf) return -ENOMEM; /* NB 2 events max (including any unlink event) */ rc = LNetEQAlloc(2, LNET_EQ_HANDLER_NONE, &eqh); if (rc) { CERROR("Can't allocate EQ: %d\n", rc); - goto out_0; + goto fail_ping_buffer_decref; } /* initialize md content */ - md.start = info; - md.length = infosz; + md.start = &pbuf->pb_info; + md.length = LNET_PING_INFO_SIZE(n_ids); md.threshold = 2; /*GET/REPLY*/ md.max_size = 0; md.options = LNET_MD_TRUNCATE; @@ -2983,7 +3032,7 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, rc = LNetMDBind(md, LNET_UNLINK, &mdh); if (rc) { CERROR("Can't bind MD: %d\n", rc); - goto out_1; + goto fail_free_eq; } rc = LNetGet(LNET_NID_ANY, mdh, id, @@ -3044,11 +3093,11 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, CWARN("%s: Unexpected rc >= 0 but no reply!\n", libcfs_id2str(id)); rc = -EIO; - goto out_1; + goto fail_free_eq; } nob = rc; - LASSERT(nob >= 0 && nob <= infosz); + LASSERT(nob >= 0 && nob <= LNET_PING_INFO_SIZE(n_ids)); rc = -EPROTO; /* if I can't parse... */ @@ -3056,56 +3105,56 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, /* can't check magic/version */ CERROR("%s: ping info too short %d\n", libcfs_id2str(id), nob); - goto out_1; + goto fail_free_eq; } - if (info->pi_magic == __swab32(LNET_PROTO_PING_MAGIC)) { - lnet_swap_pinginfo(info); - } else if (info->pi_magic != LNET_PROTO_PING_MAGIC) { + if (pbuf->pb_info.pi_magic == __swab32(LNET_PROTO_PING_MAGIC)) { + lnet_swap_pinginfo(pbuf); + } else if (pbuf->pb_info.pi_magic != LNET_PROTO_PING_MAGIC) { CERROR("%s: Unexpected magic %08x\n", - libcfs_id2str(id), info->pi_magic); - goto out_1; + libcfs_id2str(id), pbuf->pb_info.pi_magic); + goto fail_free_eq; } - if (!(info->pi_features & LNET_PING_FEAT_NI_STATUS)) { + if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_NI_STATUS)) { CERROR("%s: ping w/o NI status: 0x%x\n", - libcfs_id2str(id), info->pi_features); - goto out_1; + libcfs_id2str(id), pbuf->pb_info.pi_features); + goto fail_free_eq; } - if (nob < offsetof(struct lnet_ping_info, pi_ni[0])) { + if (nob < LNET_PING_INFO_SIZE(0)) { CERROR("%s: Short reply %d(%d min)\n", libcfs_id2str(id), - nob, (int)offsetof(struct lnet_ping_info, pi_ni[0])); - goto out_1; + nob, (int)LNET_PING_INFO_SIZE(0)); + goto fail_free_eq; } - if (info->pi_nnis < n_ids) - n_ids = info->pi_nnis; + if (pbuf->pb_info.pi_nnis < n_ids) + n_ids = pbuf->pb_info.pi_nnis; - if (nob < offsetof(struct lnet_ping_info, pi_ni[n_ids])) { + if (nob < LNET_PING_INFO_SIZE(n_ids)) { CERROR("%s: Short reply %d(%d expected)\n", libcfs_id2str(id), - nob, (int)offsetof(struct lnet_ping_info, pi_ni[n_ids])); - goto out_1; + nob, (int)LNET_PING_INFO_SIZE(n_ids)); + goto fail_free_eq; } rc = -EFAULT; /* If I SEGV... */ memset(&tmpid, 0, sizeof(tmpid)); for (i = 0; i < n_ids; i++) { - tmpid.pid = info->pi_pid; - tmpid.nid = info->pi_ni[i].ns_nid; + tmpid.pid = pbuf->pb_info.pi_pid; + tmpid.nid = pbuf->pb_info.pi_ni[i].ns_nid; if (copy_to_user(&ids[i], &tmpid, sizeof(tmpid))) - goto out_1; + goto fail_free_eq; } - rc = info->pi_nnis; + rc = pbuf->pb_info.pi_nnis; - out_1: + fail_free_eq: rc2 = LNetEQFree(eqh); if (rc2) CERROR("rc2 %d\n", rc2); LASSERT(!rc2); - out_0: - kfree(info); + fail_ping_buffer_decref: + lnet_ping_buffer_decref(pbuf); return rc; } diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c index b31a383fe974..e97957ce9252 100644 --- a/drivers/staging/lustre/lnet/lnet/router.c +++ b/drivers/staging/lustre/lnet/lnet/router.c @@ -618,17 +618,21 @@ lnet_get_route(int idx, __u32 *net, __u32 *hops, } void -lnet_swap_pinginfo(struct lnet_ping_info *info) +lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf) { - int i; struct lnet_ni_status *stat; + int nnis; + int i; - __swab32s(&info->pi_magic); - __swab32s(&info->pi_features); - __swab32s(&info->pi_pid); - __swab32s(&info->pi_nnis); - for (i = 0; i < info->pi_nnis && i < LNET_MAX_RTR_NIS; i++) { - stat = &info->pi_ni[i]; + __swab32s(&pbuf->pb_info.pi_magic); + __swab32s(&pbuf->pb_info.pi_features); + __swab32s(&pbuf->pb_info.pi_pid); + __swab32s(&pbuf->pb_info.pi_nnis); + nnis = pbuf->pb_info.pi_nnis; + if (nnis > pbuf->pb_nnis) + nnis = pbuf->pb_nnis; + for (i = 0; i < nnis; i++) { + stat = &pbuf->pb_info.pi_ni[i]; __swab64s(&stat->ns_nid); __swab32s(&stat->ns_status); } @@ -641,11 +645,12 @@ lnet_swap_pinginfo(struct lnet_ping_info *info) static void lnet_parse_rc_info(struct lnet_rc_data *rcd) { - struct lnet_ping_info *info = rcd->rcd_pinginfo; + struct lnet_ping_buffer *pbuf = rcd->rcd_pingbuffer; struct lnet_peer_ni *gw = rcd->rcd_gateway; struct lnet_route *rte; + int nnis; - if (!gw->lpni_alive) + if (!gw->lpni_alive || !pbuf) return; /* @@ -654,51 +659,48 @@ lnet_parse_rc_info(struct lnet_rc_data *rcd) */ spin_lock(&gw->lpni_lock); - if (info->pi_magic == __swab32(LNET_PROTO_PING_MAGIC)) - lnet_swap_pinginfo(info); + if (pbuf->pb_info.pi_magic == __swab32(LNET_PROTO_PING_MAGIC)) + lnet_swap_pinginfo(pbuf); /* NB always racing with network! */ - if (info->pi_magic != LNET_PROTO_PING_MAGIC) { + if (pbuf->pb_info.pi_magic != LNET_PROTO_PING_MAGIC) { CDEBUG(D_NET, "%s: Unexpected magic %08x\n", - libcfs_nid2str(gw->lpni_nid), info->pi_magic); + libcfs_nid2str(gw->lpni_nid), pbuf->pb_info.pi_magic); gw->lpni_ping_feats = LNET_PING_FEAT_INVAL; - spin_unlock(&gw->lpni_lock); - return; + goto out; } - gw->lpni_ping_feats = info->pi_features; - if (!(gw->lpni_ping_feats & LNET_PING_FEAT_MASK)) { - CDEBUG(D_NET, "%s: Unexpected features 0x%x\n", - libcfs_nid2str(gw->lpni_nid), gw->lpni_ping_feats); - spin_unlock(&gw->lpni_lock); - return; /* nothing I can understand */ - } + gw->lpni_ping_feats = pbuf->pb_info.pi_features; - if (!(gw->lpni_ping_feats & LNET_PING_FEAT_NI_STATUS)) { - spin_unlock(&gw->lpni_lock); - return; /* can't carry NI status info */ - } + /* Without NI status info there's nothing more to do. */ + if (!(gw->lpni_ping_feats & LNET_PING_FEAT_NI_STATUS)) + goto out; + + /* Determine the number of NIs for which there is data. */ + nnis = pbuf->pb_info.pi_nnis; + if (pbuf->pb_nnis < nnis) + nnis = pbuf->pb_nnis; list_for_each_entry(rte, &gw->lpni_routes, lr_gwlist) { int down = 0; int up = 0; int i; + /* If routing disabled then the route is down. */ if (gw->lpni_ping_feats & LNET_PING_FEAT_RTE_DISABLED) { rte->lr_downis = 1; continue; } - for (i = 0; i < info->pi_nnis && i < LNET_MAX_RTR_NIS; i++) { - struct lnet_ni_status *stat = &info->pi_ni[i]; + for (i = 0; i < nnis; i++) { + struct lnet_ni_status *stat = &pbuf->pb_info.pi_ni[i]; lnet_nid_t nid = stat->ns_nid; if (nid == LNET_NID_ANY) { CDEBUG(D_NET, "%s: unexpected LNET_NID_ANY\n", libcfs_nid2str(gw->lpni_nid)); gw->lpni_ping_feats = LNET_PING_FEAT_INVAL; - spin_unlock(&gw->lpni_lock); - return; + goto out; } if (LNET_NETTYP(LNET_NIDNET(nid)) == LOLND) @@ -720,8 +722,7 @@ lnet_parse_rc_info(struct lnet_rc_data *rcd) CDEBUG(D_NET, "%s: Unexpected status 0x%x\n", libcfs_nid2str(gw->lpni_nid), stat->ns_status); gw->lpni_ping_feats = LNET_PING_FEAT_INVAL; - spin_unlock(&gw->lpni_lock); - return; + goto out; } if (up) { /* ignore downed NIs if NI for dest network is up */ @@ -737,7 +738,7 @@ lnet_parse_rc_info(struct lnet_rc_data *rcd) rte->lr_downis = down; } - +out: spin_unlock(&gw->lpni_lock); } @@ -903,7 +904,8 @@ lnet_destroy_rc_data(struct lnet_rc_data *rcd) lnet_net_unlock(cpt); } - kfree(rcd->rcd_pinginfo); + if (rcd->rcd_pingbuffer) + lnet_ping_buffer_decref(rcd->rcd_pingbuffer); kfree(rcd); } @@ -912,7 +914,7 @@ static struct lnet_rc_data * lnet_create_rc_data_locked(struct lnet_peer_ni *gateway) { struct lnet_rc_data *rcd = NULL; - struct lnet_ping_info *pi; + struct lnet_ping_buffer *pbuf; struct lnet_md md; int rc; int i; @@ -926,19 +928,19 @@ lnet_create_rc_data_locked(struct lnet_peer_ni *gateway) LNetInvalidateMDHandle(&rcd->rcd_mdh); INIT_LIST_HEAD(&rcd->rcd_list); - pi = kzalloc(LNET_PINGINFO_SIZE, GFP_NOFS); - if (!pi) + pbuf = lnet_ping_buffer_alloc(LNET_MAX_RTR_NIS, GFP_NOFS); + if (!pbuf) goto out; for (i = 0; i < LNET_MAX_RTR_NIS; i++) { - pi->pi_ni[i].ns_nid = LNET_NID_ANY; - pi->pi_ni[i].ns_status = LNET_NI_STATUS_INVALID; + pbuf->pb_info.pi_ni[i].ns_nid = LNET_NID_ANY; + pbuf->pb_info.pi_ni[i].ns_status = LNET_NI_STATUS_INVALID; } - rcd->rcd_pinginfo = pi; + rcd->rcd_pingbuffer = pbuf; - md.start = pi; + md.start = &pbuf->pb_info; md.user_ptr = rcd; - md.length = LNET_PINGINFO_SIZE; + md.length = LNET_RTR_PINGINFO_SIZE; md.threshold = LNET_MD_THRESH_INF; md.options = LNET_MD_TRUNCATE; md.eq_handle = the_lnet.ln_rc_eqh; @@ -1714,7 +1716,8 @@ lnet_rtrpools_enable(void) lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_routing = 1; - the_lnet.ln_ping_info->pi_features &= ~LNET_PING_FEAT_RTE_DISABLED; + the_lnet.ln_ping_target->pb_info.pi_features &= + ~LNET_PING_FEAT_RTE_DISABLED; lnet_net_unlock(LNET_LOCK_EX); return rc; @@ -1728,7 +1731,8 @@ lnet_rtrpools_disable(void) lnet_net_lock(LNET_LOCK_EX); the_lnet.ln_routing = 0; - the_lnet.ln_ping_info->pi_features |= LNET_PING_FEAT_RTE_DISABLED; + the_lnet.ln_ping_target->pb_info.pi_features |= + LNET_PING_FEAT_RTE_DISABLED; tiny_router_buffers = 0; small_router_buffers = 0; From patchwork Sun Oct 7 23:19:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629799 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 13F3514DB for ; Sun, 7 Oct 2018 23:30:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 03A7528AD0 for ; Sun, 7 Oct 2018 23:30:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EBF3128CC0; Sun, 7 Oct 2018 23:30:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6FBE428AD0 for ; Sun, 7 Oct 2018 23:30:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2A346861766; Sun, 7 Oct 2018 16:30:04 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 88BAD8616E5 for ; Sun, 7 Oct 2018 16:30:02 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 9BCE8AD2C; Sun, 7 Oct 2018 23:30:01 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:37 +1100 Message-ID: <153895437770.16383.3391026679795909640.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 04/24] lustre: lnet: automatic sizing of router pinger buffers X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber The router pinger uses fixed-size buffers to receive the data returned by a ping. When a router has more than 16 interfaces (including loopback) this means the data for some interfaces is dropped. Detect this situation, and track the number of remote NIs in the lnet_rc_data_t structure. lnet_create_rc_data_locked() becomes lnet_update_rc_data_locked(), and modified to replace an existing ping buffer if one is present. It is now also called by lnet_ping_router_locked() when the existing ping buffer is too small. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25774 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-types.h | 4 - drivers/staging/lustre/lnet/lnet/router.c | 90 +++++++++++++------- 2 files changed, 60 insertions(+), 34 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index ab8c6d66cdbf..d1d17ededd06 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -411,8 +411,6 @@ struct lnet_ping_buffer { /* router checker data, per router */ -#define LNET_MAX_RTR_NIS LNET_INTERFACES_MIN -#define LNET_RTR_PINGINFO_SIZE LNET_PING_INFO_SIZE(LNET_MAX_RTR_NIS) struct lnet_rc_data { /* chain on the_lnet.ln_zombie_rcd or ln_deathrow_rcd */ struct list_head rcd_list; @@ -422,6 +420,8 @@ struct lnet_rc_data { struct lnet_peer_ni *rcd_gateway; /* ping buffer */ struct lnet_ping_buffer *rcd_pingbuffer; + /* desired size of buffer */ + int rcd_nnis; }; struct lnet_peer_ni { diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c index e97957ce9252..86cce27e10d8 100644 --- a/drivers/staging/lustre/lnet/lnet/router.c +++ b/drivers/staging/lustre/lnet/lnet/router.c @@ -678,8 +678,11 @@ lnet_parse_rc_info(struct lnet_rc_data *rcd) /* Determine the number of NIs for which there is data. */ nnis = pbuf->pb_info.pi_nnis; - if (pbuf->pb_nnis < nnis) + if (pbuf->pb_nnis < nnis) { + if (rcd->rcd_nnis < nnis) + rcd->rcd_nnis = nnis; nnis = pbuf->pb_nnis; + } list_for_each_entry(rte, &gw->lpni_routes, lr_gwlist) { int down = 0; @@ -911,28 +914,47 @@ lnet_destroy_rc_data(struct lnet_rc_data *rcd) } static struct lnet_rc_data * -lnet_create_rc_data_locked(struct lnet_peer_ni *gateway) +lnet_update_rc_data_locked(struct lnet_peer_ni *gateway) { - struct lnet_rc_data *rcd = NULL; - struct lnet_ping_buffer *pbuf; + struct lnet_handle_md mdh; + struct lnet_rc_data *rcd; + struct lnet_ping_buffer *pbuf = NULL; struct lnet_md md; + int nnis = LNET_INTERFACES_MIN; int rc; int i; + rcd = gateway->lpni_rcd; + if (rcd) { + nnis = rcd->rcd_nnis; + mdh = rcd->rcd_mdh; + LNetInvalidateMDHandle(&rcd->rcd_mdh); + pbuf = rcd->rcd_pingbuffer; + rcd->rcd_pingbuffer = NULL; + } else { + LNetInvalidateMDHandle(&mdh); + } + lnet_net_unlock(gateway->lpni_cpt); - rcd = kzalloc(sizeof(*rcd), GFP_NOFS); - if (!rcd) - goto out; + if (rcd) { + LNetMDUnlink(mdh); + lnet_ping_buffer_decref(pbuf); + } else { + rcd = kzalloc(sizeof(*rcd), GFP_NOFS); + if (!rcd) + goto out; - LNetInvalidateMDHandle(&rcd->rcd_mdh); - INIT_LIST_HEAD(&rcd->rcd_list); + LNetInvalidateMDHandle(&rcd->rcd_mdh); + INIT_LIST_HEAD(&rcd->rcd_list); + rcd->rcd_nnis = nnis; + } - pbuf = lnet_ping_buffer_alloc(LNET_MAX_RTR_NIS, GFP_NOFS); + pbuf = lnet_ping_buffer_alloc(nnis, GFP_NOFS); if (!pbuf) goto out; - for (i = 0; i < LNET_MAX_RTR_NIS; i++) { + for (i = 0; i < nnis; i++) { pbuf->pb_info.pi_ni[i].ns_nid = LNET_NID_ANY; pbuf->pb_info.pi_ni[i].ns_status = LNET_NI_STATUS_INVALID; } @@ -940,7 +962,7 @@ lnet_create_rc_data_locked(struct lnet_peer_ni *gateway) md.start = &pbuf->pb_info; md.user_ptr = rcd; - md.length = LNET_RTR_PINGINFO_SIZE; + md.length = LNET_PING_INFO_SIZE(nnis); md.threshold = LNET_MD_THRESH_INF; md.options = LNET_MD_TRUNCATE; md.eq_handle = the_lnet.ln_rc_eqh; @@ -949,33 +971,37 @@ lnet_create_rc_data_locked(struct lnet_peer_ni *gateway) rc = LNetMDBind(md, LNET_UNLINK, &rcd->rcd_mdh); if (rc < 0) { CERROR("Can't bind MD: %d\n", rc); - goto out; + goto out_ping_buffer_decref; } LASSERT(!rc); lnet_net_lock(gateway->lpni_cpt); - /* router table changed or someone has created rcd for this gateway */ - if (!lnet_isrouter(gateway) || gateway->lpni_rcd) { - lnet_net_unlock(gateway->lpni_cpt); - goto out; + /* Check if this is still a router. */ + if (!lnet_isrouter(gateway)) + goto out_unlock; + /* Check if someone else installed router data. */ + if (gateway->lpni_rcd && gateway->lpni_rcd != rcd) + goto out_unlock; + + /* Install and/or update the router data. */ + if (!gateway->lpni_rcd) { + lnet_peer_ni_addref_locked(gateway); + rcd->rcd_gateway = gateway; + gateway->lpni_rcd = rcd; } - - lnet_peer_ni_addref_locked(gateway); - rcd->rcd_gateway = gateway; - gateway->lpni_rcd = rcd; gateway->lpni_ping_notsent = 0; return rcd; - out: - if (rcd) { - if (!LNetMDHandleIsInvalid(rcd->rcd_mdh)) { - rc = LNetMDUnlink(rcd->rcd_mdh); - LASSERT(!rc); - } +out_unlock: + lnet_net_unlock(gateway->lpni_cpt); + rc = LNetMDUnlink(mdh); + LASSERT(!rc); +out_ping_buffer_decref: + lnet_ping_buffer_decref(pbuf); +out: + if (rcd && rcd != gateway->lpni_rcd) lnet_destroy_rc_data(rcd); - } - lnet_net_lock(gateway->lpni_cpt); return gateway->lpni_rcd; } @@ -1018,9 +1044,9 @@ lnet_ping_router_locked(struct lnet_peer_ni *rtr) return; } - rcd = rtr->lpni_rcd ? - rtr->lpni_rcd : lnet_create_rc_data_locked(rtr); - + rcd = rtr->lpni_rcd; + if (!rcd || rcd->rcd_nnis > rcd->rcd_pingbuffer->pb_nnis) + rcd = lnet_update_rc_data_locked(rtr); if (!rcd) return; From patchwork Sun Oct 7 23:19:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629801 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 97C69112B for ; Sun, 7 Oct 2018 23:30:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86C0828AD0 for ; Sun, 7 Oct 2018 23:30:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 791CC28CC0; Sun, 7 Oct 2018 23:30:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1C37A28AD0 for ; Sun, 7 Oct 2018 23:30:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B4A92861760; Sun, 7 Oct 2018 16:30:12 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DB61A8616A3 for ; Sun, 7 Oct 2018 16:30:10 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id CEB69AE17; Sun, 7 Oct 2018 23:30:09 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:37 +1100 Message-ID: <153895437774.16383.2810116639916310757.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 05/24] lustre: lnet: add Multi-Rail and Discovery ping feature bits X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Claim ping features bit for Multi-Rail and Discovery. Assert in lnet_ping_target_update() that no unknown bits will be send over the wire. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25775 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-types.h | 16 ++++++++++++++++ drivers/staging/lustre/lnet/lnet/api-ni.c | 5 +++++ 2 files changed, 21 insertions(+) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index d1d17ededd06..f4467a3bbfd1 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -386,6 +386,22 @@ struct lnet_ni { #define LNET_PING_FEAT_BASE BIT(0) /* just a ping */ #define LNET_PING_FEAT_NI_STATUS BIT(1) /* return NI status */ #define LNET_PING_FEAT_RTE_DISABLED BIT(2) /* Routing enabled */ +#define LNET_PING_FEAT_MULTI_RAIL BIT(3) /* Multi-Rail aware */ +#define LNET_PING_FEAT_DISCOVERY BIT(4) /* Supports Discovery */ + +/* + * All ping feature bits fit to hit the wire. + * In lnet_assert_wire_constants() this is compared against its open-coded + * value, and in lnet_ping_target_update() it is used to verify that no + * unknown bits have been set. + * New feature bits can be added, just be aware that this does change the + * over-the-wire protocol. + */ +#define LNET_PING_FEAT_BITS (LNET_PING_FEAT_BASE | \ + LNET_PING_FEAT_NI_STATUS | \ + LNET_PING_FEAT_RTE_DISABLED | \ + LNET_PING_FEAT_MULTI_RAIL | \ + LNET_PING_FEAT_DISCOVERY) #define LNET_PING_INFO_SIZE(NNIDS) \ offsetof(struct lnet_ping_info, pi_ni[NNIDS]) diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index ca28ad75fe2b..68af723bc6a1 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1170,6 +1170,11 @@ lnet_ping_target_update(struct lnet_ping_buffer *pbuf, if (!the_lnet.ln_routing) pbuf->pb_info.pi_features |= LNET_PING_FEAT_RTE_DISABLED; + + /* Ensure only known feature bits have been set. */ + LASSERT(pbuf->pb_info.pi_features & LNET_PING_FEAT_BITS); + LASSERT(!(pbuf->pb_info.pi_features & ~LNET_PING_FEAT_BITS)); + lnet_ping_target_install_locked(pbuf); if (the_lnet.ln_ping_target) { From patchwork Sun Oct 7 23:19:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629803 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 46E54112B for ; Sun, 7 Oct 2018 23:30:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3670B28AD0 for ; Sun, 7 Oct 2018 23:30:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2AD3B28CC0; Sun, 7 Oct 2018 23:30:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8D45028AD0 for ; Sun, 7 Oct 2018 23:30:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 45F6E21F517; Sun, 7 Oct 2018 16:30:20 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0E25A21EBF4 for ; Sun, 7 Oct 2018 16:30:18 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2D862ADF7; Sun, 7 Oct 2018 23:30:17 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:37 +1100 Message-ID: <153895437778.16383.4176394927995685300.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 06/24] lustre: lnet: add sanity checks on ping-related constants X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Add sanity checks for LNet ping related data structures and constants to wirecheck.c, and update the generated code in lnet_assert_wire_constants(). In order for the structures and macros to be visible to wirecheck.c, which is a userspace program, they were moved from kernel-only lnet/lib-types.h to lnet/types.h WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25776 Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-types.h | 30 ---------------- .../lustre/include/uapi/linux/lnet/lnet-types.h | 30 ++++++++++++++++ drivers/staging/lustre/lnet/lnet/api-ni.c | 38 ++++++++++++++++++++ 3 files changed, 68 insertions(+), 30 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index f4467a3bbfd1..f28fa5342914 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -378,36 +378,6 @@ struct lnet_ni { #define LNET_PROTO_PING_MATCHBITS 0x8000000000000000LL -/* - * NB: value of these features equal to LNET_PROTO_PING_VERSION_x - * of old LNet, so there shouldn't be any compatibility issue - */ -#define LNET_PING_FEAT_INVAL (0) /* no feature */ -#define LNET_PING_FEAT_BASE BIT(0) /* just a ping */ -#define LNET_PING_FEAT_NI_STATUS BIT(1) /* return NI status */ -#define LNET_PING_FEAT_RTE_DISABLED BIT(2) /* Routing enabled */ -#define LNET_PING_FEAT_MULTI_RAIL BIT(3) /* Multi-Rail aware */ -#define LNET_PING_FEAT_DISCOVERY BIT(4) /* Supports Discovery */ - -/* - * All ping feature bits fit to hit the wire. - * In lnet_assert_wire_constants() this is compared against its open-coded - * value, and in lnet_ping_target_update() it is used to verify that no - * unknown bits have been set. - * New feature bits can be added, just be aware that this does change the - * over-the-wire protocol. - */ -#define LNET_PING_FEAT_BITS (LNET_PING_FEAT_BASE | \ - LNET_PING_FEAT_NI_STATUS | \ - LNET_PING_FEAT_RTE_DISABLED | \ - LNET_PING_FEAT_MULTI_RAIL | \ - LNET_PING_FEAT_DISCOVERY) - -#define LNET_PING_INFO_SIZE(NNIDS) \ - offsetof(struct lnet_ping_info, pi_ni[NNIDS]) -#define LNET_PING_INFO_LONI(PINFO) ((PINFO)->pi_ni[0].ns_nid) -#define LNET_PING_INFO_SEQNO(PINFO) ((PINFO)->pi_ni[0].ns_status) - /* * Descriptor of a ping info buffer: keep a separate indicator of the * size and a reference count. The type is used both as a source and diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h index 6ee60d07ff84..e0e4fd259795 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h @@ -190,6 +190,31 @@ struct lnet_hdr { } msg; } __packed; +/* + * NB: value of these features equal to LNET_PROTO_PING_VERSION_x + * of old LNet, so there shouldn't be any compatibility issue + */ +#define LNET_PING_FEAT_INVAL (0) /* no feature */ +#define LNET_PING_FEAT_BASE (1 << 0) /* just a ping */ +#define LNET_PING_FEAT_NI_STATUS (1 << 1) /* return NI status */ +#define LNET_PING_FEAT_RTE_DISABLED (1 << 2) /* Routing enabled */ +#define LNET_PING_FEAT_MULTI_RAIL (1 << 3) /* Multi-Rail aware */ +#define LNET_PING_FEAT_DISCOVERY (1 << 4) /* Supports Discovery */ + +/* + * All ping feature bits fit to hit the wire. + * In lnet_assert_wire_constants() this is compared against its open-coded + * value, and in lnet_ping_target_update() it is used to verify that no + * unknown bits have been set. + * New feature bits can be added, just be aware that this does change the + * over-the-wire protocol. + */ +#define LNET_PING_FEAT_BITS (LNET_PING_FEAT_BASE | \ + LNET_PING_FEAT_NI_STATUS | \ + LNET_PING_FEAT_RTE_DISABLED | \ + LNET_PING_FEAT_MULTI_RAIL | \ + LNET_PING_FEAT_DISCOVERY) + /* * A HELLO message contains a magic number and protocol version * code in the header's dest_nid, the peer's NID in the src_nid, and @@ -246,6 +271,11 @@ struct lnet_ping_info { struct lnet_ni_status pi_ni[0]; } __packed; +#define LNET_PING_INFO_SIZE(NNIDS) \ + offsetof(struct lnet_ping_info, pi_ni[NNIDS]) +#define LNET_PING_INFO_LONI(PINFO) ((PINFO)->pi_ni[0].ns_nid) +#define LNET_PING_INFO_SEQNO(PINFO) ((PINFO)->pi_ni[0].ns_status) + struct lnet_counters { __u32 msgs_alloc; __u32 msgs_max; diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 68af723bc6a1..d81501f4c282 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -313,6 +313,44 @@ static void lnet_assert_wire_constants(void) BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.hello.incarnation) != 8); BUILD_BUG_ON((int)offsetof(struct lnet_hdr, msg.hello.type) != 40); BUILD_BUG_ON((int)sizeof(((struct lnet_hdr *)0)->msg.hello.type) != 4); + + /* Checks for struct lnet_ni_status and related constants */ + BUILD_BUG_ON(LNET_NI_STATUS_INVALID != 0x00000000); + BUILD_BUG_ON(LNET_NI_STATUS_UP != 0x15aac0de); + BUILD_BUG_ON(LNET_NI_STATUS_DOWN != 0xdeadface); + + /* Checks for struct lnet_ni_status */ + BUILD_BUG_ON((int)sizeof(struct lnet_ni_status) != 16); + BUILD_BUG_ON((int)offsetof(struct lnet_ni_status, ns_nid) != 0); + BUILD_BUG_ON((int)sizeof(((struct lnet_ni_status *)0)->ns_nid) != 8); + BUILD_BUG_ON((int)offsetof(struct lnet_ni_status, ns_status) != 8); + BUILD_BUG_ON((int)sizeof(((struct lnet_ni_status *)0)->ns_status) != 4); + BUILD_BUG_ON((int)offsetof(struct lnet_ni_status, ns_unused) != 12); + BUILD_BUG_ON((int)sizeof(((struct lnet_ni_status *)0)->ns_unused) != 4); + + /* Checks for struct lnet_ping_info and related constants */ + BUILD_BUG_ON(LNET_PROTO_PING_MAGIC != 0x70696E67); + BUILD_BUG_ON(LNET_PING_FEAT_INVAL != 0); + BUILD_BUG_ON(LNET_PING_FEAT_BASE != 1); + BUILD_BUG_ON(LNET_PING_FEAT_NI_STATUS != 2); + BUILD_BUG_ON(LNET_PING_FEAT_RTE_DISABLED != 4); + BUILD_BUG_ON(LNET_PING_FEAT_MULTI_RAIL != 8); + BUILD_BUG_ON(LNET_PING_FEAT_DISCOVERY != 16); + BUILD_BUG_ON(LNET_PING_FEAT_BITS != 31); + + /* Checks for struct lnet_ping_info */ + BUILD_BUG_ON((int)sizeof(struct lnet_ping_info) != 16); + BUILD_BUG_ON((int)offsetof(struct lnet_ping_info, pi_magic) != 0); + BUILD_BUG_ON((int)sizeof(((struct lnet_ping_info *)0)->pi_magic) != 4); + BUILD_BUG_ON((int)offsetof(struct lnet_ping_info, pi_features) != 4); + BUILD_BUG_ON((int)sizeof(((struct lnet_ping_info *)0)->pi_features) + != 4); + BUILD_BUG_ON((int)offsetof(struct lnet_ping_info, pi_pid) != 8); + BUILD_BUG_ON((int)sizeof(((struct lnet_ping_info *)0)->pi_pid) != 4); + BUILD_BUG_ON((int)offsetof(struct lnet_ping_info, pi_nnis) != 12); + BUILD_BUG_ON((int)sizeof(((struct lnet_ping_info *)0)->pi_nnis) != 4); + BUILD_BUG_ON((int)offsetof(struct lnet_ping_info, pi_ni) != 16); + BUILD_BUG_ON((int)sizeof(((struct lnet_ping_info *)0)->pi_ni) != 0); } static struct lnet_lnd * From patchwork Sun Oct 7 23:19:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629805 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49F28112B for ; Sun, 7 Oct 2018 23:30:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A1C428AD0 for ; Sun, 7 Oct 2018 23:30:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2EACB28CC0; Sun, 7 Oct 2018 23:30:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D153628AD0 for ; Sun, 7 Oct 2018 23:30:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 98A9121FC49; Sun, 7 Oct 2018 16:30:27 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D6D9321F8C9 for ; Sun, 7 Oct 2018 16:30:25 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D7F0BAE17; Sun, 7 Oct 2018 23:30:24 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:37 +1100 Message-ID: <153895437782.16383.10279057472731467540.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 07/24] lustre: lnet: cleanup of lnet_peer_ni_addref/decref_locked() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Address style issues in lnet_peer_ni_addref_locked() and lnet_peer_ni_decref_locked(). In the latter routine, replace a sequence of atomic_dec()/atomic_read() with atomic_dec_and_test(). WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25777 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 2e2b5ed27116..f15f5c9c9a25 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -323,8 +323,7 @@ static inline void lnet_peer_ni_decref_locked(struct lnet_peer_ni *lp) { LASSERT(atomic_read(&lp->lpni_refcount) > 0); - atomic_dec(&lp->lpni_refcount); - if (atomic_read(&lp->lpni_refcount) == 0) + if (atomic_dec_and_test(&lp->lpni_refcount)) lnet_destroy_peer_ni_locked(lp); } From patchwork Sun Oct 7 23:19:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629807 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D4CE0112B for ; Sun, 7 Oct 2018 23:30:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C291828CBF for ; Sun, 7 Oct 2018 23:30:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B697F28CC8; Sun, 7 Oct 2018 23:30:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 440DF28CBF for ; Sun, 7 Oct 2018 23:30:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 06E9F21F665; Sun, 7 Oct 2018 16:30:39 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AE08C21F517 for ; Sun, 7 Oct 2018 16:30:36 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id AE079AD2C; Sun, 7 Oct 2018 23:30:35 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:37 +1100 Message-ID: <153895437785.16383.10650578259435328953.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 08/24] lustre: lnet: rename lnet_add/del_peer_ni_to/from_peer() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Rename lnet_add_peer_ni_to_peer() to lnet_add_peer_ni(), and lnet_del_peer_ni_from_peer() to lnet_del_peer_ni(). This brings the function names closer to the ioctls they implement: IOCTL_LIBCFS_ADD_PEER_NI and IOCTL_LIBCFS_DEL_PEER_NI. These names are also a more accturate description their effect: adding or deleting an lnet_peer_ni to LNet. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25778 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 4 ++-- drivers/staging/lustre/lnet/lnet/api-ni.c | 10 +++++---- drivers/staging/lustre/lnet/lnet/peer.c | 22 +++++++++++++++----- 3 files changed, 23 insertions(+), 13 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index f15f5c9c9a25..69f45a76f1cc 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -682,8 +682,8 @@ struct lnet_peer_net *lnet_peer_get_net_locked(struct lnet_peer *peer, u32 net_id); bool lnet_peer_is_ni_pref_locked(struct lnet_peer_ni *lpni, struct lnet_ni *ni); -int lnet_add_peer_ni_to_peer(lnet_nid_t key_nid, lnet_nid_t nid, bool mr); -int lnet_del_peer_ni_from_peer(lnet_nid_t key_nid, lnet_nid_t nid); +int lnet_add_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid, bool mr); +int lnet_del_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid); int lnet_get_peer_info(__u32 idx, lnet_nid_t *primary_nid, lnet_nid_t *nid, bool *mr, struct lnet_peer_ni_credit_info __user *peer_ni_info, diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index d81501f4c282..d64ae2939abc 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -2848,9 +2848,9 @@ LNetCtl(unsigned int cmd, void *arg) return -EINVAL; mutex_lock(&the_lnet.ln_api_mutex); - rc = lnet_add_peer_ni_to_peer(cfg->prcfg_prim_nid, - cfg->prcfg_cfg_nid, - cfg->prcfg_mr); + rc = lnet_add_peer_ni(cfg->prcfg_prim_nid, + cfg->prcfg_cfg_nid, + cfg->prcfg_mr); mutex_unlock(&the_lnet.ln_api_mutex); return rc; } @@ -2862,8 +2862,8 @@ LNetCtl(unsigned int cmd, void *arg) return -EINVAL; mutex_lock(&the_lnet.ln_api_mutex); - rc = lnet_del_peer_ni_from_peer(cfg->prcfg_prim_nid, - cfg->prcfg_cfg_nid); + rc = lnet_del_peer_ni(cfg->prcfg_prim_nid, + cfg->prcfg_cfg_nid); mutex_unlock(&the_lnet.ln_api_mutex); return rc; } diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index ebb84356302f..bbf07008dbb0 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -891,14 +891,16 @@ lnet_peer_ni_add_non_mr(lnet_nid_t nid) } /* + * Implementation of IOC_LIBCFS_ADD_PEER_NI. + * * This API handles the following combinations: - * Create a primary NI if only the prim_nid is provided - * Create or add an lpni to a primary NI. Primary NI must've already - * been created - * Create a non-MR peer. + * Create a primary NI if only the prim_nid is provided + * Create or add an lpni to a primary NI. Primary NI must've already + * been created + * Create a non-MR peer. */ int -lnet_add_peer_ni_to_peer(lnet_nid_t prim_nid, lnet_nid_t nid, bool mr) +lnet_add_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid, bool mr) { /* * Caller trying to setup an MR like peer hierarchy but @@ -929,8 +931,16 @@ lnet_add_peer_ni_to_peer(lnet_nid_t prim_nid, lnet_nid_t nid, bool mr) return 0; } +/* + * Implementation of IOC_LIBCFS_DEL_PEER_NI. + * + * This API handles the following combinations: + * Delete a NI from a peer if both prim_nid and nid are provided. + * Delete a peer if only prim_nid is provided. + * Delete a peer if its primary nid is provided. + */ int -lnet_del_peer_ni_from_peer(lnet_nid_t prim_nid, lnet_nid_t nid) +lnet_del_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid) { lnet_nid_t local_nid; struct lnet_peer *peer; From patchwork Sun Oct 7 23:19:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629809 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0485A14DB for ; Sun, 7 Oct 2018 23:30:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E7EE128CBF for ; Sun, 7 Oct 2018 23:30:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB3AF28CC8; Sun, 7 Oct 2018 23:30:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 68EC528CBF for ; Sun, 7 Oct 2018 23:30:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 23A18861797; Sun, 7 Oct 2018 16:30:48 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 701C021F528 for ; Sun, 7 Oct 2018 16:30:46 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 6313BAE17; Sun, 7 Oct 2018 23:30:45 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:37 +1100 Message-ID: <153895437789.16383.3567353433359493775.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 09/24] lustre: lnet: refactor lnet_del_peer_ni() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Refactor lnet_del_peer_ni(). In particular break out the code that removes an lnet_peer_ni from an lnet_peer and put it into a separate function, lnet_peer_del_nid(). WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25779 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- drivers/staging/lustre/lnet/lnet/peer.c | 96 +++++++++++++++++++++++-------- 1 file changed, 71 insertions(+), 25 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index bbf07008dbb0..30a2486712e4 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -254,7 +254,7 @@ lnet_peer_ni_del_locked(struct lnet_peer_ni *lpni) * * The last reference may be lost in a place where the * lnet_net_lock locks only a single cpt, and that cpt may not - * be lpni->lpni_cpt. So the zombie list of this peer_table + * be lpni->lpni_cpt. So the zombie list of lnet_peer_table * has its own lock. */ spin_lock(&ptable->pt_zombie_lock); @@ -340,6 +340,61 @@ lnet_peer_del_locked(struct lnet_peer *peer) return rc2; } +static int +lnet_peer_del(struct lnet_peer *peer) +{ + lnet_net_lock(LNET_LOCK_EX); + lnet_peer_del_locked(peer); + lnet_net_unlock(LNET_LOCK_EX); + + return 0; +} + +/* + * Delete a NID from a peer. + * Implements a few sanity checks. + * Call with ln_api_mutex held. + */ +static int +lnet_peer_del_nid(struct lnet_peer *lp, lnet_nid_t nid) +{ + struct lnet_peer *lp2; + struct lnet_peer_ni *lpni; + + lpni = lnet_find_peer_ni_locked(nid); + if (!lpni) { + CERROR("Cannot remove unknown nid %s from peer %s\n", + libcfs_nid2str(nid), + libcfs_nid2str(lp->lp_primary_nid)); + return -ENOENT; + } + lnet_peer_ni_decref_locked(lpni); + lp2 = lpni->lpni_peer_net->lpn_peer; + if (lp2 != lp) { + CERROR("Nid %s is attached to peer %s, not peer %s\n", + libcfs_nid2str(nid), + libcfs_nid2str(lp2->lp_primary_nid), + libcfs_nid2str(lp->lp_primary_nid)); + return -EINVAL; + } + + /* + * This function only allows deletion of the primary NID if it + * is the only NID. + */ + if (nid == lp->lp_primary_nid && lnet_get_num_peer_nis(lp) != 1) { + CERROR("Cannot delete primary NID %s from multi-NID peer\n", + libcfs_nid2str(nid)); + return -EINVAL; + } + + lnet_net_lock(LNET_LOCK_EX); + lnet_peer_ni_del_locked(lpni); + lnet_net_unlock(LNET_LOCK_EX); + + return 0; +} + static void lnet_peer_table_cleanup_locked(struct lnet_net *net, struct lnet_peer_table *ptable) @@ -938,45 +993,36 @@ lnet_add_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid, bool mr) * Delete a NI from a peer if both prim_nid and nid are provided. * Delete a peer if only prim_nid is provided. * Delete a peer if its primary nid is provided. + * + * The caller must hold ln_api_mutex. This prevents the peer from + * being modified/deleted by a different thread. */ int lnet_del_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid) { - lnet_nid_t local_nid; - struct lnet_peer *peer; + struct lnet_peer *lp; struct lnet_peer_ni *lpni; - int rc; if (prim_nid == LNET_NID_ANY) return -EINVAL; - local_nid = (nid != LNET_NID_ANY) ? nid : prim_nid; - - lpni = lnet_find_peer_ni_locked(local_nid); + lpni = lnet_find_peer_ni_locked(prim_nid); if (!lpni) - return -EINVAL; + return -ENOENT; lnet_peer_ni_decref_locked(lpni); + lp = lpni->lpni_peer_net->lpn_peer; - peer = lpni->lpni_peer_net->lpn_peer; - LASSERT(peer); - - if (peer->lp_primary_nid == lpni->lpni_nid) { - /* - * deleting the primary ni is equivalent to deleting the - * entire peer - */ - lnet_net_lock(LNET_LOCK_EX); - rc = lnet_peer_del_locked(peer); - lnet_net_unlock(LNET_LOCK_EX); - - return rc; + if (prim_nid != lp->lp_primary_nid) { + CDEBUG(D_NET, "prim_nid %s is not primary for peer %s\n", + libcfs_nid2str(prim_nid), + libcfs_nid2str(lp->lp_primary_nid)); + return -ENODEV; } - lnet_net_lock(LNET_LOCK_EX); - rc = lnet_peer_ni_del_locked(lpni); - lnet_net_unlock(LNET_LOCK_EX); + if (nid == LNET_NID_ANY || nid == lp->lp_primary_nid) + return lnet_peer_del(lp); - return rc; + return lnet_peer_del_nid(lp, nid); } void From patchwork Sun Oct 7 23:19:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629811 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A8CD814DB for ; Sun, 7 Oct 2018 23:30:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 95AD728CBF for ; Sun, 7 Oct 2018 23:30:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 87AC828CC8; Sun, 7 Oct 2018 23:30:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C53F228CBF for ; Sun, 7 Oct 2018 23:30:56 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7335786177E; Sun, 7 Oct 2018 16:30:56 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D7CB221F502 for ; Sun, 7 Oct 2018 16:30:54 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D932EAD2C; Sun, 7 Oct 2018 23:30:53 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:37 +1100 Message-ID: <153895437792.16383.4508869255214195437.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 10/24] lustre: lnet: refactor lnet_add_peer_ni() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Refactor lnet_add_peer_ni() and the functions called by it. In particular, lnet_peer_add_nid() adds an lnet_peer_ni to an existing lnet_peer, lnet_peer_add() adds a new lnet_peer. lnet_find_or_create_peer_locked() is removed. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25780 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 1 drivers/staging/lustre/lnet/lnet/lib-move.c | 13 + drivers/staging/lustre/lnet/lnet/peer.c | 230 +++++++------------- 3 files changed, 92 insertions(+), 152 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 69f45a76f1cc..fc748ffa251d 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -668,7 +668,6 @@ u32 lnet_get_dlc_seq_locked(void); struct lnet_peer_ni *lnet_get_next_peer_ni_locked(struct lnet_peer *peer, struct lnet_peer_net *peer_net, struct lnet_peer_ni *prev); -struct lnet_peer *lnet_find_or_create_peer_locked(lnet_nid_t dst_nid, int cpt); struct lnet_peer_ni *lnet_nid2peerni_locked(lnet_nid_t nid, int cpt); struct lnet_peer_ni *lnet_nid2peerni_ex(lnet_nid_t nid, int cpt); struct lnet_peer_ni *lnet_find_peer_ni_locked(lnet_nid_t nid); diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index e8c021622f91..59ae8d0649e5 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1262,11 +1262,18 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, return -ESHUTDOWN; } - peer = lnet_find_or_create_peer_locked(dst_nid, cpt); - if (IS_ERR(peer)) { + /* + * lnet_nid2peerni_locked() is the path that will find an + * existing peer_ni, or create one and mark it as having been + * created due to network traffic. + */ + lpni = lnet_nid2peerni_locked(dst_nid, cpt); + if (IS_ERR(lpni)) { lnet_net_unlock(cpt); - return PTR_ERR(peer); + return PTR_ERR(lpni); } + peer = lpni->lpni_peer_net->lpn_peer; + lnet_peer_ni_decref_locked(lpni); /* If peer is not healthy then can not send anything to it */ if (!lnet_is_peer_healthy_locked(peer)) { diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 30a2486712e4..6b7ca5c361b8 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -541,25 +541,6 @@ lnet_find_peer_ni_locked(lnet_nid_t nid) return lpni; } -struct lnet_peer * -lnet_find_or_create_peer_locked(lnet_nid_t dst_nid, int cpt) -{ - struct lnet_peer_ni *lpni; - struct lnet_peer *lp; - - lpni = lnet_find_peer_ni_locked(dst_nid); - if (!lpni) { - lpni = lnet_nid2peerni_locked(dst_nid, cpt); - if (IS_ERR(lpni)) - return ERR_CAST(lpni); - } - - lp = lpni->lpni_peer_net->lpn_peer; - lnet_peer_ni_decref_locked(lpni); - - return lp; -} - struct lnet_peer_ni * lnet_get_peer_ni_idx_locked(int idx, struct lnet_peer_net **lpn, struct lnet_peer **lp) @@ -774,131 +755,95 @@ lnet_peer_setup_hierarchy(struct lnet_peer *lp, struct lnet_peer_ni return -ENOMEM; } +/* + * Create a new peer, with nid as its primary nid. + * + * It is not an error if the peer already exists, provided that the + * given nid is the primary NID. + * + * Call with the lnet_api_mutex held. + */ static int -lnet_add_prim_lpni(lnet_nid_t nid) +lnet_peer_add(lnet_nid_t nid, bool mr) { - int rc; - struct lnet_peer *peer; + struct lnet_peer *lp; struct lnet_peer_ni *lpni; LASSERT(nid != LNET_NID_ANY); /* - * lookup the NID and its peer - * if the peer doesn't exist, create it. - * if this is a non-MR peer then change its state to MR and exit. - * if this is an MR peer and it's a primary NI: NO-OP. - * if this is an MR peer and it's not a primary NI. Operation not - * allowed. - * - * The adding and deleting of peer nis is being serialized through - * the api_mutex. So we can look up peers with the mutex locked - * safely. Only when we need to change the ptable, do we need to - * exclusively lock the lnet_net_lock() + * No need for the lnet_net_lock here, because the + * lnet_api_mutex is held. */ lpni = lnet_find_peer_ni_locked(nid); if (!lpni) { - rc = lnet_peer_setup_hierarchy(NULL, NULL, nid); + int rc = lnet_peer_setup_hierarchy(NULL, NULL, nid); if (rc != 0) return rc; lpni = lnet_find_peer_ni_locked(nid); + LASSERT(lpni); } - - LASSERT(lpni); - + lp = lpni->lpni_peer_net->lpn_peer; lnet_peer_ni_decref_locked(lpni); - peer = lpni->lpni_peer_net->lpn_peer; - - /* - * If we found a lpni with the same nid as the NID we're trying to - * create, then we're trying to create an already existing lpni - * that belongs to a different peer - */ - if (peer->lp_primary_nid != nid) + /* A found peer must have this primary NID */ + if (lp->lp_primary_nid != nid) return -EEXIST; /* - * if we found an lpni that is not a multi-rail, which could occur + * If we found an lpni that is not a multi-rail, which could occur * if lpni is already created as a non-mr lpni or we just created * it, then make sure you indicate that this lpni is a primary mr * capable peer. * * TODO: update flags if necessary */ - if (!peer->lp_multi_rail && peer->lp_primary_nid == nid) - peer->lp_multi_rail = true; + if (mr && !lp->lp_multi_rail) { + lp->lp_multi_rail = true; + } else if (!mr && lp->lp_multi_rail) { + /* The mr state is sticky. */ + CDEBUG(D_NET, "Cannot clear multi-flag from peer %s\n", + libcfs_nid2str(nid)); + } - return rc; + return 0; } static int -lnet_add_peer_ni_to_prim_lpni(lnet_nid_t prim_nid, lnet_nid_t nid) +lnet_peer_add_nid(struct lnet_peer *lp, lnet_nid_t nid, bool mr) { - struct lnet_peer *peer, *primary_peer; - struct lnet_peer_ni *lpni = NULL, *klpni = NULL; - - LASSERT(prim_nid != LNET_NID_ANY && nid != LNET_NID_ANY); + struct lnet_peer_ni *lpni; - /* - * key nid must be created by this point. If not then this - * operation is not permitted - */ - klpni = lnet_find_peer_ni_locked(prim_nid); - if (!klpni) - return -ENOENT; + LASSERT(lp); + LASSERT(nid != LNET_NID_ANY); - lnet_peer_ni_decref_locked(klpni); + if (!mr && !lp->lp_multi_rail) { + CERROR("Cannot add nid %s to non-multi-rail peer %s\n", + libcfs_nid2str(nid), + libcfs_nid2str(lp->lp_primary_nid)); + return -EPERM; + } - primary_peer = klpni->lpni_peer_net->lpn_peer; + if (!lp->lp_multi_rail) + lp->lp_multi_rail = true; lpni = lnet_find_peer_ni_locked(nid); - if (lpni) { - lnet_peer_ni_decref_locked(lpni); - - peer = lpni->lpni_peer_net->lpn_peer; - /* - * lpni already exists in the system but it belongs to - * a different peer. We can't re-added it - */ - if (peer->lp_primary_nid != prim_nid && peer->lp_multi_rail) { - CERROR("Cannot add NID %s owned by peer %s to peer %s\n", - libcfs_nid2str(lpni->lpni_nid), - libcfs_nid2str(peer->lp_primary_nid), - libcfs_nid2str(prim_nid)); - return -EEXIST; - } else if (peer->lp_primary_nid == prim_nid) { - /* - * found a peer_ni that is already part of the - * peer. This is a no-op operation. - */ - return 0; - } - - /* - * TODO: else if (peer->lp_primary_nid != prim_nid && - * !peer->lp_multi_rail) - * peer is not an MR peer and it will be moved in the next - * step to klpni, so update its flags accordingly. - * lnet_move_peer_ni() - */ - - /* - * TODO: call lnet_update_peer() from here to update the - * flags. This is the case when the lpni you're trying to - * add is already part of the peer. This could've been - * added by the DD previously, so go ahead and do any - * updates to the state if necessary - */ + if (!lpni) + return lnet_peer_setup_hierarchy(lp, NULL, nid); + if (lpni->lpni_peer_net->lpn_peer != lp) { + struct lnet_peer *lp2 = lpni->lpni_peer_net->lpn_peer; + CERROR("Cannot add NID %s owned by peer %s to peer %s\n", + libcfs_nid2str(lpni->lpni_nid), + libcfs_nid2str(lp2->lp_primary_nid), + libcfs_nid2str(lp->lp_primary_nid)); + return -EEXIST; } - /* - * When we get here we either have found an existing lpni, which - * we can switch to the new peer. Or we need to create one and - * add it to the new peer - */ - return lnet_peer_setup_hierarchy(primary_peer, lpni, nid); + CDEBUG(D_NET, "NID %s is already owned by peer %s\n", + libcfs_nid2str(lpni->lpni_nid), + libcfs_nid2str(lp->lp_primary_nid)); + return 0; } /* @@ -929,61 +874,50 @@ lnet_peer_ni_traffic_add(lnet_nid_t nid) return rc; } -static int -lnet_peer_ni_add_non_mr(lnet_nid_t nid) -{ - struct lnet_peer_ni *lpni; - - lpni = lnet_find_peer_ni_locked(nid); - if (lpni) { - CERROR("Cannot add %s as non-mr when it already exists\n", - libcfs_nid2str(nid)); - lnet_peer_ni_decref_locked(lpni); - return -EEXIST; - } - - return lnet_peer_setup_hierarchy(NULL, NULL, nid); -} - /* * Implementation of IOC_LIBCFS_ADD_PEER_NI. * * This API handles the following combinations: - * Create a primary NI if only the prim_nid is provided - * Create or add an lpni to a primary NI. Primary NI must've already - * been created - * Create a non-MR peer. + * Create a peer with its primary NI if only the prim_nid is provided + * Add a NID to a peer identified by the prim_nid. The peer identified + * by the prim_nid must already exist. + * The peer being created may be non-MR. + * + * The caller must hold ln_api_mutex. This prevents the peer from + * being created/modified/deleted by a different thread. */ int lnet_add_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid, bool mr) { + struct lnet_peer *lp = NULL; + struct lnet_peer_ni *lpni; + + /* The prim_nid must always be specified */ + if (prim_nid == LNET_NID_ANY) + return -EINVAL; + /* - * Caller trying to setup an MR like peer hierarchy but - * specifying it to be non-MR. This is not allowed. + * If nid isn't specified, we must create a new peer with + * prim_nid as its primary nid. */ - if (prim_nid != LNET_NID_ANY && - nid != LNET_NID_ANY && !mr) - return -EPERM; - - /* Add the primary NID of a peer */ - if (prim_nid != LNET_NID_ANY && - nid == LNET_NID_ANY && mr) - return lnet_add_prim_lpni(prim_nid); + if (nid == LNET_NID_ANY) + return lnet_peer_add(prim_nid, mr); - /* Add a NID to an existing peer */ - if (prim_nid != LNET_NID_ANY && - nid != LNET_NID_ANY && mr) - return lnet_add_peer_ni_to_prim_lpni(prim_nid, nid); + /* Look up the prim_nid, which must exist. */ + lpni = lnet_find_peer_ni_locked(prim_nid); + if (!lpni) + return -ENOENT; + lnet_peer_ni_decref_locked(lpni); + lp = lpni->lpni_peer_net->lpn_peer; - /* Add a non-MR peer NI */ - if (((prim_nid != LNET_NID_ANY && - nid == LNET_NID_ANY) || - (prim_nid == LNET_NID_ANY && - nid != LNET_NID_ANY)) && !mr) - return lnet_peer_ni_add_non_mr(prim_nid != LNET_NID_ANY ? - prim_nid : nid); + if (lp->lp_primary_nid != prim_nid) { + CDEBUG(D_NET, "prim_nid %s is not primary for peer %s\n", + libcfs_nid2str(prim_nid), + libcfs_nid2str(lp->lp_primary_nid)); + return -ENODEV; + } - return 0; + return lnet_peer_add_nid(lp, nid, mr); } /* From patchwork Sun Oct 7 23:19:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629813 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CDD21112B for ; Sun, 7 Oct 2018 23:31:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BEE2828CBF for ; Sun, 7 Oct 2018 23:31:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B2CBA28CC8; Sun, 7 Oct 2018 23:31:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 290EA28CBF for ; Sun, 7 Oct 2018 23:31:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E79A38617B0; Sun, 7 Oct 2018 16:31:04 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2949621F5E1 for ; Sun, 7 Oct 2018 16:31:03 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 1BEDFAE87; Sun, 7 Oct 2018 23:31:02 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:37 +1100 Message-ID: <153895437796.16383.5518559009775786439.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 11/24] lustre: lnet: introduce LNET_PEER_MULTI_RAIL flag bit X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Add lp_state as a flag word to lnet_peer, and add lp_lock to protect it. This lock needs to be taken whenever the field is updated, because setting or clearing a bit is a read-modify-write cycle. The lp_multi_rail is removed, its function is replaced by the new LNET_PEER_MULTI_RAIL flag bit. The helper lnet_peer_is_multi_rail() tests the bit. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25781 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 6 +++++ .../staging/lustre/include/linux/lnet/lib-types.h | 11 ++++++++-- drivers/staging/lustre/lnet/lnet/lib-move.c | 9 +++++--- drivers/staging/lustre/lnet/lnet/peer.c | 22 +++++++++++++------- 4 files changed, 34 insertions(+), 14 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index fc748ffa251d..75b47628c70e 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -757,4 +757,10 @@ lnet_peer_set_alive(struct lnet_peer_ni *lp) lnet_notify_locked(lp, 0, 1, lp->lpni_last_alive); } +static inline bool +lnet_peer_is_multi_rail(struct lnet_peer *lp) +{ + return lp->lp_state & LNET_PEER_MULTI_RAIL; +} + #endif diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index f28fa5342914..602978a1c86e 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -467,6 +467,8 @@ struct lnet_peer_ni { atomic_t lpni_refcount; /* CPT this peer attached on */ int lpni_cpt; + /* state flags -- protected by lpni_lock */ + unsigned int lpni_state; /* # refs from lnet_route::lr_gateway */ int lpni_rtr_refcount; /* sequence number used to round robin over peer nis within a net */ @@ -497,10 +499,15 @@ struct lnet_peer { /* primary NID of the peer */ lnet_nid_t lp_primary_nid; - /* peer is Multi-Rail enabled peer */ - bool lp_multi_rail; + /* lock protecting peer state flags */ + spinlock_t lp_lock; + + /* peer state flags */ + unsigned int lp_state; }; +#define LNET_PEER_MULTI_RAIL BIT(0) + struct lnet_peer_net { /* chain on peer block */ struct list_head lpn_on_peer_list; diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index 59ae8d0649e5..0d0ad30bb164 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1281,7 +1281,8 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, return -EHOSTUNREACH; } - if (!peer->lp_multi_rail && lnet_get_num_peer_nis(peer) > 1) { + if (!lnet_peer_is_multi_rail(peer) && + lnet_get_num_peer_nis(peer) > 1) { lnet_net_unlock(cpt); CERROR("peer %s is declared to be non MR capable, yet configured with more than one NID\n", libcfs_nid2str(dst_nid)); @@ -1307,7 +1308,7 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, if (msg->msg_type == LNET_MSG_REPLY || msg->msg_type == LNET_MSG_ACK || - !peer->lp_multi_rail || + !lnet_peer_is_multi_rail(peer) || best_ni) { /* * for replies we want to respond on the same peer_ni we @@ -1354,7 +1355,7 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, * then use the best_gw found to send * the message to */ - if (!peer->lp_multi_rail) + if (!lnet_peer_is_multi_rail(peer)) best_lpni = best_gw; else best_lpni = NULL; @@ -1375,7 +1376,7 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, * if the peer is not MR capable, then we should always send to it * using the first NI in the NET we determined. */ - if (!peer->lp_multi_rail) { + if (!lnet_peer_is_multi_rail(peer)) { if (!best_lpni) { lnet_net_unlock(cpt); CERROR("no route to %s\n", diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 6b7ca5c361b8..cc2b926b76e4 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -182,6 +182,7 @@ lnet_peer_alloc(lnet_nid_t nid) INIT_LIST_HEAD(&lp->lp_on_lnet_peer_list); INIT_LIST_HEAD(&lp->lp_peer_nets); + spin_lock_init(&lp->lp_lock); lp->lp_primary_nid = nid; /* TODO: update flags */ @@ -798,13 +799,15 @@ lnet_peer_add(lnet_nid_t nid, bool mr) * * TODO: update flags if necessary */ - if (mr && !lp->lp_multi_rail) { - lp->lp_multi_rail = true; - } else if (!mr && lp->lp_multi_rail) { + spin_lock(&lp->lp_lock); + if (mr && !(lp->lp_state & LNET_PEER_MULTI_RAIL)) { + lp->lp_state |= LNET_PEER_MULTI_RAIL; + } else if (!mr && (lp->lp_state & LNET_PEER_MULTI_RAIL)) { /* The mr state is sticky. */ - CDEBUG(D_NET, "Cannot clear multi-flag from peer %s\n", + CDEBUG(D_NET, "Cannot clear multi-rail flag from peer %s\n", libcfs_nid2str(nid)); } + spin_unlock(&lp->lp_lock); return 0; } @@ -817,15 +820,18 @@ lnet_peer_add_nid(struct lnet_peer *lp, lnet_nid_t nid, bool mr) LASSERT(lp); LASSERT(nid != LNET_NID_ANY); - if (!mr && !lp->lp_multi_rail) { + spin_lock(&lp->lp_lock); + if (!mr && !(lp->lp_state & LNET_PEER_MULTI_RAIL)) { + spin_unlock(&lp->lp_lock); CERROR("Cannot add nid %s to non-multi-rail peer %s\n", libcfs_nid2str(nid), libcfs_nid2str(lp->lp_primary_nid)); return -EPERM; } - if (!lp->lp_multi_rail) - lp->lp_multi_rail = true; + if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) + lp->lp_state |= LNET_PEER_MULTI_RAIL; + spin_unlock(&lp->lp_lock); lpni = lnet_find_peer_ni_locked(nid); if (!lpni) @@ -1183,7 +1189,7 @@ int lnet_get_peer_info(__u32 idx, lnet_nid_t *primary_nid, lnet_nid_t *nid, return -ENOENT; *primary_nid = lp->lp_primary_nid; - *mr = lp->lp_multi_rail; + *mr = lnet_peer_is_multi_rail(lp); *nid = lpni->lpni_nid; snprintf(ni_info.cr_aliveness, LNET_MAX_STR_LEN, "NA"); if (lnet_isrouter(lpni) || From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629815 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B8AA914DB for ; Sun, 7 Oct 2018 23:31:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A6FB628CBF for ; Sun, 7 Oct 2018 23:31:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9B62628CC8; Sun, 7 Oct 2018 23:31:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9B8C728CBF for ; Sun, 7 Oct 2018 23:31:13 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4EB0421F502; Sun, 7 Oct 2018 16:31:13 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F41CC86177E for ; Sun, 7 Oct 2018 16:31:10 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 215BAAD2C; Sun, 7 Oct 2018 23:31:10 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437800.16383.15417431282816541221.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 12/24] lustre: lnet: preferred NIs for non-Multi-Rail peers X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber When a node sends a message to a peer NI, there may be a preferred local NI that should be the source of the message. This is in particular the case for non-Multi- Rail (NMR) peers, as an NMR peer depends in some cases on the source address of a message to correctly identify its origin. (This as opposed to using a UUID provided by a higher protocol layer.) Implement this by keeping an array of preferred local NIDs in the lnet_peer_ni structure. The case where only a single NID needs to be stored is optimized so that this can be done without needing to allocate any memory. A flag in the lnet_peer_ni, LNET_PEER_NI_NON_MR_PREF, indicates that the preferred NI was automatically added for an NMR peer. Note that a peer which has not been explicitly configured as Multi-Rail will be treated as non-Multi-Rail until proven otherwise. These automatic preferences will be cleared if the peer is changed to Multi-Rail. - lnet_peer_ni_set_non_mr_pref_nid() set NMR preferred NI for peer_ni - lnet_peer_ni_clr_non_mr_pref_nid() clear NMR preferred NI for peer_ni - lnet_peer_clr_non_mr_pref_nids() clear NMR preferred NIs for all peer_ni - lnet_peer_add_pref_nid() add a preferred NID - lnet_peer_del_pref_nid() delete a preferred NID WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25782 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 7 - .../staging/lustre/include/linux/lnet/lib-types.h | 10 + drivers/staging/lustre/lnet/lnet/lib-move.c | 49 +++- drivers/staging/lustre/lnet/lnet/peer.c | 257 +++++++++++++++++++- 4 files changed, 285 insertions(+), 38 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 75b47628c70e..2864bd8a403b 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -668,7 +668,8 @@ u32 lnet_get_dlc_seq_locked(void); struct lnet_peer_ni *lnet_get_next_peer_ni_locked(struct lnet_peer *peer, struct lnet_peer_net *peer_net, struct lnet_peer_ni *prev); -struct lnet_peer_ni *lnet_nid2peerni_locked(lnet_nid_t nid, int cpt); +struct lnet_peer_ni *lnet_nid2peerni_locked(lnet_nid_t nid, lnet_nid_t pref, + int cpt); struct lnet_peer_ni *lnet_nid2peerni_ex(lnet_nid_t nid, int cpt); struct lnet_peer_ni *lnet_find_peer_ni_locked(lnet_nid_t nid); void lnet_peer_net_added(struct lnet_net *net); @@ -679,8 +680,8 @@ int lnet_peer_tables_create(void); void lnet_debug_peer(lnet_nid_t nid); struct lnet_peer_net *lnet_peer_get_net_locked(struct lnet_peer *peer, u32 net_id); -bool lnet_peer_is_ni_pref_locked(struct lnet_peer_ni *lpni, - struct lnet_ni *ni); +bool lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, lnet_nid_t nid); +int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid); int lnet_add_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid, bool mr); int lnet_del_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid); int lnet_get_peer_info(__u32 idx, lnet_nid_t *primary_nid, lnet_nid_t *nid, diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index 602978a1c86e..eff2aed5e5c1 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -481,14 +481,20 @@ struct lnet_peer_ni { unsigned int lpni_ping_feats; /* routers on this peer */ struct list_head lpni_routes; - /* array of preferred local nids */ - lnet_nid_t *lpni_pref_nids; + /* preferred local nids: if only one, use lpni_pref.nid */ + union lpni_pref { + lnet_nid_t nid; + lnet_nid_t *nids; + } lpni_pref; /* number of preferred NIDs in lnpi_pref_nids */ u32 lpni_pref_nnids; /* router checker state */ struct lnet_rc_data *lpni_rcd; }; +/* Preferred path added due to traffic on non-MR peer_ni */ +#define LNET_PEER_NI_NON_MR_PREF BIT(0) + struct lnet_peer { /* chain on global peer list */ struct list_head lp_on_lnet_peer_list; diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index 0d0ad30bb164..99d8b22356bb 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1267,7 +1267,7 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, * existing peer_ni, or create one and mark it as having been * created due to network traffic. */ - lpni = lnet_nid2peerni_locked(dst_nid, cpt); + lpni = lnet_nid2peerni_locked(dst_nid, LNET_NID_ANY, cpt); if (IS_ERR(lpni)) { lnet_net_unlock(cpt); return PTR_ERR(lpni); @@ -1281,14 +1281,6 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, return -EHOSTUNREACH; } - if (!lnet_peer_is_multi_rail(peer) && - lnet_get_num_peer_nis(peer) > 1) { - lnet_net_unlock(cpt); - CERROR("peer %s is declared to be non MR capable, yet configured with more than one NID\n", - libcfs_nid2str(dst_nid)); - return -EINVAL; - } - /* * STEP 1: first jab at determining best_ni * if src_nid is explicitly specified, then best_ni is already @@ -1373,8 +1365,14 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, } /* - * if the peer is not MR capable, then we should always send to it - * using the first NI in the NET we determined. + * We must use a consistent source address when sending to a + * non-MR peer. However, a non-MR peer can have multiple NIDs + * on multiple networks, and we may even need to talk to this + * peer on multiple networks -- certain types of + * load-balancing configuration do this. + * + * So we need to pick the NI the peer prefers for this + * particular network. */ if (!lnet_peer_is_multi_rail(peer)) { if (!best_lpni) { @@ -1384,10 +1382,26 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, return -EHOSTUNREACH; } - /* best ni could be set because src_nid was provided */ + /* best ni is already set if src_nid was provided */ + if (!best_ni) { + /* Get the target peer_ni */ + peer_net = lnet_peer_get_net_locked( + peer, LNET_NIDNET(best_lpni->lpni_nid)); + list_for_each_entry(lpni, &peer_net->lpn_peer_nis, + lpni_on_peer_net_list) { + if (lpni->lpni_pref_nnids == 0) + continue; + LASSERT(lpni->lpni_pref_nnids == 1); + best_ni = lnet_nid2ni_locked( + lpni->lpni_pref.nid, cpt); + break; + } + } + /* if best_ni is still not set just pick one */ if (!best_ni) { best_ni = lnet_net2ni_locked( best_lpni->lpni_net->net_id, cpt); + /* If there is no best_ni we don't have a route */ if (!best_ni) { lnet_net_unlock(cpt); CERROR("no path to %s from net %s\n", @@ -1395,7 +1409,13 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, libcfs_net2str(best_lpni->lpni_net->net_id)); return -EHOSTUNREACH; } + lpni = list_entry(peer_net->lpn_peer_nis.next, + struct lnet_peer_ni, + lpni_on_peer_net_list); } + /* Set preferred NI if necessary. */ + if (lpni->lpni_pref_nnids == 0) + lnet_peer_ni_set_non_mr_pref_nid(lpni, best_ni->ni_nid); } /* @@ -1593,7 +1613,8 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, */ if (!lnet_is_peer_ni_healthy_locked(lpni)) continue; - ni_is_pref = lnet_peer_is_ni_pref_locked(lpni, best_ni); + ni_is_pref = lnet_peer_is_pref_nid_locked(lpni, + best_ni->ni_nid); /* if this is a preferred peer use it */ if (!preferred && ni_is_pref) { @@ -2380,7 +2401,7 @@ lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid, } lnet_net_lock(cpt); - lpni = lnet_nid2peerni_locked(from_nid, cpt); + lpni = lnet_nid2peerni_locked(from_nid, ni->ni_nid, cpt); if (IS_ERR(lpni)) { lnet_net_unlock(cpt); CERROR("%s, src %s: Dropping %s (error %ld looking up sender)\n", diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index cc2b926b76e4..44a2bf641260 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -617,18 +617,233 @@ lnet_get_next_peer_ni_locked(struct lnet_peer *peer, return lpni; } +/* + * Test whether a ni is a preferred ni for this peer_ni, e.g, whether + * this is a preferred point-to-point path. Call with lnet_net_lock in + * shared mmode. + */ bool -lnet_peer_is_ni_pref_locked(struct lnet_peer_ni *lpni, struct lnet_ni *ni) +lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, lnet_nid_t nid) { int i; + if (lpni->lpni_pref_nnids == 0) + return false; + if (lpni->lpni_pref_nnids == 1) + return lpni->lpni_pref.nid == nid; for (i = 0; i < lpni->lpni_pref_nnids; i++) { - if (lpni->lpni_pref_nids[i] == ni->ni_nid) + if (lpni->lpni_pref.nids[i] == nid) return true; } return false; } +/* + * Set a single ni as preferred, provided no preferred ni is already + * defined. Only to be used for non-multi-rail peer_ni. + */ +int +lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid) +{ + int rc = 0; + + spin_lock(&lpni->lpni_lock); + if (nid == LNET_NID_ANY) { + rc = -EINVAL; + } else if (lpni->lpni_pref_nnids > 0) { + rc = -EPERM; + } else if (lpni->lpni_pref_nnids == 0) { + lpni->lpni_pref.nid = nid; + lpni->lpni_pref_nnids = 1; + lpni->lpni_state |= LNET_PEER_NI_NON_MR_PREF; + } + spin_unlock(&lpni->lpni_lock); + + CDEBUG(D_NET, "peer %s nid %s: %d\n", + libcfs_nid2str(lpni->lpni_nid), libcfs_nid2str(nid), rc); + return rc; +} + +/* + * Clear the preferred NID from a non-multi-rail peer_ni, provided + * this preference was set by lnet_peer_ni_set_non_mr_pref_nid(). + */ +int +lnet_peer_ni_clr_non_mr_pref_nid(struct lnet_peer_ni *lpni) +{ + int rc = 0; + + spin_lock(&lpni->lpni_lock); + if (lpni->lpni_state & LNET_PEER_NI_NON_MR_PREF) { + lpni->lpni_pref_nnids = 0; + lpni->lpni_state &= ~LNET_PEER_NI_NON_MR_PREF; + } else if (lpni->lpni_pref_nnids == 0) { + rc = -ENOENT; + } else { + rc = -EPERM; + } + spin_unlock(&lpni->lpni_lock); + + CDEBUG(D_NET, "peer %s: %d\n", + libcfs_nid2str(lpni->lpni_nid), rc); + return rc; +} + +/* + * Clear the preferred NIDs from a non-multi-rail peer. + */ +void +lnet_peer_clr_non_mr_pref_nids(struct lnet_peer *lp) +{ + struct lnet_peer_ni *lpni = NULL; + + while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) + lnet_peer_ni_clr_non_mr_pref_nid(lpni); +} + +int +lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid) +{ + lnet_nid_t *nids = NULL; + lnet_nid_t *oldnids = NULL; + struct lnet_peer *lp = lpni->lpni_peer_net->lpn_peer; + int size; + int i; + int rc = 0; + + if (nid == LNET_NID_ANY) { + rc = -EINVAL; + goto out; + } + + if (lpni->lpni_pref_nnids == 1 && lpni->lpni_pref.nid == nid) { + rc = -EEXIST; + goto out; + } + + /* A non-MR node may have only one preferred NI per peer_ni */ + if (lpni->lpni_pref_nnids > 0) { + if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) { + rc = -EPERM; + goto out; + } + } + + if (lpni->lpni_pref_nnids != 0) { + size = sizeof(*nids) * (lpni->lpni_pref_nnids + 1); + nids = kzalloc_cpt(size, GFP_KERNEL, lpni->lpni_cpt); + if (!nids) { + rc = -ENOMEM; + goto out; + } + for (i = 0; i < lpni->lpni_pref_nnids; i++) { + if (lpni->lpni_pref.nids[i] == nid) { + kfree(nids); + rc = -EEXIST; + goto out; + } + nids[i] = lpni->lpni_pref.nids[i]; + } + nids[i] = nid; + } + + lnet_net_lock(LNET_LOCK_EX); + spin_lock(&lpni->lpni_lock); + if (lpni->lpni_pref_nnids == 0) { + lpni->lpni_pref.nid = nid; + } else { + oldnids = lpni->lpni_pref.nids; + lpni->lpni_pref.nids = nids; + } + lpni->lpni_pref_nnids++; + lpni->lpni_state &= ~LNET_PEER_NI_NON_MR_PREF; + spin_unlock(&lpni->lpni_lock); + lnet_net_unlock(LNET_LOCK_EX); + + kfree(oldnids); +out: + if (rc == -EEXIST && (lpni->lpni_state & LNET_PEER_NI_NON_MR_PREF)) { + spin_lock(&lpni->lpni_lock); + lpni->lpni_state &= ~LNET_PEER_NI_NON_MR_PREF; + spin_unlock(&lpni->lpni_lock); + } + CDEBUG(D_NET, "peer %s nid %s: %d\n", + libcfs_nid2str(lp->lp_primary_nid), libcfs_nid2str(nid), rc); + return rc; +} + +int +lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid) +{ + lnet_nid_t *nids = NULL; + lnet_nid_t *oldnids = NULL; + struct lnet_peer *lp = lpni->lpni_peer_net->lpn_peer; + int size; + int i, j; + int rc = 0; + + if (lpni->lpni_pref_nnids == 0) { + rc = -ENOENT; + goto out; + } + + if (lpni->lpni_pref_nnids == 1) { + if (lpni->lpni_pref.nid != nid) { + rc = -ENOENT; + goto out; + } + } else if (lpni->lpni_pref_nnids == 2) { + if (lpni->lpni_pref.nids[0] != nid && + lpni->lpni_pref.nids[1] != nid) { + rc = -ENOENT; + goto out; + } + } else { + size = sizeof(*nids) * (lpni->lpni_pref_nnids - 1); + nids = kzalloc_cpt(size, GFP_KERNEL, lpni->lpni_cpt); + if (!nids) { + rc = -ENOMEM; + goto out; + } + for (i = 0, j = 0; i < lpni->lpni_pref_nnids; i++) { + if (lpni->lpni_pref.nids[i] != nid) + continue; + nids[j++] = lpni->lpni_pref.nids[i]; + } + /* Check if we actually removed a nid. */ + if (j == lpni->lpni_pref_nnids) { + kfree(nids); + rc = -ENOENT; + goto out; + } + } + + lnet_net_lock(LNET_LOCK_EX); + spin_lock(&lpni->lpni_lock); + if (lpni->lpni_pref_nnids == 1) { + lpni->lpni_pref.nid = LNET_NID_ANY; + } else if (lpni->lpni_pref_nnids == 2) { + oldnids = lpni->lpni_pref.nids; + if (oldnids[0] == nid) + lpni->lpni_pref.nid = oldnids[1]; + else + lpni->lpni_pref.nid = oldnids[2]; + } else { + oldnids = lpni->lpni_pref.nids; + lpni->lpni_pref.nids = nids; + } + lpni->lpni_pref_nnids--; + lpni->lpni_state &= ~LNET_PEER_NI_NON_MR_PREF; + spin_unlock(&lpni->lpni_lock); + lnet_net_unlock(LNET_LOCK_EX); + + kfree(oldnids); +out: + CDEBUG(D_NET, "peer %s nid %s: %d\n", + libcfs_nid2str(lp->lp_primary_nid), libcfs_nid2str(nid), rc); + return rc; +} + lnet_nid_t lnet_peer_primary_nid_locked(lnet_nid_t nid) { @@ -653,7 +868,7 @@ LNetPrimaryNID(lnet_nid_t nid) int cpt; cpt = lnet_net_lock_current(); - lpni = lnet_nid2peerni_locked(nid, cpt); + lpni = lnet_nid2peerni_locked(nid, LNET_NID_ANY, cpt); if (IS_ERR(lpni)) { rc = PTR_ERR(lpni); goto out_unlock; @@ -802,6 +1017,7 @@ lnet_peer_add(lnet_nid_t nid, bool mr) spin_lock(&lp->lp_lock); if (mr && !(lp->lp_state & LNET_PEER_MULTI_RAIL)) { lp->lp_state |= LNET_PEER_MULTI_RAIL; + lnet_peer_clr_non_mr_pref_nids(lp); } else if (!mr && (lp->lp_state & LNET_PEER_MULTI_RAIL)) { /* The mr state is sticky. */ CDEBUG(D_NET, "Cannot clear multi-rail flag from peer %s\n", @@ -829,8 +1045,10 @@ lnet_peer_add_nid(struct lnet_peer *lp, lnet_nid_t nid, bool mr) return -EPERM; } - if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) + if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) { lp->lp_state |= LNET_PEER_MULTI_RAIL; + lnet_peer_clr_non_mr_pref_nids(lp); + } spin_unlock(&lp->lp_lock); lpni = lnet_find_peer_ni_locked(nid); @@ -856,28 +1074,27 @@ lnet_peer_add_nid(struct lnet_peer *lp, lnet_nid_t nid, bool mr) * lpni creation initiated due to traffic either sending or receiving. */ static int -lnet_peer_ni_traffic_add(lnet_nid_t nid) +lnet_peer_ni_traffic_add(lnet_nid_t nid, lnet_nid_t pref) { struct lnet_peer_ni *lpni; - int rc = 0; + int rc; if (nid == LNET_NID_ANY) return -EINVAL; /* lnet_net_lock is not needed here because ln_api_lock is held */ lpni = lnet_find_peer_ni_locked(nid); - if (lpni) { - /* - * TODO: lnet_update_primary_nid() but not all of it - * only indicate if we're converting this to MR capable - * Can happen due to DD - */ - lnet_peer_ni_decref_locked(lpni); - } else { + if (!lpni) { rc = lnet_peer_setup_hierarchy(NULL, NULL, nid); + if (rc) + return rc; + lpni = lnet_find_peer_ni_locked(nid); } + if (pref != LNET_NID_ANY) + lnet_peer_ni_set_non_mr_pref_nid(lpni, pref); + lnet_peer_ni_decref_locked(lpni); - return rc; + return 0; } /* @@ -984,6 +1201,8 @@ lnet_destroy_peer_ni_locked(struct lnet_peer_ni *lpni) ptable->pt_zombies--; spin_unlock(&ptable->pt_zombie_lock); + if (lpni->lpni_pref_nnids > 1) + kfree(lpni->lpni_pref.nids); kfree(lpni); } @@ -1006,7 +1225,7 @@ lnet_nid2peerni_ex(lnet_nid_t nid, int cpt) lnet_net_unlock(cpt); - rc = lnet_peer_ni_traffic_add(nid); + rc = lnet_peer_ni_traffic_add(nid, LNET_NID_ANY); if (rc) { lpni = ERR_PTR(rc); goto out_net_relock; @@ -1022,7 +1241,7 @@ lnet_nid2peerni_ex(lnet_nid_t nid, int cpt) } struct lnet_peer_ni * -lnet_nid2peerni_locked(lnet_nid_t nid, int cpt) +lnet_nid2peerni_locked(lnet_nid_t nid, lnet_nid_t pref, int cpt) { struct lnet_peer_ni *lpni = NULL; int rc; @@ -1061,7 +1280,7 @@ lnet_nid2peerni_locked(lnet_nid_t nid, int cpt) goto out_mutex_unlock; } - rc = lnet_peer_ni_traffic_add(nid); + rc = lnet_peer_ni_traffic_add(nid, pref); if (rc) { lpni = ERR_PTR(rc); goto out_mutex_unlock; @@ -1087,7 +1306,7 @@ lnet_debug_peer(lnet_nid_t nid) cpt = lnet_cpt_of_nid(nid, NULL); lnet_net_lock(cpt); - lp = lnet_nid2peerni_locked(nid, cpt); + lp = lnet_nid2peerni_locked(nid, LNET_NID_ANY, cpt); if (IS_ERR(lp)) { lnet_net_unlock(cpt); CDEBUG(D_WARNING, "No peer %s\n", libcfs_nid2str(nid)); From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629817 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 37FE114DB for ; Sun, 7 Oct 2018 23:31:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 24A4128CBF for ; Sun, 7 Oct 2018 23:31:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 17DB028CC8; Sun, 7 Oct 2018 23:31:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 027AB28CBF for ; Sun, 7 Oct 2018 23:31:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AB6588617B0; Sun, 7 Oct 2018 16:31:21 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C60CA21F502 for ; Sun, 7 Oct 2018 16:31:19 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C33D2AE17; Sun, 7 Oct 2018 23:31:18 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437804.16383.1008375422641070080.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 13/24] lustre: lnet: add LNET_PEER_CONFIGURED flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Add the LNET_PEER_CONFIGURED flag, which indicates that a peer has been configured by DLC. This is used to enforce that only DLC can modify such a peer. This includes some further refactoring of the code that creates or modifies peers to ensure that the flag is properly passed through, set, and cleared. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25783 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 12 + .../staging/lustre/include/linux/lnet/lib-types.h | 1 drivers/staging/lustre/lnet/lnet/peer.c | 426 +++++++++++++------- 3 files changed, 290 insertions(+), 149 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 2864bd8a403b..563417510722 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -764,4 +764,16 @@ lnet_peer_is_multi_rail(struct lnet_peer *lp) return lp->lp_state & LNET_PEER_MULTI_RAIL; } +static inline bool +lnet_peer_ni_is_configured(struct lnet_peer_ni *lpni) +{ + return lpni->lpni_peer_net->lpn_peer->lp_state & LNET_PEER_CONFIGURED; +} + +static inline bool +lnet_peer_ni_is_primary(struct lnet_peer_ni *lpni) +{ + return lpni->lpni_nid == lpni->lpni_peer_net->lpn_peer->lp_primary_nid; +} + #endif diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index eff2aed5e5c1..d1721fd01d93 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -513,6 +513,7 @@ struct lnet_peer { }; #define LNET_PEER_MULTI_RAIL BIT(0) +#define LNET_PEER_CONFIGURED BIT(1) struct lnet_peer_net { /* chain on peer block */ diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 44a2bf641260..09c1b5516f6b 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -191,10 +191,10 @@ lnet_peer_alloc(lnet_nid_t nid) } static void -lnet_try_destroy_peer_hierarchy_locked(struct lnet_peer_ni *lpni) +lnet_peer_detach_peer_ni(struct lnet_peer_ni *lpni) { - struct lnet_peer_net *peer_net; - struct lnet_peer *peer; + struct lnet_peer_net *lpn; + struct lnet_peer *lp; /* TODO: could the below situation happen? accessing an already * destroyed peer? @@ -203,24 +203,28 @@ lnet_try_destroy_peer_hierarchy_locked(struct lnet_peer_ni *lpni) !lpni->lpni_peer_net->lpn_peer) return; - peer_net = lpni->lpni_peer_net; - peer = lpni->lpni_peer_net->lpn_peer; + lpn = lpni->lpni_peer_net; + lp = lpni->lpni_peer_net->lpn_peer; + + CDEBUG(D_NET, "peer %s NID %s\n", + libcfs_nid2str(lp->lp_primary_nid), + libcfs_nid2str(lpni->lpni_nid)); list_del_init(&lpni->lpni_on_peer_net_list); lpni->lpni_peer_net = NULL; - /* if peer_net is empty, then remove it from the peer */ - if (list_empty(&peer_net->lpn_peer_nis)) { - list_del_init(&peer_net->lpn_on_peer_list); - peer_net->lpn_peer = NULL; - kfree(peer_net); + /* if lpn is empty, then remove it from the peer */ + if (list_empty(&lpn->lpn_peer_nis)) { + list_del_init(&lpn->lpn_on_peer_list); + lpn->lpn_peer = NULL; + kfree(lpn); /* If the peer is empty then remove it from the * the_lnet.ln_peers. */ - if (list_empty(&peer->lp_peer_nets)) { - list_del_init(&peer->lp_on_lnet_peer_list); - kfree(peer); + if (list_empty(&lp->lp_peer_nets)) { + list_del_init(&lp->lp_on_lnet_peer_list); + kfree(lp); } } } @@ -263,10 +267,10 @@ lnet_peer_ni_del_locked(struct lnet_peer_ni *lpni) ptable->pt_zombies++; spin_unlock(&ptable->pt_zombie_lock); - /* no need to keep this peer on the hierarchy anymore */ - lnet_try_destroy_peer_hierarchy_locked(lpni); + /* no need to keep this peer_ni on the hierarchy anymore */ + lnet_peer_detach_peer_ni(lpni); - /* decrement reference on peer */ + /* decrement reference on peer_ni */ lnet_peer_ni_decref_locked(lpni); return 0; @@ -329,6 +333,8 @@ lnet_peer_del_locked(struct lnet_peer *peer) struct lnet_peer_ni *lpni = NULL, *lpni2; int rc = 0, rc2 = 0; + CDEBUG(D_NET, "peer %s\n", libcfs_nid2str(peer->lp_primary_nid)); + lpni = lnet_get_next_peer_ni_locked(peer, NULL, lpni); while (lpni) { lpni2 = lnet_get_next_peer_ni_locked(peer, NULL, lpni); @@ -352,31 +358,36 @@ lnet_peer_del(struct lnet_peer *peer) } /* - * Delete a NID from a peer. - * Implements a few sanity checks. - * Call with ln_api_mutex held. + * Delete a NID from a peer. Call with ln_api_mutex held. + * + * Error codes: + * -EPERM: Non-DLC deletion from DLC-configured peer. + * -ENOENT: No lnet_peer_ni corresponding to the nid. + * -ECHILD: The lnet_peer_ni isn't connected to the peer. + * -EBUSY: The lnet_peer_ni is the primary, and not the only peer_ni. */ static int -lnet_peer_del_nid(struct lnet_peer *lp, lnet_nid_t nid) +lnet_peer_del_nid(struct lnet_peer *lp, lnet_nid_t nid, unsigned int flags) { - struct lnet_peer *lp2; struct lnet_peer_ni *lpni; + lnet_nid_t primary_nid = lp->lp_primary_nid; + int rc = 0; + if (!(flags & LNET_PEER_CONFIGURED)) { + if (lp->lp_state & LNET_PEER_CONFIGURED) { + rc = -EPERM; + goto out; + } + } lpni = lnet_find_peer_ni_locked(nid); if (!lpni) { - CERROR("Cannot remove unknown nid %s from peer %s\n", - libcfs_nid2str(nid), - libcfs_nid2str(lp->lp_primary_nid)); - return -ENOENT; + rc = -ENOENT; + goto out; } lnet_peer_ni_decref_locked(lpni); - lp2 = lpni->lpni_peer_net->lpn_peer; - if (lp2 != lp) { - CERROR("Nid %s is attached to peer %s, not peer %s\n", - libcfs_nid2str(nid), - libcfs_nid2str(lp2->lp_primary_nid), - libcfs_nid2str(lp->lp_primary_nid)); - return -EINVAL; + if (lp != lpni->lpni_peer_net->lpn_peer) { + rc = -ECHILD; + goto out; } /* @@ -384,16 +395,19 @@ lnet_peer_del_nid(struct lnet_peer *lp, lnet_nid_t nid) * is the only NID. */ if (nid == lp->lp_primary_nid && lnet_get_num_peer_nis(lp) != 1) { - CERROR("Cannot delete primary NID %s from multi-NID peer\n", - libcfs_nid2str(nid)); - return -EINVAL; + rc = -EBUSY; + goto out; } lnet_net_lock(LNET_LOCK_EX); lnet_peer_ni_del_locked(lpni); lnet_net_unlock(LNET_LOCK_EX); - return 0; +out: + CDEBUG(D_NET, "peer %s NID %s flags %#x: %d\n", + libcfs_nid2str(primary_nid), libcfs_nid2str(nid), flags, rc); + + return rc; } static void @@ -895,46 +909,27 @@ lnet_peer_get_net_locked(struct lnet_peer *peer, u32 net_id) return NULL; } +/* + * Always returns 0, but it the last function called from functions + * that do return an int, so returning 0 here allows the compiler to + * do a tail call. + */ static int -lnet_peer_setup_hierarchy(struct lnet_peer *lp, struct lnet_peer_ni - *lpni, - lnet_nid_t nid) +lnet_peer_attach_peer_ni(struct lnet_peer *lp, + struct lnet_peer_net *lpn, + struct lnet_peer_ni *lpni, + unsigned int flags) { - struct lnet_peer_net *lpn = NULL; struct lnet_peer_table *ptable; - u32 net_id = LNET_NIDNET(nid); - - /* - * Create the peer_ni, peer_net, and peer if they don't exist - * yet. - */ - if (lp) { - lpn = lnet_peer_get_net_locked(lp, net_id); - } else { - lp = lnet_peer_alloc(nid); - if (!lp) - goto out_enomem; - } - - if (!lpn) { - lpn = lnet_peer_net_alloc(net_id); - if (!lpn) - goto out_maybe_free_lp; - } - - if (!lpni) { - lpni = lnet_peer_ni_alloc(nid); - if (!lpni) - goto out_maybe_free_lpn; - } /* Install the new peer_ni */ lnet_net_lock(LNET_LOCK_EX); /* Add peer_ni to global peer table hash, if necessary. */ if (list_empty(&lpni->lpni_hashlist)) { + int hash = lnet_nid2peerhash(lpni->lpni_nid); + ptable = the_lnet.ln_peer_tables[lpni->lpni_cpt]; - list_add_tail(&lpni->lpni_hashlist, - &ptable->pt_hash[lnet_nid2peerhash(nid)]); + list_add_tail(&lpni->lpni_hashlist, &ptable->pt_hash[hash]); ptable->pt_version++; atomic_inc(&ptable->pt_number); atomic_inc(&lpni->lpni_refcount); @@ -942,7 +937,7 @@ lnet_peer_setup_hierarchy(struct lnet_peer *lp, struct lnet_peer_ni /* Detach the peer_ni from an existing peer, if necessary. */ if (lpni->lpni_peer_net && lpni->lpni_peer_net->lpn_peer != lp) - lnet_try_destroy_peer_hierarchy_locked(lpni); + lnet_peer_detach_peer_ni(lpni); /* Add peer_ni to peer_net */ lpni->lpni_peer_net = lpn; @@ -957,33 +952,42 @@ lnet_peer_setup_hierarchy(struct lnet_peer *lp, struct lnet_peer_ni /* Add peer to global peer list */ if (list_empty(&lp->lp_on_lnet_peer_list)) list_add_tail(&lp->lp_on_lnet_peer_list, &the_lnet.ln_peers); + + /* Update peer state */ + spin_lock(&lp->lp_lock); + if (flags & LNET_PEER_CONFIGURED) { + if (!(lp->lp_state & LNET_PEER_CONFIGURED)) + lp->lp_state |= LNET_PEER_CONFIGURED; + } + if (flags & LNET_PEER_MULTI_RAIL) { + if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) { + lp->lp_state |= LNET_PEER_MULTI_RAIL; + lnet_peer_clr_non_mr_pref_nids(lp); + } + } + spin_unlock(&lp->lp_lock); + lnet_net_unlock(LNET_LOCK_EX); - return 0; + CDEBUG(D_NET, "peer %s NID %s flags %#x\n", + libcfs_nid2str(lp->lp_primary_nid), + libcfs_nid2str(lpni->lpni_nid), flags); -out_maybe_free_lpn: - if (list_empty(&lpn->lpn_on_peer_list)) - kfree(lpn); -out_maybe_free_lp: - if (list_empty(&lp->lp_on_lnet_peer_list)) - kfree(lp); -out_enomem: - return -ENOMEM; + return 0; } /* * Create a new peer, with nid as its primary nid. * - * It is not an error if the peer already exists, provided that the - * given nid is the primary NID. - * * Call with the lnet_api_mutex held. */ static int -lnet_peer_add(lnet_nid_t nid, bool mr) +lnet_peer_add(lnet_nid_t nid, unsigned int flags) { struct lnet_peer *lp; + struct lnet_peer_net *lpn; struct lnet_peer_ni *lpni; + int rc = 0; LASSERT(nid != LNET_NID_ANY); @@ -992,82 +996,153 @@ lnet_peer_add(lnet_nid_t nid, bool mr) * lnet_api_mutex is held. */ lpni = lnet_find_peer_ni_locked(nid); - if (!lpni) { - int rc = lnet_peer_setup_hierarchy(NULL, NULL, nid); - if (rc != 0) - return rc; - lpni = lnet_find_peer_ni_locked(nid); - LASSERT(lpni); + if (lpni) { + /* A peer with this NID already exists. */ + lp = lpni->lpni_peer_net->lpn_peer; + lnet_peer_ni_decref_locked(lpni); + /* + * This is an error if the peer was configured and the + * primary NID differs or an attempt is made to change + * the Multi-Rail flag. Otherwise the assumption is + * that an existing peer is being modified. + */ + if (lp->lp_state & LNET_PEER_CONFIGURED) { + if (lp->lp_primary_nid != nid) + rc = -EEXIST; + else if ((lp->lp_state ^ flags) & LNET_PEER_MULTI_RAIL) + rc = -EPERM; + goto out; + } + /* Delete and recreate as a configured peer. */ + lnet_peer_del(lp); } - lp = lpni->lpni_peer_net->lpn_peer; - lnet_peer_ni_decref_locked(lpni); - /* A found peer must have this primary NID */ - if (lp->lp_primary_nid != nid) - return -EEXIST; + /* Create peer, peer_net, and peer_ni. */ + rc = -ENOMEM; + lp = lnet_peer_alloc(nid); + if (!lp) + goto out; + lpn = lnet_peer_net_alloc(LNET_NIDNET(nid)); + if (!lpn) + goto out_free_lp; + lpni = lnet_peer_ni_alloc(nid); + if (!lpni) + goto out_free_lpn; - /* - * If we found an lpni that is not a multi-rail, which could occur - * if lpni is already created as a non-mr lpni or we just created - * it, then make sure you indicate that this lpni is a primary mr - * capable peer. - * - * TODO: update flags if necessary - */ - spin_lock(&lp->lp_lock); - if (mr && !(lp->lp_state & LNET_PEER_MULTI_RAIL)) { - lp->lp_state |= LNET_PEER_MULTI_RAIL; - lnet_peer_clr_non_mr_pref_nids(lp); - } else if (!mr && (lp->lp_state & LNET_PEER_MULTI_RAIL)) { - /* The mr state is sticky. */ - CDEBUG(D_NET, "Cannot clear multi-rail flag from peer %s\n", - libcfs_nid2str(nid)); - } - spin_unlock(&lp->lp_lock); + return lnet_peer_attach_peer_ni(lp, lpn, lpni, flags); - return 0; +out_free_lpn: + kfree(lpn); +out_free_lp: + kfree(lp); +out: + CDEBUG(D_NET, "peer %s NID flags %#x: %d\n", + libcfs_nid2str(nid), flags, rc); + return rc; } +/* + * Add a NID to a peer. Call with ln_api_mutex held. + * + * Error codes: + * -EPERM: Non-DLC addition to a DLC-configured peer. + * -EEXIST: The NID was configured by DLC for a different peer. + * -ENOMEM: Out of memory. + * -ENOTUNIQ: Adding a second peer NID on a single network on a + * non-multi-rail peer. + */ static int -lnet_peer_add_nid(struct lnet_peer *lp, lnet_nid_t nid, bool mr) +lnet_peer_add_nid(struct lnet_peer *lp, lnet_nid_t nid, unsigned int flags) { + struct lnet_peer_net *lpn; struct lnet_peer_ni *lpni; + int rc = 0; LASSERT(lp); LASSERT(nid != LNET_NID_ANY); - spin_lock(&lp->lp_lock); - if (!mr && !(lp->lp_state & LNET_PEER_MULTI_RAIL)) { - spin_unlock(&lp->lp_lock); - CERROR("Cannot add nid %s to non-multi-rail peer %s\n", - libcfs_nid2str(nid), - libcfs_nid2str(lp->lp_primary_nid)); - return -EPERM; + /* A configured peer can only be updated through configuration. */ + if (!(flags & LNET_PEER_CONFIGURED)) { + if (lp->lp_state & LNET_PEER_CONFIGURED) { + rc = -EPERM; + goto out; + } } - if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) { - lp->lp_state |= LNET_PEER_MULTI_RAIL; - lnet_peer_clr_non_mr_pref_nids(lp); + /* + * The MULTI_RAIL flag can be set but not cleared, because + * that would leave the peer struct in an invalid state. + */ + if (flags & LNET_PEER_MULTI_RAIL) { + spin_lock(&lp->lp_lock); + if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) { + lp->lp_state |= LNET_PEER_MULTI_RAIL; + lnet_peer_clr_non_mr_pref_nids(lp); + } + spin_unlock(&lp->lp_lock); + } else if (lp->lp_state & LNET_PEER_MULTI_RAIL) { + rc = -EPERM; + goto out; } - spin_unlock(&lp->lp_lock); lpni = lnet_find_peer_ni_locked(nid); - if (!lpni) - return lnet_peer_setup_hierarchy(lp, NULL, nid); + if (lpni) { + /* + * A peer_ni already exists. This is only a problem if + * it is not connected to this peer and was configured + * by DLC. + */ + lnet_peer_ni_decref_locked(lpni); + if (lpni->lpni_peer_net->lpn_peer == lp) + goto out; + if (lnet_peer_ni_is_configured(lpni)) { + rc = -EEXIST; + goto out; + } + /* If this is the primary NID, destroy the peer. */ + if (lnet_peer_ni_is_primary(lpni)) { + lnet_peer_del(lpni->lpni_peer_net->lpn_peer); + lpni = lnet_peer_ni_alloc(nid); + if (!lpni) { + rc = -ENOMEM; + goto out; + } + } + } else { + lpni = lnet_peer_ni_alloc(nid); + if (!lpni) { + rc = -ENOMEM; + goto out; + } + } - if (lpni->lpni_peer_net->lpn_peer != lp) { - struct lnet_peer *lp2 = lpni->lpni_peer_net->lpn_peer; - CERROR("Cannot add NID %s owned by peer %s to peer %s\n", - libcfs_nid2str(lpni->lpni_nid), - libcfs_nid2str(lp2->lp_primary_nid), - libcfs_nid2str(lp->lp_primary_nid)); - return -EEXIST; + /* + * Get the peer_net. Check that we're not adding a second + * peer_ni on a peer_net of a non-multi-rail peer. + */ + lpn = lnet_peer_get_net_locked(lp, LNET_NIDNET(nid)); + if (!lpn) { + lpn = lnet_peer_net_alloc(LNET_NIDNET(nid)); + if (!lpn) { + rc = -ENOMEM; + goto out_free_lpni; + } + } else if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) { + rc = -ENOTUNIQ; + goto out_free_lpni; } - CDEBUG(D_NET, "NID %s is already owned by peer %s\n", - libcfs_nid2str(lpni->lpni_nid), - libcfs_nid2str(lp->lp_primary_nid)); - return 0; + return lnet_peer_attach_peer_ni(lp, lpn, lpni, flags); + +out_free_lpni: + /* If the peer_ni was allocated above its peer_net pointer is NULL */ + if (!lpni->lpni_peer_net) + kfree(lpni); +out: + CDEBUG(D_NET, "peer %s NID %s flags %#x: %d\n", + libcfs_nid2str(lp->lp_primary_nid), libcfs_nid2str(nid), + flags, rc); + return rc; } /* @@ -1076,25 +1151,53 @@ lnet_peer_add_nid(struct lnet_peer *lp, lnet_nid_t nid, bool mr) static int lnet_peer_ni_traffic_add(lnet_nid_t nid, lnet_nid_t pref) { + struct lnet_peer *lp; + struct lnet_peer_net *lpn; struct lnet_peer_ni *lpni; - int rc; + unsigned int flags = 0; + int rc = 0; - if (nid == LNET_NID_ANY) - return -EINVAL; + if (nid == LNET_NID_ANY) { + rc = -EINVAL; + goto out; + } /* lnet_net_lock is not needed here because ln_api_lock is held */ lpni = lnet_find_peer_ni_locked(nid); - if (!lpni) { - rc = lnet_peer_setup_hierarchy(NULL, NULL, nid); - if (rc) - return rc; - lpni = lnet_find_peer_ni_locked(nid); + if (lpni) { + /* + * We must have raced with another thread. Since we + * know next to nothing about a peer_ni created by + * traffic, we just assume everything is ok and + * return. + */ + lnet_peer_ni_decref_locked(lpni); + goto out; } + + /* Create peer, peer_net, and peer_ni. */ + rc = -ENOMEM; + lp = lnet_peer_alloc(nid); + if (!lp) + goto out; + lpn = lnet_peer_net_alloc(LNET_NIDNET(nid)); + if (!lpn) + goto out_free_lp; + lpni = lnet_peer_ni_alloc(nid); + if (!lpni) + goto out_free_lpn; if (pref != LNET_NID_ANY) lnet_peer_ni_set_non_mr_pref_nid(lpni, pref); - lnet_peer_ni_decref_locked(lpni); - return 0; + return lnet_peer_attach_peer_ni(lp, lpn, lpni, flags); + +out_free_lpn: + kfree(lpn); +out_free_lp: + kfree(lp); +out: + CDEBUG(D_NET, "peer %s: %d\n", libcfs_nid2str(nid), rc); + return rc; } /* @@ -1114,17 +1217,22 @@ lnet_add_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid, bool mr) { struct lnet_peer *lp = NULL; struct lnet_peer_ni *lpni; + unsigned int flags; /* The prim_nid must always be specified */ if (prim_nid == LNET_NID_ANY) return -EINVAL; + flags = LNET_PEER_CONFIGURED; + if (mr) + flags |= LNET_PEER_MULTI_RAIL; + /* * If nid isn't specified, we must create a new peer with * prim_nid as its primary nid. */ if (nid == LNET_NID_ANY) - return lnet_peer_add(prim_nid, mr); + return lnet_peer_add(prim_nid, flags); /* Look up the prim_nid, which must exist. */ lpni = lnet_find_peer_ni_locked(prim_nid); @@ -1133,6 +1241,14 @@ lnet_add_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid, bool mr) lnet_peer_ni_decref_locked(lpni); lp = lpni->lpni_peer_net->lpn_peer; + /* Peer must have been configured. */ + if (!(lp->lp_state & LNET_PEER_CONFIGURED)) { + CDEBUG(D_NET, "peer %s was not configured\n", + libcfs_nid2str(prim_nid)); + return -ENOENT; + } + + /* Primary NID must match */ if (lp->lp_primary_nid != prim_nid) { CDEBUG(D_NET, "prim_nid %s is not primary for peer %s\n", libcfs_nid2str(prim_nid), @@ -1140,7 +1256,14 @@ lnet_add_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid, bool mr) return -ENODEV; } - return lnet_peer_add_nid(lp, nid, mr); + /* Multi-Rail flag must match. */ + if ((lp->lp_state ^ flags) & LNET_PEER_MULTI_RAIL) { + CDEBUG(D_NET, "multi-rail state mismatch for peer %s\n", + libcfs_nid2str(prim_nid)); + return -EPERM; + } + + return lnet_peer_add_nid(lp, nid, flags); } /* @@ -1159,6 +1282,7 @@ lnet_del_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid) { struct lnet_peer *lp; struct lnet_peer_ni *lpni; + unsigned int flags; if (prim_nid == LNET_NID_ANY) return -EINVAL; @@ -1179,7 +1303,11 @@ lnet_del_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid) if (nid == LNET_NID_ANY || nid == lp->lp_primary_nid) return lnet_peer_del(lp); - return lnet_peer_del_nid(lp, nid); + flags = LNET_PEER_CONFIGURED; + if (lp->lp_state & LNET_PEER_MULTI_RAIL) + flags |= LNET_PEER_MULTI_RAIL; + + return lnet_peer_del_nid(lp, nid, flags); } void From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629819 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0942112B for ; Sun, 7 Oct 2018 23:31:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BD44328CBF for ; Sun, 7 Oct 2018 23:31:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AF90728CC8; Sun, 7 Oct 2018 23:31:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 869A628CBF for ; Sun, 7 Oct 2018 23:31:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3D3798617D2; Sun, 7 Oct 2018 16:31:29 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8A18886179E for ; Sun, 7 Oct 2018 16:31:27 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C21FDAD2C; Sun, 7 Oct 2018 23:31:26 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437808.16383.1725584261522697360.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 14/24] lustre: lnet: reference counts on lnet_peer/lnet_peer_net X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Peer discovery will be keeping track of lnet_peer structures, so there will be references to an lnet_peer independent of the references implied by lnet_peer_ni structures. Manage this by adding explicit reference counts to lnet_peer_net and lnet_peer. Each lnet_peer_net has a hold on the lnet_peer it links to with its lpn_peer pointer. This hold is only removed when that pointer is assigned a new value or the lnet_peer_net is freed. Just removing an lnet_peer_net from the lp_peer_nets list does not release this hold, it just prevents new lookups of the lnet_peer_net via the lnet_peer. Each lnet_peer_ni has a hold on the lnet_peer_net it links to with its lpni_peer_net pointer. This hold is only removed when that pointer is assigned a new value or the lnet_peer_ni is freed. Just removing an lnet_peer_ni from the lpn_peer_nis list does not release this hold, it just prevents new lookups of the lnet_peer_ni via the lnet_peer_net. This ensures that given a lnet_peer_ni *lpni, we can rely on lpni->lpni_peer_net->lpn_peer pointing to a valid lnet_peer. Keep a count of the total number of lnet_peer_ni attached to an lnet_peer in lp_nnis. Split the global ln_peers list into per-lnet_peer_table lists. The CPT of the peer table in which the lnet_peer is linked is stored in lp_cpt. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25784 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 49 +++-- .../staging/lustre/include/linux/lnet/lib-types.h | 50 ++++- drivers/staging/lustre/lnet/lnet/api-ni.c | 1 drivers/staging/lustre/lnet/lnet/lib-move.c | 8 - drivers/staging/lustre/lnet/lnet/peer.c | 210 ++++++++++++++------ 5 files changed, 227 insertions(+), 91 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 563417510722..aad25eb0011b 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -310,6 +310,36 @@ lnet_handle2me(struct lnet_handle_me *handle) return lh_entry(lh, struct lnet_me, me_lh); } +static inline void +lnet_peer_net_addref_locked(struct lnet_peer_net *lpn) +{ + atomic_inc(&lpn->lpn_refcount); +} + +void lnet_destroy_peer_net_locked(struct lnet_peer_net *lpn); + +static inline void +lnet_peer_net_decref_locked(struct lnet_peer_net *lpn) +{ + if (atomic_dec_and_test(&lpn->lpn_refcount)) + lnet_destroy_peer_net_locked(lpn); +} + +static inline void +lnet_peer_addref_locked(struct lnet_peer *lp) +{ + atomic_inc(&lp->lp_refcount); +} + +void lnet_destroy_peer_locked(struct lnet_peer *lp); + +static inline void +lnet_peer_decref_locked(struct lnet_peer *lp) +{ + if (atomic_dec_and_test(&lp->lp_refcount)) + lnet_destroy_peer_locked(lp); +} + static inline void lnet_peer_ni_addref_locked(struct lnet_peer_ni *lp) { @@ -695,21 +725,6 @@ int lnet_get_peer_ni_info(__u32 peer_index, __u64 *nid, __u32 *peer_rtr_credits, __u32 *peer_min_rtr_credtis, __u32 *peer_tx_qnob); -static inline __u32 -lnet_get_num_peer_nis(struct lnet_peer *peer) -{ - struct lnet_peer_net *lpn; - struct lnet_peer_ni *lpni; - __u32 count = 0; - - list_for_each_entry(lpn, &peer->lp_peer_nets, lpn_on_peer_list) - list_for_each_entry(lpni, &lpn->lpn_peer_nis, - lpni_on_peer_net_list) - count++; - - return count; -} - static inline bool lnet_is_peer_ni_healthy_locked(struct lnet_peer_ni *lpni) { @@ -728,7 +743,7 @@ lnet_is_peer_net_healthy_locked(struct lnet_peer_net *peer_net) struct lnet_peer_ni *lpni; list_for_each_entry(lpni, &peer_net->lpn_peer_nis, - lpni_on_peer_net_list) { + lpni_peer_nis) { if (lnet_is_peer_ni_healthy_locked(lpni)) return true; } @@ -741,7 +756,7 @@ lnet_is_peer_healthy_locked(struct lnet_peer *peer) { struct lnet_peer_net *peer_net; - list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_on_peer_list) { + list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) { if (lnet_is_peer_net_healthy_locked(peer_net)) return true; } diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index d1721fd01d93..260619e19bde 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -411,7 +411,8 @@ struct lnet_rc_data { }; struct lnet_peer_ni { - struct list_head lpni_on_peer_net_list; + /* chain on lpn_peer_nis */ + struct list_head lpni_peer_nis; /* chain on remote peer list */ struct list_head lpni_on_remote_peer_ni_list; /* chain on peer hash */ @@ -496,8 +497,8 @@ struct lnet_peer_ni { #define LNET_PEER_NI_NON_MR_PREF BIT(0) struct lnet_peer { - /* chain on global peer list */ - struct list_head lp_on_lnet_peer_list; + /* chain on pt_peer_list */ + struct list_head lp_peer_list; /* list of peer nets */ struct list_head lp_peer_nets; @@ -505,6 +506,15 @@ struct lnet_peer { /* primary NID of the peer */ lnet_nid_t lp_primary_nid; + /* CPT of peer_table */ + int lp_cpt; + + /* number of NIDs on this peer */ + int lp_nnis; + + /* reference count */ + atomic_t lp_refcount; + /* lock protecting peer state flags */ spinlock_t lp_lock; @@ -516,8 +526,8 @@ struct lnet_peer { #define LNET_PEER_CONFIGURED BIT(1) struct lnet_peer_net { - /* chain on peer block */ - struct list_head lpn_on_peer_list; + /* chain on lp_peer_nets */ + struct list_head lpn_peer_nets; /* list of peer_nis on this network */ struct list_head lpn_peer_nis; @@ -527,21 +537,45 @@ struct lnet_peer_net { /* Net ID */ __u32 lpn_net_id; + + /* reference count */ + atomic_t lpn_refcount; }; /* peer hash size */ #define LNET_PEER_HASH_BITS 9 #define LNET_PEER_HASH_SIZE (1 << LNET_PEER_HASH_BITS) -/* peer hash table */ +/* + * peer hash table - one per CPT + * + * protected by lnet_net_lock/EX for update + * pt_version + * pt_number + * pt_hash[...] + * pt_peer_list + * pt_peers + * pt_peer_nnids + * protected by pt_zombie_lock: + * pt_zombie_list + * pt_zombies + * + * pt_zombie lock nests inside lnet_net_lock + */ struct lnet_peer_table { /* /proc validity stamp */ int pt_version; /* # peers extant */ atomic_t pt_number; + /* peers */ + struct list_head pt_peer_list; + /* # peers */ + int pt_peers; + /* # NIDS on listed peers */ + int pt_peer_nnids; /* # zombies to go to deathrow (and not there yet) */ int pt_zombies; - /* zombie peers */ + /* zombie peers_ni */ struct list_head pt_zombie_list; /* protect list and count */ spinlock_t pt_zombie_lock; @@ -785,8 +819,6 @@ struct lnet { struct lnet_msg_container **ln_msg_containers; struct lnet_counters **ln_counters; struct lnet_peer_table **ln_peer_tables; - /* list of configured or discovered peers */ - struct list_head ln_peers; /* list of peer nis not on a local network */ struct list_head ln_remote_peer_ni_list; /* failure simulation */ diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index d64ae2939abc..c48bcb8722a0 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -625,7 +625,6 @@ lnet_prepare(lnet_pid_t requested_pid) the_lnet.ln_pid = requested_pid; INIT_LIST_HEAD(&the_lnet.ln_test_peers); - INIT_LIST_HEAD(&the_lnet.ln_peers); INIT_LIST_HEAD(&the_lnet.ln_remote_peer_ni_list); INIT_LIST_HEAD(&the_lnet.ln_nets); INIT_LIST_HEAD(&the_lnet.ln_routers); diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index 99d8b22356bb..4c1eef907dc7 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1388,7 +1388,7 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, peer_net = lnet_peer_get_net_locked( peer, LNET_NIDNET(best_lpni->lpni_nid)); list_for_each_entry(lpni, &peer_net->lpn_peer_nis, - lpni_on_peer_net_list) { + lpni_peer_nis) { if (lpni->lpni_pref_nnids == 0) continue; LASSERT(lpni->lpni_pref_nnids == 1); @@ -1411,7 +1411,7 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, } lpni = list_entry(peer_net->lpn_peer_nis.next, struct lnet_peer_ni, - lpni_on_peer_net_list); + lpni_peer_nis); } /* Set preferred NI if necessary. */ if (lpni->lpni_pref_nnids == 0) @@ -1443,7 +1443,7 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, * then the best route is chosen. If all routes are equal then * they are used in round robin. */ - list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_on_peer_list) { + list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) { if (!lnet_is_peer_net_healthy_locked(peer_net)) continue; @@ -1453,7 +1453,7 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, lpni = list_entry(peer_net->lpn_peer_nis.next, struct lnet_peer_ni, - lpni_on_peer_net_list); + lpni_peer_nis); net_gw = lnet_find_route_locked(NULL, lpni->lpni_nid, diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 09c1b5516f6b..d7a0a2f3bdd9 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -118,7 +118,7 @@ lnet_peer_ni_alloc(lnet_nid_t nid) INIT_LIST_HEAD(&lpni->lpni_rtrq); INIT_LIST_HEAD(&lpni->lpni_routes); INIT_LIST_HEAD(&lpni->lpni_hashlist); - INIT_LIST_HEAD(&lpni->lpni_on_peer_net_list); + INIT_LIST_HEAD(&lpni->lpni_peer_nis); INIT_LIST_HEAD(&lpni->lpni_on_remote_peer_ni_list); spin_lock_init(&lpni->lpni_lock); @@ -150,7 +150,7 @@ lnet_peer_ni_alloc(lnet_nid_t nid) &the_lnet.ln_remote_peer_ni_list); } - /* TODO: update flags */ + CDEBUG(D_NET, "%p nid %s\n", lpni, libcfs_nid2str(lpni->lpni_nid)); return lpni; } @@ -164,13 +164,32 @@ lnet_peer_net_alloc(u32 net_id) if (!lpn) return NULL; - INIT_LIST_HEAD(&lpn->lpn_on_peer_list); + INIT_LIST_HEAD(&lpn->lpn_peer_nets); INIT_LIST_HEAD(&lpn->lpn_peer_nis); lpn->lpn_net_id = net_id; + CDEBUG(D_NET, "%p net %s\n", lpn, libcfs_net2str(lpn->lpn_net_id)); + return lpn; } +void +lnet_destroy_peer_net_locked(struct lnet_peer_net *lpn) +{ + struct lnet_peer *lp; + + CDEBUG(D_NET, "%p net %s\n", lpn, libcfs_net2str(lpn->lpn_net_id)); + + LASSERT(atomic_read(&lpn->lpn_refcount) == 0); + LASSERT(list_empty(&lpn->lpn_peer_nis)); + LASSERT(list_empty(&lpn->lpn_peer_nets)); + lp = lpn->lpn_peer; + lpn->lpn_peer = NULL; + kfree(lpn); + + lnet_peer_decref_locked(lp); +} + static struct lnet_peer * lnet_peer_alloc(lnet_nid_t nid) { @@ -180,53 +199,73 @@ lnet_peer_alloc(lnet_nid_t nid) if (!lp) return NULL; - INIT_LIST_HEAD(&lp->lp_on_lnet_peer_list); + INIT_LIST_HEAD(&lp->lp_peer_list); INIT_LIST_HEAD(&lp->lp_peer_nets); spin_lock_init(&lp->lp_lock); lp->lp_primary_nid = nid; + lp->lp_cpt = lnet_nid_cpt_hash(nid, LNET_CPT_NUMBER); - /* TODO: update flags */ + CDEBUG(D_NET, "%p nid %s\n", lp, libcfs_nid2str(lp->lp_primary_nid)); return lp; } +void +lnet_destroy_peer_locked(struct lnet_peer *lp) +{ + CDEBUG(D_NET, "%p nid %s\n", lp, libcfs_nid2str(lp->lp_primary_nid)); + + LASSERT(atomic_read(&lp->lp_refcount) == 0); + LASSERT(list_empty(&lp->lp_peer_nets)); + LASSERT(list_empty(&lp->lp_peer_list)); + + kfree(lp); +} + +/* + * Detach a peer_ni from its peer_net. If this was the last peer_ni on + * that peer_net, detach the peer_net from the peer. + * + * Call with lnet_net_lock/EX held + */ static void -lnet_peer_detach_peer_ni(struct lnet_peer_ni *lpni) +lnet_peer_detach_peer_ni_locked(struct lnet_peer_ni *lpni) { + struct lnet_peer_table *ptable; struct lnet_peer_net *lpn; struct lnet_peer *lp; - /* TODO: could the below situation happen? accessing an already - * destroyed peer? + /* + * Belts and suspenders: gracefully handle teardown of a + * partially connected peer_ni. */ - if (!lpni->lpni_peer_net || - !lpni->lpni_peer_net->lpn_peer) - return; - lpn = lpni->lpni_peer_net; - lp = lpni->lpni_peer_net->lpn_peer; - CDEBUG(D_NET, "peer %s NID %s\n", - libcfs_nid2str(lp->lp_primary_nid), - libcfs_nid2str(lpni->lpni_nid)); - - list_del_init(&lpni->lpni_on_peer_net_list); - lpni->lpni_peer_net = NULL; + list_del_init(&lpni->lpni_peer_nis); + /* + * If there are no lpni's left, we detach lpn from + * lp_peer_nets, so it cannot be found anymore. + */ + if (list_empty(&lpn->lpn_peer_nis)) + list_del_init(&lpn->lpn_peer_nets); - /* if lpn is empty, then remove it from the peer */ - if (list_empty(&lpn->lpn_peer_nis)) { - list_del_init(&lpn->lpn_on_peer_list); - lpn->lpn_peer = NULL; - kfree(lpn); + /* Update peer NID count. */ + lp = lpn->lpn_peer; + ptable = the_lnet.ln_peer_tables[lp->lp_cpt]; + lp->lp_nnis--; + ptable->pt_peer_nnids--; - /* If the peer is empty then remove it from the - * the_lnet.ln_peers. - */ - if (list_empty(&lp->lp_peer_nets)) { - list_del_init(&lp->lp_on_lnet_peer_list); - kfree(lp); - } + /* + * If there are no more peer nets, make the peer unfindable + * via the peer_tables. + */ + if (list_empty(&lp->lp_peer_nets)) { + list_del_init(&lp->lp_peer_list); + ptable->pt_peers--; } + CDEBUG(D_NET, "peer %s NID %s\n", + libcfs_nid2str(lp->lp_primary_nid), + libcfs_nid2str(lpni->lpni_nid)); } /* called with lnet_net_lock LNET_LOCK_EX held */ @@ -268,9 +307,9 @@ lnet_peer_ni_del_locked(struct lnet_peer_ni *lpni) spin_unlock(&ptable->pt_zombie_lock); /* no need to keep this peer_ni on the hierarchy anymore */ - lnet_peer_detach_peer_ni(lpni); + lnet_peer_detach_peer_ni_locked(lpni); - /* decrement reference on peer_ni */ + /* remove hashlist reference on peer_ni */ lnet_peer_ni_decref_locked(lpni); return 0; @@ -319,6 +358,8 @@ lnet_peer_tables_create(void) spin_lock_init(&ptable->pt_zombie_lock); INIT_LIST_HEAD(&ptable->pt_zombie_list); + INIT_LIST_HEAD(&ptable->pt_peer_list); + for (j = 0; j < LNET_PEER_HASH_SIZE; j++) INIT_LIST_HEAD(&hash[j]); ptable->pt_hash = hash; /* sign of initialization */ @@ -394,7 +435,7 @@ lnet_peer_del_nid(struct lnet_peer *lp, lnet_nid_t nid, unsigned int flags) * This function only allows deletion of the primary NID if it * is the only NID. */ - if (nid == lp->lp_primary_nid && lnet_get_num_peer_nis(lp) != 1) { + if (nid == lp->lp_primary_nid && lp->lp_nnis != 1) { rc = -EBUSY; goto out; } @@ -560,15 +601,34 @@ struct lnet_peer_ni * lnet_get_peer_ni_idx_locked(int idx, struct lnet_peer_net **lpn, struct lnet_peer **lp) { + struct lnet_peer_table *ptable; struct lnet_peer_ni *lpni; + int lncpt; + int cpt; + + lncpt = cfs_percpt_number(the_lnet.ln_peer_tables); - list_for_each_entry((*lp), &the_lnet.ln_peers, lp_on_lnet_peer_list) { + for (cpt = 0; cpt < lncpt; cpt++) { + ptable = the_lnet.ln_peer_tables[cpt]; + if (ptable->pt_peer_nnids > idx) + break; + idx -= ptable->pt_peer_nnids; + } + if (cpt >= lncpt) + return NULL; + + list_for_each_entry((*lp), &ptable->pt_peer_list, lp_peer_list) { + if ((*lp)->lp_nnis <= idx) { + idx -= (*lp)->lp_nnis; + continue; + } list_for_each_entry((*lpn), &((*lp)->lp_peer_nets), - lpn_on_peer_list) { + lpn_peer_nets) { list_for_each_entry(lpni, &((*lpn)->lpn_peer_nis), - lpni_on_peer_net_list) + lpni_peer_nis) { if (idx-- == 0) return lpni; + } } } @@ -584,18 +644,21 @@ lnet_get_next_peer_ni_locked(struct lnet_peer *peer, struct lnet_peer_net *net = peer_net; if (!prev) { - if (!net) + if (!net) { + if (list_empty(&peer->lp_peer_nets)) + return NULL; + net = list_entry(peer->lp_peer_nets.next, struct lnet_peer_net, - lpn_on_peer_list); + lpn_peer_nets); + } lpni = list_entry(net->lpn_peer_nis.next, struct lnet_peer_ni, - lpni_on_peer_net_list); + lpni_peer_nis); return lpni; } - if (prev->lpni_on_peer_net_list.next == - &prev->lpni_peer_net->lpn_peer_nis) { + if (prev->lpni_peer_nis.next == &prev->lpni_peer_net->lpn_peer_nis) { /* * if you reached the end of the peer ni list and the peer * net is specified then there are no more peer nis in that @@ -608,25 +671,25 @@ lnet_get_next_peer_ni_locked(struct lnet_peer *peer, * we reached the end of this net ni list. move to the * next net */ - if (prev->lpni_peer_net->lpn_on_peer_list.next == + if (prev->lpni_peer_net->lpn_peer_nets.next == &peer->lp_peer_nets) /* no more nets and no more NIs. */ return NULL; /* get the next net */ - net = list_entry(prev->lpni_peer_net->lpn_on_peer_list.next, + net = list_entry(prev->lpni_peer_net->lpn_peer_nets.next, struct lnet_peer_net, - lpn_on_peer_list); + lpn_peer_nets); /* get the ni on it */ lpni = list_entry(net->lpn_peer_nis.next, struct lnet_peer_ni, - lpni_on_peer_net_list); + lpni_peer_nis); return lpni; } /* there are more nis left */ - lpni = list_entry(prev->lpni_on_peer_net_list.next, - struct lnet_peer_ni, lpni_on_peer_net_list); + lpni = list_entry(prev->lpni_peer_nis.next, + struct lnet_peer_ni, lpni_peer_nis); return lpni; } @@ -902,7 +965,7 @@ struct lnet_peer_net * lnet_peer_get_net_locked(struct lnet_peer *peer, u32 net_id) { struct lnet_peer_net *peer_net; - list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_on_peer_list) { + list_for_each_entry(peer_net, &peer->lp_peer_nets, lpn_peer_nets) { if (peer_net->lpn_net_id == net_id) return peer_net; } @@ -910,15 +973,20 @@ lnet_peer_get_net_locked(struct lnet_peer *peer, u32 net_id) } /* - * Always returns 0, but it the last function called from functions + * Attach a peer_ni to a peer_net and peer. This function assumes + * peer_ni is not already attached to the peer_net/peer. The peer_ni + * may be attached to a different peer, in which case it will be + * properly detached first. The whole operation is done atomically. + * + * Always returns 0. This is the last function called from functions * that do return an int, so returning 0 here allows the compiler to * do a tail call. */ static int lnet_peer_attach_peer_ni(struct lnet_peer *lp, - struct lnet_peer_net *lpn, - struct lnet_peer_ni *lpni, - unsigned int flags) + struct lnet_peer_net *lpn, + struct lnet_peer_ni *lpni, + unsigned int flags) { struct lnet_peer_table *ptable; @@ -932,26 +1000,38 @@ lnet_peer_attach_peer_ni(struct lnet_peer *lp, list_add_tail(&lpni->lpni_hashlist, &ptable->pt_hash[hash]); ptable->pt_version++; atomic_inc(&ptable->pt_number); + /* This is the 1st refcount on lpni. */ atomic_inc(&lpni->lpni_refcount); } /* Detach the peer_ni from an existing peer, if necessary. */ - if (lpni->lpni_peer_net && lpni->lpni_peer_net->lpn_peer != lp) - lnet_peer_detach_peer_ni(lpni); + if (lpni->lpni_peer_net) { + LASSERT(lpni->lpni_peer_net != lpn); + LASSERT(lpni->lpni_peer_net->lpn_peer != lp); + lnet_peer_detach_peer_ni_locked(lpni); + lnet_peer_net_decref_locked(lpni->lpni_peer_net); + lpni->lpni_peer_net = NULL; + } /* Add peer_ni to peer_net */ lpni->lpni_peer_net = lpn; - list_add_tail(&lpni->lpni_on_peer_net_list, &lpn->lpn_peer_nis); + list_add_tail(&lpni->lpni_peer_nis, &lpn->lpn_peer_nis); + lnet_peer_net_addref_locked(lpn); /* Add peer_net to peer */ if (!lpn->lpn_peer) { lpn->lpn_peer = lp; - list_add_tail(&lpn->lpn_on_peer_list, &lp->lp_peer_nets); + list_add_tail(&lpn->lpn_peer_nets, &lp->lp_peer_nets); + lnet_peer_addref_locked(lp); + } + + /* Add peer to global peer list, if necessary */ + ptable = the_lnet.ln_peer_tables[lp->lp_cpt]; + if (list_empty(&lp->lp_peer_list)) { + list_add_tail(&lp->lp_peer_list, &ptable->pt_peer_list); + ptable->pt_peers++; } - /* Add peer to global peer list */ - if (list_empty(&lp->lp_on_lnet_peer_list)) - list_add_tail(&lp->lp_on_lnet_peer_list, &the_lnet.ln_peers); /* Update peer state */ spin_lock(&lp->lp_lock); @@ -967,6 +1047,8 @@ lnet_peer_attach_peer_ni(struct lnet_peer *lp, } spin_unlock(&lp->lp_lock); + lp->lp_nnis++; + the_lnet.ln_peer_tables[lp->lp_cpt]->pt_peer_nnids++; lnet_net_unlock(LNET_LOCK_EX); CDEBUG(D_NET, "peer %s NID %s flags %#x\n", @@ -1314,12 +1396,17 @@ void lnet_destroy_peer_ni_locked(struct lnet_peer_ni *lpni) { struct lnet_peer_table *ptable; + struct lnet_peer_net *lpn; + + CDEBUG(D_NET, "%p nid %s\n", lpni, libcfs_nid2str(lpni->lpni_nid)); LASSERT(atomic_read(&lpni->lpni_refcount) == 0); LASSERT(lpni->lpni_rtr_refcount == 0); LASSERT(list_empty(&lpni->lpni_txq)); LASSERT(lpni->lpni_txqnob == 0); + lpn = lpni->lpni_peer_net; + lpni->lpni_peer_net = NULL; lpni->lpni_net = NULL; /* remove the peer ni from the zombie list */ @@ -1332,6 +1419,8 @@ lnet_destroy_peer_ni_locked(struct lnet_peer_ni *lpni) if (lpni->lpni_pref_nnids > 1) kfree(lpni->lpni_pref.nids); kfree(lpni); + + lnet_peer_net_decref_locked(lpn); } struct lnet_peer_ni * @@ -1518,6 +1607,7 @@ lnet_get_peer_ni_info(__u32 peer_index, __u64 *nid, return found ? 0 : -ENOENT; } +/* ln_api_mutex is held, which keeps the peer list stable */ int lnet_get_peer_info(__u32 idx, lnet_nid_t *primary_nid, lnet_nid_t *nid, bool *mr, struct lnet_peer_ni_credit_info __user *peer_ni_info, From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629821 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 761B3112B for ; Sun, 7 Oct 2018 23:31:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6648128CBF for ; Sun, 7 Oct 2018 23:31:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 59AE128CC8; Sun, 7 Oct 2018 23:31:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 047DC28CBF for ; Sun, 7 Oct 2018 23:31:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9D0838617D2; Sun, 7 Oct 2018 16:31:42 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7178821F502 for ; Sun, 7 Oct 2018 16:31:40 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 876A9AE17; Sun, 7 Oct 2018 23:31:39 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437812.16383.7373974293282162856.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 15/24] lustre: lnet: add msg_type to lnet_event X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Add a msg_type field to the lnet_event structure. This makes it possible for an event handler to tell whether LNET_EVENT_SEND corresponds to a GET or a PUT message. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25785 Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons Reviewed-by: James Simmons --- .../lustre/include/uapi/linux/lnet/lnet-types.h | 5 +++++ drivers/staging/lustre/lnet/lnet/lib-msg.c | 1 + 2 files changed, 6 insertions(+) diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h index e0e4fd259795..1ecf18e4a278 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h @@ -650,6 +650,11 @@ struct lnet_event { * \see LNetPut */ __u64 hdr_data; + /** + * The message type, to ensure a handler for LNET_EVENT_SEND can + * distinguish between LNET_MSG_GET and LNET_MSG_PUT. + */ + __u32 msg_type; /** * Indicates the completion status of the operation. It's 0 for * successful operations, otherwise it's an error code. diff --git a/drivers/staging/lustre/lnet/lnet/lib-msg.c b/drivers/staging/lustre/lnet/lnet/lib-msg.c index 1817e54a16a5..db13d01d366f 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-msg.c +++ b/drivers/staging/lustre/lnet/lnet/lib-msg.c @@ -63,6 +63,7 @@ lnet_build_msg_event(struct lnet_msg *msg, enum lnet_event_kind ev_type) LASSERT(!msg->msg_routing); ev->type = ev_type; + ev->msg_type = msg->msg_type; if (ev_type == LNET_EVENT_SEND) { /* event for active message */ From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629823 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3697214DB for ; Sun, 7 Oct 2018 23:31:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 207CE28CBF for ; Sun, 7 Oct 2018 23:31:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1286728CC8; Sun, 7 Oct 2018 23:31:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E110F28CBF for ; Sun, 7 Oct 2018 23:31:49 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AD6C9861806; Sun, 7 Oct 2018 16:31:49 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7D0BC21F502 for ; Sun, 7 Oct 2018 16:31:48 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 78E31ADF7; Sun, 7 Oct 2018 23:31:47 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437816.16383.10343171262123774566.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 16/24] lustre: lnet: add discovery thread X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Add the discovery thread, which will be used to handle peer discovery. This change adds the thread and the infrastructure that starts and stops it. The thread itself does trivial work. Peer Discovery gets its own event queue (ln_dc_eqh), a queue for peers that are to be discovered (ln_dc_request), a queue for peers waiting for an event (ln_dc_working), a wait queue head so the thread can sleep (ln_dc_waitq), and start/stop state (ln_dc_state). Peer discovery is started from lnet_select_pathway(), for GET and PUT messages not sent to the LNET_RESERVED_PORTAL. This criterion means that discovery will not be triggered by the messages used in discovery, and neither will an LNet ping trigger it. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/25786 Reviewed-by: Olaf Weber Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 6 .../staging/lustre/include/linux/lnet/lib-types.h | 71 ++++ drivers/staging/lustre/lnet/lnet/api-ni.c | 31 ++ drivers/staging/lustre/lnet/lnet/lib-move.c | 45 ++- drivers/staging/lustre/lnet/lnet/peer.c | 325 ++++++++++++++++++++ 5 files changed, 468 insertions(+), 10 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index aad25eb0011b..848d622911a4 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -438,6 +438,7 @@ bool lnet_is_ni_healthy_locked(struct lnet_ni *ni); struct lnet_net *lnet_get_net_locked(u32 net_id); extern unsigned int lnet_numa_range; +extern unsigned int lnet_peer_discovery_disabled; extern int portal_rotor; int lnet_lib_init(void); @@ -704,6 +705,9 @@ struct lnet_peer_ni *lnet_nid2peerni_ex(lnet_nid_t nid, int cpt); struct lnet_peer_ni *lnet_find_peer_ni_locked(lnet_nid_t nid); void lnet_peer_net_added(struct lnet_net *net); lnet_nid_t lnet_peer_primary_nid_locked(lnet_nid_t nid); +int lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt); +int lnet_peer_discovery_start(void); +void lnet_peer_discovery_stop(void); void lnet_peer_tables_cleanup(struct lnet_net *net); void lnet_peer_uninit(void); int lnet_peer_tables_create(void); @@ -791,4 +795,6 @@ lnet_peer_ni_is_primary(struct lnet_peer_ni *lpni) return lpni->lpni_nid == lpni->lpni_peer_net->lpn_peer->lp_primary_nid; } +bool lnet_peer_is_uptodate(struct lnet_peer *lp); + #endif diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index 260619e19bde..6394a3af50b7 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -520,10 +520,61 @@ struct lnet_peer { /* peer state flags */ unsigned int lp_state; + + /* link on discovery-related lists */ + struct list_head lp_dc_list; + + /* tasks waiting on discovery of this peer */ + wait_queue_head_t lp_dc_waitq; }; -#define LNET_PEER_MULTI_RAIL BIT(0) -#define LNET_PEER_CONFIGURED BIT(1) +/* + * The status flags in lp_state. Their semantics have chosen so that + * lp_state can be zero-initialized. + * + * A peer is marked MULTI_RAIL in two cases: it was configured using DLC + * as multi-rail aware, or the LNET_PING_FEAT_MULTI_RAIL bit was set. + * + * A peer is marked NO_DISCOVERY if the LNET_PING_FEAT_DISCOVERY bit was + * NOT set when the peer was pinged by discovery. + */ +#define LNET_PEER_MULTI_RAIL BIT(0) /* Multi-rail aware */ +#define LNET_PEER_NO_DISCOVERY BIT(1) /* Peer disabled discovery */ +/* + * A peer is marked CONFIGURED if it was configured by DLC. + * + * In addition, a peer is marked DISCOVERED if it has fully passed + * through Peer Discovery. + * + * When Peer Discovery is disabled, the discovery thread will mark + * peers REDISCOVER to indicate that they should be re-examined if + * discovery is (re)enabled on the node. + * + * A peer that was created as the result of inbound traffic will not + * be marked at all. + */ +#define LNET_PEER_CONFIGURED BIT(2) /* Configured via DLC */ +#define LNET_PEER_DISCOVERED BIT(3) /* Peer was discovered */ +#define LNET_PEER_REDISCOVER BIT(4) /* Discovery was disabled */ +/* + * A peer is marked DISCOVERING when discovery is in progress. + * The other flags below correspond to stages of discovery. + */ +#define LNET_PEER_DISCOVERING BIT(5) /* Discovering */ +#define LNET_PEER_DATA_PRESENT BIT(6) /* Remote peer data present */ +#define LNET_PEER_NIDS_UPTODATE BIT(7) /* Remote peer info uptodate */ +#define LNET_PEER_PING_SENT BIT(8) /* Waiting for REPLY to Ping */ +#define LNET_PEER_PUSH_SENT BIT(9) /* Waiting for ACK of Push */ +#define LNET_PEER_PING_FAILED BIT(10) /* Ping send failure */ +#define LNET_PEER_PUSH_FAILED BIT(11) /* Push send failure */ +/* + * A ping can be forced as a way to fix up state, or as a manual + * intervention by an admin. + * A push can be forced in circumstances that would normally not + * allow for one to happen. + */ +#define LNET_PEER_FORCE_PING BIT(12) /* Forced Ping */ +#define LNET_PEER_FORCE_PUSH BIT(13) /* Forced Push */ struct lnet_peer_net { /* chain on lp_peer_nets */ @@ -775,6 +826,11 @@ struct lnet_msg_container { void **msc_finalizers; }; +/* Peer Discovery states */ +#define LNET_DC_STATE_SHUTDOWN 0 /* not started */ +#define LNET_DC_STATE_RUNNING 1 /* started up OK */ +#define LNET_DC_STATE_STOPPING 2 /* telling thread to stop */ + /* Router Checker states */ enum lnet_rc_state { LNET_RC_STATE_SHUTDOWN, /* not started */ @@ -856,6 +912,17 @@ struct lnet { struct lnet_ping_buffer *ln_ping_target; atomic_t ln_ping_target_seqno; + /* discovery event queue handle */ + struct lnet_handle_eq ln_dc_eqh; + /* discovery requests */ + struct list_head ln_dc_request; + /* discovery working list */ + struct list_head ln_dc_working; + /* discovery thread wait queue */ + wait_queue_head_t ln_dc_waitq; + /* discovery startup/shutdown state */ + int ln_dc_state; + /* router checker startup/shutdown state */ enum lnet_rc_state ln_rc_state; /* router checker's event queue */ diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index c48bcb8722a0..dccfd5bcc459 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -78,6 +78,13 @@ module_param_call(lnet_interfaces_max, intf_max_set, param_get_int, MODULE_PARM_DESC(lnet_interfaces_max, "Maximum number of interfaces in a node."); +unsigned int lnet_peer_discovery_disabled; +static int discovery_set(const char *val, const struct kernel_param *kp); +module_param_call(lnet_peer_discovery_disabled, discovery_set, param_get_int, + &lnet_peer_discovery_disabled, 0644); +MODULE_PARM_DESC(lnet_peer_discovery_disabled, + "Set to 1 to disable peer discovery on this node."); + /* * This sequence number keeps track of how many times DLC was used to * update the local NIs. It is incremented when a NI is added or @@ -90,6 +97,23 @@ static atomic_t lnet_dlc_seq_no = ATOMIC_INIT(0); static int lnet_ping(struct lnet_process_id id, signed long timeout, struct lnet_process_id __user *ids, int n_ids); +static int +discovery_set(const char *val, const struct kernel_param *kp) +{ + int rc; + unsigned long value; + + rc = kstrtoul(val, 0, &value); + if (rc) { + CERROR("Invalid module parameter value for 'lnet_peer_discovery_disabled'\n"); + return rc; + } + + *(unsigned int *)kp->arg = !!value; + + return 0; +} + static int intf_max_set(const char *val, const struct kernel_param *kp) { @@ -1921,6 +1945,10 @@ LNetNIInit(lnet_pid_t requested_pid) if (rc) goto err_stop_ping; + rc = lnet_peer_discovery_start(); + if (rc != 0) + goto err_stop_router_checker; + lnet_fault_init(); lnet_router_debugfs_init(); @@ -1928,6 +1956,8 @@ LNetNIInit(lnet_pid_t requested_pid) return 0; +err_stop_router_checker: + lnet_router_checker_stop(); err_stop_ping: lnet_ping_target_fini(); err_acceptor_stop: @@ -1976,6 +2006,7 @@ LNetNIFini(void) lnet_fault_fini(); lnet_router_debugfs_fini(); + lnet_peer_discovery_stop(); lnet_router_checker_stop(); lnet_ping_target_fini(); diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index 4c1eef907dc7..4773180cc7b3 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1208,6 +1208,27 @@ lnet_get_best_ni(struct lnet_net *local_net, struct lnet_ni *cur_ni, return best_ni; } +/* + * Traffic to the LNET_RESERVED_PORTAL may not trigger peer discovery, + * because such traffic is required to perform discovery. We therefore + * exclude all GET and PUT on that portal. We also exclude all ACK and + * REPLY traffic, but that is because the portal is not tracked in the + * message structure for these message types. We could restrict this + * further by also checking for LNET_PROTO_PING_MATCHBITS. + */ +static bool +lnet_msg_discovery(struct lnet_msg *msg) +{ + if (msg->msg_type == LNET_MSG_PUT) { + if (msg->msg_hdr.msg.put.ptl_index != LNET_RESERVED_PORTAL) + return true; + } else if (msg->msg_type == LNET_MSG_GET) { + if (msg->msg_hdr.msg.get.ptl_index != LNET_RESERVED_PORTAL) + return true; + } + return false; +} + static int lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid) @@ -1220,7 +1241,6 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, struct lnet_peer *peer; struct lnet_peer_net *peer_net; struct lnet_net *local_net; - __u32 seq; int cpt, cpt2, rc; bool routing; bool routing2; @@ -1255,13 +1275,6 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, routing2 = false; local_found = false; - seq = lnet_get_dlc_seq_locked(); - - if (the_lnet.ln_state != LNET_STATE_RUNNING) { - lnet_net_unlock(cpt); - return -ESHUTDOWN; - } - /* * lnet_nid2peerni_locked() is the path that will find an * existing peer_ni, or create one and mark it as having been @@ -1272,7 +1285,22 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, lnet_net_unlock(cpt); return PTR_ERR(lpni); } + /* + * Now that we have a peer_ni, check if we want to discover + * the peer. Traffic to the LNET_RESERVED_PORTAL should not + * trigger discovery. + */ peer = lpni->lpni_peer_net->lpn_peer; + if (lnet_msg_discovery(msg) && !lnet_peer_is_uptodate(peer)) { + rc = lnet_discover_peer_locked(lpni, cpt); + if (rc) { + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(cpt); + return rc; + } + /* The peer may have changed. */ + peer = lpni->lpni_peer_net->lpn_peer; + } lnet_peer_ni_decref_locked(lpni); /* If peer is not healthy then can not send anything to it */ @@ -1701,6 +1729,7 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, */ cpt2 = lnet_cpt_of_nid_locked(best_lpni->lpni_nid, best_ni); if (cpt != cpt2) { + __u32 seq = lnet_get_dlc_seq_locked(); lnet_net_unlock(cpt); cpt = cpt2; lnet_net_lock(cpt); diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index d7a0a2f3bdd9..038b58414ce0 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -201,6 +201,8 @@ lnet_peer_alloc(lnet_nid_t nid) INIT_LIST_HEAD(&lp->lp_peer_list); INIT_LIST_HEAD(&lp->lp_peer_nets); + INIT_LIST_HEAD(&lp->lp_dc_list); + init_waitqueue_head(&lp->lp_dc_waitq); spin_lock_init(&lp->lp_lock); lp->lp_primary_nid = nid; lp->lp_cpt = lnet_nid_cpt_hash(nid, LNET_CPT_NUMBER); @@ -1457,6 +1459,10 @@ lnet_nid2peerni_ex(lnet_nid_t nid, int cpt) return lpni; } +/* + * Get a peer_ni for the given nid, create it if necessary. Takes a + * hold on the peer_ni. + */ struct lnet_peer_ni * lnet_nid2peerni_locked(lnet_nid_t nid, lnet_nid_t pref, int cpt) { @@ -1510,9 +1516,326 @@ lnet_nid2peerni_locked(lnet_nid_t nid, lnet_nid_t pref, int cpt) mutex_unlock(&the_lnet.ln_api_mutex); lnet_net_lock(cpt); + /* Lock has been dropped, check again for shutdown. */ + if (the_lnet.ln_state == LNET_STATE_SHUTDOWN) { + if (!IS_ERR(lpni)) + lnet_peer_ni_decref_locked(lpni); + lpni = ERR_PTR(-ESHUTDOWN); + } + return lpni; } +/* + * Peer Discovery + */ + +/* + * Is a peer uptodate from the point of view of discovery? + * + * If it is currently being processed, obviously not. + * A forced Ping or Push is also handled by the discovery thread. + * + * Otherwise look at whether the peer needs rediscovering. + */ +bool +lnet_peer_is_uptodate(struct lnet_peer *lp) +{ + bool rc; + + spin_lock(&lp->lp_lock); + if (lp->lp_state & (LNET_PEER_DISCOVERING | + LNET_PEER_FORCE_PING | + LNET_PEER_FORCE_PUSH)) { + rc = false; + } else if (lp->lp_state & LNET_PEER_REDISCOVER) { + if (lnet_peer_discovery_disabled) + rc = true; + else + rc = false; + } else if (lp->lp_state & LNET_PEER_DISCOVERED) { + if (lp->lp_state & LNET_PEER_NIDS_UPTODATE) + rc = true; + else + rc = false; + } else { + rc = false; + } + spin_unlock(&lp->lp_lock); + + return rc; +} + +/* + * Queue a peer for the attention of the discovery thread. Call with + * lnet_net_lock/EX held. Returns 0 if the peer was queued, and + * -EALREADY if the peer was already queued. + */ +static int lnet_peer_queue_for_discovery(struct lnet_peer *lp) +{ + int rc; + + spin_lock(&lp->lp_lock); + if (!(lp->lp_state & LNET_PEER_DISCOVERING)) + lp->lp_state |= LNET_PEER_DISCOVERING; + spin_unlock(&lp->lp_lock); + if (list_empty(&lp->lp_dc_list)) { + lnet_peer_addref_locked(lp); + list_add_tail(&lp->lp_dc_list, &the_lnet.ln_dc_request); + wake_up(&the_lnet.ln_dc_waitq); + rc = 0; + } else { + rc = -EALREADY; + } + + return rc; +} + +/* + * Discovery of a peer is complete. Wake all waiters on the peer. + * Call with lnet_net_lock/EX held. + */ +static void lnet_peer_discovery_complete(struct lnet_peer *lp) +{ + list_del_init(&lp->lp_dc_list); + wake_up_all(&lp->lp_dc_waitq); + lnet_peer_decref_locked(lp); +} + +/* + * Peer discovery slow path. The ln_api_mutex is held on entry, and + * dropped/retaken within this function. An lnet_peer_ni is passed in + * because discovery could tear down an lnet_peer. + */ +int +lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt) +{ + DEFINE_WAIT(wait); + struct lnet_peer *lp; + int rc = 0; + +again: + lnet_net_unlock(cpt); + lnet_net_lock(LNET_LOCK_EX); + + /* We're willing to be interrupted. */ + for (;;) { + lp = lpni->lpni_peer_net->lpn_peer; + prepare_to_wait(&lp->lp_dc_waitq, &wait, TASK_INTERRUPTIBLE); + if (signal_pending(current)) + break; + if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING) + break; + if (lnet_peer_is_uptodate(lp)) + break; + lnet_peer_queue_for_discovery(lp); + lnet_peer_addref_locked(lp); + lnet_net_unlock(LNET_LOCK_EX); + schedule(); + finish_wait(&lp->lp_dc_waitq, &wait); + lnet_net_lock(LNET_LOCK_EX); + lnet_peer_decref_locked(lp); + /* Do not use lp beyond this point. */ + } + finish_wait(&lp->lp_dc_waitq, &wait); + + lnet_net_unlock(LNET_LOCK_EX); + lnet_net_lock(cpt); + + if (signal_pending(current)) + rc = -EINTR; + else if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING) + rc = -ESHUTDOWN; + else if (!lnet_peer_is_uptodate(lp)) + goto again; + + return rc; +} + +/* + * Event handler for the discovery EQ. + * + * Called with lnet_res_lock(cpt) held. The cpt is the + * lnet_cpt_of_cookie() of the md handle cookie. + */ +static void lnet_discovery_event_handler(struct lnet_event *event) +{ + wake_up(&the_lnet.ln_dc_waitq); +} + +/* + * Wait for work to be queued or some other change that must be + * attended to. Returns non-zero if the discovery thread should shut + * down. + */ +static int lnet_peer_discovery_wait_for_work(void) +{ + int cpt; + int rc = 0; + + DEFINE_WAIT(wait); + + cpt = lnet_net_lock_current(); + for (;;) { + prepare_to_wait(&the_lnet.ln_dc_waitq, &wait, + TASK_IDLE); + if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) + break; + if (!list_empty(&the_lnet.ln_dc_request)) + break; + lnet_net_unlock(cpt); + schedule(); + finish_wait(&the_lnet.ln_dc_waitq, &wait); + cpt = lnet_net_lock_current(); + } + finish_wait(&the_lnet.ln_dc_waitq, &wait); + + if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) + rc = -ESHUTDOWN; + + lnet_net_unlock(cpt); + + CDEBUG(D_NET, "woken: %d\n", rc); + + return rc; +} + +/* The discovery thread. */ +static int lnet_peer_discovery(void *arg) +{ + struct lnet_peer *lp; + + CDEBUG(D_NET, "started\n"); + + for (;;) { + if (lnet_peer_discovery_wait_for_work()) + break; + + lnet_net_lock(LNET_LOCK_EX); + if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) + break; + while (!list_empty(&the_lnet.ln_dc_request)) { + lp = list_first_entry(&the_lnet.ln_dc_request, + struct lnet_peer, lp_dc_list); + list_move(&lp->lp_dc_list, &the_lnet.ln_dc_working); + lnet_net_unlock(LNET_LOCK_EX); + + /* Just tag and release for now. */ + spin_lock(&lp->lp_lock); + if (lnet_peer_discovery_disabled) { + lp->lp_state |= LNET_PEER_REDISCOVER; + lp->lp_state &= ~(LNET_PEER_DISCOVERED | + LNET_PEER_NIDS_UPTODATE | + LNET_PEER_DISCOVERING); + } else { + lp->lp_state |= (LNET_PEER_DISCOVERED | + LNET_PEER_NIDS_UPTODATE); + lp->lp_state &= ~(LNET_PEER_REDISCOVER | + LNET_PEER_DISCOVERING); + } + spin_unlock(&lp->lp_lock); + + lnet_net_lock(LNET_LOCK_EX); + if (!(lp->lp_state & LNET_PEER_DISCOVERING)) + lnet_peer_discovery_complete(lp); + if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) + break; + } + lnet_net_unlock(LNET_LOCK_EX); + } + + CDEBUG(D_NET, "stopping\n"); + /* + * Clean up before telling lnet_peer_discovery_stop() that + * we're done. Use wake_up() below to somewhat reduce the + * size of the thundering herd if there are multiple threads + * waiting on discovery of a single peer. + */ + LNetEQFree(the_lnet.ln_dc_eqh); + LNetInvalidateEQHandle(&the_lnet.ln_dc_eqh); + + lnet_net_lock(LNET_LOCK_EX); + list_for_each_entry(lp, &the_lnet.ln_dc_request, lp_dc_list) { + spin_lock(&lp->lp_lock); + lp->lp_state |= LNET_PEER_REDISCOVER; + lp->lp_state &= ~(LNET_PEER_DISCOVERED | + LNET_PEER_DISCOVERING | + LNET_PEER_NIDS_UPTODATE); + spin_unlock(&lp->lp_lock); + lnet_peer_discovery_complete(lp); + } + list_for_each_entry(lp, &the_lnet.ln_dc_working, lp_dc_list) { + spin_lock(&lp->lp_lock); + lp->lp_state |= LNET_PEER_REDISCOVER; + lp->lp_state &= ~(LNET_PEER_DISCOVERED | + LNET_PEER_DISCOVERING | + LNET_PEER_NIDS_UPTODATE); + spin_unlock(&lp->lp_lock); + lnet_peer_discovery_complete(lp); + } + lnet_net_unlock(LNET_LOCK_EX); + + the_lnet.ln_dc_state = LNET_DC_STATE_SHUTDOWN; + wake_up(&the_lnet.ln_dc_waitq); + + CDEBUG(D_NET, "stopped\n"); + + return 0; +} + +/* ln_api_mutex is held on entry. */ +int lnet_peer_discovery_start(void) +{ + struct task_struct *task; + int rc; + + if (the_lnet.ln_dc_state != LNET_DC_STATE_SHUTDOWN) + return -EALREADY; + + INIT_LIST_HEAD(&the_lnet.ln_dc_request); + INIT_LIST_HEAD(&the_lnet.ln_dc_working); + init_waitqueue_head(&the_lnet.ln_dc_waitq); + + rc = LNetEQAlloc(0, lnet_discovery_event_handler, &the_lnet.ln_dc_eqh); + if (rc != 0) { + CERROR("Can't allocate discovery EQ: %d\n", rc); + return rc; + } + + the_lnet.ln_dc_state = LNET_DC_STATE_RUNNING; + task = kthread_run(lnet_peer_discovery, NULL, "lnet_discovery"); + if (IS_ERR(task)) { + rc = PTR_ERR(task); + CERROR("Can't start peer discovery thread: %d\n", rc); + + LNetEQFree(the_lnet.ln_dc_eqh); + LNetInvalidateEQHandle(&the_lnet.ln_dc_eqh); + + the_lnet.ln_dc_state = LNET_DC_STATE_SHUTDOWN; + } + + return rc; +} + +/* ln_api_mutex is held on entry. */ +void lnet_peer_discovery_stop(void) +{ + if (the_lnet.ln_dc_state == LNET_DC_STATE_SHUTDOWN) + return; + + LASSERT(the_lnet.ln_dc_state == LNET_DC_STATE_RUNNING); + the_lnet.ln_dc_state = LNET_DC_STATE_STOPPING; + wake_up(&the_lnet.ln_dc_waitq); + + wait_event(the_lnet.ln_dc_waitq, + the_lnet.ln_dc_state == LNET_DC_STATE_SHUTDOWN); + + LASSERT(list_empty(&the_lnet.ln_dc_request)); + LASSERT(list_empty(&the_lnet.ln_dc_working)); +} + +/* Debugging */ + void lnet_debug_peer(lnet_nid_t nid) { @@ -1544,6 +1867,8 @@ lnet_debug_peer(lnet_nid_t nid) lnet_net_unlock(cpt); } +/* Gathering information for userspace. */ + int lnet_get_peer_ni_info(__u32 peer_index, __u64 *nid, char aliveness[LNET_MAX_STR_LEN], From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629825 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5CE6C14DB for ; Sun, 7 Oct 2018 23:31:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4969C28CBF for ; Sun, 7 Oct 2018 23:31:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3DD7C28CC8; Sun, 7 Oct 2018 23:31:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8BA3528CBF for ; Sun, 7 Oct 2018 23:31:58 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4BFBF861827; Sun, 7 Oct 2018 16:31:58 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1B0CD8617D2 for ; Sun, 7 Oct 2018 16:31:56 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 4716FAD2C; Sun, 7 Oct 2018 23:31:55 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437820.16383.6069246179655761617.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 17/24] lustre: lnet: add the Push target X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Peer Discovery will send a Push message (same format as an LNet Ping) to Multi-Rail capable peers to give the peer the list of local interfaces. Set up a target buffer for these pushes in the_lnet. The size of this buffer defaults to LNET_MIN_INTERFACES, but it is resized if required. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25788 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 8 + .../staging/lustre/include/linux/lnet/lib-types.h | 25 +++ drivers/staging/lustre/lnet/lnet/api-ni.c | 150 ++++++++++++++++++++ drivers/staging/lustre/lnet/lnet/peer.c | 5 + 4 files changed, 187 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 848d622911a4..5632e5aadf41 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -686,6 +686,14 @@ static inline int lnet_ping_buffer_numref(struct lnet_ping_buffer *pbuf) return atomic_read(&pbuf->pb_refcnt); } +static inline int lnet_push_target_resize_needed(void) +{ + return the_lnet.ln_push_target->pb_nnis < the_lnet.ln_push_target_nnis; +} + +int lnet_push_target_resize(void); +void lnet_peer_push_event(struct lnet_event *ev); + int lnet_parse_ip2nets(char **networksp, char *ip2nets); int lnet_parse_routes(char *route_str, int *im_a_router); int lnet_parse_networks(struct list_head *nilist, char *networks, diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index 6394a3af50b7..e00c13355d43 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -521,6 +521,18 @@ struct lnet_peer { /* peer state flags */ unsigned int lp_state; + /* buffer for data pushed by peer */ + struct lnet_ping_buffer *lp_data; + + /* number of NIDs for sizing push data */ + int lp_data_nnis; + + /* NI config sequence number of peer */ + __u32 lp_peer_seqno; + + /* Local NI config sequence number peer knows */ + __u32 lp_node_seqno; + /* link on discovery-related lists */ struct list_head lp_dc_list; @@ -912,6 +924,19 @@ struct lnet { struct lnet_ping_buffer *ln_ping_target; atomic_t ln_ping_target_seqno; + /* + * Push Target + * + * ln_push_nnis contains the desired size of the push target. + * The lnet_net_lock is used to handle update races. The old + * buffer may linger a while after it has been unlinked, in + * which case the event handler cleans up. + */ + struct lnet_handle_eq ln_push_target_eq; + struct lnet_handle_md ln_push_target_md; + struct lnet_ping_buffer *ln_push_target; + int ln_push_target_nnis; + /* discovery event queue handle */ struct lnet_handle_eq ln_dc_eqh; /* discovery requests */ diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index dccfd5bcc459..e6bc54e9de71 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1268,6 +1268,147 @@ lnet_ping_target_fini(void) lnet_ping_target_destroy(); } +/* Resize the push target. */ +int lnet_push_target_resize(void) +{ + struct lnet_process_id id = { LNET_NID_ANY, LNET_PID_ANY }; + struct lnet_md md = { NULL }; + struct lnet_handle_me meh; + struct lnet_handle_md mdh; + struct lnet_handle_md old_mdh; + struct lnet_ping_buffer *pbuf; + struct lnet_ping_buffer *old_pbuf; + int nnis = the_lnet.ln_push_target_nnis; + int rc; + + if (nnis <= 0) { + rc = -EINVAL; + goto fail_return; + } +again: + pbuf = lnet_ping_buffer_alloc(nnis, GFP_NOFS); + if (!pbuf) { + rc = -ENOMEM; + goto fail_return; + } + + rc = LNetMEAttach(LNET_RESERVED_PORTAL, id, + LNET_PROTO_PING_MATCHBITS, 0, + LNET_UNLINK, LNET_INS_AFTER, + &meh); + if (rc) { + CERROR("Can't create push target ME: %d\n", rc); + goto fail_decref_pbuf; + } + + /* initialize md content */ + md.start = &pbuf->pb_info; + md.length = LNET_PING_INFO_SIZE(nnis); + md.threshold = LNET_MD_THRESH_INF; + md.max_size = 0; + md.options = LNET_MD_OP_PUT | LNET_MD_TRUNCATE | + LNET_MD_MANAGE_REMOTE; + md.user_ptr = pbuf; + md.eq_handle = the_lnet.ln_push_target_eq; + + rc = LNetMDAttach(meh, md, LNET_RETAIN, &mdh); + if (rc) { + CERROR("Can't attach push MD: %d\n", rc); + goto fail_unlink_meh; + } + lnet_ping_buffer_addref(pbuf); + + lnet_net_lock(LNET_LOCK_EX); + old_pbuf = the_lnet.ln_push_target; + old_mdh = the_lnet.ln_push_target_md; + the_lnet.ln_push_target = pbuf; + the_lnet.ln_push_target_md = mdh; + lnet_net_unlock(LNET_LOCK_EX); + + if (old_pbuf) { + LNetMDUnlink(old_mdh); + lnet_ping_buffer_decref(old_pbuf); + } + + if (nnis < the_lnet.ln_push_target_nnis) + goto again; + + CDEBUG(D_NET, "nnis %d success\n", nnis); + + return 0; + +fail_unlink_meh: + LNetMEUnlink(meh); +fail_decref_pbuf: + lnet_ping_buffer_decref(pbuf); +fail_return: + CDEBUG(D_NET, "nnis %d error %d\n", nnis, rc); + return rc; +} + +static void lnet_push_target_event_handler(struct lnet_event *ev) +{ + struct lnet_ping_buffer *pbuf = ev->md.user_ptr; + + if (pbuf->pb_info.pi_magic == __swab32(LNET_PROTO_PING_MAGIC)) + lnet_swap_pinginfo(pbuf); + + if (ev->unlinked) + lnet_ping_buffer_decref(pbuf); +} + +/* Initialize the push target. */ +static int lnet_push_target_init(void) +{ + int rc; + + if (the_lnet.ln_push_target) + return -EALREADY; + + rc = LNetEQAlloc(0, lnet_push_target_event_handler, + &the_lnet.ln_push_target_eq); + if (rc) { + CERROR("Can't allocated push target EQ: %d\n", rc); + return rc; + } + + /* Start at the required minimum, we'll enlarge if required. */ + the_lnet.ln_push_target_nnis = LNET_INTERFACES_MIN; + + rc = lnet_push_target_resize(); + + if (rc) { + LNetEQFree(the_lnet.ln_push_target_eq); + LNetInvalidateEQHandle(&the_lnet.ln_push_target_eq); + } + + return rc; +} + +/* Clean up the push target. */ +static void lnet_push_target_fini(void) +{ + if (!the_lnet.ln_push_target) + return; + + /* Unlink and invalidate to prevent new references. */ + LNetMDUnlink(the_lnet.ln_push_target_md); + LNetInvalidateMDHandle(&the_lnet.ln_push_target_md); + + /* Wait for the unlink to complete. */ + while (lnet_ping_buffer_numref(the_lnet.ln_push_target) > 1) { + CDEBUG(D_NET, "Still waiting for ping data MD to unlink\n"); + schedule_timeout_uninterruptible(HZ); + } + + lnet_ping_buffer_decref(the_lnet.ln_push_target); + the_lnet.ln_push_target = NULL; + the_lnet.ln_push_target_nnis = 0; + + LNetEQFree(the_lnet.ln_push_target_eq); + LNetInvalidateEQHandle(&the_lnet.ln_push_target_eq); +} + static int lnet_ni_tq_credits(struct lnet_ni *ni) { @@ -1945,10 +2086,14 @@ LNetNIInit(lnet_pid_t requested_pid) if (rc) goto err_stop_ping; - rc = lnet_peer_discovery_start(); + rc = lnet_push_target_init(); if (rc != 0) goto err_stop_router_checker; + rc = lnet_peer_discovery_start(); + if (rc != 0) + goto err_destroy_push_target; + lnet_fault_init(); lnet_router_debugfs_init(); @@ -1956,6 +2101,8 @@ LNetNIInit(lnet_pid_t requested_pid) return 0; +err_destroy_push_target: + lnet_push_target_fini(); err_stop_router_checker: lnet_router_checker_stop(); err_stop_ping: @@ -2007,6 +2154,7 @@ LNetNIFini(void) lnet_fault_fini(); lnet_router_debugfs_fini(); lnet_peer_discovery_stop(); + lnet_push_target_fini(); lnet_router_checker_stop(); lnet_ping_target_fini(); diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 038b58414ce0..b78f99c354de 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -1681,6 +1681,8 @@ static int lnet_peer_discovery_wait_for_work(void) TASK_IDLE); if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) break; + if (lnet_push_target_resize_needed()) + break; if (!list_empty(&the_lnet.ln_dc_request)) break; lnet_net_unlock(cpt); @@ -1711,6 +1713,9 @@ static int lnet_peer_discovery(void *arg) if (lnet_peer_discovery_wait_for_work()) break; + if (lnet_push_target_resize_needed()) + lnet_push_target_resize(); + lnet_net_lock(LNET_LOCK_EX); if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) break; From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629827 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03727112B for ; Sun, 7 Oct 2018 23:32:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DDEB128CBF for ; Sun, 7 Oct 2018 23:32:10 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D118C28CC8; Sun, 7 Oct 2018 23:32:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D70A228CBF for ; Sun, 7 Oct 2018 23:32:07 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 93F7F861809; Sun, 7 Oct 2018 16:32:07 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C0FF68617CD for ; Sun, 7 Oct 2018 16:32:04 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id CF6BBAE17; Sun, 7 Oct 2018 23:32:03 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437824.16383.8664465506271547759.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 18/24] lustre: lnet: implement Peer Discovery X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Implement Peer Discovery. A peer is queued for discovery by lnet_peer_queue_for_discovery(). This set LNET_PEER_DISCOVERING, to indicate that discovery is in progress. The discovery thread lnet_peer_discovery() checks the peer and updates its state as appropriate. If LNET_PEER_DATA_PRESENT is set, then a valid Push message or Ping reply has been received. The peer is updated in accordance with the data, and LNET_PEER_NIDS_UPTODATE is set. If LNET_PEER_PING_FAILED is set, then an attempt to send a Ping message failed, and peer state is updated accordingly. The discovery thread can do some cleanup like unlinking an MD that cannot be done from the message event handler. If LNET_PEER_PUSH_FAILED is set, then an attempt to send a Push message failed, and peer state is updated accordingly. The discovery thread can do some cleanup like unlinking an MD that cannot be done from the message event handler. If LNET_PEER_PING_REQUIRED is set, we must Ping the peer in order to correctly update our knowledge of it. This is set, for example, if we receive a Push message for a peer, but cannot handle it because the Push target was too small. In such a case we know that the state of the peer is incorrect, but need to do extra work to obtain the required information. If discovery is not enabled, then the discovery process stops here and the peer is marked with LNET_PEER_UNDISCOVERED. This tells the discovery process that it doesn't need to revisit the peer while discovery remains disabled. If LNET_PEER_NIDS_UPTODATE is not set, then we have reason to think the lnet_peer is not up to date, and will Ping it. The peer needs a Push if it is multi-rail and the ping buffer sequence number for this node is newer than the sequence number it has acknowledged receiving by sending an Ack of a Push. If none of the above is true, then discovery has completed its work on the peer. Discovery signals that it is done with a peer by clearing the LNET_PEER_DISCOVERING flag, and setting LNET_PEER_DISCOVERED or LNET_PEER_UNDISCOVERED as appropriate. It then dequeues the peer and clears the LNET_PEER_QUEUED flag. When the local node is discovered via the loopback network, the peer structure that is created will have an lnet_peer_ni for the local loopback interface. Subsequent traffic from this node to itself will use the loopback net. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25789 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 20 .../staging/lustre/include/linux/lnet/lib-types.h | 39 + drivers/staging/lustre/lnet/lnet/api-ni.c | 59 + drivers/staging/lustre/lnet/lnet/lib-move.c | 18 drivers/staging/lustre/lnet/lnet/peer.c | 1499 +++++++++++++++++++- 5 files changed, 1543 insertions(+), 92 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 5632e5aadf41..f82a699371f2 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -76,6 +76,9 @@ extern struct lnet the_lnet; /* THE network */ #define LNET_ACCEPTOR_MIN_RESERVED_PORT 512 #define LNET_ACCEPTOR_MAX_RESERVED_PORT 1023 +/* Discovery timeout - same as default peer_timeout */ +#define DISCOVERY_TIMEOUT 180 + static inline int lnet_is_route_alive(struct lnet_route *route) { /* gateway is down */ @@ -713,9 +716,10 @@ struct lnet_peer_ni *lnet_nid2peerni_ex(lnet_nid_t nid, int cpt); struct lnet_peer_ni *lnet_find_peer_ni_locked(lnet_nid_t nid); void lnet_peer_net_added(struct lnet_net *net); lnet_nid_t lnet_peer_primary_nid_locked(lnet_nid_t nid); -int lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt); +int lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt, bool block); int lnet_peer_discovery_start(void); void lnet_peer_discovery_stop(void); +void lnet_push_update_to_peers(int force); void lnet_peer_tables_cleanup(struct lnet_net *net); void lnet_peer_uninit(void); int lnet_peer_tables_create(void); @@ -805,4 +809,18 @@ lnet_peer_ni_is_primary(struct lnet_peer_ni *lpni) bool lnet_peer_is_uptodate(struct lnet_peer *lp); +static inline bool +lnet_peer_needs_push(struct lnet_peer *lp) +{ + if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) + return false; + if (lp->lp_state & LNET_PEER_FORCE_PUSH) + return true; + if (lp->lp_state & LNET_PEER_NO_DISCOVERY) + return false; + if (lp->lp_node_seqno < atomic_read(&the_lnet.ln_ping_target_seqno)) + return true; + return false; +} + #endif diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index e00c13355d43..07baa86e61ab 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -67,6 +67,13 @@ struct lnet_msg { lnet_nid_t msg_from; __u32 msg_type; + /* + * hold parameters in case message is with held due + * to discovery + */ + lnet_nid_t msg_src_nid_param; + lnet_nid_t msg_rtr_nid_param; + /* committed for sending */ unsigned int msg_tx_committed:1; /* CPT # this message committed for sending */ @@ -395,6 +402,8 @@ struct lnet_ping_buffer { #define LNET_PING_BUFFER_LONI(PBUF) ((PBUF)->pb_info.pi_ni[0].ns_nid) #define LNET_PING_BUFFER_SEQNO(PBUF) ((PBUF)->pb_info.pi_ni[0].ns_status) +#define LNET_PING_INFO_TO_BUFFER(PINFO) \ + container_of((PINFO), struct lnet_ping_buffer, pb_info) /* router checker data, per router */ struct lnet_rc_data { @@ -503,6 +512,9 @@ struct lnet_peer { /* list of peer nets */ struct list_head lp_peer_nets; + /* list of messages pending discovery*/ + struct list_head lp_dc_pendq; + /* primary NID of the peer */ lnet_nid_t lp_primary_nid; @@ -524,15 +536,36 @@ struct lnet_peer { /* buffer for data pushed by peer */ struct lnet_ping_buffer *lp_data; + /* MD handle for ping in progress */ + struct lnet_handle_md lp_ping_mdh; + + /* MD handle for push in progress */ + struct lnet_handle_md lp_push_mdh; + /* number of NIDs for sizing push data */ int lp_data_nnis; /* NI config sequence number of peer */ __u32 lp_peer_seqno; - /* Local NI config sequence number peer knows */ + /* Local NI config sequence number acked by peer */ __u32 lp_node_seqno; + /* Local NI config sequence number sent to peer */ + __u32 lp_node_seqno_sent; + + /* Ping error encountered during discovery. */ + int lp_ping_error; + + /* Push error encountered during discovery. */ + int lp_push_error; + + /* Error encountered during discovery. */ + int lp_dc_error; + + /* time it was put on the ln_dc_working queue */ + time64_t lp_last_queued; + /* link on discovery-related lists */ struct list_head lp_dc_list; @@ -691,6 +724,8 @@ struct lnet_remotenet { #define LNET_CREDIT_OK 0 /** lnet message is waiting for credit */ #define LNET_CREDIT_WAIT 1 +/** lnet message is waiting for discovery */ +#define LNET_DC_WAIT 2 struct lnet_rtrbufpool { struct list_head rbp_bufs; /* my free buffer pool */ @@ -943,6 +978,8 @@ struct lnet { struct list_head ln_dc_request; /* discovery working list */ struct list_head ln_dc_working; + /* discovery expired list */ + struct list_head ln_dc_expired; /* discovery thread wait queue */ wait_queue_head_t ln_dc_waitq; /* discovery startup/shutdown state */ diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index e6bc54e9de71..955d1711eda4 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -41,7 +41,14 @@ #define D_LNI D_CONSOLE -struct lnet the_lnet; /* THE state of the network */ +/* + * initialize ln_api_mutex statically, since it needs to be used in + * discovery_set callback. That module parameter callback can be called + * before module init completes. The mutex needs to be ready for use then. + */ +struct lnet the_lnet = { + .ln_api_mutex = __MUTEX_INITIALIZER(the_lnet.ln_api_mutex), +}; /* THE state of the network */ EXPORT_SYMBOL(the_lnet); static char *ip2nets = ""; @@ -101,7 +108,9 @@ static int discovery_set(const char *val, const struct kernel_param *kp) { int rc; + unsigned int *discovery = (unsigned int *)kp->arg; unsigned long value; + struct lnet_ping_buffer *pbuf; rc = kstrtoul(val, 0, &value); if (rc) { @@ -109,7 +118,38 @@ discovery_set(const char *val, const struct kernel_param *kp) return rc; } - *(unsigned int *)kp->arg = !!value; + value = !!value; + + /* + * The purpose of locking the api_mutex here is to ensure that + * the correct value ends up stored properly. + */ + mutex_lock(&the_lnet.ln_api_mutex); + + if (value == *discovery) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + *discovery = value; + + if (the_lnet.ln_state == LNET_STATE_SHUTDOWN) { + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + + /* tell peers that discovery setting has changed */ + lnet_net_lock(LNET_LOCK_EX); + pbuf = the_lnet.ln_ping_target; + if (value) + pbuf->pb_info.pi_features &= ~LNET_PING_FEAT_DISCOVERY; + else + pbuf->pb_info.pi_features |= LNET_PING_FEAT_DISCOVERY; + lnet_net_unlock(LNET_LOCK_EX); + + lnet_push_update_to_peers(1); + + mutex_unlock(&the_lnet.ln_api_mutex); return 0; } @@ -171,7 +211,6 @@ lnet_init_locks(void) init_waitqueue_head(&the_lnet.ln_eq_waitq); init_waitqueue_head(&the_lnet.ln_rc_waitq); mutex_init(&the_lnet.ln_lnd_mutex); - mutex_init(&the_lnet.ln_api_mutex); } static int @@ -654,6 +693,10 @@ lnet_prepare(lnet_pid_t requested_pid) INIT_LIST_HEAD(&the_lnet.ln_routers); INIT_LIST_HEAD(&the_lnet.ln_drop_rules); INIT_LIST_HEAD(&the_lnet.ln_delay_rules); + INIT_LIST_HEAD(&the_lnet.ln_dc_request); + INIT_LIST_HEAD(&the_lnet.ln_dc_working); + INIT_LIST_HEAD(&the_lnet.ln_dc_expired); + init_waitqueue_head(&the_lnet.ln_dc_waitq); rc = lnet_create_remote_nets_table(); if (rc) @@ -998,7 +1041,8 @@ lnet_ping_target_create(int nnis) pbuf->pb_info.pi_nnis = nnis; pbuf->pb_info.pi_pid = the_lnet.ln_pid; pbuf->pb_info.pi_magic = LNET_PROTO_PING_MAGIC; - pbuf->pb_info.pi_features = LNET_PING_FEAT_NI_STATUS; + pbuf->pb_info.pi_features = + LNET_PING_FEAT_NI_STATUS | LNET_PING_FEAT_MULTI_RAIL; return pbuf; } @@ -1231,6 +1275,8 @@ lnet_ping_target_update(struct lnet_ping_buffer *pbuf, if (!the_lnet.ln_routing) pbuf->pb_info.pi_features |= LNET_PING_FEAT_RTE_DISABLED; + if (!lnet_peer_discovery_disabled) + pbuf->pb_info.pi_features |= LNET_PING_FEAT_DISCOVERY; /* Ensure only known feature bits have been set. */ LASSERT(pbuf->pb_info.pi_features & LNET_PING_FEAT_BITS); @@ -1252,6 +1298,8 @@ lnet_ping_target_update(struct lnet_ping_buffer *pbuf, lnet_ping_md_unlink(old_pbuf, &old_ping_md); lnet_ping_buffer_decref(old_pbuf); } + + lnet_push_update_to_peers(0); } static void @@ -1353,6 +1401,7 @@ static void lnet_push_target_event_handler(struct lnet_event *ev) if (pbuf->pb_info.pi_magic == __swab32(LNET_PROTO_PING_MAGIC)) lnet_swap_pinginfo(pbuf); + lnet_peer_push_event(ev); if (ev->unlinked) lnet_ping_buffer_decref(pbuf); } @@ -1910,8 +1959,6 @@ int lnet_lib_init(void) lnet_assert_wire_constants(); - memset(&the_lnet, 0, sizeof(the_lnet)); - /* refer to global cfs_cpt_tab for now */ the_lnet.ln_cpt_table = cfs_cpt_tab; the_lnet.ln_cpt_number = cfs_cpt_number(cfs_cpt_tab); diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index 4773180cc7b3..2ff329bf91ba 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -444,6 +444,8 @@ lnet_prep_send(struct lnet_msg *msg, int type, struct lnet_process_id target, memset(&msg->msg_hdr, 0, sizeof(msg->msg_hdr)); msg->msg_hdr.type = cpu_to_le32(type); + /* dest_nid will be overwritten by lnet_select_pathway() */ + msg->msg_hdr.dest_nid = cpu_to_le64(target.nid); msg->msg_hdr.dest_pid = cpu_to_le32(target.pid); /* src_nid will be set later */ msg->msg_hdr.src_pid = cpu_to_le32(the_lnet.ln_pid); @@ -1292,7 +1294,7 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, */ peer = lpni->lpni_peer_net->lpn_peer; if (lnet_msg_discovery(msg) && !lnet_peer_is_uptodate(peer)) { - rc = lnet_discover_peer_locked(lpni, cpt); + rc = lnet_discover_peer_locked(lpni, cpt, false); if (rc) { lnet_peer_ni_decref_locked(lpni); lnet_net_unlock(cpt); @@ -1300,6 +1302,18 @@ lnet_select_pathway(lnet_nid_t src_nid, lnet_nid_t dst_nid, } /* The peer may have changed. */ peer = lpni->lpni_peer_net->lpn_peer; + /* queue message and return */ + msg->msg_src_nid_param = src_nid; + msg->msg_rtr_nid_param = rtr_nid; + msg->msg_sending = 0; + list_add_tail(&msg->msg_list, &peer->lp_dc_pendq); + lnet_peer_ni_decref_locked(lpni); + lnet_net_unlock(cpt); + + CDEBUG(D_NET, "%s pending discovery\n", + libcfs_nid2str(peer->lp_primary_nid)); + + return LNET_DC_WAIT; } lnet_peer_ni_decref_locked(lpni); @@ -1840,7 +1854,7 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid) if (rc == LNET_CREDIT_OK) lnet_ni_send(msg->msg_txni, msg); - /* rc == LNET_CREDIT_OK or LNET_CREDIT_WAIT */ + /* rc == LNET_CREDIT_OK or LNET_CREDIT_WAIT or LNET_DC_WAIT */ return 0; } diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index b78f99c354de..1ef4a44e752e 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -38,6 +38,11 @@ #include #include +/* Value indicating that recovery needs to re-check a peer immediately. */ +#define LNET_REDISCOVER_PEER (1) + +static int lnet_peer_queue_for_discovery(struct lnet_peer *lp); + static void lnet_peer_remove_from_remote_list(struct lnet_peer_ni *lpni) { @@ -202,6 +207,7 @@ lnet_peer_alloc(lnet_nid_t nid) INIT_LIST_HEAD(&lp->lp_peer_list); INIT_LIST_HEAD(&lp->lp_peer_nets); INIT_LIST_HEAD(&lp->lp_dc_list); + INIT_LIST_HEAD(&lp->lp_dc_pendq); init_waitqueue_head(&lp->lp_dc_waitq); spin_lock_init(&lp->lp_lock); lp->lp_primary_nid = nid; @@ -220,6 +226,10 @@ lnet_destroy_peer_locked(struct lnet_peer *lp) LASSERT(atomic_read(&lp->lp_refcount) == 0); LASSERT(list_empty(&lp->lp_peer_nets)); LASSERT(list_empty(&lp->lp_peer_list)); + LASSERT(list_empty(&lp->lp_dc_list)); + + if (lp->lp_data) + lnet_ping_buffer_decref(lp->lp_data); kfree(lp); } @@ -260,10 +270,19 @@ lnet_peer_detach_peer_ni_locked(struct lnet_peer_ni *lpni) /* * If there are no more peer nets, make the peer unfindable * via the peer_tables. + * + * Otherwise, if the peer is DISCOVERED, tell discovery to + * take another look at it. This is a no-op if discovery for + * this peer did the detaching. */ if (list_empty(&lp->lp_peer_nets)) { list_del_init(&lp->lp_peer_list); ptable->pt_peers--; + } else if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING) { + /* Discovery isn't running, nothing to do here. */ + } else if (lp->lp_state & LNET_PEER_DISCOVERED) { + lnet_peer_queue_for_discovery(lp); + wake_up(&the_lnet.ln_dc_waitq); } CDEBUG(D_NET, "peer %s NID %s\n", libcfs_nid2str(lp->lp_primary_nid), @@ -599,6 +618,25 @@ lnet_find_peer_ni_locked(lnet_nid_t nid) return lpni; } +struct lnet_peer * +lnet_find_peer(lnet_nid_t nid) +{ + struct lnet_peer_ni *lpni; + struct lnet_peer *lp = NULL; + int cpt; + + cpt = lnet_net_lock_current(); + lpni = lnet_find_peer_ni_locked(nid); + if (lpni) { + lp = lpni->lpni_peer_net->lpn_peer; + lnet_peer_addref_locked(lp); + lnet_peer_ni_decref_locked(lpni); + } + lnet_net_unlock(cpt); + + return lp; +} + struct lnet_peer_ni * lnet_get_peer_ni_idx_locked(int idx, struct lnet_peer_net **lpn, struct lnet_peer **lp) @@ -696,6 +734,37 @@ lnet_get_next_peer_ni_locked(struct lnet_peer *peer, return lpni; } +/* + * Start pushes to peers that need to be updated for a configuration + * change on this node. + */ +void +lnet_push_update_to_peers(int force) +{ + struct lnet_peer_table *ptable; + struct lnet_peer *lp; + int lncpt; + int cpt; + + lnet_net_lock(LNET_LOCK_EX); + lncpt = cfs_percpt_number(the_lnet.ln_peer_tables); + for (cpt = 0; cpt < lncpt; cpt++) { + ptable = the_lnet.ln_peer_tables[cpt]; + list_for_each_entry(lp, &ptable->pt_peer_list, lp_peer_list) { + if (force) { + spin_lock(&lp->lp_lock); + if (lp->lp_state & LNET_PEER_MULTI_RAIL) + lp->lp_state |= LNET_PEER_FORCE_PUSH; + spin_unlock(&lp->lp_lock); + } + if (lnet_peer_needs_push(lp)) + lnet_peer_queue_for_discovery(lp); + } + } + lnet_net_unlock(LNET_LOCK_EX); + wake_up(&the_lnet.ln_dc_waitq); +} + /* * Test whether a ni is a preferred ni for this peer_ni, e.g, whether * this is a preferred point-to-point path. Call with lnet_net_lock in @@ -941,6 +1010,7 @@ lnet_peer_primary_nid_locked(lnet_nid_t nid) lnet_nid_t LNetPrimaryNID(lnet_nid_t nid) { + struct lnet_peer *lp; struct lnet_peer_ni *lpni; lnet_nid_t primary_nid = nid; int rc = 0; @@ -952,7 +1022,15 @@ LNetPrimaryNID(lnet_nid_t nid) rc = PTR_ERR(lpni); goto out_unlock; } - primary_nid = lpni->lpni_peer_net->lpn_peer->lp_primary_nid; + lp = lpni->lpni_peer_net->lpn_peer; + while (!lnet_peer_is_uptodate(lp)) { + rc = lnet_discover_peer_locked(lpni, cpt, true); + if (rc) + goto out_decref; + lp = lpni->lpni_peer_net->lpn_peer; + } + primary_nid = lp->lp_primary_nid; +out_decref: lnet_peer_ni_decref_locked(lpni); out_unlock: lnet_net_unlock(cpt); @@ -1229,6 +1307,30 @@ lnet_peer_add_nid(struct lnet_peer *lp, lnet_nid_t nid, unsigned int flags) return rc; } +/* + * Update the primary NID of a peer, if possible. + * + * Call with the lnet_api_mutex held. + */ +static int +lnet_peer_set_primary_nid(struct lnet_peer *lp, lnet_nid_t nid, + unsigned int flags) +{ + lnet_nid_t old = lp->lp_primary_nid; + int rc = 0; + + if (lp->lp_primary_nid == nid) + goto out; + rc = lnet_peer_add_nid(lp, nid, flags); + if (rc) + goto out; + lp->lp_primary_nid = nid; +out: + CDEBUG(D_NET, "peer %s NID %s: %d\n", + libcfs_nid2str(old), libcfs_nid2str(nid), rc); + return rc; +} + /* * lpni creation initiated due to traffic either sending or receiving. */ @@ -1548,11 +1650,15 @@ lnet_peer_is_uptodate(struct lnet_peer *lp) LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH)) { rc = false; + } else if (lp->lp_state & LNET_PEER_NO_DISCOVERY) { + rc = true; } else if (lp->lp_state & LNET_PEER_REDISCOVER) { if (lnet_peer_discovery_disabled) rc = true; else rc = false; + } else if (lnet_peer_needs_push(lp)) { + rc = false; } else if (lp->lp_state & LNET_PEER_DISCOVERED) { if (lp->lp_state & LNET_PEER_NIDS_UPTODATE) rc = true; @@ -1588,6 +1694,9 @@ static int lnet_peer_queue_for_discovery(struct lnet_peer *lp) rc = -EALREADY; } + CDEBUG(D_NET, "Queue peer %s: %d\n", + libcfs_nid2str(lp->lp_primary_nid), rc); + return rc; } @@ -1597,9 +1706,252 @@ static int lnet_peer_queue_for_discovery(struct lnet_peer *lp) */ static void lnet_peer_discovery_complete(struct lnet_peer *lp) { + struct lnet_msg *msg = NULL; + int rc = 0; + struct list_head pending_msgs; + + INIT_LIST_HEAD(&pending_msgs); + + CDEBUG(D_NET, "Discovery complete. Dequeue peer %s\n", + libcfs_nid2str(lp->lp_primary_nid)); + list_del_init(&lp->lp_dc_list); + list_splice_init(&lp->lp_dc_pendq, &pending_msgs); wake_up_all(&lp->lp_dc_waitq); + + lnet_net_unlock(LNET_LOCK_EX); + + /* iterate through all pending messages and send them again */ + list_for_each_entry(msg, &pending_msgs, msg_list) { + if (lp->lp_dc_error) { + lnet_finalize(msg, lp->lp_dc_error); + continue; + } + + CDEBUG(D_NET, "sending pending message %s to target %s\n", + lnet_msgtyp2str(msg->msg_type), + libcfs_id2str(msg->msg_target)); + rc = lnet_send(msg->msg_src_nid_param, msg, + msg->msg_rtr_nid_param); + if (rc < 0) { + CNETERR("Error sending %s to %s: %d\n", + lnet_msgtyp2str(msg->msg_type), + libcfs_id2str(msg->msg_target), rc); + lnet_finalize(msg, rc); + } + } + lnet_net_lock(LNET_LOCK_EX); + lnet_peer_decref_locked(lp); +} + +/* + * Handle inbound push. + * Like any event handler, called with lnet_res_lock/CPT held. + */ +void lnet_peer_push_event(struct lnet_event *ev) +{ + struct lnet_ping_buffer *pbuf = ev->md.user_ptr; + struct lnet_peer *lp; + + /* lnet_find_peer() adds a refcount */ + lp = lnet_find_peer(ev->source.nid); + if (!lp) { + CERROR("Push Put from unknown %s (source %s)\n", + libcfs_nid2str(ev->initiator.nid), + libcfs_nid2str(ev->source.nid)); + return; + } + + /* Ensure peer state remains consistent while we modify it. */ + spin_lock(&lp->lp_lock); + + /* + * If some kind of error happened the contents of the message + * cannot be used. Clear the NIDS_UPTODATE and set the + * FORCE_PING flag to trigger a ping. + */ + if (ev->status) { + lp->lp_state &= ~LNET_PEER_NIDS_UPTODATE; + lp->lp_state |= LNET_PEER_FORCE_PING; + CDEBUG(D_NET, "Push Put error %d from %s (source %s)\n", + ev->status, + libcfs_nid2str(lp->lp_primary_nid), + libcfs_nid2str(ev->source.nid)); + goto out; + } + + /* + * A push with invalid or corrupted info. Clear the UPTODATE + * flag to trigger a ping. + */ + if (lnet_ping_info_validate(&pbuf->pb_info)) { + lp->lp_state &= ~LNET_PEER_NIDS_UPTODATE; + lp->lp_state |= LNET_PEER_FORCE_PING; + CDEBUG(D_NET, "Corrupted Push from %s\n", + libcfs_nid2str(lp->lp_primary_nid)); + goto out; + } + + /* + * Make sure we'll allocate the correct size ping buffer when + * pinging the peer. + */ + if (lp->lp_data_nnis < pbuf->pb_info.pi_nnis) + lp->lp_data_nnis = pbuf->pb_info.pi_nnis; + + /* + * A non-Multi-Rail peer is not supposed to be capable of + * sending a push. + */ + if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL)) { + CERROR("Push from non-Multi-Rail peer %s dropped\n", + libcfs_nid2str(lp->lp_primary_nid)); + goto out; + } + + /* + * Check the MULTIRAIL flag. Complain if the peer was DLC + * configured without it. + */ + if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) { + if (lp->lp_state & LNET_PEER_CONFIGURED) { + CERROR("Push says %s is Multi-Rail, DLC says not\n", + libcfs_nid2str(lp->lp_primary_nid)); + } else { + lp->lp_state |= LNET_PEER_MULTI_RAIL; + lnet_peer_clr_non_mr_pref_nids(lp); + } + } + + /* + * The peer may have discovery disabled at its end. Set + * NO_DISCOVERY as appropriate. + */ + if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY)) { + CDEBUG(D_NET, "Peer %s has discovery disabled\n", + libcfs_nid2str(lp->lp_primary_nid)); + lp->lp_state |= LNET_PEER_NO_DISCOVERY; + } else if (lp->lp_state & LNET_PEER_NO_DISCOVERY) { + CDEBUG(D_NET, "Peer %s has discovery enabled\n", + libcfs_nid2str(lp->lp_primary_nid)); + lp->lp_state &= ~LNET_PEER_NO_DISCOVERY; + } + + /* + * Check for truncation of the Put message. Clear the + * NIDS_UPTODATE flag and set FORCE_PING to trigger a ping, + * and tell discovery to allocate a bigger buffer. + */ + if (pbuf->pb_nnis < pbuf->pb_info.pi_nnis) { + if (the_lnet.ln_push_target_nnis < pbuf->pb_info.pi_nnis) + the_lnet.ln_push_target_nnis = pbuf->pb_info.pi_nnis; + lp->lp_state &= ~LNET_PEER_NIDS_UPTODATE; + lp->lp_state |= LNET_PEER_FORCE_PING; + CDEBUG(D_NET, "Truncated Push from %s (%d nids)\n", + libcfs_nid2str(lp->lp_primary_nid), + pbuf->pb_info.pi_nnis); + goto out; + } + + /* + * Check whether the Put data is stale. Stale data can just be + * dropped. + */ + if (pbuf->pb_info.pi_nnis > 1 && + lp->lp_primary_nid == pbuf->pb_info.pi_ni[1].ns_nid && + LNET_PING_BUFFER_SEQNO(pbuf) < lp->lp_peer_seqno) { + CDEBUG(D_NET, "Stale Push from %s: got %u have %u\n", + libcfs_nid2str(lp->lp_primary_nid), + LNET_PING_BUFFER_SEQNO(pbuf), + lp->lp_peer_seqno); + goto out; + } + + /* + * Check whether the Put data is new, in which case we clear + * the UPTODATE flag and prepare to process it. + * + * If the Put data is current, and the peer is UPTODATE then + * we assome everything is all right and drop the data as + * stale. + */ + if (LNET_PING_BUFFER_SEQNO(pbuf) > lp->lp_peer_seqno) { + lp->lp_peer_seqno = LNET_PING_BUFFER_SEQNO(pbuf); + lp->lp_state &= ~LNET_PEER_NIDS_UPTODATE; + } else if (lp->lp_state & LNET_PEER_NIDS_UPTODATE) { + CDEBUG(D_NET, "Stale Push from %s: got %u have %u\n", + libcfs_nid2str(lp->lp_primary_nid), + LNET_PING_BUFFER_SEQNO(pbuf), + lp->lp_peer_seqno); + goto out; + } + + /* + * If there is data present that hasn't been processed yet, + * we'll replace it if the Put contained newer data and it + * fits. We're racing with a Ping or earlier Push in this + * case. + */ + if (lp->lp_state & LNET_PEER_DATA_PRESENT) { + if (LNET_PING_BUFFER_SEQNO(pbuf) > + LNET_PING_BUFFER_SEQNO(lp->lp_data) && + pbuf->pb_info.pi_nnis <= lp->lp_data->pb_nnis) { + memcpy(&lp->lp_data->pb_info, &pbuf->pb_info, + LNET_PING_INFO_SIZE(pbuf->pb_info.pi_nnis)); + CDEBUG(D_NET, "Ping/Push race from %s: %u vs %u\n", + libcfs_nid2str(lp->lp_primary_nid), + LNET_PING_BUFFER_SEQNO(pbuf), + LNET_PING_BUFFER_SEQNO(lp->lp_data)); + } + goto out; + } + + /* + * Allocate a buffer to copy the data. On a failure we drop + * the Push and set FORCE_PING to force the discovery + * thread to fix the problem by pinging the peer. + */ + lp->lp_data = lnet_ping_buffer_alloc(lp->lp_data_nnis, GFP_ATOMIC); + if (!lp->lp_data) { + lp->lp_state |= LNET_PEER_FORCE_PING; + CDEBUG(D_NET, "Cannot allocate Push buffer for %s %u\n", + libcfs_nid2str(lp->lp_primary_nid), + LNET_PING_BUFFER_SEQNO(pbuf)); + goto out; + } + + /* Success */ + memcpy(&lp->lp_data->pb_info, &pbuf->pb_info, + LNET_PING_INFO_SIZE(pbuf->pb_info.pi_nnis)); + lp->lp_state |= LNET_PEER_DATA_PRESENT; + CDEBUG(D_NET, "Received Push %s %u\n", + libcfs_nid2str(lp->lp_primary_nid), + LNET_PING_BUFFER_SEQNO(pbuf)); + +out: + /* + * Queue the peer for discovery, and wake the discovery thread + * if the peer was already queued, because its status changed. + */ + spin_unlock(&lp->lp_lock); + lnet_net_lock(LNET_LOCK_EX); + if (lnet_peer_queue_for_discovery(lp)) + wake_up(&the_lnet.ln_dc_waitq); + /* Drop refcount from lookup */ lnet_peer_decref_locked(lp); + lnet_net_unlock(LNET_LOCK_EX); +} + +/* + * Clear the discovery error state, unless we're already discovering + * this peer, in which case the error is current. + */ +static void lnet_peer_clear_discovery_error(struct lnet_peer *lp) +{ + spin_lock(&lp->lp_lock); + if (!(lp->lp_state & LNET_PEER_DISCOVERING)) + lp->lp_dc_error = 0; + spin_unlock(&lp->lp_lock); } /* @@ -1608,7 +1960,7 @@ static void lnet_peer_discovery_complete(struct lnet_peer *lp) * because discovery could tear down an lnet_peer. */ int -lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt) +lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt, bool block) { DEFINE_WAIT(wait); struct lnet_peer *lp; @@ -1617,25 +1969,40 @@ lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt) again: lnet_net_unlock(cpt); lnet_net_lock(LNET_LOCK_EX); + lp = lpni->lpni_peer_net->lpn_peer; + lnet_peer_clear_discovery_error(lp); - /* We're willing to be interrupted. */ + /* + * We're willing to be interrupted. The lpni can become a + * zombie if we race with DLC, so we must check for that. + */ for (;;) { - lp = lpni->lpni_peer_net->lpn_peer; prepare_to_wait(&lp->lp_dc_waitq, &wait, TASK_INTERRUPTIBLE); if (signal_pending(current)) break; if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING) break; + if (lp->lp_dc_error) + break; if (lnet_peer_is_uptodate(lp)) break; lnet_peer_queue_for_discovery(lp); lnet_peer_addref_locked(lp); + /* + * if caller requested a non-blocking operation then + * return immediately. Once discovery is complete then the + * peer ref will be decremented and any pending messages + * that were stopped due to discovery will be transmitted. + */ + if (!block) + break; lnet_net_unlock(LNET_LOCK_EX); schedule(); finish_wait(&lp->lp_dc_waitq, &wait); lnet_net_lock(LNET_LOCK_EX); lnet_peer_decref_locked(lp); - /* Do not use lp beyond this point. */ + /* Peer may have changed */ + lp = lpni->lpni_peer_net->lpn_peer; } finish_wait(&lp->lp_dc_waitq, &wait); @@ -1646,71 +2013,969 @@ lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt) rc = -EINTR; else if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING) rc = -ESHUTDOWN; + else if (lp->lp_dc_error) + rc = lp->lp_dc_error; + else if (!block) + CDEBUG(D_NET, "non-blocking discovery\n"); else if (!lnet_peer_is_uptodate(lp)) goto again; + CDEBUG(D_NET, "peer %s NID %s: %d. %s\n", + (lp ? libcfs_nid2str(lp->lp_primary_nid) : "(none)"), + libcfs_nid2str(lpni->lpni_nid), rc, + (!block) ? "pending discovery" : "discovery complete"); + return rc; } -/* - * Event handler for the discovery EQ. - * - * Called with lnet_res_lock(cpt) held. The cpt is the - * lnet_cpt_of_cookie() of the md handle cookie. - */ -static void lnet_discovery_event_handler(struct lnet_event *event) +/* Handle an incoming ack for a push. */ +static void +lnet_discovery_event_ack(struct lnet_peer *lp, struct lnet_event *ev) { - wake_up(&the_lnet.ln_dc_waitq); + struct lnet_ping_buffer *pbuf; + + pbuf = LNET_PING_INFO_TO_BUFFER(ev->md.start); + spin_lock(&lp->lp_lock); + lp->lp_state &= ~LNET_PEER_PUSH_SENT; + lp->lp_push_error = ev->status; + if (ev->status) + lp->lp_state |= LNET_PEER_PUSH_FAILED; + else + lp->lp_node_seqno = LNET_PING_BUFFER_SEQNO(pbuf); + spin_unlock(&lp->lp_lock); + + CDEBUG(D_NET, "peer %s ev->status %d\n", + libcfs_nid2str(lp->lp_primary_nid), ev->status); } -/* - * Wait for work to be queued or some other change that must be - * attended to. Returns non-zero if the discovery thread should shut - * down. - */ -static int lnet_peer_discovery_wait_for_work(void) +/* Handle a Reply message. This is the reply to a Ping message. */ +static void +lnet_discovery_event_reply(struct lnet_peer *lp, struct lnet_event *ev) { - int cpt; - int rc = 0; + struct lnet_ping_buffer *pbuf; + int rc; - DEFINE_WAIT(wait); + spin_lock(&lp->lp_lock); - cpt = lnet_net_lock_current(); - for (;;) { - prepare_to_wait(&the_lnet.ln_dc_waitq, &wait, - TASK_IDLE); - if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) - break; - if (lnet_push_target_resize_needed()) - break; - if (!list_empty(&the_lnet.ln_dc_request)) - break; - lnet_net_unlock(cpt); - schedule(); - finish_wait(&the_lnet.ln_dc_waitq, &wait); - cpt = lnet_net_lock_current(); + /* + * If some kind of error happened the contents of message + * cannot be used. Set PING_FAILED to trigger a retry. + */ + if (ev->status) { + lp->lp_state |= LNET_PEER_PING_FAILED; + lp->lp_ping_error = ev->status; + CDEBUG(D_NET, "Ping Reply error %d from %s (source %s)\n", + ev->status, + libcfs_nid2str(lp->lp_primary_nid), + libcfs_nid2str(ev->source.nid)); + goto out; } - finish_wait(&the_lnet.ln_dc_waitq, &wait); - - if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) - rc = -ESHUTDOWN; - lnet_net_unlock(cpt); + pbuf = LNET_PING_INFO_TO_BUFFER(ev->md.start); + if (pbuf->pb_info.pi_magic == __swab32(LNET_PROTO_PING_MAGIC)) + lnet_swap_pinginfo(pbuf); - CDEBUG(D_NET, "woken: %d\n", rc); + /* + * A reply with invalid or corrupted info. Set PING_FAILED to + * trigger a retry. + */ + rc = lnet_ping_info_validate(&pbuf->pb_info); + if (rc) { + lp->lp_state |= LNET_PEER_PING_FAILED; + lp->lp_ping_error = 0; + CDEBUG(D_NET, "Corrupted Ping Reply from %s: %d\n", + libcfs_nid2str(lp->lp_primary_nid), rc); + goto out; + } - return rc; -} + /* + * Update the MULTI_RAIL flag based on the reply. If the peer + * was configured with DLC then the setting should match what + * DLC put in. + */ + if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL) { + if (lp->lp_state & LNET_PEER_MULTI_RAIL) { + /* Everything's fine */ + } else if (lp->lp_state & LNET_PEER_CONFIGURED) { + CWARN("Reply says %s is Multi-Rail, DLC says not\n", + libcfs_nid2str(lp->lp_primary_nid)); + } else { + lp->lp_state |= LNET_PEER_MULTI_RAIL; + lnet_peer_clr_non_mr_pref_nids(lp); + } + } else if (lp->lp_state & LNET_PEER_MULTI_RAIL) { + if (lp->lp_state & LNET_PEER_CONFIGURED) { + CWARN("DLC says %s is Multi-Rail, Reply says not\n", + libcfs_nid2str(lp->lp_primary_nid)); + } else { + CERROR("Multi-Rail state vanished from %s\n", + libcfs_nid2str(lp->lp_primary_nid)); + lp->lp_state &= ~LNET_PEER_MULTI_RAIL; + } + } -/* The discovery thread. */ -static int lnet_peer_discovery(void *arg) -{ - struct lnet_peer *lp; + /* + * Make sure we'll allocate the correct size ping buffer when + * pinging the peer. + */ + if (lp->lp_data_nnis < pbuf->pb_info.pi_nnis) + lp->lp_data_nnis = pbuf->pb_info.pi_nnis; - CDEBUG(D_NET, "started\n"); + /* + * The peer may have discovery disabled at its end. Set + * NO_DISCOVERY as appropriate. + */ + if (!(pbuf->pb_info.pi_features & LNET_PING_FEAT_DISCOVERY)) { + CDEBUG(D_NET, "Peer %s has discovery disabled\n", + libcfs_nid2str(lp->lp_primary_nid)); + lp->lp_state |= LNET_PEER_NO_DISCOVERY; + } else if (lp->lp_state & LNET_PEER_NO_DISCOVERY) { + CDEBUG(D_NET, "Peer %s has discovery enabled\n", + libcfs_nid2str(lp->lp_primary_nid)); + lp->lp_state &= ~LNET_PEER_NO_DISCOVERY; + } - for (;;) { - if (lnet_peer_discovery_wait_for_work()) + /* + * Check for truncation of the Reply. Clear PING_SENT and set + * PING_FAILED to trigger a retry. + */ + if (pbuf->pb_nnis < pbuf->pb_info.pi_nnis) { + if (the_lnet.ln_push_target_nnis < pbuf->pb_info.pi_nnis) + the_lnet.ln_push_target_nnis = pbuf->pb_info.pi_nnis; + lp->lp_state |= LNET_PEER_PING_FAILED; + lp->lp_ping_error = 0; + CDEBUG(D_NET, "Truncated Reply from %s (%d nids)\n", + libcfs_nid2str(lp->lp_primary_nid), + pbuf->pb_info.pi_nnis); + goto out; + } + + /* + * Check the sequence numbers in the reply. These are only + * available if the reply came from a Multi-Rail peer. + */ + if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL && + pbuf->pb_info.pi_nnis > 1 && + lp->lp_primary_nid == pbuf->pb_info.pi_ni[1].ns_nid) { + if (LNET_PING_BUFFER_SEQNO(pbuf) < lp->lp_peer_seqno) { + CDEBUG(D_NET, "Stale Reply from %s: got %u have %u\n", + libcfs_nid2str(lp->lp_primary_nid), + LNET_PING_BUFFER_SEQNO(pbuf), + lp->lp_peer_seqno); + goto out; + } + + if (LNET_PING_BUFFER_SEQNO(pbuf) > lp->lp_peer_seqno) + lp->lp_peer_seqno = LNET_PING_BUFFER_SEQNO(pbuf); + } + + /* We're happy with the state of the data in the buffer. */ + CDEBUG(D_NET, "peer %s data present %u\n", + libcfs_nid2str(lp->lp_primary_nid), lp->lp_peer_seqno); + if (lp->lp_state & LNET_PEER_DATA_PRESENT) + lnet_ping_buffer_decref(lp->lp_data); + else + lp->lp_state |= LNET_PEER_DATA_PRESENT; + lnet_ping_buffer_addref(pbuf); + lp->lp_data = pbuf; +out: + lp->lp_state &= ~LNET_PEER_PING_SENT; + spin_unlock(&lp->lp_lock); +} + +/* + * Send event handling. Only matters for error cases, where we clean + * up state on the peer and peer_ni that would otherwise be updated in + * the REPLY event handler for a successful Ping, and the ACK event + * handler for a successful Push. + */ +static int +lnet_discovery_event_send(struct lnet_peer *lp, struct lnet_event *ev) +{ + int rc = 0; + + if (!ev->status) + goto out; + + spin_lock(&lp->lp_lock); + if (ev->msg_type == LNET_MSG_GET) { + lp->lp_state &= ~LNET_PEER_PING_SENT; + lp->lp_state |= LNET_PEER_PING_FAILED; + lp->lp_ping_error = ev->status; + } else { /* ev->msg_type == LNET_MSG_PUT */ + lp->lp_state &= ~LNET_PEER_PUSH_SENT; + lp->lp_state |= LNET_PEER_PUSH_FAILED; + lp->lp_push_error = ev->status; + } + spin_unlock(&lp->lp_lock); + rc = LNET_REDISCOVER_PEER; +out: + CDEBUG(D_NET, "%s Send to %s: %d\n", + (ev->msg_type == LNET_MSG_GET ? "Ping" : "Push"), + libcfs_nid2str(ev->target.nid), rc); + return rc; +} + +/* + * Unlink event handling. This event is only seen if a call to + * LNetMDUnlink() caused the event to be unlinked. If this call was + * made after the event was set up in LNetGet() or LNetPut() then we + * assume the Ping or Push timed out. + */ +static void +lnet_discovery_event_unlink(struct lnet_peer *lp, struct lnet_event *ev) +{ + spin_lock(&lp->lp_lock); + /* We've passed through LNetGet() */ + if (lp->lp_state & LNET_PEER_PING_SENT) { + lp->lp_state &= ~LNET_PEER_PING_SENT; + lp->lp_state |= LNET_PEER_PING_FAILED; + lp->lp_ping_error = -ETIMEDOUT; + CDEBUG(D_NET, "Ping Unlink for message to peer %s\n", + libcfs_nid2str(lp->lp_primary_nid)); + } + /* We've passed through LNetPut() */ + if (lp->lp_state & LNET_PEER_PUSH_SENT) { + lp->lp_state &= ~LNET_PEER_PUSH_SENT; + lp->lp_state |= LNET_PEER_PUSH_FAILED; + lp->lp_push_error = -ETIMEDOUT; + CDEBUG(D_NET, "Push Unlink for message to peer %s\n", + libcfs_nid2str(lp->lp_primary_nid)); + } + spin_unlock(&lp->lp_lock); +} + +/* + * Event handler for the discovery EQ. + * + * Called with lnet_res_lock(cpt) held. The cpt is the + * lnet_cpt_of_cookie() of the md handle cookie. + */ +static void lnet_discovery_event_handler(struct lnet_event *event) +{ + struct lnet_peer *lp = event->md.user_ptr; + struct lnet_ping_buffer *pbuf; + int rc; + + /* discovery needs to take another look */ + rc = LNET_REDISCOVER_PEER; + + CDEBUG(D_NET, "Received event: %d\n", event->type); + + switch (event->type) { + case LNET_EVENT_ACK: + lnet_discovery_event_ack(lp, event); + break; + case LNET_EVENT_REPLY: + lnet_discovery_event_reply(lp, event); + break; + case LNET_EVENT_SEND: + /* Only send failure triggers a retry. */ + rc = lnet_discovery_event_send(lp, event); + break; + case LNET_EVENT_UNLINK: + /* LNetMDUnlink() was called */ + lnet_discovery_event_unlink(lp, event); + break; + default: + /* Invalid events. */ + LBUG(); + } + lnet_net_lock(LNET_LOCK_EX); + if (event->unlinked) { + pbuf = LNET_PING_INFO_TO_BUFFER(event->md.start); + lnet_ping_buffer_decref(pbuf); + lnet_peer_decref_locked(lp); + } + if (rc == LNET_REDISCOVER_PEER) { + list_move_tail(&lp->lp_dc_list, &the_lnet.ln_dc_request); + wake_up(&the_lnet.ln_dc_waitq); + } + lnet_net_unlock(LNET_LOCK_EX); +} + +/* + * Build a peer from incoming data. + * + * The NIDs in the incoming data are supposed to be structured as follows: + * - loopback + * - primary NID + * - other NIDs in same net + * - NIDs in second net + * - NIDs in third net + * - ... + * This due to the way the list of NIDs in the data is created. + * + * Note that this function will mark the peer uptodate unless an + * ENOMEM is encontered. All other errors are due to a conflict + * between the DLC configuration and what discovery sees. We treat DLC + * as binding, and therefore set the NIDS_UPTODATE flag to prevent the + * peer from becoming stuck in discovery. + */ +static int lnet_peer_merge_data(struct lnet_peer *lp, + struct lnet_ping_buffer *pbuf) +{ + struct lnet_peer_ni *lpni; + lnet_nid_t *curnis = NULL; + lnet_nid_t *addnis = NULL; + lnet_nid_t *delnis = NULL; + unsigned int flags; + int ncurnis; + int naddnis; + int ndelnis; + int nnis = 0; + int i; + int j; + int rc; + + flags = LNET_PEER_DISCOVERED; + if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL) + flags |= LNET_PEER_MULTI_RAIL; + + nnis = max_t(int, lp->lp_nnis, pbuf->pb_info.pi_nnis); + curnis = kmalloc_array(nnis, sizeof(lnet_nid_t), GFP_NOFS); + addnis = kmalloc_array(nnis, sizeof(lnet_nid_t), GFP_NOFS); + delnis = kmalloc_array(nnis, sizeof(lnet_nid_t), GFP_NOFS); + if (!curnis || !addnis || !delnis) { + rc = -ENOMEM; + goto out; + } + ncurnis = 0; + naddnis = 0; + ndelnis = 0; + + /* Construct the list of NIDs present in peer. */ + lpni = NULL; + while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) + curnis[ncurnis++] = lpni->lpni_nid; + + /* + * Check for NIDs in pbuf not present in curnis[]. + * The loop starts at 1 to skip the loopback NID. + */ + for (i = 1; i < pbuf->pb_info.pi_nnis; i++) { + for (j = 0; j < ncurnis; j++) + if (pbuf->pb_info.pi_ni[i].ns_nid == curnis[j]) + break; + if (j == ncurnis) + addnis[naddnis++] = pbuf->pb_info.pi_ni[i].ns_nid; + } + /* + * Check for NIDs in curnis[] not present in pbuf. + * The nested loop starts at 1 to skip the loopback NID. + * + * But never add the loopback NID to delnis[]: if it is + * present in curnis[] then this peer is for this node. + */ + for (i = 0; i < ncurnis; i++) { + if (LNET_NETTYP(LNET_NIDNET(curnis[i])) == LOLND) + continue; + for (j = 1; j < pbuf->pb_info.pi_nnis; j++) + if (curnis[i] == pbuf->pb_info.pi_ni[j].ns_nid) + break; + if (j == pbuf->pb_info.pi_nnis) + delnis[ndelnis++] = curnis[i]; + } + + for (i = 0; i < naddnis; i++) { + rc = lnet_peer_add_nid(lp, addnis[i], flags); + if (rc) { + CERROR("Error adding NID %s to peer %s: %d\n", + libcfs_nid2str(addnis[i]), + libcfs_nid2str(lp->lp_primary_nid), rc); + if (rc == -ENOMEM) + goto out; + } + } + for (i = 0; i < ndelnis; i++) { + rc = lnet_peer_del_nid(lp, delnis[i], flags); + if (rc) { + CERROR("Error deleting NID %s from peer %s: %d\n", + libcfs_nid2str(delnis[i]), + libcfs_nid2str(lp->lp_primary_nid), rc); + if (rc == -ENOMEM) + goto out; + } + } + /* + * Errors other than -ENOMEM are due to peers having been + * configured with DLC. Ignore these because DLC overrides + * Discovery. + */ + rc = 0; +out: + kfree(curnis); + kfree(addnis); + kfree(delnis); + lnet_ping_buffer_decref(pbuf); + CDEBUG(D_NET, "peer %s: %d\n", libcfs_nid2str(lp->lp_primary_nid), rc); + + if (rc) { + spin_lock(&lp->lp_lock); + lp->lp_state &= ~LNET_PEER_NIDS_UPTODATE; + lp->lp_state |= LNET_PEER_FORCE_PING; + spin_unlock(&lp->lp_lock); + } + return rc; +} + +/* + * The data in pbuf says lp is its primary peer, but the data was + * received by a different peer. Try to update lp with the data. + */ +static int +lnet_peer_set_primary_data(struct lnet_peer *lp, struct lnet_ping_buffer *pbuf) +{ + struct lnet_handle_md mdh; + + /* Queue lp for discovery, and force it on the request queue. */ + lnet_net_lock(LNET_LOCK_EX); + if (lnet_peer_queue_for_discovery(lp)) + list_move(&lp->lp_dc_list, &the_lnet.ln_dc_request); + lnet_net_unlock(LNET_LOCK_EX); + + LNetInvalidateMDHandle(&mdh); + + /* + * Decide whether we can move the peer to the DATA_PRESENT state. + * + * We replace stale data for a multi-rail peer, repair PING_FAILED + * status, and preempt FORCE_PING. + * + * If after that we have DATA_PRESENT, we merge it into this peer. + */ + spin_lock(&lp->lp_lock); + if (lp->lp_state & LNET_PEER_MULTI_RAIL) { + if (lp->lp_peer_seqno < LNET_PING_BUFFER_SEQNO(pbuf)) { + lp->lp_peer_seqno = LNET_PING_BUFFER_SEQNO(pbuf); + } else if (lp->lp_state & LNET_PEER_DATA_PRESENT) { + lp->lp_state &= ~LNET_PEER_DATA_PRESENT; + lnet_ping_buffer_decref(pbuf); + pbuf = lp->lp_data; + lp->lp_data = NULL; + } + } + if (lp->lp_state & LNET_PEER_DATA_PRESENT) { + lnet_ping_buffer_decref(lp->lp_data); + lp->lp_data = NULL; + lp->lp_state &= ~LNET_PEER_DATA_PRESENT; + } + if (lp->lp_state & LNET_PEER_PING_FAILED) { + mdh = lp->lp_ping_mdh; + LNetInvalidateMDHandle(&lp->lp_ping_mdh); + lp->lp_state &= ~LNET_PEER_PING_FAILED; + lp->lp_ping_error = 0; + } + if (lp->lp_state & LNET_PEER_FORCE_PING) + lp->lp_state &= ~LNET_PEER_FORCE_PING; + lp->lp_state |= LNET_PEER_NIDS_UPTODATE; + spin_unlock(&lp->lp_lock); + + if (!LNetMDHandleIsInvalid(mdh)) + LNetMDUnlink(mdh); + + if (pbuf) + return lnet_peer_merge_data(lp, pbuf); + + CDEBUG(D_NET, "peer %s\n", libcfs_nid2str(lp->lp_primary_nid)); + return 0; +} + +/* + * Update a peer using the data received. + */ +static int lnet_peer_data_present(struct lnet_peer *lp) +__must_hold(&lp->lp_lock) +{ + struct lnet_ping_buffer *pbuf; + struct lnet_peer_ni *lpni; + lnet_nid_t nid = LNET_NID_ANY; + unsigned int flags; + int rc = 0; + + pbuf = lp->lp_data; + lp->lp_data = NULL; + lp->lp_state &= ~LNET_PEER_DATA_PRESENT; + lp->lp_state |= LNET_PEER_NIDS_UPTODATE; + spin_unlock(&lp->lp_lock); + + /* + * Modifications of peer structures are done while holding the + * ln_api_mutex. A global lock is required because we may be + * modifying multiple peer structures, and a mutex greatly + * simplifies memory management. + * + * The actual changes to the data structures must also protect + * against concurrent lookups, for which the lnet_net_lock in + * LNET_LOCK_EX mode is used. + */ + mutex_lock(&the_lnet.ln_api_mutex); + if (the_lnet.ln_state == LNET_STATE_SHUTDOWN) { + rc = -ESHUTDOWN; + goto out; + } + + /* + * If this peer is not on the peer list then it is being torn + * down, and our reference count may be all that is keeping it + * alive. Don't do any work on it. + */ + if (list_empty(&lp->lp_peer_list)) + goto out; + + flags = LNET_PEER_DISCOVERED; + if (pbuf->pb_info.pi_features & LNET_PING_FEAT_MULTI_RAIL) + flags |= LNET_PEER_MULTI_RAIL; + + /* + * Check whether the primary NID in the message matches the + * primary NID of the peer. If it does, update the peer, if + * it it does not, check whether there is already a peer with + * that primary NID. If no such peer exists, try to update + * the primary NID of the current peer (allowed if it was + * created due to message traffic) and complete the update. + * If the peer did exist, hand off the data to it. + * + * The peer for the loopback interface is a special case: this + * is the peer for the local node, and we want to set its + * primary NID to the correct value here. + */ + if (pbuf->pb_info.pi_nnis > 1) + nid = pbuf->pb_info.pi_ni[1].ns_nid; + if (LNET_NETTYP(LNET_NIDNET(lp->lp_primary_nid)) == LOLND) { + rc = lnet_peer_set_primary_nid(lp, nid, flags); + if (!rc) + rc = lnet_peer_merge_data(lp, pbuf); + } else if (lp->lp_primary_nid == nid) { + rc = lnet_peer_merge_data(lp, pbuf); + } else { + lpni = lnet_find_peer_ni_locked(nid); + if (!lpni) { + rc = lnet_peer_set_primary_nid(lp, nid, flags); + if (rc) { + CERROR("Primary NID error %s versus %s: %d\n", + libcfs_nid2str(lp->lp_primary_nid), + libcfs_nid2str(nid), rc); + } else { + rc = lnet_peer_merge_data(lp, pbuf); + } + } else { + rc = lnet_peer_set_primary_data( + lpni->lpni_peer_net->lpn_peer, pbuf); + lnet_peer_ni_decref_locked(lpni); + } + } +out: + CDEBUG(D_NET, "peer %s: %d\n", libcfs_nid2str(lp->lp_primary_nid), rc); + mutex_unlock(&the_lnet.ln_api_mutex); + + spin_lock(&lp->lp_lock); + /* Tell discovery to re-check the peer immediately. */ + if (!rc) + rc = LNET_REDISCOVER_PEER; + return rc; +} + +/* + * A ping failed. Clear the PING_FAILED state and set the + * FORCE_PING state, to ensure a retry even if discovery is + * disabled. This avoids being left with incorrect state. + */ +static int lnet_peer_ping_failed(struct lnet_peer *lp) +__must_hold(&lp->lp_lock) +{ + struct lnet_handle_md mdh; + int rc; + + mdh = lp->lp_ping_mdh; + LNetInvalidateMDHandle(&lp->lp_ping_mdh); + lp->lp_state &= ~LNET_PEER_PING_FAILED; + lp->lp_state |= LNET_PEER_FORCE_PING; + rc = lp->lp_ping_error; + lp->lp_ping_error = 0; + spin_unlock(&lp->lp_lock); + + if (!LNetMDHandleIsInvalid(mdh)) + LNetMDUnlink(mdh); + + CDEBUG(D_NET, "peer %s:%d\n", + libcfs_nid2str(lp->lp_primary_nid), rc); + + spin_lock(&lp->lp_lock); + return rc ? rc : LNET_REDISCOVER_PEER; +} + +/* + * Select NID to send a Ping or Push to. + */ +static lnet_nid_t lnet_peer_select_nid(struct lnet_peer *lp) +{ + struct lnet_peer_ni *lpni; + + /* Look for a direct-connected NID for this peer. */ + lpni = NULL; + while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) { + if (!lnet_is_peer_ni_healthy_locked(lpni)) + continue; + if (!lnet_get_net_locked(lpni->lpni_peer_net->lpn_net_id)) + continue; + break; + } + if (lpni) + return lpni->lpni_nid; + + /* Look for a routed-connected NID for this peer. */ + lpni = NULL; + while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) { + if (!lnet_is_peer_ni_healthy_locked(lpni)) + continue; + if (!lnet_find_rnet_locked(lpni->lpni_peer_net->lpn_net_id)) + continue; + break; + } + if (lpni) + return lpni->lpni_nid; + + return LNET_NID_ANY; +} + +/* Active side of ping. */ +static int lnet_peer_send_ping(struct lnet_peer *lp) +__must_hold(&lp->lp_lock) +{ + struct lnet_md md = { NULL }; + struct lnet_process_id id; + struct lnet_ping_buffer *pbuf; + int nnis; + int rc; + int cpt; + + lp->lp_state |= LNET_PEER_PING_SENT; + lp->lp_state &= ~LNET_PEER_FORCE_PING; + spin_unlock(&lp->lp_lock); + + nnis = max_t(int, lp->lp_data_nnis, LNET_INTERFACES_MIN); + pbuf = lnet_ping_buffer_alloc(nnis, GFP_NOFS); + if (!pbuf) { + rc = -ENOMEM; + goto fail_error; + } + + /* initialize md content */ + md.start = &pbuf->pb_info; + md.length = LNET_PING_INFO_SIZE(nnis); + md.threshold = 2; /* GET/REPLY */ + md.max_size = 0; + md.options = LNET_MD_TRUNCATE; + md.user_ptr = lp; + md.eq_handle = the_lnet.ln_dc_eqh; + + rc = LNetMDBind(md, LNET_UNLINK, &lp->lp_ping_mdh); + if (rc != 0) { + lnet_ping_buffer_decref(pbuf); + CERROR("Can't bind MD: %d\n", rc); + goto fail_error; + } + cpt = lnet_net_lock_current(); + /* Refcount for MD. */ + lnet_peer_addref_locked(lp); + id.pid = LNET_PID_LUSTRE; + id.nid = lnet_peer_select_nid(lp); + lnet_net_unlock(cpt); + + if (id.nid == LNET_NID_ANY) { + rc = -EHOSTUNREACH; + goto fail_unlink_md; + } + + rc = LNetGet(LNET_NID_ANY, lp->lp_ping_mdh, id, + LNET_RESERVED_PORTAL, + LNET_PROTO_PING_MATCHBITS, 0); + + if (rc) + goto fail_unlink_md; + + CDEBUG(D_NET, "peer %s\n", libcfs_nid2str(lp->lp_primary_nid)); + + spin_lock(&lp->lp_lock); + return 0; + +fail_unlink_md: + LNetMDUnlink(lp->lp_ping_mdh); + LNetInvalidateMDHandle(&lp->lp_ping_mdh); +fail_error: + CDEBUG(D_NET, "peer %s: %d\n", libcfs_nid2str(lp->lp_primary_nid), rc); + /* + * The errors that get us here are considered hard errors and + * cause Discovery to terminate. So we clear PING_SENT, but do + * not set either PING_FAILED or FORCE_PING. In fact we need + * to clear PING_FAILED, because the unlink event handler will + * have set it if we called LNetMDUnlink() above. + */ + spin_lock(&lp->lp_lock); + lp->lp_state &= ~(LNET_PEER_PING_SENT | LNET_PEER_PING_FAILED); + return rc; +} + +/* + * This function exists because you cannot call LNetMDUnlink() from an + * event handler. + */ +static int lnet_peer_push_failed(struct lnet_peer *lp) +__must_hold(&lp->lp_lock) +{ + struct lnet_handle_md mdh; + int rc; + + mdh = lp->lp_push_mdh; + LNetInvalidateMDHandle(&lp->lp_push_mdh); + lp->lp_state &= ~LNET_PEER_PUSH_FAILED; + rc = lp->lp_push_error; + lp->lp_push_error = 0; + spin_unlock(&lp->lp_lock); + + if (!LNetMDHandleIsInvalid(mdh)) + LNetMDUnlink(mdh); + + CDEBUG(D_NET, "peer %s\n", libcfs_nid2str(lp->lp_primary_nid)); + spin_lock(&lp->lp_lock); + return rc ? rc : LNET_REDISCOVER_PEER; +} + +/* Active side of push. */ +static int lnet_peer_send_push(struct lnet_peer *lp) +__must_hold(&lp->lp_lock) +{ + struct lnet_ping_buffer *pbuf; + struct lnet_process_id id; + struct lnet_md md; + int cpt; + int rc; + + /* Don't push to a non-multi-rail peer. */ + if (!(lp->lp_state & LNET_PEER_MULTI_RAIL)) { + lp->lp_state &= ~LNET_PEER_FORCE_PUSH; + return 0; + } + + lp->lp_state |= LNET_PEER_PUSH_SENT; + lp->lp_state &= ~LNET_PEER_FORCE_PUSH; + spin_unlock(&lp->lp_lock); + + cpt = lnet_net_lock_current(); + pbuf = the_lnet.ln_ping_target; + lnet_ping_buffer_addref(pbuf); + lnet_net_unlock(cpt); + + /* Push source MD */ + md.start = &pbuf->pb_info; + md.length = LNET_PING_INFO_SIZE(pbuf->pb_nnis); + md.threshold = 2; /* Put/Ack */ + md.max_size = 0; + md.options = 0; + md.eq_handle = the_lnet.ln_dc_eqh; + md.user_ptr = lp; + + rc = LNetMDBind(md, LNET_UNLINK, &lp->lp_push_mdh); + if (rc) { + lnet_ping_buffer_decref(pbuf); + CERROR("Can't bind push source MD: %d\n", rc); + goto fail_error; + } + cpt = lnet_net_lock_current(); + /* Refcount for MD. */ + lnet_peer_addref_locked(lp); + id.pid = LNET_PID_LUSTRE; + id.nid = lnet_peer_select_nid(lp); + lnet_net_unlock(cpt); + + if (id.nid == LNET_NID_ANY) { + rc = -EHOSTUNREACH; + goto fail_unlink; + } + + rc = LNetPut(LNET_NID_ANY, lp->lp_push_mdh, + LNET_ACK_REQ, id, LNET_RESERVED_PORTAL, + LNET_PROTO_PING_MATCHBITS, 0, 0); + + if (rc) + goto fail_unlink; + + CDEBUG(D_NET, "peer %s\n", libcfs_nid2str(lp->lp_primary_nid)); + + spin_lock(&lp->lp_lock); + return 0; + +fail_unlink: + LNetMDUnlink(lp->lp_push_mdh); + LNetInvalidateMDHandle(&lp->lp_push_mdh); +fail_error: + CDEBUG(D_NET, "peer %s: %d\n", libcfs_nid2str(lp->lp_primary_nid), rc); + /* + * The errors that get us here are considered hard errors and + * cause Discovery to terminate. So we clear PUSH_SENT, but do + * not set PUSH_FAILED. In fact we need to clear PUSH_FAILED, + * because the unlink event handler will have set it if we + * called LNetMDUnlink() above. + */ + spin_lock(&lp->lp_lock); + lp->lp_state &= ~(LNET_PEER_PUSH_SENT | LNET_PEER_PUSH_FAILED); + return rc; +} + +/* + * An unrecoverable error was encountered during discovery. + * Set error status in peer and abort discovery. + */ +static void lnet_peer_discovery_error(struct lnet_peer *lp, int error) +{ + CDEBUG(D_NET, "Discovery error %s: %d\n", + libcfs_nid2str(lp->lp_primary_nid), error); + + spin_lock(&lp->lp_lock); + lp->lp_dc_error = error; + lp->lp_state &= ~LNET_PEER_DISCOVERING; + lp->lp_state |= LNET_PEER_REDISCOVER; + spin_unlock(&lp->lp_lock); +} + +/* + * Mark the peer as discovered. + */ +static int lnet_peer_discovered(struct lnet_peer *lp) +__must_hold(&lp->lp_lock) +{ + lp->lp_state |= LNET_PEER_DISCOVERED; + lp->lp_state &= ~(LNET_PEER_DISCOVERING | + LNET_PEER_REDISCOVER); + + CDEBUG(D_NET, "peer %s\n", libcfs_nid2str(lp->lp_primary_nid)); + + return 0; +} + +/* + * Mark the peer as to be rediscovered. + */ +static int lnet_peer_rediscover(struct lnet_peer *lp) +__must_hold(&lp->lp_lock) +{ + lp->lp_state |= LNET_PEER_REDISCOVER; + lp->lp_state &= ~LNET_PEER_DISCOVERING; + + CDEBUG(D_NET, "peer %s\n", libcfs_nid2str(lp->lp_primary_nid)); + + return 0; +} + +/* + * Returns the first peer on the ln_dc_working queue if its timeout + * has expired. Takes the current time as an argument so as to not + * obsessively re-check the clock. The oldest discovery request will + * be at the head of the queue. + */ +static struct lnet_peer *lnet_peer_dc_timed_out(time64_t now) +{ + struct lnet_peer *lp; + + if (list_empty(&the_lnet.ln_dc_working)) + return NULL; + lp = list_first_entry(&the_lnet.ln_dc_working, + struct lnet_peer, lp_dc_list); + if (now < lp->lp_last_queued + DISCOVERY_TIMEOUT) + return NULL; + return lp; +} + +/* + * Discovering this peer is taking too long. Cancel any Ping or Push + * that discovery is waiting on by unlinking the relevant MDs. The + * lnet_discovery_event_handler() will proceed from here and complete + * the cleanup. + */ +static void lnet_peer_discovery_timeout(struct lnet_peer *lp) +{ + struct lnet_handle_md ping_mdh; + struct lnet_handle_md push_mdh; + + LNetInvalidateMDHandle(&ping_mdh); + LNetInvalidateMDHandle(&push_mdh); + + spin_lock(&lp->lp_lock); + if (lp->lp_state & LNET_PEER_PING_SENT) { + ping_mdh = lp->lp_ping_mdh; + LNetInvalidateMDHandle(&lp->lp_ping_mdh); + } + if (lp->lp_state & LNET_PEER_PUSH_SENT) { + push_mdh = lp->lp_push_mdh; + LNetInvalidateMDHandle(&lp->lp_push_mdh); + } + spin_unlock(&lp->lp_lock); + + if (!LNetMDHandleIsInvalid(ping_mdh)) + LNetMDUnlink(ping_mdh); + if (!LNetMDHandleIsInvalid(push_mdh)) + LNetMDUnlink(push_mdh); +} + +/* + * Wait for work to be queued or some other change that must be + * attended to. Returns non-zero if the discovery thread should shut + * down. + */ +static int lnet_peer_discovery_wait_for_work(void) +{ + int cpt; + int rc = 0; + + DEFINE_WAIT(wait); + + cpt = lnet_net_lock_current(); + for (;;) { + prepare_to_wait(&the_lnet.ln_dc_waitq, &wait, + TASK_IDLE); + if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) + break; + if (lnet_push_target_resize_needed()) + break; + if (!list_empty(&the_lnet.ln_dc_request)) + break; + if (lnet_peer_dc_timed_out(ktime_get_real_seconds())) + break; + lnet_net_unlock(cpt); + + /* + * wakeup max every second to check if there are peers that + * have been stuck on the working queue for greater than + * the peer timeout. + */ + schedule_timeout(HZ); + finish_wait(&the_lnet.ln_dc_waitq, &wait); + cpt = lnet_net_lock_current(); + } + finish_wait(&the_lnet.ln_dc_waitq, &wait); + + if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) + rc = -ESHUTDOWN; + + lnet_net_unlock(cpt); + + CDEBUG(D_NET, "woken: %d\n", rc); + + return rc; +} + +/* The discovery thread. */ +static int lnet_peer_discovery(void *arg) +{ + struct lnet_peer *lp; + time64_t now; + int rc; + + CDEBUG(D_NET, "started\n"); + + for (;;) { + if (lnet_peer_discovery_wait_for_work()) break; if (lnet_push_target_resize_needed()) @@ -1719,33 +2984,97 @@ static int lnet_peer_discovery(void *arg) lnet_net_lock(LNET_LOCK_EX); if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) break; + + /* + * Process all incoming discovery work requests. When + * discovery must wait on a peer to change state, it + * is added to the tail of the ln_dc_working queue. A + * timestamp keeps track of when the peer was added, + * so we can time out discovery requests that take too + * long. + */ while (!list_empty(&the_lnet.ln_dc_request)) { lp = list_first_entry(&the_lnet.ln_dc_request, struct lnet_peer, lp_dc_list); list_move(&lp->lp_dc_list, &the_lnet.ln_dc_working); + /* + * set the time the peer was put on the dc_working + * queue. It shouldn't remain on the queue + * forever, in case the GET message (for ping) + * doesn't get a REPLY or the PUT message (for + * push) doesn't get an ACK. + * + * TODO: LNet Health will deal with this scenario + * in a generic way. + */ + lp->lp_last_queued = ktime_get_real_seconds(); lnet_net_unlock(LNET_LOCK_EX); - /* Just tag and release for now. */ + /* + * Select an action depending on the state of + * the peer and whether discovery is disabled. + * The check whether discovery is disabled is + * done after the code that handles processing + * for arrived data, cleanup for failures, and + * forcing a Ping or Push. + */ spin_lock(&lp->lp_lock); - if (lnet_peer_discovery_disabled) { - lp->lp_state |= LNET_PEER_REDISCOVER; - lp->lp_state &= ~(LNET_PEER_DISCOVERED | - LNET_PEER_NIDS_UPTODATE | - LNET_PEER_DISCOVERING); - } else { - lp->lp_state |= (LNET_PEER_DISCOVERED | - LNET_PEER_NIDS_UPTODATE); - lp->lp_state &= ~(LNET_PEER_REDISCOVER | - LNET_PEER_DISCOVERING); - } + CDEBUG(D_NET, "peer %s state %#x\n", + libcfs_nid2str(lp->lp_primary_nid), + lp->lp_state); + if (lp->lp_state & LNET_PEER_DATA_PRESENT) + rc = lnet_peer_data_present(lp); + else if (lp->lp_state & LNET_PEER_PING_FAILED) + rc = lnet_peer_ping_failed(lp); + else if (lp->lp_state & LNET_PEER_PUSH_FAILED) + rc = lnet_peer_push_failed(lp); + else if (lp->lp_state & LNET_PEER_FORCE_PING) + rc = lnet_peer_send_ping(lp); + else if (lp->lp_state & LNET_PEER_FORCE_PUSH) + rc = lnet_peer_send_push(lp); + else if (lnet_peer_discovery_disabled) + rc = lnet_peer_rediscover(lp); + else if (!(lp->lp_state & LNET_PEER_NIDS_UPTODATE)) + rc = lnet_peer_send_ping(lp); + else if (lnet_peer_needs_push(lp)) + rc = lnet_peer_send_push(lp); + else + rc = lnet_peer_discovered(lp); + CDEBUG(D_NET, "peer %s state %#x rc %d\n", + libcfs_nid2str(lp->lp_primary_nid), + lp->lp_state, rc); spin_unlock(&lp->lp_lock); lnet_net_lock(LNET_LOCK_EX); + if (rc == LNET_REDISCOVER_PEER) { + list_move(&lp->lp_dc_list, + &the_lnet.ln_dc_request); + } else if (rc) { + lnet_peer_discovery_error(lp, rc); + } if (!(lp->lp_state & LNET_PEER_DISCOVERING)) lnet_peer_discovery_complete(lp); if (the_lnet.ln_dc_state == LNET_DC_STATE_STOPPING) break; } + + /* + * Now that the ln_dc_request queue has been emptied + * check the ln_dc_working queue for peers that are + * taking too long. Move all that are found to the + * ln_dc_expired queue and time out any pending + * Ping or Push. We have to drop the lnet_net_lock + * in the loop because lnet_peer_discovery_timeout() + * calls LNetMDUnlink(). + */ + now = ktime_get_real_seconds(); + while ((lp = lnet_peer_dc_timed_out(now)) != NULL) { + list_move(&lp->lp_dc_list, &the_lnet.ln_dc_expired); + lnet_net_unlock(LNET_LOCK_EX); + lnet_peer_discovery_timeout(lp); + lnet_net_lock(LNET_LOCK_EX); + } + lnet_net_unlock(LNET_LOCK_EX); } @@ -1759,23 +3088,28 @@ static int lnet_peer_discovery(void *arg) LNetEQFree(the_lnet.ln_dc_eqh); LNetInvalidateEQHandle(&the_lnet.ln_dc_eqh); + /* Queue cleanup 1: stop all pending pings and pushes. */ lnet_net_lock(LNET_LOCK_EX); - list_for_each_entry(lp, &the_lnet.ln_dc_request, lp_dc_list) { - spin_lock(&lp->lp_lock); - lp->lp_state |= LNET_PEER_REDISCOVER; - lp->lp_state &= ~(LNET_PEER_DISCOVERED | - LNET_PEER_DISCOVERING | - LNET_PEER_NIDS_UPTODATE); - spin_unlock(&lp->lp_lock); - lnet_peer_discovery_complete(lp); + while (!list_empty(&the_lnet.ln_dc_working)) { + lp = list_first_entry(&the_lnet.ln_dc_working, + struct lnet_peer, lp_dc_list); + list_move(&lp->lp_dc_list, &the_lnet.ln_dc_expired); + lnet_net_unlock(LNET_LOCK_EX); + lnet_peer_discovery_timeout(lp); + lnet_net_lock(LNET_LOCK_EX); } - list_for_each_entry(lp, &the_lnet.ln_dc_working, lp_dc_list) { - spin_lock(&lp->lp_lock); - lp->lp_state |= LNET_PEER_REDISCOVER; - lp->lp_state &= ~(LNET_PEER_DISCOVERED | - LNET_PEER_DISCOVERING | - LNET_PEER_NIDS_UPTODATE); - spin_unlock(&lp->lp_lock); + lnet_net_unlock(LNET_LOCK_EX); + + /* Queue cleanup 2: wait for the expired queue to clear. */ + while (!list_empty(&the_lnet.ln_dc_expired)) + schedule_timeout_uninterruptible(HZ); + + /* Queue cleanup 3: clear the request queue. */ + lnet_net_lock(LNET_LOCK_EX); + while (!list_empty(&the_lnet.ln_dc_request)) { + lp = list_first_entry(&the_lnet.ln_dc_request, + struct lnet_peer, lp_dc_list); + lnet_peer_discovery_error(lp, -ESHUTDOWN); lnet_peer_discovery_complete(lp); } lnet_net_unlock(LNET_LOCK_EX); @@ -1797,10 +3131,6 @@ int lnet_peer_discovery_start(void) if (the_lnet.ln_dc_state != LNET_DC_STATE_SHUTDOWN) return -EALREADY; - INIT_LIST_HEAD(&the_lnet.ln_dc_request); - INIT_LIST_HEAD(&the_lnet.ln_dc_working); - init_waitqueue_head(&the_lnet.ln_dc_waitq); - rc = LNetEQAlloc(0, lnet_discovery_event_handler, &the_lnet.ln_dc_eqh); if (rc != 0) { CERROR("Can't allocate discovery EQ: %d\n", rc); @@ -1819,6 +3149,8 @@ int lnet_peer_discovery_start(void) the_lnet.ln_dc_state = LNET_DC_STATE_SHUTDOWN; } + CDEBUG(D_NET, "discovery start: %d\n", rc); + return rc; } @@ -1837,6 +3169,9 @@ void lnet_peer_discovery_stop(void) LASSERT(list_empty(&the_lnet.ln_dc_request)); LASSERT(list_empty(&the_lnet.ln_dc_working)); + LASSERT(list_empty(&the_lnet.ln_dc_expired)); + + CDEBUG(D_NET, "discovery stopped\n"); } /* Debugging */ From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629829 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03854112B for ; Sun, 7 Oct 2018 23:32:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E633D28CBF for ; Sun, 7 Oct 2018 23:32:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DA8E628CC8; Sun, 7 Oct 2018 23:32:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id F370328CBF for ; Sun, 7 Oct 2018 23:32:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B46D3861831; Sun, 7 Oct 2018 16:32:15 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7ED8F8617FB for ; Sun, 7 Oct 2018 16:32:14 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id AE269AD2C; Sun, 7 Oct 2018 23:32:13 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437828.16383.14376327105039832285.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 19/24] lustre: lnet: add "lnetctl peer list" X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Add IOC_LIBCFS_GET_PEER_LIST to obtain a list of the primary NIDs of all peers known to the system. The list is written into a userspace buffer by the kernel. The typical usage is to make a first call to determine the required buffer size, then a second call to obtain the list. Extend the "lnetctl peer" set of commands with a "list" subcommand that uses this interface. Modify the IOC_LIBCFS_GET_PEER_NI ioctl (which is new in the Multi-Rail code) to use a NID to indicate the peer to look up, and then pass out the data for all NIDs of that peer. Re-implement "lnetctl peer show" to obtain the list of NIDs using IOC_LIBCFS_GET_PEER_LIST followed by one or more IOC_LIBCFS_GET_PEER_NI calls to get information for each peer. Make sure to copy the structure from kernel space to user space even if the ioctl handler returns an error. This is needed because if the buffer passed in by the user space is not big enough to copy the data, we want to pass the requested size to user space in the structure passed in. The return code in this case is -E2BIG. WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25790 Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 9 - .../staging/lustre/include/linux/lnet/lib-types.h | 3 .../lustre/include/uapi/linux/lnet/libcfs_ioctl.h | 3 drivers/staging/lustre/lnet/lnet/api-ni.c | 30 ++- drivers/staging/lustre/lnet/lnet/peer.c | 222 +++++++++++++------- 5 files changed, 169 insertions(+), 98 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index f82a699371f2..58e3a9c4e39f 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -462,6 +462,8 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg); struct lnet_ni *lnet_get_next_ni_locked(struct lnet_net *mynet, struct lnet_ni *prev); struct lnet_ni *lnet_get_ni_idx_locked(int idx); +int lnet_get_peer_list(__u32 *countp, __u32 *sizep, + struct lnet_process_id __user *ids); void lnet_router_debugfs_init(void); void lnet_router_debugfs_fini(void); @@ -730,10 +732,9 @@ bool lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, lnet_nid_t nid); int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid); int lnet_add_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid, bool mr); int lnet_del_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid); -int lnet_get_peer_info(__u32 idx, lnet_nid_t *primary_nid, lnet_nid_t *nid, - bool *mr, - struct lnet_peer_ni_credit_info __user *peer_ni_info, - struct lnet_ioctl_element_stats __user *peer_ni_stats); +int lnet_get_peer_info(lnet_nid_t *primary_nid, lnet_nid_t *nid, + __u32 *nnis, bool *mr, __u32 *sizep, + void __user *bulk); int lnet_get_peer_ni_info(__u32 peer_index, __u64 *nid, char alivness[LNET_MAX_STR_LEN], __u32 *cpt_iter, __u32 *refcount, diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index 07baa86e61ab..8543a67420d7 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -651,7 +651,6 @@ struct lnet_peer_net { * pt_hash[...] * pt_peer_list * pt_peers - * pt_peer_nnids * protected by pt_zombie_lock: * pt_zombie_list * pt_zombies @@ -667,8 +666,6 @@ struct lnet_peer_table { struct list_head pt_peer_list; /* # peers */ int pt_peers; - /* # NIDS on listed peers */ - int pt_peer_nnids; /* # zombies to go to deathrow (and not there yet) */ int pt_zombies; /* zombie peers_ni */ diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h b/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h index 2a9beed23985..2607620e8ef8 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h @@ -144,6 +144,7 @@ struct libcfs_debug_ioctl_data { #define IOC_LIBCFS_GET_LOCAL_NI _IOWR(IOC_LIBCFS_TYPE, 97, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_SET_NUMA_RANGE _IOWR(IOC_LIBCFS_TYPE, 98, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_NUMA_RANGE _IOWR(IOC_LIBCFS_TYPE, 99, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_MAX_NR 99 +#define IOC_LIBCFS_GET_PEER_LIST _IOWR(IOC_LIBCFS_TYPE, 100, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_MAX_NR 100 #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 955d1711eda4..f624abe7db80 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -3117,21 +3117,31 @@ LNetCtl(unsigned int cmd, void *arg) case IOC_LIBCFS_GET_PEER_NI: { struct lnet_ioctl_peer_cfg *cfg = arg; - struct lnet_peer_ni_credit_info __user *lpni_cri; - struct lnet_ioctl_element_stats __user *lpni_stats; - size_t usr_size = sizeof(*lpni_cri) + sizeof(*lpni_stats); - if ((cfg->prcfg_hdr.ioc_len != sizeof(*cfg)) || - (cfg->prcfg_size != usr_size)) + if (cfg->prcfg_hdr.ioc_len < sizeof(*cfg)) return -EINVAL; - lpni_cri = cfg->prcfg_bulk; - lpni_stats = cfg->prcfg_bulk + sizeof(*lpni_cri); + mutex_lock(&the_lnet.ln_api_mutex); + rc = lnet_get_peer_info(&cfg->prcfg_prim_nid, + &cfg->prcfg_cfg_nid, + &cfg->prcfg_count, + &cfg->prcfg_mr, + &cfg->prcfg_size, + (void __user *)cfg->prcfg_bulk); + mutex_unlock(&the_lnet.ln_api_mutex); + return rc; + } + + case IOC_LIBCFS_GET_PEER_LIST: { + struct lnet_ioctl_peer_cfg *cfg = arg; + + if (cfg->prcfg_hdr.ioc_len < sizeof(*cfg)) + return -EINVAL; mutex_lock(&the_lnet.ln_api_mutex); - rc = lnet_get_peer_info(cfg->prcfg_count, &cfg->prcfg_prim_nid, - &cfg->prcfg_cfg_nid, &cfg->prcfg_mr, - lpni_cri, lpni_stats); + rc = lnet_get_peer_list(&cfg->prcfg_count, &cfg->prcfg_size, + (struct lnet_process_id __user *) + cfg->prcfg_bulk); mutex_unlock(&the_lnet.ln_api_mutex); return rc; } diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 1ef4a44e752e..8dff3b767577 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -263,9 +263,7 @@ lnet_peer_detach_peer_ni_locked(struct lnet_peer_ni *lpni) /* Update peer NID count. */ lp = lpn->lpn_peer; - ptable = the_lnet.ln_peer_tables[lp->lp_cpt]; lp->lp_nnis--; - ptable->pt_peer_nnids--; /* * If there are no more peer nets, make the peer unfindable @@ -277,6 +275,7 @@ lnet_peer_detach_peer_ni_locked(struct lnet_peer_ni *lpni) */ if (list_empty(&lp->lp_peer_nets)) { list_del_init(&lp->lp_peer_list); + ptable = the_lnet.ln_peer_tables[lp->lp_cpt]; ptable->pt_peers--; } else if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING) { /* Discovery isn't running, nothing to do here. */ @@ -637,44 +636,6 @@ lnet_find_peer(lnet_nid_t nid) return lp; } -struct lnet_peer_ni * -lnet_get_peer_ni_idx_locked(int idx, struct lnet_peer_net **lpn, - struct lnet_peer **lp) -{ - struct lnet_peer_table *ptable; - struct lnet_peer_ni *lpni; - int lncpt; - int cpt; - - lncpt = cfs_percpt_number(the_lnet.ln_peer_tables); - - for (cpt = 0; cpt < lncpt; cpt++) { - ptable = the_lnet.ln_peer_tables[cpt]; - if (ptable->pt_peer_nnids > idx) - break; - idx -= ptable->pt_peer_nnids; - } - if (cpt >= lncpt) - return NULL; - - list_for_each_entry((*lp), &ptable->pt_peer_list, lp_peer_list) { - if ((*lp)->lp_nnis <= idx) { - idx -= (*lp)->lp_nnis; - continue; - } - list_for_each_entry((*lpn), &((*lp)->lp_peer_nets), - lpn_peer_nets) { - list_for_each_entry(lpni, &((*lpn)->lpn_peer_nis), - lpni_peer_nis) { - if (idx-- == 0) - return lpni; - } - } - } - - return NULL; -} - struct lnet_peer_ni * lnet_get_next_peer_ni_locked(struct lnet_peer *peer, struct lnet_peer_net *peer_net, @@ -734,6 +695,69 @@ lnet_get_next_peer_ni_locked(struct lnet_peer *peer, return lpni; } +/* Call with the ln_api_mutex held */ +int +lnet_get_peer_list(__u32 *countp, __u32 *sizep, + struct lnet_process_id __user *ids) +{ + struct lnet_process_id id; + struct lnet_peer_table *ptable; + struct lnet_peer *lp; + __u32 count = 0; + __u32 size = 0; + int lncpt; + int cpt; + __u32 i; + int rc; + + rc = -ESHUTDOWN; + if (the_lnet.ln_state == LNET_STATE_SHUTDOWN) + goto done; + + lncpt = cfs_percpt_number(the_lnet.ln_peer_tables); + + /* + * Count the number of peers, and return E2BIG if the buffer + * is too small. We'll also return the desired size. + */ + rc = -E2BIG; + for (cpt = 0; cpt < lncpt; cpt++) { + ptable = the_lnet.ln_peer_tables[cpt]; + count += ptable->pt_peers; + } + size = count * sizeof(*ids); + if (size > *sizep) + goto done; + + /* + * Walk the peer lists and copy out the primary nids. + * This is safe because the peer lists are only modified + * while the ln_api_mutex is held. So we don't need to + * hold the lnet_net_lock as well, and can therefore + * directly call copy_to_user(). + */ + rc = -EFAULT; + memset(&id, 0, sizeof(id)); + id.pid = LNET_PID_LUSTRE; + i = 0; + for (cpt = 0; cpt < lncpt; cpt++) { + ptable = the_lnet.ln_peer_tables[cpt]; + list_for_each_entry(lp, &ptable->pt_peer_list, lp_peer_list) { + if (i >= count) + goto done; + id.nid = lp->lp_primary_nid; + if (copy_to_user(&ids[i], &id, sizeof(id))) + goto done; + i++; + } + } + rc = 0; +done: + *countp = count; + *sizep = size; + return rc; +} + /* * Start pushes to peers that need to be updated for a configuration * change on this node. @@ -1128,7 +1152,6 @@ lnet_peer_attach_peer_ni(struct lnet_peer *lp, spin_unlock(&lp->lp_lock); lp->lp_nnis++; - the_lnet.ln_peer_tables[lp->lp_cpt]->pt_peer_nnids++; lnet_net_unlock(LNET_LOCK_EX); CDEBUG(D_NET, "peer %s NID %s flags %#x\n", @@ -3273,55 +3296,94 @@ lnet_get_peer_ni_info(__u32 peer_index, __u64 *nid, } /* ln_api_mutex is held, which keeps the peer list stable */ -int lnet_get_peer_info(__u32 idx, lnet_nid_t *primary_nid, lnet_nid_t *nid, - bool *mr, - struct lnet_peer_ni_credit_info __user *peer_ni_info, - struct lnet_ioctl_element_stats __user *peer_ni_stats) +int lnet_get_peer_info(lnet_nid_t *primary_nid, lnet_nid_t *nidp, + __u32 *nnis, bool *mr, __u32 *sizep, + void __user *bulk) { - struct lnet_ioctl_element_stats ni_stats; - struct lnet_peer_ni_credit_info ni_info; - struct lnet_peer_ni *lpni = NULL; - struct lnet_peer_net *lpn = NULL; - struct lnet_peer *lp = NULL; + struct lnet_ioctl_element_stats *lpni_stats; + struct lnet_peer_ni_credit_info *lpni_info; + struct lnet_peer_ni *lpni; + struct lnet_peer *lp; + lnet_nid_t nid; + __u32 size; int rc; - lpni = lnet_get_peer_ni_idx_locked(idx, &lpn, &lp); + lp = lnet_find_peer(*primary_nid); - if (!lpni) - return -ENOENT; + if (!lp) { + rc = -ENOENT; + goto out; + } + + size = sizeof(nid) + sizeof(*lpni_info) + sizeof(*lpni_stats); + size *= lp->lp_nnis; + if (size > *sizep) { + *sizep = size; + rc = -E2BIG; + goto out_lp_decref; + } *primary_nid = lp->lp_primary_nid; *mr = lnet_peer_is_multi_rail(lp); - *nid = lpni->lpni_nid; - snprintf(ni_info.cr_aliveness, LNET_MAX_STR_LEN, "NA"); - if (lnet_isrouter(lpni) || - lnet_peer_aliveness_enabled(lpni)) - snprintf(ni_info.cr_aliveness, LNET_MAX_STR_LEN, - lpni->lpni_alive ? "up" : "down"); - - ni_info.cr_refcount = atomic_read(&lpni->lpni_refcount); - ni_info.cr_ni_peer_tx_credits = lpni->lpni_net ? - lpni->lpni_net->net_tunables.lct_peer_tx_credits : 0; - ni_info.cr_peer_tx_credits = lpni->lpni_txcredits; - ni_info.cr_peer_rtr_credits = lpni->lpni_rtrcredits; - ni_info.cr_peer_min_rtr_credits = lpni->lpni_minrtrcredits; - ni_info.cr_peer_min_tx_credits = lpni->lpni_mintxcredits; - ni_info.cr_peer_tx_qnob = lpni->lpni_txqnob; - - ni_stats.iel_send_count = atomic_read(&lpni->lpni_stats.send_count); - ni_stats.iel_recv_count = atomic_read(&lpni->lpni_stats.recv_count); - ni_stats.iel_drop_count = atomic_read(&lpni->lpni_stats.drop_count); - - /* If copy_to_user fails */ - rc = -EFAULT; - if (copy_to_user(peer_ni_info, &ni_info, sizeof(ni_info))) - goto copy_failed; + *nidp = lp->lp_primary_nid; + *nnis = lp->lp_nnis; + *sizep = size; - if (copy_to_user(peer_ni_stats, &ni_stats, sizeof(ni_stats))) - goto copy_failed; + /* Allocate helper buffers. */ + rc = -ENOMEM; + lpni_info = kzalloc(sizeof(*lpni_info), GFP_KERNEL); + if (!lpni_info) + goto out_lp_decref; + lpni_stats = kzalloc(sizeof(*lpni_stats), GFP_KERNEL); + if (!lpni_stats) + goto out_free_info; + lpni = NULL; + rc = -EFAULT; + while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) { + nid = lpni->lpni_nid; + if (copy_to_user(bulk, &nid, sizeof(nid))) + goto out_free_stats; + bulk += sizeof(nid); + + memset(lpni_info, 0, sizeof(*lpni_info)); + snprintf(lpni_info->cr_aliveness, LNET_MAX_STR_LEN, "NA"); + if (lnet_isrouter(lpni) || + lnet_peer_aliveness_enabled(lpni)) + snprintf(lpni_info->cr_aliveness, LNET_MAX_STR_LEN, + lpni->lpni_alive ? "up" : "down"); + + lpni_info->cr_refcount = atomic_read(&lpni->lpni_refcount); + lpni_info->cr_ni_peer_tx_credits = lpni->lpni_net ? + lpni->lpni_net->net_tunables.lct_peer_tx_credits : 0; + lpni_info->cr_peer_tx_credits = lpni->lpni_txcredits; + lpni_info->cr_peer_rtr_credits = lpni->lpni_rtrcredits; + lpni_info->cr_peer_min_rtr_credits = lpni->lpni_minrtrcredits; + lpni_info->cr_peer_min_tx_credits = lpni->lpni_mintxcredits; + lpni_info->cr_peer_tx_qnob = lpni->lpni_txqnob; + if (copy_to_user(bulk, lpni_info, sizeof(*lpni_info))) + goto out_free_stats; + bulk += sizeof(*lpni_info); + + memset(lpni_stats, 0, sizeof(*lpni_stats)); + lpni_stats->iel_send_count = + atomic_read(&lpni->lpni_stats.send_count); + lpni_stats->iel_recv_count = + atomic_read(&lpni->lpni_stats.recv_count); + lpni_stats->iel_drop_count = + atomic_read(&lpni->lpni_stats.drop_count); + if (copy_to_user(bulk, lpni_stats, sizeof(*lpni_stats))) + goto out_free_stats; + bulk += sizeof(*lpni_stats); + } rc = 0; -copy_failed: +out_free_stats: + kfree(lpni_stats); +out_free_info: + kfree(lpni_info); +out_lp_decref: + lnet_peer_decref_locked(lp); +out: return rc; } From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629831 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E5E8E14DB for ; Sun, 7 Oct 2018 23:32:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D526128CBF for ; Sun, 7 Oct 2018 23:32:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C985728CC8; Sun, 7 Oct 2018 23:32:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2F9D328CBF for ; Sun, 7 Oct 2018 23:32:25 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AF149861831; Sun, 7 Oct 2018 16:32:24 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 80498861827 for ; Sun, 7 Oct 2018 16:32:23 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 7C858AE17; Sun, 7 Oct 2018 23:32:22 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437832.16383.822062330627071137.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 20/24] lustre: lnet: add "lnetctl ping" command X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sonia Sharma , Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Adds function jt_ping() in lnetctl.c and lustre_lnet_ping_nid() in liblnetconfig.c file. The output of "lnetctl ping" is similar to "lnetctl peer show". Function jt_ping() in lnetctl.c calls lustre_lnet_ping_nid() to implement "lnetctl ping". Adds a function infra_ping_nid() to be later reused for the ping similar lnetctl commands. Uses a new ioctl call, IOC_LIBCFS_PING_PEER for "lnetctl ping". With "lnetctl ping", multiple nids can be pinged. Uses a new struct(lnet_ioctl_ping_data in lib-dlc.h) to pass the data from kernel to user space for ping. Also changes lnet_ping() function and its input parameters in drivers/staging/lustre/lnet/lnet/api-ni.c WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Sonia Sharma Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25791 Reviewed-by: Amir Shehata Tested-by: Amir Shehata Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 5 +- .../lustre/include/uapi/linux/lnet/libcfs_ioctl.h | 2 - .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c | 2 - .../lustre/lnet/klnds/socklnd/socklnd_modparams.c | 2 - drivers/staging/lustre/lnet/lnet/api-ni.c | 55 +++++++++++++++----- drivers/staging/lustre/lnet/lnet/peer.c | 2 - 6 files changed, 47 insertions(+), 21 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 58e3a9c4e39f..adb4d0551ef5 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -76,8 +76,8 @@ extern struct lnet the_lnet; /* THE network */ #define LNET_ACCEPTOR_MIN_RESERVED_PORT 512 #define LNET_ACCEPTOR_MAX_RESERVED_PORT 1023 -/* Discovery timeout - same as default peer_timeout */ -#define DISCOVERY_TIMEOUT 180 +/* default timeout */ +#define DEFAULT_PEER_TIMEOUT 180 static inline int lnet_is_route_alive(struct lnet_route *route) { @@ -716,6 +716,7 @@ struct lnet_peer_ni *lnet_nid2peerni_locked(lnet_nid_t nid, lnet_nid_t pref, int cpt); struct lnet_peer_ni *lnet_nid2peerni_ex(lnet_nid_t nid, int cpt); struct lnet_peer_ni *lnet_find_peer_ni_locked(lnet_nid_t nid); +struct lnet_peer *lnet_find_peer(lnet_nid_t nid); void lnet_peer_net_added(struct lnet_net *net); lnet_nid_t lnet_peer_primary_nid_locked(lnet_nid_t nid); int lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt, bool block); diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h b/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h index 2607620e8ef8..3d89202bd396 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h @@ -102,7 +102,7 @@ struct libcfs_debug_ioctl_data { #define IOC_LIBCFS_CONFIGURE _IOWR('e', 59, IOCTL_LIBCFS_TYPE) #define IOC_LIBCFS_TESTPROTOCOMPAT _IOWR('e', 60, IOCTL_LIBCFS_TYPE) #define IOC_LIBCFS_PING _IOWR('e', 61, IOCTL_LIBCFS_TYPE) -/* IOC_LIBCFS_DEBUG_PEER _IOWR('e', 62, IOCTL_LIBCFS_TYPE) */ +#define IOC_LIBCFS_PING_PEER _IOWR('e', 62, IOCTL_LIBCFS_TYPE) #define IOC_LIBCFS_LNETST _IOWR('e', 63, IOCTL_LIBCFS_TYPE) #define IOC_LIBCFS_LNET_FAULT _IOWR('e', 64, IOCTL_LIBCFS_TYPE) /* lnd ioctls */ diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c index 0f2ad9110dc9..13b19f3eabf0 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c @@ -83,7 +83,7 @@ static int peer_buffer_credits; module_param(peer_buffer_credits, int, 0444); MODULE_PARM_DESC(peer_buffer_credits, "# per-peer router buffer credits"); -static int peer_timeout = 180; +static int peer_timeout = DEFAULT_PEER_TIMEOUT; module_param(peer_timeout, int, 0444); MODULE_PARM_DESC(peer_timeout, "Seconds without aliveness news to declare peer dead (<=0 to disable)"); diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_modparams.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_modparams.c index 5663a4ca94d4..da5910049fc1 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_modparams.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_modparams.c @@ -35,7 +35,7 @@ static int peer_buffer_credits; module_param(peer_buffer_credits, int, 0444); MODULE_PARM_DESC(peer_buffer_credits, "# per-peer router buffer credits"); -static int peer_timeout = 180; +static int peer_timeout = DEFAULT_PEER_TIMEOUT; module_param(peer_timeout, int, 0444); MODULE_PARM_DESC(peer_timeout, "Seconds without aliveness news to declare peer dead (<=0 to disable)"); diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index f624abe7db80..37f47bd1511f 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -3181,24 +3181,50 @@ LNetCtl(unsigned int cmd, void *arg) id.nid = data->ioc_nid; id.pid = data->ioc_u32[0]; - /* Don't block longer than 2 minutes */ - if (data->ioc_u32[1] > 120 * MSEC_PER_SEC) - return -EINVAL; - - /* If timestamp is negative then disable timeout */ - if ((s32)data->ioc_u32[1] < 0) - timeout = MAX_SCHEDULE_TIMEOUT; + /* If timeout is negative then set default of 3 minutes */ + if (((s32)data->ioc_u32[1] <= 0) || + data->ioc_u32[1] > (DEFAULT_PEER_TIMEOUT * MSEC_PER_SEC)) + timeout = DEFAULT_PEER_TIMEOUT * HZ; else timeout = msecs_to_jiffies(data->ioc_u32[1]); rc = lnet_ping(id, timeout, data->ioc_pbuf1, data->ioc_plen1 / sizeof(struct lnet_process_id)); + if (rc < 0) return rc; + data->ioc_count = rc; return 0; } + case IOC_LIBCFS_PING_PEER: { + struct lnet_ioctl_ping_data *ping = arg; + struct lnet_peer *lp; + signed long timeout; + + /* If timeout is negative then set default of 3 minutes */ + if (((s32)ping->op_param) <= 0 || + ping->op_param > (DEFAULT_PEER_TIMEOUT * MSEC_PER_SEC)) + timeout = DEFAULT_PEER_TIMEOUT * HZ; + else + timeout = msecs_to_jiffies(ping->op_param); + + rc = lnet_ping(ping->ping_id, timeout, + ping->ping_buf, + ping->ping_count); + if (rc < 0) + return rc; + + lp = lnet_find_peer(ping->ping_id.nid); + if (lp) { + ping->ping_id.nid = lp->lp_primary_nid; + ping->mr_info = lnet_peer_is_multi_rail(lp); + } + ping->ping_count = rc; + return 0; + } + default: ni = lnet_net2ni_addref(data->ioc_net); if (!ni) @@ -3301,7 +3327,7 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, /* initialize md content */ md.start = &pbuf->pb_info; md.length = LNET_PING_INFO_SIZE(n_ids); - md.threshold = 2; /*GET/REPLY*/ + md.threshold = 2; /* GET/REPLY */ md.max_size = 0; md.options = LNET_MD_TRUNCATE; md.user_ptr = NULL; @@ -3319,7 +3345,6 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, if (rc) { /* Don't CERROR; this could be deliberate! */ - rc2 = LNetMDUnlink(mdh); LASSERT(!rc2); @@ -3363,7 +3388,6 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, replied = 1; rc = event.mlength; } - } while (rc2 <= 0 || !event.unlinked); if (!replied) { @@ -3377,10 +3401,9 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, nob = rc; LASSERT(nob >= 0 && nob <= LNET_PING_INFO_SIZE(n_ids)); - rc = -EPROTO; /* if I can't parse... */ + rc = -EPROTO; /* if I can't parse... */ if (nob < 8) { - /* can't check magic/version */ CERROR("%s: ping info too short %d\n", libcfs_id2str(id), nob); goto fail_free_eq; @@ -3401,7 +3424,8 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, } if (nob < LNET_PING_INFO_SIZE(0)) { - CERROR("%s: Short reply %d(%d min)\n", libcfs_id2str(id), + CERROR("%s: Short reply %d(%d min)\n", + libcfs_id2str(id), nob, (int)LNET_PING_INFO_SIZE(0)); goto fail_free_eq; } @@ -3410,12 +3434,13 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, n_ids = pbuf->pb_info.pi_nnis; if (nob < LNET_PING_INFO_SIZE(n_ids)) { - CERROR("%s: Short reply %d(%d expected)\n", libcfs_id2str(id), + CERROR("%s: Short reply %d(%d expected)\n", + libcfs_id2str(id), nob, (int)LNET_PING_INFO_SIZE(n_ids)); goto fail_free_eq; } - rc = -EFAULT; /* If I SEGV... */ + rc = -EFAULT; /* if I segv in copy_to_user()... */ memset(&tmpid, 0, sizeof(tmpid)); for (i = 0; i < n_ids; i++) { diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 8dff3b767577..95f72ae39a89 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -2905,7 +2905,7 @@ static struct lnet_peer *lnet_peer_dc_timed_out(time64_t now) return NULL; lp = list_first_entry(&the_lnet.ln_dc_working, struct lnet_peer, lp_dc_list); - if (now < lp->lp_last_queued + DISCOVERY_TIMEOUT) + if (now < lp->lp_last_queued + DEFAULT_PEER_TIMEOUT) return NULL; return lp; } From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629833 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 63C0514DB for ; Sun, 7 Oct 2018 23:32:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5385228CBF for ; Sun, 7 Oct 2018 23:32:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4792728CC8; Sun, 7 Oct 2018 23:32:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D2A5828CBF for ; Sun, 7 Oct 2018 23:32:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9D476861870; Sun, 7 Oct 2018 16:32:31 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 076FE21F5E1 for ; Sun, 7 Oct 2018 16:32:30 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 254CFAD2C; Sun, 7 Oct 2018 23:32:29 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437836.16383.17797958097457714365.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 21/24] lustre: lnet: add "lnetctl discover" X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sonia Sharma , Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Sonia Sharma Add a "discover" subcommand to lnetctl jt_discover() in lnetctl.c calls lustre_lnet_discover_nid() to implement "lnetctl discover". The output is similar to "lnetctl ping" command. This patch also does some clean up in linlnetconfig.c For parameters under global settings, the common code for them is pulled in functions ioctl_set_value() and ioctl_show_global_values(). WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Sonia Sharma Signed-off-by: Amir Shehata Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25793 Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../lustre/include/uapi/linux/lnet/libcfs_ioctl.h | 2 drivers/staging/lustre/lnet/lnet/api-ni.c | 100 ++++++++++++++++++++ 2 files changed, 101 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h b/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h index 3d89202bd396..60bc9713923e 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h @@ -113,7 +113,7 @@ struct libcfs_debug_ioctl_data { #define IOC_LIBCFS_DEL_PEER _IOWR('e', 74, IOCTL_LIBCFS_TYPE) #define IOC_LIBCFS_ADD_PEER _IOWR('e', 75, IOCTL_LIBCFS_TYPE) #define IOC_LIBCFS_GET_PEER _IOWR('e', 76, IOCTL_LIBCFS_TYPE) -/* ioctl 77 is free for use */ +#define IOC_LIBCFS_DISCOVER _IOWR('e', 77, IOCTL_LIBCFS_TYPE) #define IOC_LIBCFS_ADD_INTERFACE _IOWR('e', 78, IOCTL_LIBCFS_TYPE) #define IOC_LIBCFS_DEL_INTERFACE _IOWR('e', 79, IOCTL_LIBCFS_TYPE) #define IOC_LIBCFS_GET_INTERFACE _IOWR('e', 80, IOCTL_LIBCFS_TYPE) diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 37f47bd1511f..0511c6acb9b1 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -104,6 +104,9 @@ static atomic_t lnet_dlc_seq_no = ATOMIC_INIT(0); static int lnet_ping(struct lnet_process_id id, signed long timeout, struct lnet_process_id __user *ids, int n_ids); +static int lnet_discover(struct lnet_process_id id, __u32 force, + struct lnet_process_id __user *ids, int n_ids); + static int discovery_set(const char *val, const struct kernel_param *kp) { @@ -3225,6 +3228,25 @@ LNetCtl(unsigned int cmd, void *arg) return 0; } + case IOC_LIBCFS_DISCOVER: { + struct lnet_ioctl_ping_data *discover = arg; + struct lnet_peer *lp; + + rc = lnet_discover(discover->ping_id, discover->op_param, + discover->ping_buf, + discover->ping_count); + if (rc < 0) + return rc; + lp = lnet_find_peer(discover->ping_id.nid); + if (lp) { + discover->ping_id.nid = lp->lp_primary_nid; + discover->mr_info = lnet_peer_is_multi_rail(lp); + } + + discover->ping_count = rc; + return 0; + } + default: ni = lnet_net2ni_addref(data->ioc_net); if (!ni) @@ -3461,3 +3483,81 @@ static int lnet_ping(struct lnet_process_id id, signed long timeout, lnet_ping_buffer_decref(pbuf); return rc; } + +static int +lnet_discover(struct lnet_process_id id, __u32 force, + struct lnet_process_id __user *ids, + int n_ids) +{ + struct lnet_peer_ni *lpni; + struct lnet_peer_ni *p; + struct lnet_peer *lp; + struct lnet_process_id *buf; + int cpt; + int i; + int rc; + int max_intf = lnet_interfaces_max; + + if (n_ids <= 0 || + id.nid == LNET_NID_ANY || + n_ids > max_intf) + return -EINVAL; + + if (id.pid == LNET_PID_ANY) + id.pid = LNET_PID_LUSTRE; + + buf = kcalloc(n_ids, sizeof(*buf), GFP_KERNEL); + if (!buf) + return -ENOMEM; + + cpt = lnet_net_lock_current(); + lpni = lnet_nid2peerni_locked(id.nid, LNET_NID_ANY, cpt); + if (IS_ERR(lpni)) { + rc = PTR_ERR(lpni); + goto out; + } + + /* + * Clearing the NIDS_UPTODATE flag ensures the peer will + * be discovered, provided discovery has not been disabled. + */ + lp = lpni->lpni_peer_net->lpn_peer; + spin_lock(&lp->lp_lock); + lp->lp_state &= ~LNET_PEER_NIDS_UPTODATE; + /* If the force flag is set, force a PING and PUSH as well. */ + if (force) + lp->lp_state |= LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH; + spin_unlock(&lp->lp_lock); + rc = lnet_discover_peer_locked(lpni, cpt, true); + if (rc) + goto out_decref; + + /* Peer may have changed. */ + lp = lpni->lpni_peer_net->lpn_peer; + if (lp->lp_nnis < n_ids) + n_ids = lp->lp_nnis; + + i = 0; + p = NULL; + while ((p = lnet_get_next_peer_ni_locked(lp, NULL, p)) != NULL) { + buf[i].pid = id.pid; + buf[i].nid = p->lpni_nid; + if (++i >= n_ids) + break; + } + + lnet_net_unlock(cpt); + + rc = -EFAULT; + if (copy_to_user(ids, buf, n_ids * sizeof(*buf))) + goto out_relock; + rc = n_ids; +out_relock: + lnet_net_lock(cpt); +out_decref: + lnet_peer_ni_decref_locked(lpni); +out: + lnet_net_unlock(cpt); + kfree(buf); + return rc; +} From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629835 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D815A112B for ; Sun, 7 Oct 2018 23:32:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C6F1F28CBF for ; Sun, 7 Oct 2018 23:32:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BADF628CC8; Sun, 7 Oct 2018 23:32:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D412928CBF for ; Sun, 7 Oct 2018 23:32:39 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5CD46861869; Sun, 7 Oct 2018 16:32:39 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C17F9861827 for ; Sun, 7 Oct 2018 16:32:37 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id DD00EAE17; Sun, 7 Oct 2018 23:32:36 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437840.16383.11395842984054958152.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 22/24] lustre: lnet: add enhanced statistics X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata Added statistics to track the different types of LNet messages which are sent/received/dropped WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Amir Shehata Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/25795 Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 12 ++ .../staging/lustre/include/linux/lnet/lib-types.h | 20 +++ .../lustre/include/uapi/linux/lnet/libcfs_ioctl.h | 3 - drivers/staging/lustre/lnet/lnet/api-ni.c | 45 +++++++- drivers/staging/lustre/lnet/lnet/lib-move.c | 116 +++++++++++++++++++- drivers/staging/lustre/lnet/lnet/lib-msg.c | 16 ++- drivers/staging/lustre/lnet/lnet/net_fault.c | 3 - drivers/staging/lustre/lnet/lnet/peer.c | 26 +++- 8 files changed, 217 insertions(+), 24 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index adb4d0551ef5..91980f60a50d 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -575,7 +575,7 @@ void lnet_set_reply_msg_len(struct lnet_ni *ni, struct lnet_msg *msg, void lnet_finalize(struct lnet_msg *msg, int rc); void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, - unsigned int nob); + unsigned int nob, __u32 msg_type); void lnet_drop_delayed_msg_list(struct list_head *head, char *reason); void lnet_recv_delayed_msg_list(struct list_head *head); @@ -825,4 +825,14 @@ lnet_peer_needs_push(struct lnet_peer *lp) return false; } +void lnet_incr_stats(struct lnet_element_stats *stats, + enum lnet_msg_type msg_type, + enum lnet_stats_type stats_type); + +__u32 lnet_sum_stats(struct lnet_element_stats *stats, + enum lnet_stats_type stats_type); + +void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, + struct lnet_element_stats *stats); + #endif diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index 8543a67420d7..19f7b11a1e44 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -279,10 +279,24 @@ enum lnet_ni_state { LNET_NI_STATE_DELETING }; +enum lnet_stats_type { + LNET_STATS_TYPE_SEND = 0, + LNET_STATS_TYPE_RECV, + LNET_STATS_TYPE_DROP +}; + +struct lnet_comm_count { + atomic_t co_get_count; + atomic_t co_put_count; + atomic_t co_reply_count; + atomic_t co_ack_count; + atomic_t co_hello_count; +}; + struct lnet_element_stats { - atomic_t send_count; - atomic_t recv_count; - atomic_t drop_count; + struct lnet_comm_count el_send_stats; + struct lnet_comm_count el_recv_stats; + struct lnet_comm_count el_drop_stats; }; struct lnet_net { diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h b/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h index 60bc9713923e..4590f65c333f 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h @@ -145,6 +145,7 @@ struct libcfs_debug_ioctl_data { #define IOC_LIBCFS_SET_NUMA_RANGE _IOWR(IOC_LIBCFS_TYPE, 98, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_NUMA_RANGE _IOWR(IOC_LIBCFS_TYPE, 99, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_PEER_LIST _IOWR(IOC_LIBCFS_TYPE, 100, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_MAX_NR 100 +#define IOC_LIBCFS_GET_LOCAL_NI_MSG_STATS _IOWR(IOC_LIBCFS_TYPE, 101, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_MAX_NR 101 #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 0511c6acb9b1..0852118bf803 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -2263,8 +2263,12 @@ lnet_fill_ni_info(struct lnet_ni *ni, struct lnet_ioctl_config_ni *cfg_ni, memcpy(&tun->lt_cmn, &ni->ni_net->net_tunables, sizeof(tun->lt_cmn)); if (stats) { - stats->iel_send_count = atomic_read(&ni->ni_stats.send_count); - stats->iel_recv_count = atomic_read(&ni->ni_stats.recv_count); + stats->iel_send_count = lnet_sum_stats(&ni->ni_stats, + LNET_STATS_TYPE_SEND); + stats->iel_recv_count = lnet_sum_stats(&ni->ni_stats, + LNET_STATS_TYPE_RECV); + stats->iel_drop_count = lnet_sum_stats(&ni->ni_stats, + LNET_STATS_TYPE_DROP); } /* @@ -2491,6 +2495,29 @@ lnet_get_ni_config(struct lnet_ioctl_config_ni *cfg_ni, return rc; } +int lnet_get_ni_stats(struct lnet_ioctl_element_msg_stats *msg_stats) +{ + struct lnet_ni *ni; + int cpt; + int rc = -ENOENT; + + if (!msg_stats) + return -EINVAL; + + cpt = lnet_net_lock_current(); + + ni = lnet_get_ni_idx_locked(msg_stats->im_idx); + + if (ni) { + lnet_usr_translate_stats(msg_stats, &ni->ni_stats); + rc = 0; + } + + lnet_net_unlock(cpt); + + return rc; +} + static int lnet_add_net_common(struct lnet_net *net, struct lnet_ioctl_config_lnd_tunables *tun) { @@ -2956,6 +2983,7 @@ LNetCtl(unsigned int cmd, void *arg) __u32 tun_size; cfg_ni = arg; + /* get the tunables if they are available */ if (cfg_ni->lic_cfg_hdr.ioc_len < sizeof(*cfg_ni) + sizeof(*stats) + sizeof(*tun)) @@ -2975,6 +3003,19 @@ LNetCtl(unsigned int cmd, void *arg) return rc; } + case IOC_LIBCFS_GET_LOCAL_NI_MSG_STATS: { + struct lnet_ioctl_element_msg_stats *msg_stats = arg; + + if (msg_stats->im_hdr.ioc_len != sizeof(*msg_stats)) + return -EINVAL; + + mutex_lock(&the_lnet.ln_api_mutex); + rc = lnet_get_ni_stats(msg_stats); + mutex_unlock(&the_lnet.ln_api_mutex); + + return rc; + } + case IOC_LIBCFS_GET_NET: { size_t total = sizeof(*config) + sizeof(struct lnet_ioctl_net_config); diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index 2ff329bf91ba..5694d85c713c 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -45,6 +45,104 @@ static int local_nid_dist_zero = 1; module_param(local_nid_dist_zero, int, 0444); MODULE_PARM_DESC(local_nid_dist_zero, "Reserved"); +static inline struct lnet_comm_count * +get_stats_counts(struct lnet_element_stats *stats, + enum lnet_stats_type stats_type) +{ + switch (stats_type) { + case LNET_STATS_TYPE_SEND: + return &stats->el_send_stats; + case LNET_STATS_TYPE_RECV: + return &stats->el_recv_stats; + case LNET_STATS_TYPE_DROP: + return &stats->el_drop_stats; + default: + CERROR("Unknown stats type\n"); + } + + return NULL; +} + +void lnet_incr_stats(struct lnet_element_stats *stats, + enum lnet_msg_type msg_type, + enum lnet_stats_type stats_type) +{ + struct lnet_comm_count *counts = get_stats_counts(stats, stats_type); + + if (!counts) + return; + + switch (msg_type) { + case LNET_MSG_ACK: + atomic_inc(&counts->co_ack_count); + break; + case LNET_MSG_PUT: + atomic_inc(&counts->co_put_count); + break; + case LNET_MSG_GET: + atomic_inc(&counts->co_get_count); + break; + case LNET_MSG_REPLY: + atomic_inc(&counts->co_reply_count); + break; + case LNET_MSG_HELLO: + atomic_inc(&counts->co_hello_count); + break; + default: + CERROR("There is a BUG in the code. Unknown message type\n"); + break; + } +} + +__u32 lnet_sum_stats(struct lnet_element_stats *stats, + enum lnet_stats_type stats_type) +{ + struct lnet_comm_count *counts = get_stats_counts(stats, stats_type); + + if (!counts) + return 0; + + return (atomic_read(&counts->co_ack_count) + + atomic_read(&counts->co_put_count) + + atomic_read(&counts->co_get_count) + + atomic_read(&counts->co_reply_count) + + atomic_read(&counts->co_hello_count)); +} + +static inline void assign_stats(struct lnet_ioctl_comm_count *msg_stats, + struct lnet_comm_count *counts) +{ + msg_stats->ico_get_count = atomic_read(&counts->co_get_count); + msg_stats->ico_put_count = atomic_read(&counts->co_put_count); + msg_stats->ico_reply_count = atomic_read(&counts->co_reply_count); + msg_stats->ico_ack_count = atomic_read(&counts->co_ack_count); + msg_stats->ico_hello_count = atomic_read(&counts->co_hello_count); +} + +void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, + struct lnet_element_stats *stats) +{ + struct lnet_comm_count *counts; + + LASSERT(msg_stats); + LASSERT(stats); + + counts = get_stats_counts(stats, LNET_STATS_TYPE_SEND); + if (!counts) + return; + assign_stats(&msg_stats->im_send_stats, counts); + + counts = get_stats_counts(stats, LNET_STATS_TYPE_RECV); + if (!counts) + return; + assign_stats(&msg_stats->im_recv_stats, counts); + + counts = get_stats_counts(stats, LNET_STATS_TYPE_DROP); + if (!counts) + return; + assign_stats(&msg_stats->im_drop_stats, counts); +} + int lnet_fail_nid(lnet_nid_t nid, unsigned int threshold) { @@ -632,9 +730,13 @@ lnet_post_send_locked(struct lnet_msg *msg, int do_send) the_lnet.ln_counters[cpt]->drop_length += msg->msg_len; lnet_net_unlock(cpt); if (msg->msg_txpeer) - atomic_inc(&msg->msg_txpeer->lpni_stats.drop_count); + lnet_incr_stats(&msg->msg_txpeer->lpni_stats, + msg->msg_type, + LNET_STATS_TYPE_DROP); if (msg->msg_txni) - atomic_inc(&msg->msg_txni->ni_stats.drop_count); + lnet_incr_stats(&msg->msg_txni->ni_stats, + msg->msg_type, + LNET_STATS_TYPE_DROP); CNETERR("Dropping message for %s: peer not alive\n", libcfs_id2str(msg->msg_target)); @@ -1859,9 +1961,11 @@ lnet_send(lnet_nid_t src_nid, struct lnet_msg *msg, lnet_nid_t rtr_nid) } void -lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, unsigned int nob) +lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, unsigned int nob, + __u32 msg_type) { lnet_net_lock(cpt); + lnet_incr_stats(&ni->ni_stats, msg_type, LNET_STATS_TYPE_DROP); the_lnet.ln_counters[cpt]->drop_count++; the_lnet.ln_counters[cpt]->drop_length += nob; lnet_net_unlock(cpt); @@ -2510,7 +2614,7 @@ lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid, lnet_finalize(msg, rc); drop: - lnet_drop_message(ni, cpt, private, payload_length); + lnet_drop_message(ni, cpt, private, payload_length, type); return 0; } EXPORT_SYMBOL(lnet_parse); @@ -2546,7 +2650,8 @@ lnet_drop_delayed_msg_list(struct list_head *head, char *reason) * until that's done */ lnet_drop_message(msg->msg_rxni, msg->msg_rx_cpt, - msg->msg_private, msg->msg_len); + msg->msg_private, msg->msg_len, + msg->msg_type); /* * NB: message will not generate event because w/o attached MD, * but we still should give error code so lnet_msg_decommit() @@ -2786,6 +2891,7 @@ lnet_create_reply_msg(struct lnet_ni *ni, struct lnet_msg *getmsg) cpt = lnet_cpt_of_nid(peer_id.nid, ni); lnet_net_lock(cpt); + lnet_incr_stats(&ni->ni_stats, LNET_MSG_GET, LNET_STATS_TYPE_DROP); the_lnet.ln_counters[cpt]->drop_count++; the_lnet.ln_counters[cpt]->drop_length += getmd->md_length; lnet_net_unlock(cpt); diff --git a/drivers/staging/lustre/lnet/lnet/lib-msg.c b/drivers/staging/lustre/lnet/lnet/lib-msg.c index db13d01d366f..7f58cfe25bc2 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-msg.c +++ b/drivers/staging/lustre/lnet/lnet/lib-msg.c @@ -219,9 +219,13 @@ lnet_msg_decommit_tx(struct lnet_msg *msg, int status) incr_stats: if (msg->msg_txpeer) - atomic_inc(&msg->msg_txpeer->lpni_stats.send_count); + lnet_incr_stats(&msg->msg_txpeer->lpni_stats, + msg->msg_type, + LNET_STATS_TYPE_SEND); if (msg->msg_txni) - atomic_inc(&msg->msg_txni->ni_stats.send_count); + lnet_incr_stats(&msg->msg_txni->ni_stats, + msg->msg_type, + LNET_STATS_TYPE_SEND); out: lnet_return_tx_credits_locked(msg); msg->msg_tx_committed = 0; @@ -280,9 +284,13 @@ lnet_msg_decommit_rx(struct lnet_msg *msg, int status) incr_stats: if (msg->msg_rxpeer) - atomic_inc(&msg->msg_rxpeer->lpni_stats.recv_count); + lnet_incr_stats(&msg->msg_rxpeer->lpni_stats, + msg->msg_type, + LNET_STATS_TYPE_RECV); if (msg->msg_rxni) - atomic_inc(&msg->msg_rxni->ni_stats.recv_count); + lnet_incr_stats(&msg->msg_rxni->ni_stats, + msg->msg_type, + LNET_STATS_TYPE_RECV); if (ev->type == LNET_EVENT_PUT || ev->type == LNET_EVENT_REPLY) counters->recv_length += msg->msg_wanted; diff --git a/drivers/staging/lustre/lnet/lnet/net_fault.c b/drivers/staging/lustre/lnet/lnet/net_fault.c index 3841bac1aa0a..e2c746855da9 100644 --- a/drivers/staging/lustre/lnet/lnet/net_fault.c +++ b/drivers/staging/lustre/lnet/lnet/net_fault.c @@ -632,7 +632,8 @@ delayed_msg_process(struct list_head *msg_list, bool drop) } } - lnet_drop_message(ni, cpt, msg->msg_private, msg->msg_len); + lnet_drop_message(ni, cpt, msg->msg_private, msg->msg_len, + msg->msg_type); lnet_finalize(msg, rc); } } diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 95f72ae39a89..03c1c34517e4 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -3301,6 +3301,7 @@ int lnet_get_peer_info(lnet_nid_t *primary_nid, lnet_nid_t *nidp, void __user *bulk) { struct lnet_ioctl_element_stats *lpni_stats; + struct lnet_ioctl_element_msg_stats *lpni_msg_stats; struct lnet_peer_ni_credit_info *lpni_info; struct lnet_peer_ni *lpni; struct lnet_peer *lp; @@ -3315,7 +3316,8 @@ int lnet_get_peer_info(lnet_nid_t *primary_nid, lnet_nid_t *nidp, goto out; } - size = sizeof(nid) + sizeof(*lpni_info) + sizeof(*lpni_stats); + size = sizeof(nid) + sizeof(*lpni_info) + sizeof(*lpni_stats) + + sizeof(*lpni_msg_stats); size *= lp->lp_nnis; if (size > *sizep) { *sizep = size; @@ -3337,13 +3339,17 @@ int lnet_get_peer_info(lnet_nid_t *primary_nid, lnet_nid_t *nidp, lpni_stats = kzalloc(sizeof(*lpni_stats), GFP_KERNEL); if (!lpni_stats) goto out_free_info; + lpni_msg_stats = kzalloc(sizeof(*lpni_msg_stats), GFP_KERNEL); + if (!lpni_msg_stats) + goto out_free_stats; + lpni = NULL; rc = -EFAULT; while ((lpni = lnet_get_next_peer_ni_locked(lp, NULL, lpni)) != NULL) { nid = lpni->lpni_nid; if (copy_to_user(bulk, &nid, sizeof(nid))) - goto out_free_stats; + goto out_free_msg_stats; bulk += sizeof(nid); memset(lpni_info, 0, sizeof(*lpni_info)); @@ -3362,22 +3368,28 @@ int lnet_get_peer_info(lnet_nid_t *primary_nid, lnet_nid_t *nidp, lpni_info->cr_peer_min_tx_credits = lpni->lpni_mintxcredits; lpni_info->cr_peer_tx_qnob = lpni->lpni_txqnob; if (copy_to_user(bulk, lpni_info, sizeof(*lpni_info))) - goto out_free_stats; + goto out_free_msg_stats; bulk += sizeof(*lpni_info); memset(lpni_stats, 0, sizeof(*lpni_stats)); lpni_stats->iel_send_count = - atomic_read(&lpni->lpni_stats.send_count); + lnet_sum_stats(&lpni->lpni_stats, LNET_STATS_TYPE_SEND); lpni_stats->iel_recv_count = - atomic_read(&lpni->lpni_stats.recv_count); + lnet_sum_stats(&lpni->lpni_stats, LNET_STATS_TYPE_RECV); lpni_stats->iel_drop_count = - atomic_read(&lpni->lpni_stats.drop_count); + lnet_sum_stats(&lpni->lpni_stats, LNET_STATS_TYPE_DROP); if (copy_to_user(bulk, lpni_stats, sizeof(*lpni_stats))) - goto out_free_stats; + goto out_free_msg_stats; bulk += sizeof(*lpni_stats); + lnet_usr_translate_stats(lpni_msg_stats, &lpni->lpni_stats); + if (copy_to_user(bulk, lpni_msg_stats, sizeof(*lpni_msg_stats))) + goto out_free_msg_stats; + bulk += sizeof(*lpni_msg_stats); } rc = 0; +out_free_msg_stats: + kfree(lpni_msg_stats); out_free_stats: kfree(lpni_stats); out_free_info: From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629837 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B6B88112B for ; Sun, 7 Oct 2018 23:32:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A6C1B28CBF for ; Sun, 7 Oct 2018 23:32:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9B23B28CC8; Sun, 7 Oct 2018 23:32:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2B84D28CBF for ; Sun, 7 Oct 2018 23:32:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D4D87861890; Sun, 7 Oct 2018 16:32:47 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 58CCA21F5E1 for ; Sun, 7 Oct 2018 16:32:46 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 236E0AD2C; Sun, 7 Oct 2018 23:32:45 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437844.16383.527689917065770648.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 23/24] lustre: lnet: show peer state X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Olaf Weber , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata It is important to show the peer state when debugging. This patch exports the peer state from the kernel to user space, and is shown when the detail level requested in the peer show command is >= 3 WC-bug-id: https://jira.whamcloud.com/browse/LU-9480 Signed-off-by: Amir Shehata Signed-off-by: Olaf Weber Reviewed-on: https://review.whamcloud.com/26130 Reviewed-by: Olaf Weber Reviewed-by: Dmitry Eremin Signed-off-by: NeilBrown Reviewed-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 4 +--- drivers/staging/lustre/lnet/lnet/api-ni.c | 6 +----- drivers/staging/lustre/lnet/lnet/peer.c | 21 ++++++++++---------- 3 files changed, 12 insertions(+), 19 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 91980f60a50d..fcfd844e0162 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -733,9 +733,7 @@ bool lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, lnet_nid_t nid); int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid); int lnet_add_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid, bool mr); int lnet_del_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid); -int lnet_get_peer_info(lnet_nid_t *primary_nid, lnet_nid_t *nid, - __u32 *nnis, bool *mr, __u32 *sizep, - void __user *bulk); +int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk); int lnet_get_peer_ni_info(__u32 peer_index, __u64 *nid, char alivness[LNET_MAX_STR_LEN], __u32 *cpt_iter, __u32 *refcount, diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 0852118bf803..e2c86b8279e5 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -3166,11 +3166,7 @@ LNetCtl(unsigned int cmd, void *arg) return -EINVAL; mutex_lock(&the_lnet.ln_api_mutex); - rc = lnet_get_peer_info(&cfg->prcfg_prim_nid, - &cfg->prcfg_cfg_nid, - &cfg->prcfg_count, - &cfg->prcfg_mr, - &cfg->prcfg_size, + rc = lnet_get_peer_info(cfg, (void __user *)cfg->prcfg_bulk); mutex_unlock(&the_lnet.ln_api_mutex); return rc; diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 03c1c34517e4..5f61fca09f44 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -3296,9 +3296,7 @@ lnet_get_peer_ni_info(__u32 peer_index, __u64 *nid, } /* ln_api_mutex is held, which keeps the peer list stable */ -int lnet_get_peer_info(lnet_nid_t *primary_nid, lnet_nid_t *nidp, - __u32 *nnis, bool *mr, __u32 *sizep, - void __user *bulk) +int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk) { struct lnet_ioctl_element_stats *lpni_stats; struct lnet_ioctl_element_msg_stats *lpni_msg_stats; @@ -3309,7 +3307,7 @@ int lnet_get_peer_info(lnet_nid_t *primary_nid, lnet_nid_t *nidp, __u32 size; int rc; - lp = lnet_find_peer(*primary_nid); + lp = lnet_find_peer(cfg->prcfg_prim_nid); if (!lp) { rc = -ENOENT; @@ -3319,17 +3317,18 @@ int lnet_get_peer_info(lnet_nid_t *primary_nid, lnet_nid_t *nidp, size = sizeof(nid) + sizeof(*lpni_info) + sizeof(*lpni_stats) + sizeof(*lpni_msg_stats); size *= lp->lp_nnis; - if (size > *sizep) { - *sizep = size; + if (size > cfg->prcfg_size) { + cfg->prcfg_size = size; rc = -E2BIG; goto out_lp_decref; } - *primary_nid = lp->lp_primary_nid; - *mr = lnet_peer_is_multi_rail(lp); - *nidp = lp->lp_primary_nid; - *nnis = lp->lp_nnis; - *sizep = size; + cfg->prcfg_prim_nid = lp->lp_primary_nid; + cfg->prcfg_mr = lnet_peer_is_multi_rail(lp); + cfg->prcfg_cfg_nid = lp->lp_primary_nid; + cfg->prcfg_count = lp->lp_nnis; + cfg->prcfg_size = size; + cfg->prcfg_state = lp->lp_state; /* Allocate helper buffers. */ rc = -ENOMEM; From patchwork Sun Oct 7 23:19:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 10629839 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 500C014DB for ; Sun, 7 Oct 2018 23:32:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4004628CBF for ; Sun, 7 Oct 2018 23:32:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 32AB628CC8; Sun, 7 Oct 2018 23:32:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A542928CBF for ; Sun, 7 Oct 2018 23:32:55 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6FD1D8618A3; Sun, 7 Oct 2018 16:32:55 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D600C861831 for ; Sun, 7 Oct 2018 16:32:53 -0700 (PDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id EA952AE17; Sun, 7 Oct 2018 23:32:52 +0000 (UTC) From: NeilBrown To: Oleg Drokin , Doug Oucharek , James Simmons , Andreas Dilger Date: Mon, 08 Oct 2018 10:19:38 +1100 Message-ID: <153895437848.16383.5882317080014923551.stgit@noble> In-Reply-To: <153895417139.16383.3791701638653772865.stgit@noble> References: <153895417139.16383.3791701638653772865.stgit@noble> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 24/24] lustre: lnet: balance references in lnet_discover_peer_locked() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Oleg Drokin , "John L. Hammond" , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: John L. Hammond In lnet_discover_peer_locked() avoid a leaked reference to the peer in the non-blocking discovery case. WC-bug-id: https://jira.whamcloud.com/browse/LU-9913 Signed-off-by: John L. Hammond Reviewed-on: https://review.whamcloud.com/28695 Reviewed-by: Olaf Weber Reviewed-by: Quentin Bouget Reviewed-by: Oleg Drokin Signed-off-by: NeilBrown Reviewed-by: James Simmons Reviewed-by: James Simmons --- drivers/staging/lustre/lnet/lnet/peer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 5f61fca09f44..db36b5cf31e1 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -2010,7 +2010,6 @@ lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt, bool block) if (lnet_peer_is_uptodate(lp)) break; lnet_peer_queue_for_discovery(lp); - lnet_peer_addref_locked(lp); /* * if caller requested a non-blocking operation then * return immediately. Once discovery is complete then the @@ -2019,6 +2018,8 @@ lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt, bool block) */ if (!block) break; + + lnet_peer_addref_locked(lp); lnet_net_unlock(LNET_LOCK_EX); schedule(); finish_wait(&lp->lp_dc_waitq, &wait);