From patchwork Sun Nov 28 23:27:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643247 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 093B7C433EF for ; Sun, 28 Nov 2021 23:28:24 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9266B21C90A; Sun, 28 Nov 2021 15:28:14 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5C7B6200EB1 for ; Sun, 28 Nov 2021 15:27:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8A2CE240; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 78E06C1AC9; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:36 -0500 Message-Id: <1638142074-5945-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 01/19] lnet: fix delay rule crash X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The following crash was captured in testing: LNetError: 25912:0:(net_fault.c:520:delay_rule_decref()) ASSERTION( list_empty(&rule->dl_sched_link) ) failed: LNetError: 25912:0:(net_fault.c:520:delay_rule_decref()) LBUG Pid: 25912, comm: lnet_dd 5.7.0-rc7+ #1 SMP PREEMPT Fri Aug 20 16:17:11 EDT 2021 Call Trace: libcfs_call_trace+0x62/0x80 [libcfs] lbug_with_loc+0x41/0xa0 [libcfs] delay_rule_decref+0x6e/0xe0 [lnet] lnet_delay_rule_check+0x65/0x110 [lnet] lnet_delay_rule_daemon+0x76/0x120 [lnet] The fix is revert the list changes in lnet_delay_rule_check(). Fixes: da4bdd3701 ("lustre: use list_first_entry() in lnet/lnet subdirectory.") Signed-off-by: James Simmons --- net/lnet/lnet/net_fault.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/net/lnet/lnet/net_fault.c b/net/lnet/lnet/net_fault.c index 06366df..02fc1ae 100644 --- a/net/lnet/lnet/net_fault.c +++ b/net/lnet/lnet/net_fault.c @@ -744,15 +744,15 @@ struct delay_daemon_data { break; spin_lock_bh(&delay_dd.dd_lock); - rule = list_first_entry_or_null(&delay_dd.dd_sched_rules, - struct lnet_delay_rule, - dl_sched_link); - if (!rule) - list_del_init(&rule->dl_sched_link); - spin_unlock_bh(&delay_dd.dd_lock); - - if (!rule) + if (list_empty(&delay_dd.dd_sched_rules)) { + spin_unlock_bh(&delay_dd.dd_lock); break; + } + + rule = list_entry(delay_dd.dd_sched_rules.next, + struct lnet_delay_rule, dl_sched_link); + list_del_init(&rule->dl_sched_link); + spin_unlock_bh(&delay_dd.dd_lock); delayed_msg_check(rule, false, &msgs); delay_rule_decref(rule); /* -1 for delay_dd.dd_sched_rules */ From patchwork Sun Nov 28 23:27:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643271 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 76FEDC433F5 for ; Sun, 28 Nov 2021 23:28:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A0148200F4A; Sun, 28 Nov 2021 15:28:41 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DE44E200EB1 for ; Sun, 28 Nov 2021 15:27:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8E181243; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 818EFC1ACE; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:37 -0500 Message-Id: <1638142074-5945-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 02/19] lnet: change tp_nid to 16byte in lnet_test_peer. X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This updates 'struct lnet_test_peer' to store a large address nid. WC-bug-id: https://jira.whamcloud.com/browse/LU-10391 Lustre-commit: 7e89b556ea7dc4b4c ("LU-10391 lnet: change tp_nid to 16byte in lnet_test_peer.") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/43595 Reviewed-by: James Simmons Reviewed-by: Chris Horn Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-types.h | 2 +- net/lnet/lnet/lib-move.c | 18 +++++++++++------- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 380a7b9..1901ad2 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -233,7 +233,7 @@ struct lnet_libmd { struct lnet_test_peer { /* info about peers we are trying to fail */ struct list_head tp_list; /* ln_test_peers */ - lnet_nid_t tp_nid; /* matching nid */ + struct lnet_nid tp_nid; /* matching nid */ unsigned int tp_threshold; /* # failures to simulate */ }; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 170d684..b9f5973 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -190,13 +190,15 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } int -lnet_fail_nid(lnet_nid_t nid, unsigned int threshold) +lnet_fail_nid(lnet_nid_t nid4, unsigned int threshold) { struct lnet_test_peer *tp; struct list_head *el; struct list_head *next; + struct lnet_nid nid; LIST_HEAD(cull); + lnet_nid4_to_nid(nid4, &nid); /* NB: use lnet_net_lock(0) to serialize operations on test peers */ if (threshold) { /* Adding a new entry */ @@ -218,9 +220,9 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, list_for_each_safe(el, next, &the_lnet.ln_test_peers) { tp = list_entry(el, struct lnet_test_peer, tp_list); - if (!tp->tp_threshold || /* needs culling anyway */ - nid == LNET_NID_ANY || /* removing all entries */ - tp->tp_nid == nid) { /* matched this one */ + if (!tp->tp_threshold || /* needs culling anyway */ + LNET_NID_IS_ANY(&nid) || /* removing all entries */ + nid_same(&tp->tp_nid, &nid)) { /* matched this one */ list_move(&tp->tp_list, &cull); } } @@ -237,14 +239,16 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, } static int -fail_peer(lnet_nid_t nid, int outgoing) +fail_peer(lnet_nid_t nid4, int outgoing) { struct lnet_test_peer *tp; struct list_head *el; struct list_head *next; + struct lnet_nid nid; LIST_HEAD(cull); int fail = 0; + lnet_nid4_to_nid(nid4, &nid); /* NB: use lnet_net_lock(0) to serialize operations on test peers */ lnet_net_lock(0); @@ -264,8 +268,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, continue; } - if (tp->tp_nid == LNET_NID_ANY || /* fail every peer */ - nid == tp->tp_nid) { /* fail this peer */ + if (LNET_NID_IS_ANY(&tp->tp_nid) || /* fail every peer */ + nid_same(&nid, &tp->tp_nid)) { /* fail this peer */ fail = 1; if (tp->tp_threshold != LNET_MD_THRESH_INF) { From patchwork Sun Nov 28 23:27:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643243 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9C823C433FE for ; Sun, 28 Nov 2021 23:28:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2B20D205753; Sun, 28 Nov 2021 15:28:11 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 93874200EB1 for ; Sun, 28 Nov 2021 15:27:59 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 8BE63241; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 82AB7C1ADA; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:38 -0500 Message-Id: <1638142074-5945-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 03/19] lnet: extend preferred nids in struct lnet_peer_ni X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown union lpni_pref in struct lnet_peer_ni how has 'struct lnet_nid' rather than lnet_nid_t. Also, lnet_peer_ni_set_no_mr_pref_nid() allows the pref nid to be NULL and is a no-op in that case. WC-bug-id: https://jira.whamcloud.com/browse/LU-10391 Lustre-commit: 47cc77462343533b4 ("LU-10391 lnet: extend prefered nids in struct lnet_peer_ni") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/43596 Reviewed-by: James Simmons Reviewed-by: Chris Horn Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 25 +++++++---- include/linux/lnet/lib-types.h | 4 +- include/uapi/linux/lnet/nidstr.h | 3 +- net/lnet/lnet/api-ni.c | 16 +++---- net/lnet/lnet/lib-move.c | 58 ++++++++++++------------ net/lnet/lnet/nidstrings.c | 9 ++-- net/lnet/lnet/peer.c | 95 +++++++++++++++++++++++----------------- net/lnet/lnet/udsp.c | 38 ++++++++-------- 8 files changed, 138 insertions(+), 110 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 104c98d..fb2f42fcb 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -486,6 +486,7 @@ unsigned int lnet_nid_cpt_hash(struct lnet_nid *nid, int lnet_cpt_of_nid(lnet_nid_t nid, struct lnet_ni *ni); int lnet_nid2cpt(struct lnet_nid *nid, struct lnet_ni *ni); struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt); +struct lnet_ni *lnet_nid_to_ni_locked(struct lnet_nid *nid, int cpt); struct lnet_ni *lnet_nid2ni_addref(lnet_nid_t nid); struct lnet_ni *lnet_net2ni_locked(u32 net, int cpt); struct lnet_ni *lnet_net2ni_addref(u32 net); @@ -538,9 +539,11 @@ int lnet_get_peer_list(u32 *countp, u32 *sizep, extern void lnet_peer_ni_add_to_recoveryq_locked(struct lnet_peer_ni *lpni, struct list_head *queue, time64_t now); -extern int lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid); +extern int lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, + struct lnet_nid *nid); extern void lnet_peer_clr_pref_nids(struct lnet_peer_ni *lpni); -extern int lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid); +extern int lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, + struct lnet_nid *nid); void lnet_peer_ni_set_selection_priority(struct lnet_peer_ni *lpni, u32 priority); extern void lnet_ni_add_to_recoveryq_locked(struct lnet_ni *ni, @@ -565,7 +568,7 @@ void lnet_rtr_transfer_to_peer(struct lnet_peer *src, int lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason); struct lnet_net *lnet_get_net_locked(u32 net_id); void lnet_net_clr_pref_rtrs(struct lnet_net *net); -int lnet_net_add_pref_rtr(struct lnet_net *net, lnet_nid_t gw_nid); +int lnet_net_add_pref_rtr(struct lnet_net *net, struct lnet_nid *gw_nid); int lnet_islocalnid4(lnet_nid_t nid); int lnet_islocalnid(struct lnet_nid *nid); @@ -838,6 +841,9 @@ struct lnet_peer_ni *lnet_get_next_peer_ni_locked(struct lnet_peer *peer, struct lnet_peer_ni *prev); struct lnet_peer_ni *lnet_nid2peerni_locked(lnet_nid_t nid, lnet_nid_t pref, int cpt); +struct lnet_peer_ni *lnet_peerni_by_nid_locked(struct lnet_nid *nid, + struct lnet_nid *pref, + int cpt); struct lnet_peer_ni *lnet_nid2peerni_ex(struct lnet_nid *nid, int cpt); struct lnet_peer_ni *lnet_peer_get_ni_locked(struct lnet_peer *lp, lnet_nid_t nid); @@ -859,13 +865,16 @@ struct lnet_peer_ni *lnet_peer_ni_get_locked(struct lnet_peer *lp, void lnet_debug_peer(lnet_nid_t nid); struct lnet_peer_net *lnet_peer_get_net_locked(struct lnet_peer *peer, u32 net_id); -bool lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, lnet_nid_t nid); -int lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid); +bool lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, + struct lnet_nid *nid); +int lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, struct lnet_nid *nid); void lnet_peer_clr_pref_nids(struct lnet_peer_ni *lpni); -bool lnet_peer_is_pref_rtr_locked(struct lnet_peer_ni *lpni, lnet_nid_t gw_nid); +bool lnet_peer_is_pref_rtr_locked(struct lnet_peer_ni *lpni, + struct lnet_nid *gw_nid); void lnet_peer_clr_pref_rtrs(struct lnet_peer_ni *lpni); -int lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni, lnet_nid_t nid); -int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid); +int lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni, struct lnet_nid *nid); +int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, + struct lnet_nid *nid); int lnet_add_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid, bool mr, bool temp); int lnet_del_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid); int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk); diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 1901ad2..bde7249 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -568,7 +568,7 @@ struct lnet_ping_buffer { struct lnet_nid_list { struct list_head nl_list; - lnet_nid_t nl_nid; + struct lnet_nid nl_nid; }; struct lnet_peer_ni { @@ -635,7 +635,7 @@ struct lnet_peer_ni { time64_t lpni_last_alive; /* preferred local nids: if only one, use lpni_pref.nid */ union lpni_pref { - lnet_nid_t nid; + struct lnet_nid nid; struct list_head nids; } lpni_pref; /* list of router nids preferred for this peer NI */ diff --git a/include/uapi/linux/lnet/nidstr.h b/include/uapi/linux/lnet/nidstr.h index bfc9644..482cfb2 100644 --- a/include/uapi/linux/lnet/nidstr.h +++ b/include/uapi/linux/lnet/nidstr.h @@ -108,7 +108,8 @@ static inline char *libcfs_nidstr(const struct lnet_nid *nid) int cfs_parse_nidlist(char *str, int len, struct list_head *list); int cfs_print_nidlist(char *buffer, int count, struct list_head *list); int cfs_match_nid(lnet_nid_t nid, struct list_head *list); -int cfs_match_nid_net(lnet_nid_t nid, __u32 net, struct list_head *net_num_list, +int cfs_match_nid_net(struct lnet_nid *nid, __u32 net, + struct list_head *net_num_list, struct list_head *addr); int cfs_match_net(__u32 net_id, __u32 net_type, struct list_head *net_num_list); diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 0f4feda..340cc84e 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1388,7 +1388,7 @@ struct lnet_net * int lnet_net_add_pref_rtr(struct lnet_net *net, - lnet_nid_t gw_nid) + struct lnet_nid *gw_nid) __must_hold(&the_lnet.ln_api_mutex) { struct lnet_nid_list *ne; @@ -1399,7 +1399,7 @@ struct lnet_net * * lock. */ list_for_each_entry(ne, &net->net_rtr_pref_nids, nl_list) { - if (ne->nl_nid == gw_nid) + if (nid_same(&ne->nl_nid, gw_nid)) return -EEXIST; } @@ -1407,7 +1407,7 @@ struct lnet_net * if (!ne) return -ENOMEM; - ne->nl_nid = gw_nid; + ne->nl_nid = *gw_nid; /* Lock the cpt to protect against addition and checks in the * selection algorithm @@ -1420,11 +1420,11 @@ struct lnet_net * } bool -lnet_net_is_pref_rtr_locked(struct lnet_net *net, lnet_nid_t rtr_nid) +lnet_net_is_pref_rtr_locked(struct lnet_net *net, struct lnet_nid *rtr_nid) { struct lnet_nid_list *ne; - CDEBUG(D_NET, "%s: rtr pref emtpy: %d\n", + CDEBUG(D_NET, "%s: rtr pref empty: %d\n", libcfs_net2str(net->net_id), list_empty(&net->net_rtr_pref_nids)); @@ -1433,9 +1433,9 @@ struct lnet_net * list_for_each_entry(ne, &net->net_rtr_pref_nids, nl_list) { CDEBUG(D_NET, "Comparing pref %s with gw %s\n", - libcfs_nid2str(ne->nl_nid), - libcfs_nid2str(rtr_nid)); - if (rtr_nid == ne->nl_nid) + libcfs_nidstr(&ne->nl_nid), + libcfs_nidstr(rtr_nid)); + if (nid_same(rtr_nid, &ne->nl_nid)) return true; } diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index b9f5973..2f7c37d 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -1137,9 +1137,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, * preferred, then let's use it */ if (best_ni) { - /* FIXME need to handle large-addr nid */ lpni_is_preferred = lnet_peer_is_pref_nid_locked(lpni, - lnet_nid_to_nid4(&best_ni->ni_nid)); + &best_ni->ni_nid); CDEBUG(D_NET, "%s lpni_is_preferred = %d\n", libcfs_nidstr(&best_ni->ni_nid), lpni_is_preferred); @@ -1318,7 +1317,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, struct lnet_route *route; int rc; bool best_rte_is_preferred = false; - lnet_nid_t gw_pnid; + struct lnet_nid *gw_pnid; CDEBUG(D_NET, "Looking up a route to %s, from %s\n", libcfs_net2str(rnet->lrn_net), libcfs_net2str(src_net)); @@ -1328,7 +1327,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, list_for_each_entry(route, &rnet->lrn_routes, lr_list) { if (!lnet_is_route_alive(route)) continue; - gw_pnid = lnet_nid_to_nid4(&route->lr_gateway->lp_primary_nid); + gw_pnid = &route->lr_gateway->lp_primary_nid; /* no protection on below fields, but it's harmless */ if (last_route && (last_route->lr_seq - route->lr_seq < 0)) @@ -1352,7 +1351,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (!lpni) { CDEBUG(D_NET, "Gateway %s does not have a peer NI on net %s\n", - libcfs_nid2str(gw_pnid), + libcfs_nidstr(gw_pnid), libcfs_net2str(src_net)); continue; } @@ -1368,7 +1367,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, best_gw_ni = lpni; best_rte_is_preferred = true; CDEBUG(D_NET, "preferred gw = %s\n", - libcfs_nid2str(gw_pnid)); + libcfs_nidstr(gw_pnid)); continue; } else if ((!rc) && best_rte_is_preferred) /* The best route we found so far is in the preferred @@ -1397,7 +1396,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, if (!lpni) { CDEBUG(D_NET, "Gateway %s does not have a peer NI on net %s\n", - libcfs_nid2str(gw_pnid), + libcfs_nidstr(gw_pnid), libcfs_net2str(src_net)); continue; } @@ -1789,8 +1788,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats, CDEBUG(D_NET, "Setting preferred local NID %s on NMR peer %s\n", libcfs_nidstr(&lni->ni_nid), libcfs_nidstr(&lpni->lpni_nid)); - lnet_peer_ni_set_non_mr_pref_nid(lpni, - lnet_nid_to_nid4(&lni->ni_nid)); + lnet_peer_ni_set_non_mr_pref_nid(lpni, &lni->ni_nid); } } @@ -2314,7 +2312,8 @@ struct lnet_ni * if (lpni_entry->lpni_pref_nnids == 0) continue; LASSERT(lpni_entry->lpni_pref_nnids == 1); - best_ni = lnet_nid2ni_locked(lpni_entry->lpni_pref.nid, cpt); + best_ni = lnet_nid_to_ni_locked(&lpni_entry->lpni_pref.nid, + cpt); break; } @@ -4208,7 +4207,7 @@ void lnet_monitor_thr_stop(void) } int -lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid, +lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr, lnet_nid_t from_nid4, void *private, int rdma_req) { struct lnet_peer_ni *lpni; @@ -4217,6 +4216,7 @@ void lnet_monitor_thr_stop(void) lnet_pid_t dest_pid; lnet_nid_t dest_nid; lnet_nid_t src_nid; + struct lnet_nid from_nid; bool push = false; int for_me; u32 type; @@ -4225,6 +4225,8 @@ void lnet_monitor_thr_stop(void) LASSERT(!in_interrupt()); + lnet_nid4_to_nid(from_nid4, &from_nid); + type = le32_to_cpu(hdr->type); src_nid = le64_to_cpu(hdr->src_nid); dest_nid = le64_to_cpu(hdr->dest_nid); @@ -4233,7 +4235,7 @@ void lnet_monitor_thr_stop(void) /* FIXME handle large-addr nids */ for_me = (lnet_nid_to_nid4(&ni->ni_nid) == dest_nid); - cpt = lnet_cpt_of_nid(from_nid, ni); + cpt = lnet_cpt_of_nid(from_nid4, ni); CDEBUG(D_NET, "TRACE: %s(%s) <- %s : %s\n", libcfs_nid2str(dest_nid), @@ -4246,7 +4248,7 @@ void lnet_monitor_thr_stop(void) case LNET_MSG_GET: if (payload_length > 0) { CERROR("%s, src %s: bad %s payload %d (0 expected)\n", - libcfs_nid2str(from_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), lnet_msgtyp2str(type), payload_length); return -EPROTO; @@ -4258,7 +4260,7 @@ void lnet_monitor_thr_stop(void) if (payload_length > (u32)(for_me ? LNET_MAX_PAYLOAD : LNET_MTU)) { CERROR("%s, src %s: bad %s payload %d (%d max expected)\n", - libcfs_nid2str(from_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), lnet_msgtyp2str(type), payload_length, @@ -4269,7 +4271,7 @@ void lnet_monitor_thr_stop(void) default: CERROR("%s, src %s: Bad message type 0x%x\n", - libcfs_nid2str(from_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), type); return -EPROTO; } @@ -4296,7 +4298,7 @@ void lnet_monitor_thr_stop(void) if (LNET_NIDNET(dest_nid) == LNET_NID_NET(&ni->ni_nid)) { /* should have gone direct */ CERROR("%s, src %s: Bad dest nid %s (should have been sent direct)\n", - libcfs_nid2str(from_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), libcfs_nid2str(dest_nid)); return -EPROTO; @@ -4308,7 +4310,7 @@ void lnet_monitor_thr_stop(void) * this node's NID on its own network */ CERROR("%s, src %s: Bad dest nid %s (it's my nid but on a different network)\n", - libcfs_nid2str(from_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), libcfs_nid2str(dest_nid)); return -EPROTO; @@ -4316,7 +4318,7 @@ void lnet_monitor_thr_stop(void) if (rdma_req && type == LNET_MSG_GET) { CERROR("%s, src %s: Bad optimized GET for %s (final destination must be me)\n", - libcfs_nid2str(from_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), libcfs_nid2str(dest_nid)); return -EPROTO; @@ -4324,7 +4326,7 @@ void lnet_monitor_thr_stop(void) if (!the_lnet.ln_routing) { CERROR("%s, src %s: Dropping message for %s (routing not enabled)\n", - libcfs_nid2str(from_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), libcfs_nid2str(dest_nid)); goto drop; @@ -4338,7 +4340,7 @@ void lnet_monitor_thr_stop(void) if (!list_empty(&the_lnet.ln_test_peers) && /* normally we don't */ fail_peer(src_nid, 0)) { /* shall we now? */ CERROR("%s, src %s: Dropping %s to simulate failure\n", - libcfs_nid2str(from_nid), libcfs_nid2str(src_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), lnet_msgtyp2str(type)); goto drop; } @@ -4347,7 +4349,7 @@ void lnet_monitor_thr_stop(void) if (!list_empty(&the_lnet.ln_drop_rules) && lnet_drop_rule_match(hdr, lnet_nid_to_nid4(&ni->ni_nid), NULL)) { CDEBUG(D_NET, "%s, src %s, dst %s: Dropping %s to simulate silent message loss\n", - libcfs_nid2str(from_nid), libcfs_nid2str(src_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), libcfs_nid2str(dest_nid), lnet_msgtyp2str(type)); goto drop; } @@ -4355,7 +4357,7 @@ void lnet_monitor_thr_stop(void) msg = kmem_cache_zalloc(lnet_msg_cachep, GFP_NOFS); if (!msg) { CERROR("%s, src %s: Dropping %s (out of memory)\n", - libcfs_nid2str(from_nid), libcfs_nid2str(src_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), lnet_msgtyp2str(type)); goto drop; } @@ -4372,7 +4374,7 @@ void lnet_monitor_thr_stop(void) msg->msg_offset = 0; msg->msg_hdr = *hdr; /* for building message event */ - msg->msg_from = from_nid; + msg->msg_from = from_nid4; if (!for_me) { msg->msg_target.pid = dest_pid; msg->msg_target.nid = dest_nid; @@ -4388,14 +4390,12 @@ void lnet_monitor_thr_stop(void) } lnet_net_lock(cpt); - /* FIXME support large-addr nid */ - lpni = lnet_nid2peerni_locked(from_nid, lnet_nid_to_nid4(&ni->ni_nid), - cpt); + lpni = lnet_peerni_by_nid_locked(&from_nid, &ni->ni_nid, cpt); if (IS_ERR(lpni)) { lnet_net_unlock(cpt); rc = PTR_ERR(lpni); CERROR("%s, src %s: Dropping %s (error %d looking up sender)\n", - libcfs_nid2str(from_nid), libcfs_nid2str(src_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), lnet_msgtyp2str(type), rc); kfree(msg); if (rc == -ESHUTDOWN) @@ -4410,7 +4410,7 @@ void lnet_monitor_thr_stop(void) */ if (((lnet_drop_asym_route && for_me) || !lpni->lpni_peer_net->lpn_peer->lp_alive) && - LNET_NIDNET(src_nid) != LNET_NIDNET(from_nid)) { + LNET_NIDNET(src_nid) != LNET_NIDNET(from_nid4)) { u32 src_net_id = LNET_NIDNET(src_nid); struct lnet_peer *gw = lpni->lpni_peer_net->lpn_peer; struct lnet_route *route; @@ -4445,7 +4445,7 @@ void lnet_monitor_thr_stop(void) * => asymmetric routing detected but forbidden */ CERROR("%s, src %s: Dropping asymmetrical route %s\n", - libcfs_nid2str(from_nid), + libcfs_nid2str(from_nid4), libcfs_nid2str(src_nid), lnet_msgtyp2str(type)); kfree(msg); goto drop; diff --git a/net/lnet/lnet/nidstrings.c b/net/lnet/lnet/nidstrings.c index d91815d..dfd6744 100644 --- a/net/lnet/lnet/nidstrings.c +++ b/net/lnet/lnet/nidstrings.c @@ -803,7 +803,7 @@ int cfs_print_nidlist(char *buffer, int count, struct list_head *nidlist) } int -cfs_match_nid_net(lnet_nid_t nid, u32 net_type, +cfs_match_nid_net(struct lnet_nid *nid, u32 net_type, struct list_head *net_num_list, struct list_head *addr) { @@ -813,15 +813,16 @@ int cfs_print_nidlist(char *buffer, int count, struct list_head *nidlist) if (!addr || !net_num_list) return 0; - nf = type2net_info(LNET_NETTYP(LNET_NIDNET(nid))); + nf = type2net_info(LNET_NETTYP(LNET_NID_NET(nid))); if (!nf || !net_num_list || !addr) return 0; - address = LNET_NIDADDR(nid); + /* FIXME handle long-addr nid */ + address = LNET_NIDADDR(lnet_nid_to_nid4(nid)); /* if either the address or net number don't match then no match */ if (!nf->nf_match_addr(address, addr) || - !cfs_match_net(LNET_NIDNET(nid), net_type, net_num_list)) + !cfs_match_net(LNET_NID_NET(nid), net_type, net_num_list)) return 0; return 1; diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 4b6f339..1853388 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -990,7 +990,7 @@ struct lnet_peer_ni * */ bool lnet_peer_is_pref_rtr_locked(struct lnet_peer_ni *lpni, - lnet_nid_t gw_nid) + struct lnet_nid *gw_nid) { struct lnet_nid_list *ne; @@ -1006,9 +1006,9 @@ struct lnet_peer_ni * */ list_for_each_entry(ne, &lpni->lpni_rtr_pref_nids, nl_list) { CDEBUG(D_NET, "Comparing pref %s with gw %s\n", - libcfs_nid2str(ne->nl_nid), - libcfs_nid2str(gw_nid)); - if (ne->nl_nid == gw_nid) + libcfs_nidstr(&ne->nl_nid), + libcfs_nidstr(gw_nid)); + if (nid_same(&ne->nl_nid, gw_nid)) return true; } @@ -1037,7 +1037,7 @@ struct lnet_peer_ni * int lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni, - lnet_nid_t gw_nid) + struct lnet_nid *gw_nid) { int cpt = lpni->lpni_cpt; struct lnet_nid_list *ne = NULL; @@ -1050,7 +1050,7 @@ struct lnet_peer_ni * __must_hold(&the_lnet.ln_api_mutex); list_for_each_entry(ne, &lpni->lpni_rtr_pref_nids, nl_list) { - if (ne->nl_nid == gw_nid) + if (nid_same(&ne->nl_nid, gw_nid)) return -EEXIST; } @@ -1058,7 +1058,7 @@ struct lnet_peer_ni * if (!ne) return -ENOMEM; - ne->nl_nid = gw_nid; + ne->nl_nid = *gw_nid; /* Lock the cpt to protect against addition and checks in the * selection algorithm @@ -1076,16 +1076,16 @@ struct lnet_peer_ni * * shared mmode. */ bool -lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, lnet_nid_t nid) +lnet_peer_is_pref_nid_locked(struct lnet_peer_ni *lpni, struct lnet_nid *nid) { struct lnet_nid_list *ne; if (lpni->lpni_pref_nnids == 0) return false; if (lpni->lpni_pref_nnids == 1) - return lpni->lpni_pref.nid == nid; + return nid_same(&lpni->lpni_pref.nid, nid); list_for_each_entry(ne, &lpni->lpni_pref.nids, nl_list) { - if (ne->nl_nid == nid) + if (nid_same(&ne->nl_nid, nid)) return true; } return false; @@ -1096,24 +1096,27 @@ struct lnet_peer_ni * * defined. Only to be used for non-multi-rail peer_ni. */ int -lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid) +lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, + struct lnet_nid *nid) { int rc = 0; + if (!nid) + return -EINVAL; spin_lock(&lpni->lpni_lock); - if (nid == LNET_NID_ANY) { + if (LNET_NID_IS_ANY(nid)) { rc = -EINVAL; } else if (lpni->lpni_pref_nnids > 0) { rc = -EPERM; } else if (lpni->lpni_pref_nnids == 0) { - lpni->lpni_pref.nid = nid; + lpni->lpni_pref.nid = *nid; lpni->lpni_pref_nnids = 1; lpni->lpni_state |= LNET_PEER_NI_NON_MR_PREF; } spin_unlock(&lpni->lpni_lock); CDEBUG(D_NET, "peer %s nid %s: %d\n", - libcfs_nidstr(&lpni->lpni_nid), libcfs_nid2str(nid), rc); + libcfs_nidstr(&lpni->lpni_nid), libcfs_nidstr(nid), rc); return rc; } @@ -1161,20 +1164,21 @@ struct lnet_peer_ni * } int -lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid) +lnet_peer_add_pref_nid(struct lnet_peer_ni *lpni, struct lnet_nid *nid) { struct lnet_peer *lp = lpni->lpni_peer_net->lpn_peer; struct lnet_nid_list *ne1 = NULL; struct lnet_nid_list *ne2 = NULL; - lnet_nid_t tmp_nid = LNET_NID_ANY; + struct lnet_nid *tmp_nid = NULL; int rc = 0; - if (nid == LNET_NID_ANY) { + if (LNET_NID_IS_ANY(nid)) { rc = -EINVAL; goto out; } - if (lpni->lpni_pref_nnids == 1 && lpni->lpni_pref.nid == nid) { + if (lpni->lpni_pref_nnids == 1 && + nid_same(&lpni->lpni_pref.nid, nid)) { rc = -EEXIST; goto out; } @@ -1191,12 +1195,12 @@ struct lnet_peer_ni * size_t alloc_size = sizeof(*ne1); if (lpni->lpni_pref_nnids == 1) { - tmp_nid = lpni->lpni_pref.nid; + tmp_nid = &lpni->lpni_pref.nid; INIT_LIST_HEAD(&lpni->lpni_pref.nids); } list_for_each_entry(ne1, &lpni->lpni_pref.nids, nl_list) { - if (ne1->nl_nid == nid) { + if (nid_same(&ne1->nl_nid, nid)) { rc = -EEXIST; goto out; } @@ -1217,15 +1221,15 @@ struct lnet_peer_ni * goto out; } INIT_LIST_HEAD(&ne2->nl_list); - ne2->nl_nid = tmp_nid; + ne2->nl_nid = *tmp_nid; } - ne1->nl_nid = nid; + ne1->nl_nid = *nid; } lnet_net_lock(LNET_LOCK_EX); spin_lock(&lpni->lpni_lock); if (lpni->lpni_pref_nnids == 0) { - lpni->lpni_pref.nid = nid; + lpni->lpni_pref.nid = *nid; } else { if (ne2) list_add_tail(&ne2->nl_list, &lpni->lpni_pref.nids); @@ -1243,12 +1247,12 @@ struct lnet_peer_ni * spin_unlock(&lpni->lpni_lock); } CDEBUG(D_NET, "peer %s nid %s: %d\n", - libcfs_nidstr(&lp->lp_primary_nid), libcfs_nid2str(nid), rc); + libcfs_nidstr(&lp->lp_primary_nid), libcfs_nidstr(nid), rc); return rc; } int -lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid) +lnet_peer_del_pref_nid(struct lnet_peer_ni *lpni, struct lnet_nid *nid) { struct lnet_peer *lp = lpni->lpni_peer_net->lpn_peer; struct lnet_nid_list *ne = NULL; @@ -1260,13 +1264,13 @@ struct lnet_peer_ni * } if (lpni->lpni_pref_nnids == 1) { - if (lpni->lpni_pref.nid != nid) { + if (!nid_same(&lpni->lpni_pref.nid, nid)) { rc = -ENOENT; goto out; } } else { list_for_each_entry(ne, &lpni->lpni_pref.nids, nl_list) { - if (ne->nl_nid == nid) + if (nid_same(&ne->nl_nid, nid)) goto remove_nid_entry; } rc = -ENOENT; @@ -1278,7 +1282,7 @@ struct lnet_peer_ni * lnet_net_lock(LNET_LOCK_EX); spin_lock(&lpni->lpni_lock); if (lpni->lpni_pref_nnids == 1) { - lpni->lpni_pref.nid = LNET_NID_ANY; + lpni->lpni_pref.nid = LNET_ANY_NID; } else { list_del_init(&ne->nl_list); if (lpni->lpni_pref_nnids == 2) { @@ -1301,7 +1305,7 @@ struct lnet_peer_ni * kfree(ne); out: CDEBUG(D_NET, "peer %s nid %s: %d\n", - libcfs_nidstr(&lp->lp_primary_nid), libcfs_nid2str(nid), rc); + libcfs_nidstr(&lp->lp_primary_nid), libcfs_nidstr(nid), rc); return rc; } @@ -1316,7 +1320,7 @@ struct lnet_peer_ni * lnet_net_lock(LNET_LOCK_EX); if (lpni->lpni_pref_nnids == 1) - lpni->lpni_pref.nid = LNET_NID_ANY; + lpni->lpni_pref.nid = LNET_ANY_NID; else if (lpni->lpni_pref_nnids > 1) list_splice_init(&lpni->lpni_pref.nids, &zombies); lpni->lpni_pref_nnids = 0; @@ -1849,7 +1853,7 @@ struct lnet_peer_net * * lpni creation initiated due to traffic either sending or receiving. */ static int -lnet_peer_ni_traffic_add(struct lnet_nid *nid, lnet_nid_t pref) +lnet_peer_ni_traffic_add(struct lnet_nid *nid, struct lnet_nid *pref) { struct lnet_peer *lp; struct lnet_peer_net *lpn; @@ -1886,8 +1890,7 @@ struct lnet_peer_net * lpni = lnet_peer_ni_alloc(nid); if (!lpni) goto out_free_lpn; - if (pref != LNET_NID_ANY) - lnet_peer_ni_set_non_mr_pref_nid(lpni, pref); + lnet_peer_ni_set_non_mr_pref_nid(lpni, pref); return lnet_peer_attach_peer_ni(lp, lpn, lpni, flags); @@ -2084,7 +2087,7 @@ struct lnet_peer_ni * lnet_net_unlock(cpt); - rc = lnet_peer_ni_traffic_add(nid, LNET_NID_ANY); + rc = lnet_peer_ni_traffic_add(nid, NULL); if (rc) { lpni = ERR_PTR(rc); goto out_net_relock; @@ -2104,21 +2107,20 @@ struct lnet_peer_ni * * hold on the peer_ni. */ struct lnet_peer_ni * -lnet_nid2peerni_locked(lnet_nid_t nid4, lnet_nid_t pref, int cpt) +lnet_peerni_by_nid_locked(struct lnet_nid *nid, + struct lnet_nid *pref, int cpt) { struct lnet_peer_ni *lpni = NULL; - struct lnet_nid nid; int rc; if (the_lnet.ln_state != LNET_STATE_RUNNING) return ERR_PTR(-ESHUTDOWN); - lnet_nid4_to_nid(nid4, &nid); /* * find if a peer_ni already exists. * If so then just return that. */ - lpni = lnet_find_peer_ni_locked(nid4); + lpni = lnet_peer_ni_find_locked(nid); if (lpni) return lpni; @@ -2145,13 +2147,13 @@ struct lnet_peer_ni * goto out_mutex_unlock; } - rc = lnet_peer_ni_traffic_add(&nid, pref); + rc = lnet_peer_ni_traffic_add(nid, pref); if (rc) { lpni = ERR_PTR(rc); goto out_mutex_unlock; } - lpni = lnet_find_peer_ni_locked(nid4); + lpni = lnet_peer_ni_find_locked(nid); LASSERT(lpni); out_mutex_unlock: @@ -2168,6 +2170,19 @@ struct lnet_peer_ni * return lpni; } +struct lnet_peer_ni * +lnet_nid2peerni_locked(lnet_nid_t nid4, lnet_nid_t pref4, int cpt) +{ + struct lnet_nid nid, pref; + + lnet_nid4_to_nid(nid4, &nid); + lnet_nid4_to_nid(pref4, &pref); + if (pref4 == LNET_NID_ANY) + return lnet_peerni_by_nid_locked(&nid, NULL, cpt); + else + return lnet_peerni_by_nid_locked(&nid, &pref, cpt); +} + bool lnet_peer_gw_discovery(struct lnet_peer *lp) { diff --git a/net/lnet/lnet/udsp.c b/net/lnet/lnet/udsp.c index 977a6a6..7fa4f88 100644 --- a/net/lnet/lnet/udsp.c +++ b/net/lnet/lnet/udsp.c @@ -213,7 +213,7 @@ enum udsp_apply { struct lnet_ud_nid_descr *ni_match = udi->udi_match; u32 priority = (udi->udi_revert) ? -1 : udi->udi_priority; - rc = cfs_match_nid_net(lnet_nid_to_nid4(&ni->ni_nid), + rc = cfs_match_nid_net(&ni->ni_nid, ni_match->ud_net_id.udn_net_type, &ni_match->ud_net_id.udn_net_num_range, &ni_match->ud_addr_range); @@ -239,7 +239,7 @@ enum udsp_apply { struct lnet_route *route; struct lnet_peer_ni *lpni; bool cleared = false; - lnet_nid_t gw_nid, gw_prim_nid; + struct lnet_nid *gw_nid, *gw_prim_nid; int rc = 0; int i; @@ -248,14 +248,14 @@ enum udsp_apply { list_for_each_entry(rnet, rn_list, lrn_list) { list_for_each_entry(route, &rnet->lrn_routes, lr_list) { /* look if gw nid on the same net matches */ - gw_prim_nid = lnet_nid_to_nid4(&route->lr_gateway->lp_primary_nid); + gw_prim_nid = &route->lr_gateway->lp_primary_nid; lpni = NULL; while ((lpni = lnet_get_next_peer_ni_locked(route->lr_gateway, NULL, lpni)) != NULL) { if (!lnet_get_net_locked(lpni->lpni_peer_net->lpn_net_id)) continue; - gw_nid = lnet_nid_to_nid4(&lpni->lpni_nid); + gw_nid = &lpni->lpni_nid; rc = cfs_match_nid_net(gw_nid, rte_action->ud_net_id.udn_net_type, &rte_action->ud_net_id.udn_net_num_range, @@ -285,13 +285,13 @@ enum udsp_apply { /* match. Add to pref NIDs */ CDEBUG(D_NET, "udsp net->gw: %s->%s\n", libcfs_net2str(net->net_id), - libcfs_nid2str(gw_prim_nid)); + libcfs_nidstr(gw_prim_nid)); rc = lnet_net_add_pref_rtr(net, gw_prim_nid); lnet_net_lock(LNET_LOCK_EX); /* success if EEXIST return */ if (rc && rc != -EEXIST) { CERROR("Failed to add %s to %s pref rtr list\n", - libcfs_nid2str(gw_prim_nid), + libcfs_nidstr(gw_prim_nid), libcfs_net2str(net->net_id)); return rc; } @@ -417,7 +417,7 @@ enum udsp_apply { struct list_head *rn_list; struct lnet_route *route; bool cleared = false; - lnet_nid_t gw_nid; + struct lnet_nid *gw_nid; int rc = 0; int i; @@ -425,7 +425,7 @@ enum udsp_apply { rn_list = &the_lnet.ln_remote_nets_hash[i]; list_for_each_entry(rnet, rn_list, lrn_list) { list_for_each_entry(route, &rnet->lrn_routes, lr_list) { - gw_nid = lnet_nid_to_nid4(&route->lr_gateway->lp_primary_nid); + gw_nid = &route->lr_gateway->lp_primary_nid; rc = cfs_match_nid_net(gw_nid, rte_action->ud_net_id.udn_net_type, &rte_action->ud_net_id.udn_net_num_range, @@ -447,7 +447,7 @@ enum udsp_apply { } CDEBUG(D_NET, "add gw nid %s as preferred for peer %s\n", - libcfs_nid2str(gw_nid), + libcfs_nidstr(gw_nid), libcfs_nidstr(&lpni->lpni_nid)); /* match. Add to pref NIDs */ rc = lnet_peer_add_pref_rtr(lpni, gw_nid); @@ -455,7 +455,7 @@ enum udsp_apply { /* success if EEXIST return */ if (rc && rc != -EEXIST) { CERROR("Failed to add %s to %s pref rtr list\n", - libcfs_nid2str(gw_nid), + libcfs_nidstr(gw_nid), libcfs_nidstr(&lpni->lpni_nid)); return rc; } @@ -481,7 +481,7 @@ enum udsp_apply { ni_action->ud_net_id.udn_net_type) continue; list_for_each_entry(ni, &net->net_ni_list, ni_netlist) { - rc = cfs_match_nid_net(lnet_nid_to_nid4(&ni->ni_nid), + rc = cfs_match_nid_net(&ni->ni_nid, ni_action->ud_net_id.udn_net_type, &ni_action->ud_net_id.udn_net_num_range, &ni_action->ud_addr_range); @@ -503,8 +503,7 @@ enum udsp_apply { libcfs_nidstr(&ni->ni_nid), libcfs_nidstr(&lpni->lpni_nid)); /* match. Add to pref NIDs */ - rc = lnet_peer_add_pref_nid(lpni, - lnet_nid_to_nid4(&ni->ni_nid)); + rc = lnet_peer_add_pref_nid(lpni, &ni->ni_nid); lnet_net_lock(LNET_LOCK_EX); /* success if EEXIST return */ if (rc && rc != -EEXIST) { @@ -530,7 +529,7 @@ enum udsp_apply { bool local = udi->udi_local; enum lnet_udsp_action_type type = udi->udi_type; - rc = cfs_match_nid_net(lnet_nid_to_nid4(&lpni->lpni_nid), + rc = cfs_match_nid_net(&lpni->lpni_nid, lp_match->ud_net_id.udn_net_type, &lp_match->ud_net_id.udn_net_num_range, &lp_match->ud_addr_range); @@ -996,7 +995,8 @@ struct lnet_udsp * info->cud_net_priority = ni->ni_net->net_sel_priority; list_for_each_entry(ne, &net->net_rtr_pref_nids, nl_list) { if (i < LNET_MAX_SHOW_NUM_NID) - info->cud_pref_rtr_nid[i] = ne->nl_nid; + info->cud_pref_rtr_nid[i] = + lnet_nid_to_nid4(&ne->nl_nid); else break; i++; @@ -1020,13 +1020,14 @@ struct lnet_udsp * libcfs_nidstr(&lpni->lpni_nid), lpni->lpni_pref_nnids); if (lpni->lpni_pref_nnids == 1) { - info->cud_pref_nid[0] = lpni->lpni_pref.nid; + info->cud_pref_nid[0] = lnet_nid_to_nid4(&lpni->lpni_pref.nid); } else if (lpni->lpni_pref_nnids > 1) { struct list_head *list = &lpni->lpni_pref.nids; list_for_each_entry(ne, list, nl_list) { if (i < LNET_MAX_SHOW_NUM_NID) - info->cud_pref_nid[i] = ne->nl_nid; + info->cud_pref_nid[i] = + lnet_nid_to_nid4(&ne->nl_nid); else break; i++; @@ -1036,7 +1037,8 @@ struct lnet_udsp * i = 0; list_for_each_entry(ne, &lpni->lpni_rtr_pref_nids, nl_list) { if (i < LNET_MAX_SHOW_NUM_NID) - info->cud_pref_rtr_nid[i] = ne->nl_nid; + info->cud_pref_rtr_nid[i] = + lnet_nid_to_nid4(&ne->nl_nid); else break; i++; From patchwork Sun Nov 28 23:27:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643235 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3C49EC433F5 for ; Sun, 28 Nov 2021 23:28:04 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BA057200F69; Sun, 28 Nov 2021 15:28:02 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 33142200EB1 for ; Sun, 28 Nov 2021 15:28:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9061D244; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 84479C1AE4; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:39 -0500 Message-Id: <1638142074-5945-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 04/19] lnet: switch to large lnet_processid for matching X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown Change lnet_libhandle.me_match_id and lnet_match_info.mi_id to struct lnet_processid, so they support large nids. This requires changing LNetMEAttach(), lnet_mt_match_head(), lnet_mt_of_attach(), lnet_ptl_match_type(), lnet_match2mt() to accept a pointer to lnet_processid rather than an lnet_process_id. WC-bug-id: https://jira.whamcloud.com/browse/LU-10391 Lustre-commit: db49fbf00d24edc83 ("LU-10391 lnet: switch to large lnet_processid for matching") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/43597 Reviewed-by: James Simmons Reviewed-by: Chris Horn Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/niobuf.c | 21 ++++++++++++-------- include/linux/lnet/api.h | 2 +- include/linux/lnet/lib-lnet.h | 4 ++-- include/linux/lnet/lib-types.h | 4 ++-- net/lnet/lnet/api-ni.c | 12 +++++------ net/lnet/lnet/lib-me.c | 4 ++-- net/lnet/lnet/lib-move.c | 10 +++++----- net/lnet/lnet/lib-ptl.c | 45 ++++++++++++++++++++++-------------------- net/lnet/selftest/rpc.c | 10 +++++++--- 9 files changed, 62 insertions(+), 50 deletions(-) diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index 614bb63..c5bbf5a 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -120,7 +120,7 @@ static void __mdunlink_iterate_helper(struct lnet_handle_md *bd_mds, static int ptlrpc_register_bulk(struct ptlrpc_request *req) { struct ptlrpc_bulk_desc *desc = req->rq_bulk; - struct lnet_process_id peer; + struct lnet_processid peer; int rc = 0; int posted_md; int total_md; @@ -150,7 +150,9 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) desc->bd_failure = 0; - peer = desc->bd_import->imp_connection->c_peer; + peer.pid = desc->bd_import->imp_connection->c_peer.pid; + lnet_nid4_to_nid(desc->bd_import->imp_connection->c_peer.nid, + &peer.nid); LASSERT(desc->bd_cbid.cbid_fn == client_bulk_callback); LASSERT(desc->bd_cbid.cbid_arg == desc); @@ -186,7 +188,7 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_BULK_ATTACH)) { rc = -ENOMEM; } else { - me = LNetMEAttach(desc->bd_portal, peer, mbits, 0, + me = LNetMEAttach(desc->bd_portal, &peer, mbits, 0, LNET_UNLINK, LNET_INS_AFTER); rc = PTR_ERR_OR_ZERO(me); } @@ -225,7 +227,7 @@ static int ptlrpc_register_bulk(struct ptlrpc_request *req) /* Holler if peer manages to touch buffers before he knows the mbits */ if (desc->bd_refs != total_md) CWARN("%s: Peer %s touched %d buffers while I registered\n", - desc->bd_import->imp_obd->obd_name, libcfs_id2str(peer), + desc->bd_import->imp_obd->obd_name, libcfs_idstr(&peer), total_md - desc->bd_refs); spin_unlock(&desc->bd_lock); @@ -492,6 +494,7 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) unsigned int mpflag = 0; bool rep_mbits = false; struct lnet_handle_md bulk_cookie; + struct lnet_processid peer; struct ptlrpc_connection *connection; struct lnet_me *reply_me; struct lnet_md reply_md; @@ -627,12 +630,14 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) request->rq_repmsg = NULL; } + peer.pid = connection->c_peer.pid; + lnet_nid4_to_nid(connection->c_peer.nid, &peer.nid); if (request->rq_bulk && OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_BULK_REPLY_ATTACH)) { reply_me = ERR_PTR(-ENOMEM); } else { reply_me = LNetMEAttach(request->rq_reply_portal, - connection->c_peer, + &peer, rep_mbits ? request->rq_mbits : request->rq_xid, 0, LNET_UNLINK, LNET_INS_AFTER); @@ -761,8 +766,8 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply) int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd) { struct ptlrpc_service *service = rqbd->rqbd_svcpt->scp_service; - static struct lnet_process_id match_id = { - .nid = LNET_NID_ANY, + static struct lnet_processid match_id = { + .nid = LNET_ANY_NID, .pid = LNET_PID_ANY }; int rc; @@ -780,7 +785,7 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd) * threads can find it by grabbing a local lock */ me = LNetMEAttach(service->srv_req_portal, - match_id, 0, ~0, LNET_UNLINK, + &match_id, 0, ~0, LNET_UNLINK, rqbd->rqbd_svcpt->scp_cpt >= 0 ? LNET_INS_LOCAL : LNET_INS_AFTER); if (IS_ERR(me)) { diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h index d32c7c1..040bf18 100644 --- a/include/linux/lnet/api.h +++ b/include/linux/lnet/api.h @@ -96,7 +96,7 @@ */ struct lnet_me * LNetMEAttach(unsigned int portal, - struct lnet_process_id match_id_in, + struct lnet_processid *match_id_in, u64 match_bits_in, u64 ignore_bits_in, enum lnet_unlink unlink_in, diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index fb2f42fcb..02eae6b 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -629,9 +629,9 @@ int lnet_send_ping(lnet_nid_t dest_nid, struct lnet_handle_md *mdh, int nnis, /* match-table functions */ struct list_head *lnet_mt_match_head(struct lnet_match_table *mtable, - struct lnet_process_id id, u64 mbits); + struct lnet_processid *id, u64 mbits); struct lnet_match_table *lnet_mt_of_attach(unsigned int index, - struct lnet_process_id id, + struct lnet_processid *id, u64 mbits, u64 ignore_bits, enum lnet_ins_pos pos); int lnet_mt_match_md(struct lnet_match_table *mtable, diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index bde7249..628d133 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -187,7 +187,7 @@ struct lnet_libhandle { struct lnet_me { struct list_head me_list; int me_cpt; - struct lnet_process_id me_match_id; + struct lnet_processid me_match_id; unsigned int me_portal; unsigned int me_pos; /* hash offset in mt_hash */ u64 me_match_bits; @@ -994,7 +994,7 @@ enum lnet_match_flags { /* parameter for matching operations (GET, PUT) */ struct lnet_match_info { u64 mi_mbits; - struct lnet_process_id mi_id; + struct lnet_processid mi_id; unsigned int mi_cpt; unsigned int mi_opc; unsigned int mi_portal; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 340cc84e..9d9d0e6 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -1840,8 +1840,8 @@ struct lnet_ping_buffer * struct lnet_handle_md *ping_mdh, int ni_count, bool set_eq) { - struct lnet_process_id id = { - .nid = LNET_NID_ANY, + struct lnet_processid id = { + .nid = LNET_ANY_NID, .pid = LNET_PID_ANY }; struct lnet_me *me; @@ -1859,7 +1859,7 @@ struct lnet_ping_buffer * } /* Ping target ME/MD */ - me = LNetMEAttach(LNET_RESERVED_PORTAL, id, + me = LNetMEAttach(LNET_RESERVED_PORTAL, &id, LNET_PROTO_PING_MATCHBITS, 0, LNET_UNLINK, LNET_INS_AFTER); if (IS_ERR(me)) { @@ -2056,15 +2056,15 @@ int lnet_push_target_resize(void) int lnet_push_target_post(struct lnet_ping_buffer *pbuf, struct lnet_handle_md *mdhp) { - struct lnet_process_id id = { - .nid = LNET_NID_ANY, + struct lnet_processid id = { + .nid = LNET_ANY_NID, .pid = LNET_PID_ANY }; struct lnet_md md = { NULL }; struct lnet_me *me; int rc; - me = LNetMEAttach(LNET_RESERVED_PORTAL, id, + me = LNetMEAttach(LNET_RESERVED_PORTAL, &id, LNET_PROTO_PING_MATCHBITS, 0, LNET_UNLINK, LNET_INS_AFTER); if (IS_ERR(me)) { diff --git a/net/lnet/lnet/lib-me.c b/net/lnet/lnet/lib-me.c index 66a79e2..7868165 100644 --- a/net/lnet/lnet/lib-me.c +++ b/net/lnet/lnet/lib-me.c @@ -68,7 +68,7 @@ */ struct lnet_me * LNetMEAttach(unsigned int portal, - struct lnet_process_id match_id, + struct lnet_processid *match_id, u64 match_bits, u64 ignore_bits, enum lnet_unlink unlink, enum lnet_ins_pos pos) { @@ -93,7 +93,7 @@ struct lnet_me * lnet_res_lock(mtable->mt_cpt); me->me_portal = portal; - me->me_match_id = match_id; + me->me_match_id = *match_id; me->me_match_bits = match_bits; me->me_ignore_bits = ignore_bits; me->me_unlink = unlink; diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index 2f7c37d..088a754 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -3900,7 +3900,7 @@ void lnet_monitor_thr_stop(void) le32_to_cpus(&hdr->msg.put.offset); /* Primary peer NID. */ - info.mi_id.nid = msg->msg_initiator; + lnet_nid4_to_nid(msg->msg_initiator, &info.mi_id.nid); info.mi_id.pid = hdr->src_pid; info.mi_opc = LNET_MD_OP_PUT; info.mi_portal = hdr->msg.put.ptl_index; @@ -3939,7 +3939,7 @@ void lnet_monitor_thr_stop(void) case LNET_MATCHMD_DROP: CNETERR("Dropping PUT from %s portal %d match %llu offset %d length %d: %d\n", - libcfs_id2str(info.mi_id), info.mi_portal, + libcfs_idstr(&info.mi_id), info.mi_portal, info.mi_mbits, info.mi_roffset, info.mi_rlength, rc); return -ENOENT; /* -ve: OK but no match */ @@ -3964,7 +3964,7 @@ void lnet_monitor_thr_stop(void) source_id.nid = hdr->src_nid; source_id.pid = hdr->src_pid; /* Primary peer NID */ - info.mi_id.nid = msg->msg_initiator; + lnet_nid4_to_nid(msg->msg_initiator, &info.mi_id.nid); info.mi_id.pid = hdr->src_pid; info.mi_opc = LNET_MD_OP_GET; info.mi_portal = hdr->msg.get.ptl_index; @@ -3976,7 +3976,7 @@ void lnet_monitor_thr_stop(void) rc = lnet_ptl_match_md(&info, msg); if (rc == LNET_MATCHMD_DROP) { CNETERR("Dropping GET from %s portal %d match %llu offset %d length %d\n", - libcfs_id2str(info.mi_id), info.mi_portal, + libcfs_idstr(&info.mi_id), info.mi_portal, info.mi_mbits, info.mi_roffset, info.mi_rlength); return -ENOENT; /* -ve: OK but no match */ } @@ -4008,7 +4008,7 @@ void lnet_monitor_thr_stop(void) /* didn't get as far as lnet_ni_send() */ CERROR("%s: Unable to send REPLY for GET from %s: %d\n", libcfs_nidstr(&ni->ni_nid), - libcfs_id2str(info.mi_id), rc); + libcfs_idstr(&info.mi_id), rc); lnet_finalize(msg, rc); } diff --git a/net/lnet/lnet/lib-ptl.c b/net/lnet/lnet/lib-ptl.c index 095b190..d367c00 100644 --- a/net/lnet/lnet/lib-ptl.c +++ b/net/lnet/lnet/lib-ptl.c @@ -39,15 +39,15 @@ MODULE_PARM_DESC(portal_rotor, "redirect PUTs to different cpu-partitions"); static int -lnet_ptl_match_type(unsigned int index, struct lnet_process_id match_id, +lnet_ptl_match_type(unsigned int index, struct lnet_processid *match_id, u64 mbits, u64 ignore_bits) { struct lnet_portal *ptl = the_lnet.ln_portals[index]; int unique; - unique = !ignore_bits && - match_id.nid != LNET_NID_ANY && - match_id.pid != LNET_PID_ANY; + unique = (!ignore_bits && + !LNET_NID_IS_ANY(&match_id->nid) && + match_id->pid != LNET_PID_ANY); LASSERT(!lnet_ptl_is_unique(ptl) || !lnet_ptl_is_wildcard(ptl)); @@ -151,8 +151,8 @@ return LNET_MATCHMD_NONE; /* mismatched ME nid/pid? */ - if (me->me_match_id.nid != LNET_NID_ANY && - me->me_match_id.nid != info->mi_id.nid) + if (!LNET_NID_IS_ANY(&me->me_match_id.nid) && + !nid_same(&me->me_match_id.nid, &info->mi_id.nid)) return LNET_MATCHMD_NONE; if (me->me_match_id.pid != LNET_PID_ANY && @@ -182,7 +182,7 @@ } else if (!(md->md_options & LNET_MD_TRUNCATE)) { /* this packet _really_ is too big */ CERROR("Matching packet from %s, match %llu length %d too big: %d left, %d allowed\n", - libcfs_id2str(info->mi_id), info->mi_mbits, + libcfs_idstr(&info->mi_id), info->mi_mbits, info->mi_rlength, md->md_length - offset, mlength); return LNET_MATCHMD_DROP; @@ -191,7 +191,7 @@ /* Commit to this ME/MD */ CDEBUG(D_NET, "Incoming %s index %x from %s of length %d/%d into md %#llx [%d] + %d\n", (info->mi_opc == LNET_MD_OP_PUT) ? "put" : "get", - info->mi_portal, libcfs_id2str(info->mi_id), mlength, + info->mi_portal, libcfs_idstr(&info->mi_id), mlength, info->mi_rlength, md->md_lh.lh_cookie, md->md_niov, offset); lnet_msg_attach_md(msg, md, offset, mlength); @@ -212,18 +212,18 @@ } static struct lnet_match_table * -lnet_match2mt(struct lnet_portal *ptl, struct lnet_process_id id, u64 mbits) +lnet_match2mt(struct lnet_portal *ptl, struct lnet_processid *id, u64 mbits) { if (LNET_CPT_NUMBER == 1) return ptl->ptl_mtables[0]; /* the only one */ /* if it's a unique portal, return match-table hashed by NID */ return lnet_ptl_is_unique(ptl) ? - ptl->ptl_mtables[lnet_cpt_of_nid(id.nid, NULL)] : NULL; + ptl->ptl_mtables[lnet_nid2cpt(&id->nid, NULL)] : NULL; } struct lnet_match_table * -lnet_mt_of_attach(unsigned int index, struct lnet_process_id id, +lnet_mt_of_attach(unsigned int index, struct lnet_processid *id, u64 mbits, u64 ignore_bits, enum lnet_ins_pos pos) { struct lnet_portal *ptl; @@ -274,7 +274,7 @@ struct lnet_match_table * LASSERT(lnet_ptl_is_wildcard(ptl) || lnet_ptl_is_unique(ptl)); - mtable = lnet_match2mt(ptl, info->mi_id, info->mi_mbits); + mtable = lnet_match2mt(ptl, &info->mi_id, info->mi_mbits); if (mtable) return mtable; @@ -357,13 +357,13 @@ struct lnet_match_table * struct list_head * lnet_mt_match_head(struct lnet_match_table *mtable, - struct lnet_process_id id, u64 mbits) + struct lnet_processid *id, u64 mbits) { struct lnet_portal *ptl = the_lnet.ln_portals[mtable->mt_portal]; unsigned long hash = mbits; if (!lnet_ptl_is_wildcard(ptl)) { - hash += id.nid + id.pid; + hash += nidhash(&id->nid) + id->pid; LASSERT(lnet_ptl_is_unique(ptl)); hash = hash_long(hash, LNET_MT_HASH_BITS); @@ -385,7 +385,8 @@ struct list_head * if (!list_empty(&mtable->mt_mhash[LNET_MT_HASH_IGNORE])) head = &mtable->mt_mhash[LNET_MT_HASH_IGNORE]; else - head = lnet_mt_match_head(mtable, info->mi_id, info->mi_mbits); + head = lnet_mt_match_head(mtable, &info->mi_id, + info->mi_mbits); again: /* NB: only wildcard portal needs to return LNET_MATCHMD_EXHAUSTED */ if (lnet_ptl_is_wildcard(the_lnet.ln_portals[mtable->mt_portal])) @@ -418,7 +419,8 @@ struct list_head * } if (!exhausted && head == &mtable->mt_mhash[LNET_MT_HASH_IGNORE]) { - head = lnet_mt_match_head(mtable, info->mi_id, info->mi_mbits); + head = lnet_mt_match_head(mtable, &info->mi_id, + info->mi_mbits); goto again; /* re-check MEs w/o ignore-bits */ } @@ -570,8 +572,9 @@ struct list_head * struct lnet_portal *ptl; int rc; - CDEBUG(D_NET, "Request from %s of length %d into portal %d MB=%#llx\n", - libcfs_id2str(info->mi_id), info->mi_rlength, info->mi_portal, + CDEBUG(D_NET, + "Request from %s of length %d into portal %d MB=%#llx\n", + libcfs_idstr(&info->mi_id), info->mi_rlength, info->mi_portal, info->mi_mbits); if (info->mi_portal >= the_lnet.ln_nportals) { @@ -629,7 +632,7 @@ struct list_head * CDEBUG(D_NET, "Delaying %s from %s ptl %d MB %#llx off %d len %d\n", info->mi_opc == LNET_MD_OP_PUT ? "PUT" : "GET", - libcfs_id2str(info->mi_id), info->mi_portal, + libcfs_idstr(&info->mi_id), info->mi_portal, info->mi_mbits, info->mi_roffset, info->mi_rlength); } goto out0; @@ -687,7 +690,7 @@ struct list_head * hdr = &msg->msg_hdr; /* Multi-Rail: Primary peer NID */ - info.mi_id.nid = msg->msg_initiator; + lnet_nid4_to_nid(msg->msg_initiator, &info.mi_id.nid); info.mi_id.pid = hdr->src_pid; info.mi_opc = LNET_MD_OP_PUT; info.mi_portal = hdr->msg.put.ptl_index; @@ -719,7 +722,7 @@ struct list_head * list_add_tail(&msg->msg_list, matches); CDEBUG(D_NET, "Resuming delayed PUT from %s portal %d match %llu offset %d length %d.\n", - libcfs_id2str(info.mi_id), + libcfs_idstr(&info.mi_id), info.mi_portal, info.mi_mbits, info.mi_roffset, info.mi_rlength); } else { diff --git a/net/lnet/selftest/rpc.c b/net/lnet/selftest/rpc.c index 7141da4..bd95e88 100644 --- a/net/lnet/selftest/rpc.c +++ b/net/lnet/selftest/rpc.c @@ -354,14 +354,18 @@ struct srpc_bulk * static int srpc_post_passive_rdma(int portal, int local, u64 matchbits, void *buf, - int len, int options, struct lnet_process_id peer, + int len, int options, struct lnet_process_id peer4, struct lnet_handle_md *mdh, struct srpc_event *ev) { int rc; struct lnet_md md; struct lnet_me *me; + struct lnet_processid peer; - me = LNetMEAttach(portal, peer, matchbits, 0, LNET_UNLINK, + peer.pid = peer4.pid; + lnet_nid4_to_nid(peer4.nid, &peer.nid); + + me = LNetMEAttach(portal, &peer, matchbits, 0, LNET_UNLINK, local ? LNET_INS_LOCAL : LNET_INS_AFTER); if (IS_ERR(me)) { rc = PTR_ERR(me); @@ -387,7 +391,7 @@ struct srpc_bulk * CDEBUG(D_NET, "Posted passive RDMA: peer %s, portal %d, matchbits %#llx\n", - libcfs_id2str(peer), portal, matchbits); + libcfs_id2str(peer4), portal, matchbits); return 0; } From patchwork Sun Nov 28 23:27:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643251 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E7E38C433FE for ; Sun, 28 Nov 2021 23:28:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 81F0620573A; Sun, 28 Nov 2021 15:28:17 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B4833200E02 for ; Sun, 28 Nov 2021 15:28:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 95CF1246; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 88D63C1AEC; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:40 -0500 Message-Id: <1638142074-5945-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 05/19] lnet: libcfs: add timeout to cfs_race() to fix race X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev there is no guarantee for the branches in cfs_race() to be executed in strict order, thus it's possible that the second branch (with cfs_race_state=1) is executed before the first branch and then another thread executing the first branch gets stuck. this construction is used for testing only and as a workaround it's enough to timeout. WC-bug-id: https://jira.whamcloud.com/browse/LU-13358 Lustre-commit: 2d2d381f35ee00431 ("LU-13358 libcfs: add timeout to cfs_race() to fix race") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/43161 Reviewed-by: James Simmons Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/libcfs/libcfs_fail.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/include/linux/libcfs/libcfs_fail.h b/include/linux/libcfs/libcfs_fail.h index 45166c5..731401b 100644 --- a/include/linux/libcfs/libcfs_fail.h +++ b/include/linux/libcfs/libcfs_fail.h @@ -213,8 +213,14 @@ static inline void cfs_race_wait(u32 id) cfs_race_state = 0; CERROR("cfs_race id %x sleeping\n", id); - rc = wait_event_interruptible(cfs_race_waitq, - cfs_race_state != 0); + /* + * XXX: don't wait forever as there is no guarantee + * that this branch is executed first. for testing + * purposes this construction works good enough + */ + rc = wait_event_interruptible_timeout(cfs_race_waitq, + cfs_race_state != 0, + 5 * HZ); CERROR("cfs_fail_race id %x awake: rc=%d\n", id, rc); } } From patchwork Sun Nov 28 23:27:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 16171C433EF for ; Sun, 28 Nov 2021 23:28:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2F422205743; Sun, 28 Nov 2021 15:28:21 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7D046200E3B for ; Sun, 28 Nov 2021 15:28:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 95AB9245; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8F167C1AC4; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:41 -0500 Message-Id: <1638142074-5945-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 06/19] lustre: llite: tighten condition for fault not drop mmap_sem X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam As __lock_page_or_retry() indicates, filemap_fault() will return VM_FAULT_RETRY without releasing mmap_sem iff flags contains FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT. So ll_fault0() should pass in FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT in ll_filemap_fault() so that when it returns VM_FAULT_RETRY, we can pass on trying normal fault under DLM lock as mmap_sem is still being held. While in Linux 5.1 (6b4c9f4469819) FAULT_FLAG_RETRY_NOWAIT is enough to not drop mmap_sem when failed to lock the page. Fixes: 22cceab961 ("lustre: llite: Avoid eternel retry loops with MAP_POPULATE") WC-bug-id: https://jira.whamcloud.com/browse/LU-14713 Lustre-commit: 81aec05103558f57a ("LU-14713 llite: tighten condition for fault not drop mmap_sem") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/44715 Reviewed-by: Patrick Farrell Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_mmap.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c index 8047786..0009c5f 100644 --- a/fs/lustre/llite/llite_mmap.c +++ b/fs/lustre/llite/llite_mmap.c @@ -285,18 +285,25 @@ static vm_fault_t __ll_fault(struct vm_area_struct *vma, struct vm_fault *vmf) if (ll_sbi_has_fast_read(ll_i2sbi(inode))) { /* do fast fault */ + bool allow_retry = vmf->flags & FAULT_FLAG_ALLOW_RETRY; bool has_retry = vmf->flags & FAULT_FLAG_RETRY_NOWAIT; /* To avoid loops, instruct downstream to not drop mmap_sem */ - vmf->flags |= FAULT_FLAG_RETRY_NOWAIT; + /** + * only need FAULT_FLAG_ALLOW_RETRY prior to Linux 5.1 + * (6b4c9f4469819), where FAULT_FLAG_RETRY_NOWAIT is enough + * to not drop mmap_sem when failed to lock the page. + */ + vmf->flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; ll_cl_add(inode, env, NULL, LCC_MMAP); fault_ret = filemap_fault(vmf); ll_cl_remove(inode, env); if (has_retry) vmf->flags &= ~FAULT_FLAG_RETRY_NOWAIT; + if (!allow_retry) + vmf->flags &= ~FAULT_FLAG_ALLOW_RETRY; - /* - * - If there is no error, then the page was found in cache and + /* - If there is no error, then the page was found in cache and * uptodate; * - If VM_FAULT_RETRY is set, the page existed but failed to * lock. We will try slow path to avoid loops. From patchwork Sun Nov 28 23:27:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643253 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6C6F4C433F5 for ; Sun, 28 Nov 2021 23:28:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C0E5B21C951; Sun, 28 Nov 2021 15:28:20 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id ED979200E02 for ; Sun, 28 Nov 2021 15:28:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 995B5247; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 93FCCC1AF6; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:42 -0500 Message-Id: <1638142074-5945-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 07/19] lnet: o2iblnd: map_on_demand not needed for frag interop X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The map_on_demand tunable is not used for setting max frags so don't require that it be set in order to negotiate max frags. HPE-bug-id: LUS-10488 WC-bug-id: https://jira.whamcloud.com/browse/LU-15094 Lustre-commit: 4e61a4aacdbc237606 ("LU-15094 o2iblnd: map_on_demand not needed for frag interop") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/45215 Reviewed-by: Serguei Smirnov Reviewed-by: Andriy Skulysh Reviewed-by: Alexey Lyashkov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 16 +++------------- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 380374e..a053e7d 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -2658,22 +2658,12 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, reason = "Unknown"; break; - case IBLND_REJECT_RDMA_FRAGS: { - struct lnet_ioctl_config_o2iblnd_tunables *tunables; - + case IBLND_REJECT_RDMA_FRAGS: if (!cp) { reason = "can't negotiate max frags"; goto out; } - tunables = &peer_ni->ibp_ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib; - /* - * This check only makes sense if the kernel supports global - * memory registration. Otherwise, map_on_demand will never == 0 - */ - if (!tunables->lnd_map_on_demand) { - reason = "map_on_demand must be enabled"; - goto out; - } + if (conn->ibc_max_frags <= frag_num) { reason = "unsupported max frags"; goto out; @@ -2682,7 +2672,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, peer_ni->ibp_max_frags = frag_num; reason = "rdma fragments"; break; - } + case IBLND_REJECT_MSG_QUEUE_SIZE: if (!cp) { reason = "can't negotiate queue depth"; From patchwork Sun Nov 28 23:27:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id ABD85C433EF for ; Sun, 28 Nov 2021 23:28:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E0319200EC8; Sun, 28 Nov 2021 15:28:03 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3094E200EDC for ; Sun, 28 Nov 2021 15:28:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 9B12324A; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 98B8EC1AC9; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:43 -0500 Message-Id: <1638142074-5945-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 08/19] lnet: o2iblnd: Fix logic for unaligned transfer X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn It's possible for there to be an offset for the first page of a transfer. However, there are two bugs with this code in o2iblnd. The first is that this use-case will require LNET_MAX_IOV + 1 local RDMA fragments, but we do not specify the correct corresponding values for the max page list to ib_alloc_fast_reg_page_list(), ib_alloc_fast_reg_mr(), etc. The second issue is that the logic in kiblnd_setup_rd_kiov() attempts to obtain one more scatterlist entry than is actually needed. This causes the transfer to fail with -EFAULT. HPE-bug-id: LUS-10407 WC-bug-id: https://jira.whamcloud.com/browse/LU-15092 Lustre-commit: 23a2c92f203ff2f39 ("LU-15092 o2iblnd: Fix logic for unaligned transfer") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/45216 Reviewed-by: James Simmons Reviewed-by: Andriy Skulysh Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 2 +- net/lnet/klnds/o2iblnd/o2iblnd.h | 6 ++++-- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 15 +++++++++------ 3 files changed, 14 insertions(+), 9 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index 36d26b2..9cdc12a 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -1392,7 +1392,7 @@ static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps, frd->frd_mr = ib_alloc_mr(fpo->fpo_hdev->ibh_pd, fastreg_gaps ? IB_MR_TYPE_SG_GAPS : IB_MR_TYPE_MEM_REG, - LNET_MAX_IOV); + IBLND_MAX_RDMA_FRAGS); if (IS_ERR(frd->frd_mr)) { rc = PTR_ERR(frd->frd_mr); CERROR("Failed to allocate ib_alloc_mr: %d\n", rc); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 5066f7b..21f8981 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -112,8 +112,10 @@ struct kib_tunables { #define IBLND_OOB_CAPABLE(v) ((v) != IBLND_MSG_VERSION_1) #define IBLND_OOB_MSGS(v) (IBLND_OOB_CAPABLE(v) ? 2 : 0) -#define IBLND_MSG_SIZE (4 << 10) /* max size of queued messages (inc hdr) */ -#define IBLND_MAX_RDMA_FRAGS LNET_MAX_IOV /* max # of fragments supported */ +/* max size of queued messages (inc hdr) */ +#define IBLND_MSG_SIZE (4 << 10) +/* max # of fragments supported. + 1 for unaligned case */ +#define IBLND_MAX_RDMA_FRAGS (LNET_MAX_IOV + 1) /************************/ /* derived constants... */ diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index a053e7d..db13f41 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -662,6 +662,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, struct scatterlist *sg; int fragnob; int max_nkiov; + int sg_count = 0; CDEBUG(D_NET, "niov %d offset %d nob %d\n", nkiov, offset, nob); @@ -682,6 +683,12 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, do { LASSERT(nkiov > 0); + if (!sg) { + CERROR("lacking enough sg entries to map tx\n"); + return -EFAULT; + } + sg_count++; + fragnob = min((int)(kiov->bv_len - offset), nob); /* We're allowed to start at a non-aligned page offset in @@ -700,10 +707,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, sg_set_page(sg, kiov->bv_page, fragnob, kiov->bv_offset + offset); sg = sg_next(sg); - if (!sg) { - CERROR("lacking enough sg entries to map tx\n"); - return -EFAULT; - } offset = 0; kiov++; @@ -711,7 +714,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, nob -= fragnob; } while (nob > 0); - return kiblnd_map_tx(ni, tx, rd, sg - tx->tx_frags); + return kiblnd_map_tx(ni, tx, rd, sg_count); } static int @@ -1008,7 +1011,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, int nob = offsetof(struct kib_msg, ibm_u) + body_nob; LASSERT(tx->tx_nwrq >= 0); - LASSERT(tx->tx_nwrq < IBLND_MAX_RDMA_FRAGS + 1); + LASSERT(tx->tx_nwrq <= IBLND_MAX_RDMA_FRAGS); LASSERT(nob <= IBLND_MSG_SIZE); kiblnd_init_msg(tx->tx_msg, type, body_nob); From patchwork Sun Nov 28 23:27:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643239 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 20F83C433EF for ; Sun, 28 Nov 2021 23:28:12 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8586820134F; Sun, 28 Nov 2021 15:28:07 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 851AF200F3F for ; Sun, 28 Nov 2021 15:28:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A0F6324C; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9D0D8C1ACE; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:44 -0500 Message-Id: <1638142074-5945-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 09/19] lnet: Reset ni_ping_count only on receive X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The lnet_ni:ni_ping_count is currently reset on every (healthy) tx. We should only reset it when receiving a message over the NI. Taking net_lock 0 on every tx results in a performance loss for certain workloads. Fixes: 885dab4e09 ("lnet: Recover local NI w/exponential backoff interval") HPE-bug-id: LUS-10427 WC-bug-id: https://jira.whamcloud.com/browse/LU-15102 Lustre-commit: 9cc0a5ff5fc8f45aa ("LU-15102 lnet: Reset ni_ping_count only on receive") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/45235 Reviewed-by: Serguei Smirnov Reviewed-by: Andriy Skulysh Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 3c8b7c3..12768b2 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -888,8 +888,6 @@ * faster recovery. */ lnet_inc_healthv(&ni->ni_healthv, lnet_health_sensitivity); - lnet_net_lock(0); - ni->ni_ping_count = 0; /* It's possible msg_txpeer is NULL in the LOLND * case. Only increment the peer's health if we're * receiving a message from it. It's the only sure way to @@ -898,7 +896,9 @@ * as indication that the router is fully healthy. */ if (lpni && msg->msg_rx_committed) { + lnet_net_lock(0); lpni->lpni_ping_count = 0; + ni->ni_ping_count = 0; /* If we're receiving a message from the router or * I'm a router, then set that lpni's health to * maximum so we can commence communication @@ -925,8 +925,8 @@ &the_lnet.ln_mt_peerNIRecovq, ktime_get_seconds()); } + lnet_net_unlock(0); } - lnet_net_unlock(0); /* we can finalize this message */ return -1; From patchwork Sun Nov 28 23:27:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C4FA2C433F5 for ; Sun, 28 Nov 2021 23:28:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 479F121C8FE; Sun, 28 Nov 2021 15:28:14 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BC790200F3F for ; Sun, 28 Nov 2021 15:28:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A5454255; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A16B2C1AC4; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:45 -0500 Message-Id: <1638142074-5945-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 10/19] lustre: ptlrpc: fix timeout after spurious wakeup X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Zhuravlev so that final timeout don't exceed requested one WC-bug-id: https://jira.whamcloud.com/browse/LU-15086 Lustre-commit: b8383035406a4b7be ("LU-15086 ptlrpc: fix timeout after spurious wakeup") Signed-off-by: Alex Zhuravlev Reviewed-on: https://review.whamcloud.com/45308 Reviewed-by: Andriy Skulysh Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/ptlrpcd.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/lustre/ptlrpc/ptlrpcd.c b/fs/lustre/ptlrpc/ptlrpcd.c index 9cd9d39..23fb52d 100644 --- a/fs/lustre/ptlrpc/ptlrpcd.c +++ b/fs/lustre/ptlrpc/ptlrpcd.c @@ -438,7 +438,7 @@ static int ptlrpcd(void *arg) DEFINE_WAIT_FUNC(wait, woken_wake_function); time64_t timeout; - timeout = ptlrpc_set_next_timeout(set); + timeout = ptlrpc_set_next_timeout(set) * HZ; lu_context_enter(&env.le_ctx); lu_context_enter(env.le_ses); @@ -447,12 +447,15 @@ static int ptlrpcd(void *arg) while (!ptlrpcd_check(&env, pc)) { int ret; - if (timeout == 0) + if (timeout == 0) { ret = wait_woken(&wait, TASK_IDLE, MAX_SCHEDULE_TIMEOUT); - else + } else { ret = wait_woken(&wait, TASK_IDLE, - HZ * timeout); + timeout); + if (ret > 0) + timeout = ret; + } if (ret != 0) continue; /* Timed out */ From patchwork Sun Nov 28 23:27:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643261 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E1298C433EF for ; Sun, 28 Nov 2021 23:28:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 331F421C905; Sun, 28 Nov 2021 15:28:27 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 012F9200F3F for ; Sun, 28 Nov 2021 15:28:01 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id A7BC525D; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A4C30C1ADA; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:46 -0500 Message-Id: <1638142074-5945-12-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 11/19] lnet: Fail peer add for existing gw peer X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn If there's an existing peer entry for a peer that is being added via CLI, and that existing peer was not created via the CLI, then DLC will attempt to delete the existing peer before creating a new one. The exit status of the peer deletion was not being checked. This results in the ability to add duplicate peers for gateways, because gateways cannot be deleted via lnetctl unless the routes for that gateway have been removed first. WC-bug-id: https://jira.whamcloud.com/browse/LU-15138 Lustre-commit: 79a4b69adb1e365b1 ("LU-15138 lnet: Fail peer add for existing gw peer") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/45337 Reviewed-by: Alexander Boyko Reviewed-by: Andriy Skulysh Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 1853388..a9f33c0 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -499,12 +499,14 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp) static int lnet_peer_del(struct lnet_peer *peer) { + int rc; + lnet_peer_cancel_discovery(peer); lnet_net_lock(LNET_LOCK_EX); - lnet_peer_del_locked(peer); + rc = lnet_peer_del_locked(peer); lnet_net_unlock(LNET_LOCK_EX); - return 0; + return rc; } /* @@ -1648,7 +1650,9 @@ struct lnet_peer_net * } } /* Delete and recreate as a configured peer. */ - lnet_peer_del(lp); + rc = lnet_peer_del(lp); + if (rc) + goto out; } /* Create peer, peer_net, and peer_ni. */ @@ -3238,6 +3242,7 @@ static int lnet_peer_deletion(struct lnet_peer *lp) struct list_head rlist; struct lnet_route *route, *tmp; int sensitivity = lp->lp_health_sensitivity; + int rc; INIT_LIST_HEAD(&rlist); @@ -3271,7 +3276,10 @@ static int lnet_peer_deletion(struct lnet_peer *lp) lnet_net_unlock(LNET_LOCK_EX); /* lnet_peer_del() deletes all the peer NIs owned by this peer */ - lnet_peer_del(lp); + rc = lnet_peer_del(lp); + if (rc) + CNETERR("Internal error: Unable to delete peer %s rc %d\n", + libcfs_nidstr(&lp->lp_primary_nid), rc); list_for_each_entry_safe(route, tmp, &rlist, lr_list) { From patchwork Sun Nov 28 23:27:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643259 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DC120C433F5 for ; Sun, 28 Nov 2021 23:28:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AA10C20111A; Sun, 28 Nov 2021 15:28:24 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3AD19200F3F for ; Sun, 28 Nov 2021 15:28:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id AA438260; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A8F67C1AC9; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:47 -0500 Message-Id: <1638142074-5945-13-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 12/19] lustre: ptlrpc: remove bogus LASSERT X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger In the error case, it isn't possible for rc to be both -ENOMEM and 0 at the same time, so remove the incorrect LASSERT(rc == 0) to avoid crashing the system on an allocation failure. Improve error messages to conform to code style. Fixes: c8c95f49fd73 ("lnet: me: discard struct lnet_handle_me") WC-bug-id: https://jira.whamcloud.com/browse/LU-12678 Lustre-commit: 49769c1eea52e067d ("LU-12678 ptlrpc: remove bogus LASSERT") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/45421 Reviewed-by: James Simmons Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/niobuf.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/lustre/ptlrpc/niobuf.c b/fs/lustre/ptlrpc/niobuf.c index c5bbf5a..da04d4e 100644 --- a/fs/lustre/ptlrpc/niobuf.c +++ b/fs/lustre/ptlrpc/niobuf.c @@ -774,7 +774,7 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd) struct lnet_md md; struct lnet_me *me; - CDEBUG(D_NET, "LNetMEAttach: portal %d\n", + CDEBUG(D_NET, "%s: registering portal %d\n", service->srv_name, service->srv_req_portal); if (OBD_FAIL_CHECK(OBD_FAIL_PTLRPC_RQBD)) @@ -789,8 +789,9 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd) rqbd->rqbd_svcpt->scp_cpt >= 0 ? LNET_INS_LOCAL : LNET_INS_AFTER); if (IS_ERR(me)) { - CERROR("LNetMEAttach failed: %ld\n", PTR_ERR(me)); - return -ENOMEM; + CERROR("%s: LNetMEAttach failed: rc = %ld\n", + service->srv_name, PTR_ERR(me)); + return PTR_ERR(me); } LASSERT(rqbd->rqbd_refcount == 0); @@ -810,9 +811,9 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd) return 0; } - CERROR("ptlrpc: LNetMDAttach failed: rc = %d\n", rc); + CERROR("%s: LNetMDAttach failed: rc = %d\n", service->srv_name, rc); LASSERT(rc == -ENOMEM); rqbd->rqbd_refcount = 0; - return -ENOMEM; + return rc; } From patchwork Sun Nov 28 23:27:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643263 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8D110C433F5 for ; Sun, 28 Nov 2021 23:28:45 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1BA6B200DEA; Sun, 28 Nov 2021 15:28:28 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73384200F3F for ; Sun, 28 Nov 2021 15:28:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B0CC8264; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id AD60FC1ACE; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:48 -0500 Message-Id: <1638142074-5945-14-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 13/19] lustre: quota: optimize capability check for root squash X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson On client side, checking for owner/group quota can be directly bypassed if this is for root and there is no root squash. Fixes: cd633cfc96 ("lustre: quota: nodemap squashed root cannot bypass quota") WC-bug-id: https://jira.whamcloud.com/browse/LU-15141 Lustre-commit: f5fd5a363cc48e38c ("LU-15141 quota: optimize capability check for root squash") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/45322 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 1211438..7b7b49f 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -2385,7 +2385,12 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io, } /* check if the file's owner/group is over quota */ - if (!io->ci_noquota) { + /* do not check for root without root squash, because in this case + * we should bypass quota + */ + if ((!oio->oi_cap_sys_resource || + cli->cl_root_squash) && + !io->ci_noquota) { struct cl_object *obj; struct cl_attr *attr; unsigned int qid[MAXQUOTAS]; @@ -2400,20 +2405,8 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io, qid[USRQUOTA] = attr->cat_uid; qid[GRPQUOTA] = attr->cat_gid; qid[PRJQUOTA] = attr->cat_projid; - /* - * if EDQUOT returned for root, we double check - * if root squash enabled or not updated from server side. - * without root squash, we should bypass quota for root. - */ - if (rc == 0 && osc_quota_chkdq(cli, qid) == -EDQUOT) { - if (oio->oi_cap_sys_resource && - !cli->cl_root_squash) { - io->ci_noquota = 1; - rc = 0; - } else { - rc = -EDQUOT; - } - } + if (rc == 0 && osc_quota_chkdq(cli, qid) == -EDQUOT) + rc = -EDQUOT; if (rc) return rc; } From patchwork Sun Nov 28 23:27:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643265 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DEC30C433EF for ; Sun, 28 Nov 2021 23:28:49 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7BE252010C1; Sun, 28 Nov 2021 15:28:31 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AA9CF200F3F for ; Sun, 28 Nov 2021 15:28:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B56DF265; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B0E25C1AC4; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:49 -0500 Message-Id: <1638142074-5945-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 14/19] lustre: llite: skip request slot for lmv_revalidate_slaves() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh Some syscalls need lmv_revalidate_slaves(). It requires second lock enqueue and the it can be blocked by lack of RPC slots. Don't acquire rpc slot for second lock enqueue. HPE-bug-id: LUS-8416 WC-bug-id: https://jira.whamcloud.com/browse/LU-15121 Lustre-commit: 7e781c605c4189ea1 ("LU-15121 llite: skip request slot for lmv_revalidate_slaves()") Signed-off-by: Andriy Skulysh Reviewed-on: https://review.whamcloud.com/45275 Reviewed-by: Vitaly Fertman Reviewed-by: Alexander Zarochentsev Reviewed-by: Vitaly Fertman Reviewed-by: Alexander Zarochentsev Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 7 +++++-- fs/lustre/include/obd.h | 1 + fs/lustre/ldlm/ldlm_request.c | 18 +++++++++++------- fs/lustre/llite/statahead.c | 1 + fs/lustre/lmv/lmv_intent.c | 2 ++ fs/lustre/mdc/mdc_dev.c | 3 ++- fs/lustre/mdc/mdc_locks.c | 5 +++-- fs/lustre/osc/osc_request.c | 2 +- 8 files changed, 26 insertions(+), 13 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 1fc199b..a2fe9676 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -1018,7 +1018,9 @@ struct ldlm_enqueue_info { /* whether enqueue slave stripes */ unsigned int ei_enq_slave:1; /* whether acquire rpc slot */ - unsigned int ei_enq_slot:1; + unsigned int ei_req_slot:1; + /** whether acquire mod rpc slot */ + unsigned int ei_mod_slot:1; }; extern struct obd_ops ldlm_obd_ops; @@ -1343,7 +1345,8 @@ int ldlm_prep_elc_req(struct obd_export *exp, int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, struct ldlm_enqueue_info *einfo, u8 with_policy, u64 *flags, void *lvb, u32 lvb_len, - const struct lustre_handle *lockh, int rc); + const struct lustre_handle *lockh, int rc, + bool request_slot); int ldlm_cli_convert_req(struct ldlm_lock *lock, u32 *flags, u64 new_bits); int ldlm_cli_convert(struct ldlm_lock *lock, enum ldlm_cancel_flags cancel_flags); diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index 27acd33..58a5803 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -722,6 +722,7 @@ enum md_cli_flags { CLI_API32 = BIT(3), CLI_MIGRATE = BIT(4), CLI_DIRTY_DATA = BIT(5), + CLI_NO_SLOT = BIT(6), }; enum md_op_code { diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 746c45b..44e1ec2 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -359,7 +359,9 @@ static bool ldlm_request_slot_needed(struct ldlm_enqueue_info *einfo) /* exclude EXTENT locks and DOM-only IBITS locks because they * are asynchronous and don't wait on server being blocked. */ - return einfo->ei_type == LDLM_FLOCK || einfo->ei_type == LDLM_IBITS; + return einfo->ei_req_slot && + (einfo->ei_type == LDLM_FLOCK || + einfo->ei_type == LDLM_IBITS); } /** @@ -371,7 +373,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, struct ldlm_enqueue_info *einfo, u8 with_policy, u64 *ldlm_flags, void *lvb, u32 lvb_len, const struct lustre_handle *lockh, - int rc) + int rc, bool request_slot) { struct ldlm_namespace *ns = exp->exp_obd->obd_namespace; const struct lu_env *env = NULL; @@ -380,7 +382,7 @@ int ldlm_cli_enqueue_fini(struct obd_export *exp, struct ptlrpc_request *req, struct ldlm_reply *reply; int cleanup_phase = 1; - if (ldlm_request_slot_needed(einfo)) + if (request_slot) obd_put_request_slot(&req->rq_import->imp_obd->u.cli); ptlrpc_put_mod_rpc_slot(req); @@ -726,6 +728,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, int is_replay = *flags & LDLM_FL_REPLAY; int req_passed_in = 1; int rc, err; + bool need_req_slot; struct ptlrpc_request *req; ns = exp->exp_obd->obd_namespace; @@ -829,13 +832,14 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, * that threads that are waiting for a modify RPC slot are not polluting * our rpcs in flight counter. */ - if (einfo->ei_enq_slot) + if (einfo->ei_mod_slot) ptlrpc_get_mod_rpc_slot(req); - if (ldlm_request_slot_needed(einfo)) { + need_req_slot = ldlm_request_slot_needed(einfo); + if (need_req_slot) { rc = obd_get_request_slot(&req->rq_import->imp_obd->u.cli); if (rc) { - if (einfo->ei_enq_slot) + if (einfo->ei_mod_slot) ptlrpc_put_mod_rpc_slot(req); failed_lock_cleanup(ns, lock, einfo->ei_mode); LDLM_LOCK_RELEASE(lock); @@ -855,7 +859,7 @@ int ldlm_cli_enqueue(struct obd_export *exp, struct ptlrpc_request **reqp, rc = ptlrpc_queue_wait(req); err = ldlm_cli_enqueue_fini(exp, req, einfo, policy ? 1 : 0, flags, - lvb, lvb_len, lockh, rc); + lvb, lvb_len, lockh, rc, need_req_slot); /* * If ldlm_cli_enqueue_fini did not find the lock, we need to free diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 4806e99..39ffb9d 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -380,6 +380,7 @@ static int ll_statahead_interpret(struct ptlrpc_request *req, einfo->ei_cb_cp = ldlm_completion_ast; einfo->ei_cb_gl = NULL; einfo->ei_cbdata = NULL; + einfo->ei_req_slot = 1; return minfo; } diff --git a/fs/lustre/lmv/lmv_intent.c b/fs/lustre/lmv/lmv_intent.c index 93da2b3..906ca16 100644 --- a/fs/lustre/lmv/lmv_intent.c +++ b/fs/lustre/lmv/lmv_intent.c @@ -106,6 +106,7 @@ static int lmv_intent_remote(struct obd_export *exp, struct lookup_intent *it, } op_data->op_bias = MDS_CROSS_REF; + op_data->op_cli_flags = CLI_NO_SLOT; CDEBUG(D_INODE, "REMOTE_INTENT with fid=" DFID " -> mds #%u\n", PFID(&body->mbo_fid1), tgt->ltd_index); @@ -203,6 +204,7 @@ int lmv_revalidate_slaves(struct obd_export *exp, * it's remote object. */ op_data->op_bias = MDS_CROSS_REF; + op_data->op_cli_flags = CLI_NO_SLOT; tgt = lmv_tgt(lmv, lsm->lsm_md_oinfo[i].lmo_mds); if (!tgt) { diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index b2f60ea..0b1d257 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -66,6 +66,7 @@ static void mdc_lock_build_einfo(const struct lu_env *env, einfo->ei_cb_cp = ldlm_completion_ast; einfo->ei_cb_gl = mdc_ldlm_glimpse_ast; einfo->ei_cbdata = osc; /* value to be put into ->l_ast_data */ + einfo->ei_req_slot = 1; } static void mdc_lock_lvb_update(const struct lu_env *env, @@ -664,7 +665,7 @@ int mdc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req, /* Complete obtaining the lock procedure. */ rc = ldlm_cli_enqueue_fini(aa->oa_exp, req, &einfo, 1, aa->oa_flags, aa->oa_lvb, aa->oa_lvb ? - sizeof(*aa->oa_lvb) : 0, lockh, rc); + sizeof(*aa->oa_lvb) : 0, lockh, rc, true); /* Complete mdc stuff. */ rc = mdc_enqueue_fini(aa->oa_exp, req, aa->oa_upcall, aa->oa_cookie, lockh, mode, aa->oa_flags, rc); diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index 4135c3a..66f0039 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -984,7 +984,8 @@ int mdc_enqueue_base(struct obd_export *exp, struct ldlm_enqueue_info *einfo, req->rq_sent = ktime_get_real_seconds() + resends; } - einfo->ei_enq_slot = !mdc_skip_mod_rpc_slot(it); + einfo->ei_req_slot = !(op_data->op_cli_flags & CLI_NO_SLOT); + einfo->ei_mod_slot = !mdc_skip_mod_rpc_slot(it); /* With Data-on-MDT the glimpse callback is needed too. * It is set here in advance but not in mdc_finish_enqueue() @@ -1371,7 +1372,7 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env, rc = -ETIMEDOUT; rc = ldlm_cli_enqueue_fini(exp, req, einfo, 1, &flags, NULL, 0, - lockh, rc); + lockh, rc, true); if (rc < 0) { CERROR("%s: ldlm_cli_enqueue_fini() failed: rc = %d\n", exp->exp_obd->obd_name, rc); diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index cf79808..e065eab 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -2824,7 +2824,7 @@ int osc_enqueue_interpret(const struct lu_env *env, struct ptlrpc_request *req, /* Complete obtaining the lock procedure. */ rc = ldlm_cli_enqueue_fini(aa->oa_exp, req, &einfo, 1, aa->oa_flags, - lvb, lvb_len, lockh, rc); + lvb, lvb_len, lockh, rc, false); /* Complete osc stuff. */ rc = osc_enqueue_fini(req, aa->oa_upcall, aa->oa_cookie, lockh, mode, aa->oa_flags, aa->oa_speculative, rc); From patchwork Sun Nov 28 23:27:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643267 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B718AC433EF for ; Sun, 28 Nov 2021 23:28:52 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B8D03200EE1; Sun, 28 Nov 2021 15:28:34 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F33EC200F73 for ; Sun, 28 Nov 2021 15:28:02 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id B66B9266; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B544BC1ADA; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:50 -0500 Message-Id: <1638142074-5945-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 15/19] lnet: set eth routes needed for multi rail X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Serguei Smirnov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Serguei Smirnov When ksocklnd is initialized or new ethernet interfaces are added via lnetctl, set the routing rules using a common shell script ksocklnd-config. This ensures control over source interface when sending traffic. For example, for eth0 with ip 192.168.122.142/24: the output of "ip route show table eth0" should be 192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.142 This step can be omitted by specifying options ksocklnd skip_mr_route_setup=1 in the conf file, or by using switch --skip-mr-route-setup when adding NI with lnetctl. Note that the module parameter takes priority over the lnetctl switch: if skip-mr-route-setup is not specified when adding NI with lnetctl, the route still won't get created if the conf file has skip_mr_route_setup=1. The route also won't be created if any route already exists for the given interface, assuming advanced users who manage routing on their own will want to continue doing so. WC-bug-id: https://jira.whamcloud.com/browse/LU-14662 Lustre-commit: c9bfe57bd2495671f ("LU-14662 lnet: set eth routes needed for multi rail") Signed-off-by: Serguei Smirnov Reviewed-on: https://review.whamcloud.com/44065 Reviewed-by: Amir Shehata Reviewed-by: Cyril Bordage Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd_modparams.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/lnet/klnds/socklnd/socklnd_modparams.c b/net/lnet/klnds/socklnd/socklnd_modparams.c index c00ea49..5eb58ca 100644 --- a/net/lnet/klnds/socklnd/socklnd_modparams.c +++ b/net/lnet/klnds/socklnd/socklnd_modparams.c @@ -147,6 +147,11 @@ module_param(conns_per_peer, uint, 0644); MODULE_PARM_DESC(conns_per_peer, "number of connections per peer"); +/* By default skip_mr_route_setup is 0 (do not skip) */ +static unsigned int skip_mr_route_setup; +module_param(skip_mr_route_setup, uint, 0444); +MODULE_PARM_DESC(skip_mr_route_setup, "skip automatic setup of linux routes for MR"); + #if SOCKNAL_VERSION_DEBUG static int protocol = 3; module_param(protocol, int, 0644); From patchwork Sun Nov 28 23:27:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643237 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 84922C433F5 for ; Sun, 28 Nov 2021 23:28:09 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3EC8E201125; Sun, 28 Nov 2021 15:28:06 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 36E71200EC0 for ; Sun, 28 Nov 2021 15:28:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id BAD4B268; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B987EC1AC9; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:51 -0500 Message-Id: <1638142074-5945-17-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 16/19] lustre: llite: Do not count tiny write twice X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell We accidentally count bytes written with tiny write twice in stats. Remove the extra count. This also has the positive effect of improving tiny write performance by about 4% by removing an extra call to the stats code (the main cost is ktime_get()). Before, 8 byte dd: 13.9 MiB/s After: 14.3 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-15197 Lustre-commit: 5208135f432a320e9 ("LU-15197 llite: Do not count tiny write twice") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/45476 Reviewed-by: Andreas Dilger Reviewed-by: Aurelien Degremont Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 6755671..d3374232 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -2032,8 +2032,6 @@ static ssize_t ll_do_tiny_write(struct kiocb *iocb, struct iov_iter *iter) if (result > 0) { ll_heat_add(inode, CIT_WRITE, result); - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_WRITE_BYTES, - result); set_bit(LLIF_DATA_MODIFIED, &ll_i2info(inode)->lli_flags); } From patchwork Sun Nov 28 23:27:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643249 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DEE84C433EF for ; Sun, 28 Nov 2021 23:28:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 673AC201112; Sun, 28 Nov 2021 15:28:17 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73785200EC0 for ; Sun, 28 Nov 2021 15:28:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C59EB26F; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BEE61C1ACE; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:52 -0500 Message-Id: <1638142074-5945-18-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 17/19] lustre: llite: mend the trunc_sem_up_write() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Bobi Jam The original lli_trunc_sem replace change (commit ae9e437745) fixed a lock scenario: t1 (page fault) t2 (dio read) t3 (truncate) |- vm_mmap_pgoff() |- vvp_io_read_start() |- vvp_io_setattr |- down_write(mmap_sem) |- down_read(trunc_sem) _start() |- do_map() |- ll_direct_IO_impl() |- vvp_io_fault_start |- ll_get_user_pages() |- down_write( |- down_read(mmap_sem) trunc_sem) |- down_read(trunc_sem) t1 waits for read semaphore of trunc_sem which is hindered by t3, since t3 is waiting for the write semaphore while t2 take its read semaphore,and t2 is waiting for mmap_sem which has been taken by t1, and a deadlock ensues. commit ae9e437745 changes the down_read(trunc_sem) to trunc_sem_down_read_nowait() in page fault path, to make it ignore that there is a down_write(trunc_sem) waiting, just takes the read semaphore if no writer has taken the semaphore, and breaks the deadlock. But there is a delicacy in using wake_up_var(), wake_up_var()-> __wake_up_bit()->waitqueue_active() locklessly test for waiters on the queue, and if it's called without explicit smp_mb() it's possible for the waitqueue_active() to ge hoisted before the condition store such that we'll observe an empty wait list and the waiter might not observe the condition, and the waiter won't get woke up whereafter. Fixes: ae9e437745 ("lustre: llite: replace lli_trunc_sem") WC-bug-id: https://jira.whamcloud.com/browse/LU-14713 Lustre-commit: 39745c8b5493159bb ("LU-14713 llite: mend the trunc_sem_up_write()") Signed-off-by: Bobi Jam Reviewed-on: https://review.whamcloud.com/43844 Reviewed-by: Andreas Dilger Reviewed-by: Neil Brown Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 7768c99..ce7431f 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -365,6 +365,8 @@ static inline void trunc_sem_down_write(struct ll_trunc_sem *sem) static inline void trunc_sem_up_write(struct ll_trunc_sem *sem) { atomic_set(&sem->ll_trunc_readers, 0); + /* match the smp_mb() in wait_var_event()->prepare_to_wait() */ + smp_mb(); wake_up_var(&sem->ll_trunc_readers); } From patchwork Sun Nov 28 23:27:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643269 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 942ACC433F5 for ; Sun, 28 Nov 2021 23:28:55 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 01D7221C93C; Sun, 28 Nov 2021 15:28:37 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AC8022010B6 for ; Sun, 28 Nov 2021 15:28:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C6972274; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C3257C1AC4; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:53 -0500 Message-Id: <1638142074-5945-19-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 18/19] lnet: Netlink improvements X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" With the expansion of the use of Netlink several issues have been encountered. This patch fixes many of the issues. The issues are: 1) Fix idx handling in lnet_genl_parse_list() function. It needs to always been incremented. Some renaming suggestion for enum lnet_nl_scalar_attrs from Neil. New LN_SCALAR_ATTR_INT_VALUE to allow pushing integers as well as strings from userspace. 2) Create struct genl_filter_list which will be used to create a list of items to pass back to userland. This will be a common setup. WC-bug-id: https://jira.whamcloud.com/browse/LU-9680 Lustre-commit: 82835a1952dcb37e8 ("LU-9680 net: Netlink improvements") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/44358 Reviewed-by: Ben Evans Reviewed-by: Petros Koutoupis Reviewed-by: Oleg Drokin --- include/linux/lnet/lib-types.h | 8 +++++++- include/uapi/linux/lnet/lnet-nl.h | 29 +++++++++++++++++++++++++---- net/lnet/lnet/api-ni.c | 9 +++++---- 3 files changed, 37 insertions(+), 9 deletions(-) diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 628d133..7631044 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -1320,7 +1320,13 @@ struct lnet { struct list_head ln_udsp_list; }; -static const struct nla_policy scalar_attr_policy[LN_SCALAR_CNT + 1] = { +struct genl_filter_list { + struct list_head lp_list; + void *lp_cursor; + bool lp_first; +}; + +static const struct nla_policy scalar_attr_policy[LN_SCALAR_MAX + 1] = { [LN_SCALAR_ATTR_LIST] = { .type = NLA_NESTED }, [LN_SCALAR_ATTR_LIST_SIZE] = { .type = NLA_U16 }, [LN_SCALAR_ATTR_INDEX] = { .type = NLA_U16 }, diff --git a/include/uapi/linux/lnet/lnet-nl.h b/include/uapi/linux/lnet/lnet-nl.h index f5bb67c..83f6e27 100644 --- a/include/uapi/linux/lnet/lnet-nl.h +++ b/include/uapi/linux/lnet/lnet-nl.h @@ -38,23 +38,44 @@ enum lnet_nl_key_format { LNKF_SEQUENCE = 4, }; +/** + * enum lnet_nl_scalar_attrs - scalar LNet netlink attributes used + * to compose messages for sending or + * receiving. + * + * @LN_SCALAR_ATTR_UNSPEC: unspecified attribute to catch errors + * @LN_SCALAR_ATTR_PAD: padding for 64-bit attributes, ignore + * + * @LN_SCALAR_ATTR_LIST: List of scalar attributes (NLA_NESTED) + * @LN_SCALAR_ATTR_LIST_SIZE: Number of items in scalar list (NLA_U16) + * @LN_SCALAR_ATTR_INDEX: True Netlink attr value (NLA_U16) + * @LN_SCALAR_ATTR_NLA_TYPE: Data format for value part of the pair + * (NLA_U16) + * @LN_SCALAR_ATTR_VALUE: String value of key part of the pair. + * (NLA_NUL_STRING) + * @LN_SCALAR_ATTR_INT_VALUE: Numeric value of key part of the pair. + * (NLA_S64) + * @LN_SCALAR_ATTR_KEY_FORMAT: LNKF_* format of the key value pair. + */ enum lnet_nl_scalar_attrs { LN_SCALAR_ATTR_UNSPEC = 0, - LN_SCALAR_ATTR_LIST, + LN_SCALAR_ATTR_PAD = LN_SCALAR_ATTR_UNSPEC, + LN_SCALAR_ATTR_LIST, LN_SCALAR_ATTR_LIST_SIZE, LN_SCALAR_ATTR_INDEX, LN_SCALAR_ATTR_NLA_TYPE, LN_SCALAR_ATTR_VALUE, + LN_SCALAR_ATTR_INT_VALUE, LN_SCALAR_ATTR_KEY_FORMAT, - __LN_SCALAR_ATTR_LAST, + __LN_SCALAR_ATTR_MAX_PLUS_ONE, }; -#define LN_SCALAR_CNT (__LN_SCALAR_ATTR_LAST - 1) +#define LN_SCALAR_MAX (__LN_SCALAR_ATTR_MAX_PLUS_ONE - 1) struct ln_key_props { - char *lkp_values; + char *lkp_value; __u16 lkp_key_format; __u16 lkp_data_type; }; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 9d9d0e6..3ed3f0b 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -2670,9 +2670,9 @@ static int lnet_genl_parse_list(struct sk_buff *msg, list->lkl_maxattr); nla_put_u16(msg, LN_SCALAR_ATTR_INDEX, count); - if (props[count].lkp_values) + if (props[count].lkp_value) nla_put_string(msg, LN_SCALAR_ATTR_VALUE, - props[count].lkp_values); + props[count].lkp_value); if (props[count].lkp_key_format) nla_put_u16(msg, LN_SCALAR_ATTR_KEY_FORMAT, props[count].lkp_key_format); @@ -2684,13 +2684,14 @@ static int lnet_genl_parse_list(struct sk_buff *msg, rc = lnet_genl_parse_list(msg, data, ++idx); if (rc < 0) return rc; + idx = rc; } nla_nest_end(msg, key); } nla_nest_end(msg, node); - return 0; + return idx; } int lnet_genl_send_scalar_list(struct sk_buff *msg, u32 portid, u32 seq, @@ -2717,7 +2718,7 @@ int lnet_genl_send_scalar_list(struct sk_buff *msg, u32 portid, u32 seq, canceled: if (rc < 0) genlmsg_cancel(msg, hdr); - return rc; + return rc > 0 ? 0 : rc; } EXPORT_SYMBOL(lnet_genl_send_scalar_list); From patchwork Sun Nov 28 23:27:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12643241 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8AC8BC433F5 for ; Sun, 28 Nov 2021 23:28:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7BE5D201127; Sun, 28 Nov 2021 15:28:10 -0800 (PST) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 036F62010C8 for ; Sun, 28 Nov 2021 15:28:03 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id C9846276; Sun, 28 Nov 2021 18:27:56 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C7804C1AC9; Sun, 28 Nov 2021 18:27:56 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 28 Nov 2021 18:27:54 -0500 Message-Id: <1638142074-5945-20-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> References: <1638142074-5945-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 19/19] lnet: libcfs: separate daemon_list from cfs_trace_data X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown cfs_trace_data provides a fifo for trace messages. To minimize locking, there is a separate fifo for each CPU, and even for different interrupt levels per-cpu. When a page is remove from the fifo to br written to a file, the page is added to a "daemon_list". Trace message on the daemon_list have already been logged to a file, but can be easily dumped to the console when a bug occurs. The daemon_list is always accessed from a single thread at a time, so the per-CPU facilities for cfs_trace_data are not needed. However daemon_list is currently managed per-cpu as part of cfs_trace_data. This patch moves the daemon_list of pages out to a separate structure - a simple linked list, protected by cfs_tracefile_sem. Rather than using a 'cfs_trace_page' to hold linkage information and content size, we use page->lru for linkage and page->private for the size of the content in each page. This is a step towards replacing cfs_trace_data with the Linux ring_buffer which provides similar functionality with even less locking. In the current code, if the daemon which writes trace data to a file cannot keep up with load, excess pages are moved to the daemon_list temporarily before being discarded. With the patch, these page are simply discarded immediately. If the daemon thread cannot keep up, that is a configuration problem and temporarily preserving a few pages is unlikely to really help. WC-bug-id: https://jira.whamcloud.com/browse/LU-14428 Lustre-commit: 848738a85d82bb71c ("LU-14428 libcfs: separate daemon_list from cfs_trace_data") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/41493 Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/libcfs/tracefile.c | 213 +++++++++++++++++++++++--------------------- net/lnet/libcfs/tracefile.h | 17 +--- 2 files changed, 112 insertions(+), 118 deletions(-) diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c index e0ef234..b27732a 100644 --- a/net/lnet/libcfs/tracefile.c +++ b/net/lnet/libcfs/tracefile.c @@ -58,6 +58,13 @@ enum cfs_trace_buf_type { union cfs_trace_data_union (*cfs_trace_data[CFS_TCD_TYPE_CNT])[NR_CPUS] __cacheline_aligned; +/* Pages containing records already processed by daemon. + * Link via ->lru, use size in ->private + */ +static LIST_HEAD(daemon_pages); +static long daemon_pages_count; +static long daemon_pages_max; + char cfs_tracefile[TRACEFILE_NAME_SIZE]; long long cfs_tracefile_size = CFS_TRACEFILE_SIZE; @@ -68,12 +75,6 @@ enum cfs_trace_buf_type { struct page_collection { struct list_head pc_pages; - /* - * if this flag is set, collect_pages() will spill both - * ->tcd_daemon_pages and ->tcd_pages to the ->pc_pages. Otherwise, - * only ->tcd_pages are spilled. - */ - int pc_want_daemon_pages; }; /* @@ -103,9 +104,6 @@ struct cfs_trace_page { unsigned short type; }; -static void put_pages_on_tcd_daemon_list(struct page_collection *pc, - struct cfs_trace_cpu_data *tcd); - /* trace file lock routines */ /* * The walking argument indicates the locking comes from all tcd types @@ -296,10 +294,10 @@ static void cfs_tcd_shrink(struct cfs_trace_cpu_data *tcd) if (!pgcount--) break; - list_move_tail(&tage->linkage, &pc.pc_pages); + list_del(&tage->linkage); + cfs_tage_free(tage); tcd->tcd_cur_pages--; } - put_pages_on_tcd_daemon_list(&pc, tcd); } /* return a page that has 'len' bytes left at the end */ @@ -678,11 +676,6 @@ int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata, cfs_tcd_for_each(tcd, i, j) { list_splice_init(&tcd->tcd_pages, &pc->pc_pages); tcd->tcd_cur_pages = 0; - - if (pc->pc_want_daemon_pages) { - list_splice_init(&tcd->tcd_daemon_pages, &pc->pc_pages); - tcd->tcd_cur_daemon_pages = 0; - } } } @@ -695,11 +688,6 @@ static void collect_pages_on_all_cpus(struct page_collection *pc) cfs_tcd_for_each_type_lock(tcd, i, cpu) { list_splice_init(&tcd->tcd_pages, &pc->pc_pages); tcd->tcd_cur_pages = 0; - if (pc->pc_want_daemon_pages) { - list_splice_init(&tcd->tcd_daemon_pages, - &pc->pc_pages); - tcd->tcd_cur_daemon_pages = 0; - } } } } @@ -746,64 +734,17 @@ static void put_pages_back(struct page_collection *pc) put_pages_back_on_all_cpus(pc); } -/* Add pages to a per-cpu debug daemon ringbuffer. This buffer makes sure that - * we have a good amount of data at all times for dumping during an LBUG, even - * if we have been steadily writing (and otherwise discarding) pages via the - * debug daemon. - */ -static void put_pages_on_tcd_daemon_list(struct page_collection *pc, - struct cfs_trace_cpu_data *tcd) -{ - struct cfs_trace_page *tage; - struct cfs_trace_page *tmp; - - list_for_each_entry_safe(tage, tmp, &pc->pc_pages, linkage) { - __LASSERT_TAGE_INVARIANT(tage); - - if (tage->cpu != tcd->tcd_cpu || tage->type != tcd->tcd_type) - continue; - - cfs_tage_to_tail(tage, &tcd->tcd_daemon_pages); - tcd->tcd_cur_daemon_pages++; - - if (tcd->tcd_cur_daemon_pages > tcd->tcd_max_pages) { - struct cfs_trace_page *victim; - - __LASSERT(!list_empty(&tcd->tcd_daemon_pages)); - victim = cfs_tage_from_list(tcd->tcd_daemon_pages.next); - - __LASSERT_TAGE_INVARIANT(victim); - - list_del(&victim->linkage); - cfs_tage_free(victim); - tcd->tcd_cur_daemon_pages--; - } - } -} - -static void put_pages_on_daemon_list(struct page_collection *pc) -{ - struct cfs_trace_cpu_data *tcd; - int i, cpu; - - for_each_possible_cpu(cpu) { - cfs_tcd_for_each_type_lock(tcd, i, cpu) - put_pages_on_tcd_daemon_list(pc, tcd); - } -} - #ifdef CONFIG_LNET_DUMP_ON_PANIC void cfs_trace_debug_print(void) { struct page_collection pc; struct cfs_trace_page *tage; struct cfs_trace_page *tmp; + struct page *page; - pc.pc_want_daemon_pages = 1; collect_pages(&pc); list_for_each_entry_safe(tage, tmp, &pc.pc_pages, linkage) { char *p, *file, *fn; - struct page *page; __LASSERT_TAGE_INVARIANT(tage); @@ -830,6 +771,34 @@ void cfs_trace_debug_print(void) list_del(&tage->linkage); cfs_tage_free(tage); } + down_write(&cfs_tracefile_sem); + while ((page = list_first_entry_or_null(&daemon_pages, + struct page, lru)) != NULL) { + char *p, *file, *fn; + + p = page_address(page); + while (p < ((char *)page_address(page) + page->private)) { + struct ptldebug_header *hdr; + int len; + + hdr = (void *)p; + p += sizeof(*hdr); + file = p; + p += strlen(file) + 1; + fn = p; + p += strlen(fn) + 1; + len = hdr->ph_len - (int)(p - (char *)hdr); + + cfs_print_to_console(hdr, D_EMERG, file, fn, + "%.*s", len, p); + + p += len; + } + list_del_init(&page->lru); + daemon_pages_count -= 1; + put_page(page); + } + up_write(&cfs_tracefile_sem); } #endif /* CONFIG_LNET_DUMP_ON_PANIC */ @@ -840,6 +809,7 @@ int cfs_tracefile_dump_all_pages(char *filename) struct cfs_trace_page *tage; struct cfs_trace_page *tmp; char *buf; + struct page *page; int rc; down_write(&cfs_tracefile_sem); @@ -854,7 +824,6 @@ int cfs_tracefile_dump_all_pages(char *filename) goto out; } - pc.pc_want_daemon_pages = 1; collect_pages(&pc); if (list_empty(&pc.pc_pages)) { rc = 0; @@ -881,8 +850,20 @@ int cfs_tracefile_dump_all_pages(char *filename) list_del(&tage->linkage); cfs_tage_free(tage); } - - rc = vfs_fsync(filp, 1); + while ((page = list_first_entry_or_null(&daemon_pages, + struct page, lru)) != NULL) { + buf = page_address(page); + rc = kernel_write(filp, buf, page->private, &filp->f_pos); + if (rc != (int)page->private) { + pr_warn("Lustre: wanted to write %u but wrote %d\n", + (int)page->private, rc); + break; + } + list_del(&page->lru); + daemon_pages_count -= 1; + put_page(page); + } + rc = vfs_fsync_range(filp, 0, LLONG_MAX, 1); if (rc) pr_err("LustreError: sync returns: rc = %d\n", rc); close: @@ -896,8 +877,8 @@ void cfs_trace_flush_pages(void) { struct page_collection pc; struct cfs_trace_page *tage; + struct page *page; - pc.pc_want_daemon_pages = 1; collect_pages(&pc); while (!list_empty(&pc.pc_pages)) { tage = list_first_entry(&pc.pc_pages, @@ -907,6 +888,15 @@ void cfs_trace_flush_pages(void) list_del(&tage->linkage); cfs_tage_free(tage); } + + down_write(&cfs_tracefile_sem); + while ((page = list_first_entry_or_null(&daemon_pages, + struct page, lru)) != NULL) { + list_del(&page->lru); + daemon_pages_count -= 1; + put_page(page); + } + up_write(&cfs_tracefile_sem); } int cfs_trace_copyout_string(char __user *usr_buffer, int usr_buffer_nob, @@ -1039,6 +1029,7 @@ int cfs_trace_set_debug_mb(int mb) cfs_tcd_for_each(tcd, i, j) tcd->tcd_max_pages = (pages * tcd->tcd_pages_factor) / 100; + daemon_pages_max = pages; up_write(&cfs_tracefile_sem); return mb; @@ -1071,9 +1062,10 @@ static int tracefiled(void *arg) int last_loop = 0; int rc; - pc.pc_want_daemon_pages = 0; - while (!last_loop) { + LIST_HEAD(for_daemon_pages); + int for_daemon_pages_count = 0; + schedule_timeout_interruptible(HZ); if (kthread_should_stop()) last_loop = 1; @@ -1095,38 +1087,55 @@ static int tracefiled(void *arg) } } up_read(&cfs_tracefile_sem); - if (!filp) { - put_pages_on_daemon_list(&pc); - __LASSERT(list_empty(&pc.pc_pages)); - continue; - } list_for_each_entry_safe(tage, tmp, &pc.pc_pages, linkage) { - struct dentry *de = file_dentry(filp); - static loff_t f_pos; - __LASSERT_TAGE_INVARIANT(tage); - if (f_pos >= (off_t)cfs_tracefile_size) - f_pos = 0; - else if (f_pos > i_size_read(de->d_inode)) - f_pos = i_size_read(de->d_inode); - - buf = kmap(tage->page); - rc = kernel_write(filp, buf, tage->used, &f_pos); - kunmap(tage->page); - - if (rc != (int)tage->used) { - pr_warn("Lustre: wanted to write %u but wrote %d\n", - tage->used, rc); - put_pages_back(&pc); - __LASSERT(list_empty(&pc.pc_pages)); - break; + if (filp) { + struct dentry *de = file_dentry(filp); + static loff_t f_pos; + + if (f_pos >= (off_t)cfs_tracefile_size) + f_pos = 0; + else if (f_pos > i_size_read(de->d_inode)) + f_pos = i_size_read(de->d_inode); + + buf = kmap(tage->page); + rc = kernel_write(filp, buf, tage->used, + &f_pos); + kunmap(tage->page); + if (rc != (int)tage->used) { + pr_warn("Lustre: wanted to write %u but wrote %d\n", + tage->used, rc); + put_pages_back(&pc); + __LASSERT(list_empty(&pc.pc_pages)); + break; + } } + list_del_init(&tage->linkage); + list_add_tail(&tage->page->lru, &for_daemon_pages); + for_daemon_pages_count += 1; + + tage->page->private = (int)tage->used; + kfree(tage); + atomic_dec(&cfs_tage_allocated); } - filp_close(filp, NULL); - put_pages_on_daemon_list(&pc); + if (filp) + filp_close(filp, NULL); + + down_write(&cfs_tracefile_sem); + list_splice_tail(&for_daemon_pages, &daemon_pages); + daemon_pages_count += for_daemon_pages_count; + while (daemon_pages_count > daemon_pages_max) { + struct page *p = list_first_entry(&daemon_pages, + struct page, lru); + list_del(&p->lru); + put_page(p); + daemon_pages_count -= 1; + } + up_write(&cfs_tracefile_sem); + if (!list_empty(&pc.pc_pages)) { int i; @@ -1233,14 +1242,13 @@ int cfs_tracefile_init(int max_pages) tcd->tcd_cpu = j; INIT_LIST_HEAD(&tcd->tcd_pages); INIT_LIST_HEAD(&tcd->tcd_stock_pages); - INIT_LIST_HEAD(&tcd->tcd_daemon_pages); tcd->tcd_cur_pages = 0; tcd->tcd_cur_stock_pages = 0; - tcd->tcd_cur_daemon_pages = 0; tcd->tcd_max_pages = (max_pages * factor) / 100; LASSERT(tcd->tcd_max_pages > 0); tcd->tcd_shutting_down = 0; } + daemon_pages_max = max_pages; return 0; @@ -1299,5 +1307,6 @@ static void cfs_trace_cleanup(void) void cfs_tracefile_exit(void) { cfs_trace_stop_thread(); + cfs_trace_flush_pages(); cfs_trace_cleanup(); } diff --git a/net/lnet/libcfs/tracefile.h b/net/lnet/libcfs/tracefile.h index af21e4a..6293a9c 100644 --- a/net/lnet/libcfs/tracefile.h +++ b/net/lnet/libcfs/tracefile.h @@ -92,22 +92,7 @@ int cfs_trace_copyout_string(char __user *usr_buffer, int usr_buffer_nob, unsigned long tcd_cur_pages; /* - * pages with trace records already processed by - * tracefiled. These pages are kept in memory, so that some - * portion of log can be written in the event of LBUG. This - * list is maintained in LRU order. - * - * Pages are moved to ->tcd_daemon_pages by tracefiled() - * (put_pages_on_daemon_list()). LRU pages from this list are - * discarded when list grows too large. - */ - struct list_head tcd_daemon_pages; - /* number of pages on ->tcd_daemon_pages */ - unsigned long tcd_cur_daemon_pages; - - /* - * Maximal number of pages allowed on ->tcd_pages and - * ->tcd_daemon_pages each. + * Maximal number of pages allowed on ->tcd_pages * Always TCD_MAX_PAGES * tcd_pages_factor / 100 in current * implementation. */