From patchwork Mon Aug 2 19:50:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414657 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3782EC4338F for ; Mon, 2 Aug 2021 19:51:00 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A3F2560724 for ; Mon, 2 Aug 2021 19:50:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A3F2560724 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EBAC7352BBD; Mon, 2 Aug 2021 12:50:58 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E0FAA35286A for ; Mon, 2 Aug 2021 12:50:54 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 4DA491007A8D; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 458E1C2F4C; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:21 -0400 Message-Id: <1627933851-7603-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 01/25] lustre: llite: avoid stale data reading X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong remove_mapping() can prohibit to kill page from page cache due page refcount!=2, in vvp_page_delete() clear uptodate flag in case stale data reading later. WC-bug-id: https://jira.whamcloud.com/browse/LU-14541 Lustre-commit: f2a16793fa4316fc9cc ("LU-14541 llite: avoid stale data reading") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/43476 Reviewed-by: Patrick Farrell Reviewed-by: Andreas Dilger Reviewed-by: Li Dongyang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/vvp_page.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c index 86353df..2ecd414 100644 --- a/fs/lustre/llite/vvp_page.c +++ b/fs/lustre/llite/vvp_page.c @@ -172,6 +172,12 @@ static void vvp_page_delete(const struct lu_env *env, ClearPagePrivate(vmpage); vmpage->private = 0; + + /** + * Vmpage might not be released due page refcount != 2, + * clear Page uptodate here to avoid stale data. + */ + ClearPageUptodate(vmpage); /* * Reference from vmpage to cl_page is removed, but the reference back * is still here. It is removed later in vvp_page_fini(). From patchwork Mon Aug 2 19:50:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414703 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86FCFC4338F for ; Mon, 2 Aug 2021 19:54:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4B88260F36 for ; Mon, 2 Aug 2021 19:54:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4B88260F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B3DBD34FCF1; Mon, 2 Aug 2021 12:54:18 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4A36635286A for ; Mon, 2 Aug 2021 12:50:55 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 516861007A8E; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 48919C2F50; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:22 -0400 Message-Id: <1627933851-7603-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 02/25] lustre: llite: No locked parallel DIO X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell If we are doing locked DIO, the OSC & LDLM locks are released at the end of cl_io_loop, ie, before we wait for parallel DIO at the llite layer. This is problematic because the locks are released before i/o done using them is complete; this can lead to data inconsistencies. (And at least one LBUG, see LU-14805.) The easiest solution for now is only do parallel DIO when working lockless (which is the default; DIO only switches to locked to manage conflicts with buffered i/o). This problem & fix apply to AIO as well as parallel DIO. WC-bug-id: https://jira.whamcloud.com/browse/LU-14805 Lustre-commit: 0f8db7e06abbc341 ("LU-14805 llite: No locked parallel DIO") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/44131 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/rw26.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c index ba9c070..0d72c3e 100644 --- a/fs/lustre/llite/rw26.c +++ b/fs/lustre/llite/rw26.c @@ -410,10 +410,19 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter) else vio->u.readwrite.vui_read += tot_bytes; - /* If async dio submission is not allowed, we must wait here. */ - if (is_sync_kiocb(iocb) && !io->ci_parallel_dio) { + /* We cannot do async submission - for AIO or regular DIO - unless + * lockless because it causes us to release the lock early. + * + * There are also several circumstances in which we must disable + * parallel DIO, so we check if it is enabled. + * + * The check for "is_sync_kiocb" excludes AIO, which does not need to + * be disabled in these situations. + */ + if (io->ci_dio_lock || (is_sync_kiocb(iocb) && !io->ci_parallel_dio)) { ssize_t rc2; + /* Wait here rather than doing async submission */ rc2 = cl_sync_io_wait_recycle(env, &aio->cda_sync, 0, 0); if (result == 0 && rc2) result = rc2; From patchwork Mon Aug 2 19:50:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414705 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D7BBC4338F for ; Mon, 2 Aug 2021 19:54:22 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0267960FC2 for ; Mon, 2 Aug 2021 19:54:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0267960FC2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2F975351FD1; Mon, 2 Aug 2021 12:54:19 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8496635286A for ; Mon, 2 Aug 2021 12:50:55 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 53C4F1007A8F; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4B3B5C2F53; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:23 -0400 Message-Id: <1627933851-7603-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 03/25] lnet: discard lnet_current_net_count X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown The variable lnet_current_net_count is never used. So remove it. The function lnet_get_net_count() is only used to update thar variable, so remove it too. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lsutre-commit: a39f07804153f4f4 ("LU-6142 lnet: discard lnet_current_net_count") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/44089 Reviewed-by: James Simmons Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 - net/lnet/lnet/api-ni.c | 22 ---------------------- 2 files changed, 23 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index f56ecab..3677a12 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -804,7 +804,6 @@ bool lnet_net_unique(u32 net_id, struct list_head *nilist, bool lnet_ni_unique_net(struct list_head *nilist, char *iface); void lnet_incr_dlc_seq(void); u32 lnet_get_dlc_seq_locked(void); -int lnet_get_net_count(void); struct lnet_peer_net *lnet_get_next_peer_net_locked(struct lnet_peer *lp, u32 prev_lpn_id); diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index dc9020d..ec28139 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -196,8 +196,6 @@ static void lnet_set_lnd_timeout(void) (lnet_retry_count + 1); } -unsigned int lnet_current_net_count; - /* * This sequence number keeps track of how many times DLC was used to * update the local NIs. It is incremented when a NI is added or @@ -1671,23 +1669,6 @@ struct lnet_ping_buffer * return count; } -int -lnet_get_net_count(void) -{ - struct lnet_net *net; - int count = 0; - - lnet_net_lock(0); - - list_for_each_entry(net, &the_lnet.ln_nets, net_list) { - count++; - } - - lnet_net_unlock(0); - - return count; -} - void lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf) { @@ -2516,9 +2497,6 @@ static void lnet_push_target_fini(void) lnet_net_unlock(LNET_LOCK_EX); } - /* update net count */ - lnet_current_net_count = lnet_get_net_count(); - return ni_count; failed1: From patchwork Mon Aug 2 19:50:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414707 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50F35C4338F for ; Mon, 2 Aug 2021 19:54:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1152161037 for ; Mon, 2 Aug 2021 19:54:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1152161037 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7EB4E351B45; Mon, 2 Aug 2021 12:54:22 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id BBD5D35286A for ; Mon, 2 Aug 2021 12:50:55 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 5588C1007AA4; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4F51AC2F55; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:24 -0400 Message-Id: <1627933851-7603-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 04/25] lnet: convert kiblnd/ksocknal_thread_start to vararg X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown Rather than requiring the called to format a thread name into a temp buffer, change these thread_start function to accept a format and args, and to hand them directly to kthread_run(). This is done with a macro rather than a function as the functions are trivial and varargs is slightly easier with macros. WC-bug-id: https://jira.whamcloud.com/browse/LU-6142 Lustre-commit: 9976d2c35d40a170 ("LU-6142 lnet: convert kiblnd/ksocknal_thread_start to vararg") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/44122 Reviewed-by: James Simmons Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 10 ++++------ net/lnet/klnds/o2iblnd/o2iblnd.h | 10 +++++++++- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 12 ------------ net/lnet/klnds/socklnd/socklnd.c | 16 ++++++---------- net/lnet/klnds/socklnd/socklnd.h | 10 +++++++++- net/lnet/klnds/socklnd/socklnd_cb.c | 17 ++--------------- 6 files changed, 30 insertions(+), 45 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index b519a31..3141953 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -2712,13 +2712,11 @@ static int kiblnd_start_schedulers(struct kib_sched_info *sched) } for (i = 0; i < nthrs; i++) { - long id; - char name[20]; + long id = KIB_THREAD_ID(sched->ibs_cpt, sched->ibs_nthreads + i); - id = KIB_THREAD_ID(sched->ibs_cpt, sched->ibs_nthreads + i); - snprintf(name, sizeof(name), "kiblnd_sd_%02ld_%02ld", - KIB_THREAD_CPT(id), KIB_THREAD_TID(id)); - rc = kiblnd_thread_start(kiblnd_scheduler, (void *)id, name); + rc = kiblnd_thread_start(kiblnd_scheduler, (void *)id, + "kiblnd_sd_%02ld_%02ld", + KIB_THREAD_CPT(id), KIB_THREAD_TID(id)); if (!rc) continue; diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h index 8d1d7eb..3691bfe 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.h +++ b/net/lnet/klnds/o2iblnd/o2iblnd.h @@ -907,7 +907,15 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx, int kiblnd_connd(void *arg); int kiblnd_scheduler(void *arg); -int kiblnd_thread_start(int (*fn)(void *arg), void *arg, char *name); +#define kiblnd_thread_start(fn, data, namefmt, arg...) \ + ({ \ + struct task_struct *__task = kthread_run(fn, data, \ + namefmt, ##arg);\ + if (!IS_ERR(__task)) \ + atomic_inc(&kiblnd_data.kib_nthreads); \ + PTR_ERR_OR_ZERO(__task); \ + }) + int kiblnd_failover_thread(void *arg); int kiblnd_alloc_pages(struct kib_pages **pp, int cpt, int npages); diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 32ccac2..193e75b 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1830,18 +1830,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, return rc; } -int -kiblnd_thread_start(int (*fn)(void *arg), void *arg, char *name) -{ - struct task_struct *task = kthread_run(fn, arg, "%s", name); - - if (IS_ERR(task)) - return PTR_ERR(task); - - atomic_inc(&kiblnd_data.kib_nthreads); - return 0; -} - static void kiblnd_thread_fini(void) { diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index e15f1c0..cbbbb0c 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -2066,15 +2066,13 @@ static int ksocknal_device_event(struct notifier_block *unused, } for (i = 0; i < *ksocknal_tunables.ksnd_nconnds; i++) { - char name[16]; - spin_lock_bh(&ksocknal_data.ksnd_connd_lock); ksocknal_data.ksnd_connd_starting++; spin_unlock_bh(&ksocknal_data.ksnd_connd_lock); - snprintf(name, sizeof(name), "socknal_cd%02d", i); rc = ksocknal_thread_start(ksocknal_connd, - (void *)((uintptr_t)i), name); + (void *)((uintptr_t)i), + "socknal_cd%02d", i); if (rc) { spin_lock_bh(&ksocknal_data.ksnd_connd_lock); ksocknal_data.ksnd_connd_starting--; @@ -2241,14 +2239,12 @@ static int ksocknal_device_event(struct notifier_block *unused, for (i = 0; i < nthrs; i++) { long id; - char name[20]; id = KSOCK_THREAD_ID(sched->kss_cpt, sched->kss_nthreads + i); - snprintf(name, sizeof(name), "socknal_sd%02d_%02d", - sched->kss_cpt, (int)KSOCK_THREAD_SID(id)); - - rc = ksocknal_thread_start(ksocknal_scheduler, - (void *)id, name); + rc = ksocknal_thread_start(ksocknal_scheduler, (void *)id, + "socknal_sd%02d_%02d", + sched->kss_cpt, + (int)KSOCK_THREAD_SID(id)); if (!rc) continue; diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h index 357769a..45103a3 100644 --- a/net/lnet/klnds/socklnd/socklnd.h +++ b/net/lnet/klnds/socklnd/socklnd.h @@ -650,7 +650,15 @@ int ksocknal_launch_packet(struct lnet_ni *ni, struct ksock_tx *tx, void ksocknal_queue_tx_locked(struct ksock_tx *tx, struct ksock_conn *conn); void ksocknal_txlist_done(struct lnet_ni *ni, struct list_head *txlist, int error); void ksocknal_query(struct lnet_ni *ni, lnet_nid_t nid, time64_t *when); -int ksocknal_thread_start(int (*fn)(void *arg), void *arg, char *name); +#define ksocknal_thread_start(fn, data, namefmt, arg...) \ + ({ \ + struct task_struct *__task = kthread_run(fn, data, \ + namefmt, ##arg);\ + if (!IS_ERR(__task)) \ + atomic_inc(&ksocknal_data.ksnd_nthreads); \ + PTR_ERR_OR_ZERO(__task); \ + }) + void ksocknal_thread_fini(void); void ksocknal_launch_all_connections_locked(struct ksock_peer_ni *peer_ni); struct ksock_conn_cb * diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c index bfb98f5..efec479 100644 --- a/net/lnet/klnds/socklnd/socklnd_cb.c +++ b/net/lnet/klnds/socklnd/socklnd_cb.c @@ -966,18 +966,6 @@ struct ksock_conn_cb * return -EIO; } -int -ksocknal_thread_start(int (*fn)(void *arg), void *arg, char *name) -{ - struct task_struct *task = kthread_run(fn, arg, "%s", name); - - if (IS_ERR(task)) - return PTR_ERR(task); - - atomic_inc(&ksocknal_data.ksnd_nthreads); - return 0; -} - void ksocknal_thread_fini(void) { @@ -1951,7 +1939,6 @@ void ksocknal_write_callback(struct ksock_conn *conn) static int ksocknal_connd_check_start(time64_t sec, long *timeout) { - char name[16]; int rc; int total = ksocknal_data.ksnd_connd_starting + ksocknal_data.ksnd_connd_running; @@ -1991,8 +1978,8 @@ void ksocknal_write_callback(struct ksock_conn *conn) spin_unlock_bh(&ksocknal_data.ksnd_connd_lock); /* NB: total is the next id */ - snprintf(name, sizeof(name), "socknal_cd%02d", total); - rc = ksocknal_thread_start(ksocknal_connd, NULL, name); + rc = ksocknal_thread_start(ksocknal_connd, NULL, + "socknal_cd%02d", total); spin_lock_bh(&ksocknal_data.ksnd_connd_lock); if (!rc) From patchwork Mon Aug 2 19:50:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414711 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 558B9C4338F for ; Mon, 2 Aug 2021 19:54:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A4B7260F36 for ; Mon, 2 Aug 2021 19:54:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A4B7260F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CA473352F18; Mon, 2 Aug 2021 12:54:25 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 13D5335286A for ; Mon, 2 Aug 2021 12:50:56 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 5A74F1007AA6; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 528E0C2F56; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:25 -0400 Message-Id: <1627933851-7603-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 05/25] lnet: print device status in net show command X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cyril Bordage , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Cyril Bordage A device can be in fatal state, if the cable was disconnected, or the port brought down on the switch side. In these cases, the LND (o2iblnd for now), will flag the device in fatal state. That device will not be used any further. However, it's health will not be decremented. This causes some confusion when examining the state of the node. It is better to print the device status in the output of the lnetctl net show command. WC-bug-id: https://jira.whamcloud.com/browse/LU-14114 Lustre-commit: f75ff33d9fbefd69 ("LU-14114 lnet: print device status in net show command") Signed-off-by: Cyril Bordage Reviewed-on: https://review.whamcloud.com/44169 Reviewed-by: Amir Shehata Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lnet/lnet-dlc.h | 1 + net/lnet/lnet/api-ni.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h index c1c063f..ef60224 100644 --- a/include/uapi/linux/lnet/lnet-dlc.h +++ b/include/uapi/linux/lnet/lnet-dlc.h @@ -190,6 +190,7 @@ struct lnet_ioctl_local_ni_hstats { __u32 hlni_local_no_route; __u32 hlni_local_timeout; __u32 hlni_local_error; + __s32 hlni_fatal_error; __s32 hlni_health_value; __u32 hlni_ping_count; __u64 hlni_next_ping; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index ec28139..4513d8d 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3692,6 +3692,8 @@ u32 lnet_get_dlc_seq_locked(void) atomic_read(&ni->ni_hstats.hlt_local_timeout); stats->hlni_local_error = atomic_read(&ni->ni_hstats.hlt_local_error); + stats->hlni_fatal_error = + atomic_read(&ni->ni_fatal_error_on); stats->hlni_health_value = atomic_read(&ni->ni_healthv); stats->hlni_ping_count = ni->ni_ping_count; From patchwork Mon Aug 2 19:50:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7275C4320A for ; Mon, 2 Aug 2021 19:54:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ADA0B60FC2 for ; Mon, 2 Aug 2021 19:54:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org ADA0B60FC2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0A7A5352DF4; Mon, 2 Aug 2021 12:54:23 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4B6D635286A for ; Mon, 2 Aug 2021 12:50:56 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 5C93D1007AA7; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 557FCC2F57; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:26 -0400 Message-Id: <1627933851-7603-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 06/25] lustre: lmv: getattr_name("..") under striped directory X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao For getattr_name(".."), it should return FID of the master object for striped directories. This includes changes on both client and server: * lmv_getattr_name() should use master object FID if it's looking up "..". * mdt_raw_lookup() should check parent object is sub stripe, if so it needs to lookup again to get master object FID. For old client without above change this needs to be checked twice. This is needed by NFS export, because ll_get_parent() find parent by getattr_name(".."). Reenable check_fhandle_syscall and update sanityn test_102. WC-bug-id: https://jira.whamcloud.com/browse/LU-14826 Lustre-commit: cbc62b0b829afdce ("LU-14826 mdt: getattr_name("..") under striped directory") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/44168 Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Signed-off-by: James Simmons --- fs/lustre/lmv/lmv_obd.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 2f84028..1d9b830 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1945,7 +1945,11 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, int rc; retry: - tgt = lmv_locate_tgt(lmv, op_data); + if (op_data->op_namelen == 2 && + op_data->op_name[0] == '.' && op_data->op_name[1] == '.') + tgt = lmv_fid2tgt(lmv, &op_data->op_fid1); + else + tgt = lmv_locate_tgt(lmv, op_data); if (IS_ERR(tgt)) return PTR_ERR(tgt); From patchwork Mon Aug 2 19:50:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414661 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBBD2C4338F for ; Mon, 2 Aug 2021 19:51:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8F75E60F36 for ; Mon, 2 Aug 2021 19:51:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8F75E60F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 31C02352D70; Mon, 2 Aug 2021 12:51:04 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8300E35286A for ; Mon, 2 Aug 2021 12:50:56 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 5F8BE1007AB9; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 59E16C2F46; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:27 -0400 Message-Id: <1627933851-7603-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 07/25] lustre: llite: revert 'simplify callback handling for async getattr' X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger This reverts commit 248f68f27de7d18c58a44114a46259141ca53115. This is causing process hangs and timeouts during file removal. Fixes: 248f68f27d ("lustre: llite: simplify callback handling for async getattr") WC-bug-id: https://jira.whamcloud.com/browse/LU-14868 Lustre-commit: e90794af4bfac3a5 ("U-14868 llite: revert 'simplify callback handling for async getattr'") Reviewed-on: https://review.whamcloud.com/44371 Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 32 ++-- fs/lustre/include/obd_class.h | 4 +- fs/lustre/llite/llite_internal.h | 7 +- fs/lustre/llite/statahead.c | 319 ++++++++++++++++++++++++++------------- fs/lustre/lmv/lmv_obd.c | 6 +- fs/lustre/mdc/mdc_internal.h | 3 +- fs/lustre/mdc/mdc_locks.c | 31 ++-- 7 files changed, 252 insertions(+), 150 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index eeb6262..f619342 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -818,24 +818,18 @@ struct md_callback { void *data, int flag); }; -enum md_opcode { - MD_OP_NONE = 0, - MD_OP_GETATTR = 1, - MD_OP_MAX, -}; - -struct md_op_item { - enum md_opcode mop_opc; - struct md_op_data mop_data; - struct lookup_intent mop_it; - struct lustre_handle mop_lockh; - struct ldlm_enqueue_info mop_einfo; - int (*mop_cb)(struct req_capsule *pill, - struct md_op_item *item, - int rc); - void *mop_cbdata; - struct inode *mop_dir; - u64 mop_lock_flags; +struct md_enqueue_info; +/* metadata stat-ahead */ + +struct md_enqueue_info { + struct md_op_data mi_data; + struct lookup_intent mi_it; + struct lustre_handle mi_lockh; + struct inode *mi_dir; + struct ldlm_enqueue_info mi_einfo; + int (*mi_cb)(struct ptlrpc_request *req, + struct md_enqueue_info *minfo, int rc); + void *mi_cbdata; }; struct obd_ops { @@ -1067,7 +1061,7 @@ struct md_ops { struct lu_fid *fid); int (*intent_getattr_async)(struct obd_export *exp, - struct md_op_item *item); + struct md_enqueue_info *minfo); int (*revalidate_lock)(struct obd_export *, struct lookup_intent *, struct lu_fid *, u64 *bits); diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index ad9b2fc..f2a3d2b 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -1594,7 +1594,7 @@ static inline int md_init_ea_size(struct obd_export *exp, u32 easize, } static inline int md_intent_getattr_async(struct obd_export *exp, - struct md_op_item *item) + struct md_enqueue_info *minfo) { int rc; @@ -1605,7 +1605,7 @@ static inline int md_intent_getattr_async(struct obd_export *exp, lprocfs_counter_incr(exp->exp_obd->obd_md_stats, LPROC_MD_INTENT_GETATTR_ASYNC); - return MDP(exp->exp_obd, intent_getattr_async)(exp, item); + return MDP(exp->exp_obd, intent_getattr_async)(exp, minfo); } static inline int md_revalidate_lock(struct obd_export *exp, diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 6cae741..2247806 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -1480,12 +1480,17 @@ struct ll_statahead_info { * is not a hidden one */ unsigned int sai_skip_hidden;/* skipped hidden dentry count */ - unsigned int sai_ls_all:1; /* "ls -al", do stat-ahead for + unsigned int sai_ls_all:1, /* "ls -al", do stat-ahead for * hidden entries */ + sai_in_readpage:1;/* statahead in readdir() */ wait_queue_head_t sai_waitq; /* stat-ahead wait queue */ struct task_struct *sai_task; /* stat-ahead thread */ struct task_struct *sai_agl_task; /* AGL thread */ + struct list_head sai_interim_entries; /* entries which got async + * stat reply, but not + * instantiated + */ struct list_head sai_entries; /* completed entries */ struct list_head sai_agls; /* AGLs to be sent */ struct list_head sai_cache[LL_SA_CACHE_SIZE]; diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index becd0e1..8930f61 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -32,6 +32,7 @@ #include #include +#include #include #include #include @@ -55,12 +56,13 @@ enum se_stat { /* * sa_entry is not refcounted: statahead thread allocates it and do async stat, - * and in async stat callback ll_statahead_interpret() will prepare the inode - * and set lock data in the ptlrpcd context. Then the scanner process will be - * woken up if this entry is the waiting one, can access and free it. + * and in async stat callback ll_statahead_interpret() will add it into + * sai_interim_entries, later statahead thread will call sa_handle_callback() to + * instantiate entry and move it into sai_entries, and then only scanner process + * can access and free it. */ struct sa_entry { - /* link into sai_entries */ + /* link into sai_interim_entries or sai_entries */ struct list_head se_list; /* link into sai hash table locally */ struct list_head se_hash; @@ -72,6 +74,10 @@ struct sa_entry { enum se_stat se_state; /* entry size, contains name */ int se_size; + /* pointer to async getattr enqueue info */ + struct md_enqueue_info *se_minfo; + /* pointer to the async getattr request */ + struct ptlrpc_request *se_req; /* pointer to the target inode */ struct inode *se_inode; /* entry name */ @@ -131,6 +137,12 @@ static inline int sa_sent_full(struct ll_statahead_info *sai) return atomic_read(&sai->sai_cache_count) >= sai->sai_max; } +/* got async stat replies */ +static inline int sa_has_callback(struct ll_statahead_info *sai) +{ + return !list_empty(&sai->sai_interim_entries); +} + static inline int agl_list_empty(struct ll_statahead_info *sai) { return list_empty(&sai->sai_agls); @@ -316,55 +328,55 @@ static void sa_free(struct ll_statahead_info *sai, struct sa_entry *entry) } /* finish async stat RPC arguments */ -static void sa_fini_data(struct md_op_item *item) +static void sa_fini_data(struct md_enqueue_info *minfo) { - ll_unlock_md_op_lsm(&item->mop_data); - iput(item->mop_dir); - kfree(item); + ll_unlock_md_op_lsm(&minfo->mi_data); + iput(minfo->mi_dir); + kfree(minfo); } -static int ll_statahead_interpret(struct req_capsule *pill, - struct md_op_item *item, int rc); +static int ll_statahead_interpret(struct ptlrpc_request *req, + struct md_enqueue_info *minfo, int rc); /* * prepare arguments for async stat RPC. */ -static struct md_op_item * +static struct md_enqueue_info * sa_prep_data(struct inode *dir, struct inode *child, struct sa_entry *entry) { - struct md_op_item *item; + struct md_enqueue_info *minfo; struct ldlm_enqueue_info *einfo; - struct md_op_data *op_data; + struct md_op_data *op_data; - item = kzalloc(sizeof(*item), GFP_NOFS); - if (!item) + minfo = kzalloc(sizeof(*minfo), GFP_NOFS); + if (!minfo) return ERR_PTR(-ENOMEM); - op_data = ll_prep_md_op_data(&item->mop_data, dir, child, + op_data = ll_prep_md_op_data(&minfo->mi_data, dir, child, entry->se_qstr.name, entry->se_qstr.len, 0, LUSTRE_OPC_ANY, NULL); if (IS_ERR(op_data)) { - kfree(item); - return ERR_CAST(item); + kfree(minfo); + return (struct md_enqueue_info *)op_data; } if (!child) op_data->op_fid2 = entry->se_fid; - item->mop_it.it_op = IT_GETATTR; - item->mop_dir = igrab(dir); - item->mop_cb = ll_statahead_interpret; - item->mop_cbdata = entry; - - einfo = &item->mop_einfo; - einfo->ei_type = LDLM_IBITS; - einfo->ei_mode = it_to_lock_mode(&item->mop_it); - einfo->ei_cb_bl = ll_md_blocking_ast; - einfo->ei_cb_cp = ldlm_completion_ast; - einfo->ei_cb_gl = NULL; + minfo->mi_it.it_op = IT_GETATTR; + minfo->mi_dir = igrab(dir); + minfo->mi_cb = ll_statahead_interpret; + minfo->mi_cbdata = entry; + + einfo = &minfo->mi_einfo; + einfo->ei_type = LDLM_IBITS; + einfo->ei_mode = it_to_lock_mode(&minfo->mi_it); + einfo->ei_cb_bl = ll_md_blocking_ast; + einfo->ei_cb_cp = ldlm_completion_ast; + einfo->ei_cb_gl = NULL; einfo->ei_cbdata = NULL; - return item; + return minfo; } /* @@ -375,8 +387,22 @@ static int ll_statahead_interpret(struct req_capsule *pill, sa_make_ready(struct ll_statahead_info *sai, struct sa_entry *entry, int ret) { struct ll_inode_info *lli = ll_i2info(sai->sai_dentry->d_inode); + struct md_enqueue_info *minfo = entry->se_minfo; + struct ptlrpc_request *req = entry->se_req; bool wakeup; + /* release resources used in RPC */ + if (minfo) { + entry->se_minfo = NULL; + ll_intent_release(&minfo->mi_it); + sa_fini_data(minfo); + } + + if (req) { + entry->se_req = NULL; + ptlrpc_req_finished(req); + } + spin_lock(&lli->lli_sa_lock); wakeup = __sa_make_ready(sai, entry, ret); spin_unlock(&lli->lli_sa_lock); @@ -433,6 +459,7 @@ static struct ll_statahead_info *ll_sai_alloc(struct dentry *dentry) sai->sai_index = 1; init_waitqueue_head(&sai->sai_waitq); + INIT_LIST_HEAD(&sai->sai_interim_entries); INIT_LIST_HEAD(&sai->sai_entries); INIT_LIST_HEAD(&sai->sai_agls); @@ -495,6 +522,7 @@ static void ll_sai_put(struct ll_statahead_info *sai) LASSERT(sai->sai_task == NULL); LASSERT(sai->sai_agl_task == NULL); LASSERT(sai->sai_sent == sai->sai_replied); + LASSERT(!sa_has_callback(sai)); list_for_each_entry_safe(entry, next, &sai->sai_entries, se_list) @@ -585,63 +613,26 @@ static void ll_agl_trigger(struct inode *inode, struct ll_statahead_info *sai) } /* - * Callback for async stat RPC, this is called in ptlrpcd context. It prepares - * the inode and set lock data directly in the ptlrpcd context. It will wake up - * the directory listing process if the dentry is the waiting one. + * prepare inode for sa entry, add it into agl list, now sa_entry is ready + * to be used by scanner process. */ -static int ll_statahead_interpret(struct req_capsule *pill, - struct md_op_item *item, int rc) +static void sa_instantiate(struct ll_statahead_info *sai, + struct sa_entry *entry) { - struct lookup_intent *it = &item->mop_it; - struct inode *dir = item->mop_dir; - struct ll_inode_info *lli = ll_i2info(dir); - struct ll_statahead_info *sai = lli->lli_sai; - struct sa_entry *entry = (struct sa_entry *)item->mop_cbdata; - struct mdt_body *body; + struct inode *dir = sai->sai_dentry->d_inode; struct inode *child; - u64 handle = 0; - - if (it_disposition(it, DISP_LOOKUP_NEG)) - rc = -ENOENT; - - /* - * because statahead thread will wait for all inflight RPC to finish, - * sai should be always valid, no need to refcount - */ - LASSERT(sai); - LASSERT(entry); - - CDEBUG(D_READA, "sa_entry %.*s rc %d\n", - entry->se_qstr.len, entry->se_qstr.name, rc); - - if (rc != 0) { - ll_intent_release(it); - sa_fini_data(item); - } else { - /* - * release ibits lock ASAP to avoid deadlock when statahead - * thread enqueues lock on parent in readdir and another - * process enqueues lock on child with parent lock held, eg. - * unlink. - */ - handle = it->it_lock_handle; - ll_intent_drop_lock(it); - ll_unlock_md_op_lsm(&item->mop_data); - } - - if (rc != 0) { - spin_lock(&lli->lli_sa_lock); - if (__sa_make_ready(sai, entry, rc)) - wake_up(&sai->sai_waitq); - - sai->sai_replied++; - spin_unlock(&lli->lli_sa_lock); + struct md_enqueue_info *minfo; + struct lookup_intent *it; + struct ptlrpc_request *req; + struct mdt_body *body; + int rc = 0; - return rc; - } + LASSERT(entry->se_handle != 0); - entry->se_handle = handle; - body = req_capsule_server_get(pill, &RMF_MDT_BODY); + minfo = entry->se_minfo; + it = &minfo->mi_it; + req = entry->se_req; + body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY); if (!body) { rc = -EFAULT; goto out; @@ -649,7 +640,7 @@ static int ll_statahead_interpret(struct req_capsule *pill, child = entry->se_inode; /* revalidate; unlinked and re-created with the same name */ - if (unlikely(!lu_fid_eq(&item->mop_data.op_fid2, &body->mbo_fid1))) { + if (unlikely(!lu_fid_eq(&minfo->mi_data.op_fid2, &body->mbo_fid1))) { if (child) { entry->se_inode = NULL; iput(child); @@ -666,7 +657,7 @@ static int ll_statahead_interpret(struct req_capsule *pill, goto out; } - rc = ll_prep_inode(&child, pill, dir->i_sb, it); + rc = ll_prep_inode(&child, &req->rq_pill, dir->i_sb, it); if (rc) goto out; @@ -679,18 +670,107 @@ static int ll_statahead_interpret(struct req_capsule *pill, if (agl_should_run(sai, child)) ll_agl_add(sai, child, entry->se_index); + out: /* - * First it will drop ldlm ibits lock refcount by calling + * sa_make_ready() will drop ldlm ibits lock refcount by calling * ll_intent_drop_lock() in spite of failures. Do not worry about * calling ll_intent_drop_lock() more than once. */ - ll_intent_release(&item->mop_it); - sa_fini_data(item); sa_make_ready(sai, entry, rc); +} + +/* once there are async stat replies, instantiate sa_entry from replies */ +static void sa_handle_callback(struct ll_statahead_info *sai) +{ + struct ll_inode_info *lli; + + lli = ll_i2info(sai->sai_dentry->d_inode); spin_lock(&lli->lli_sa_lock); + while (sa_has_callback(sai)) { + struct sa_entry *entry; + + entry = list_first_entry(&sai->sai_interim_entries, + struct sa_entry, se_list); + list_del_init(&entry->se_list); + spin_unlock(&lli->lli_sa_lock); + + sa_instantiate(sai, entry); + spin_lock(&lli->lli_sa_lock); + } + spin_unlock(&lli->lli_sa_lock); +} + +/* + * callback for async stat RPC, because this is called in ptlrpcd context, we + * only put sa_entry in sai_interim_entries, and wake up statahead thread to + * really prepare inode and instantiate sa_entry later. + */ +static int ll_statahead_interpret(struct ptlrpc_request *req, + struct md_enqueue_info *minfo, int rc) +{ + struct lookup_intent *it = &minfo->mi_it; + struct inode *dir = minfo->mi_dir; + struct ll_inode_info *lli = ll_i2info(dir); + struct ll_statahead_info *sai = lli->lli_sai; + struct sa_entry *entry = (struct sa_entry *)minfo->mi_cbdata; + u64 handle = 0; + + if (it_disposition(it, DISP_LOOKUP_NEG)) + rc = -ENOENT; + + /* + * because statahead thread will wait for all inflight RPC to finish, + * sai should be always valid, no need to refcount + */ + LASSERT(sai); + LASSERT(entry); + + CDEBUG(D_READA, "sa_entry %.*s rc %d\n", + entry->se_qstr.len, entry->se_qstr.name, rc); + + if (rc) { + ll_intent_release(it); + sa_fini_data(minfo); + } else { + /* + * release ibits lock ASAP to avoid deadlock when statahead + * thread enqueues lock on parent in readdir and another + * process enqueues lock on child with parent lock held, eg. + * unlink. + */ + handle = it->it_lock_handle; + ll_intent_drop_lock(it); + ll_unlock_md_op_lsm(&minfo->mi_data); + } + + spin_lock(&lli->lli_sa_lock); + if (rc) { + if (__sa_make_ready(sai, entry, rc)) + wake_up(&sai->sai_waitq); + } else { + int first = 0; + + entry->se_minfo = minfo; + entry->se_req = ptlrpc_request_addref(req); + /* + * Release the async ibits lock ASAP to avoid deadlock + * when statahead thread tries to enqueue lock on parent + * for readpage and other tries to enqueue lock on child + * with parent's lock held, for example: unlink. + */ + entry->se_handle = handle; + if (!sa_has_callback(sai)) + first = 1; + + list_add_tail(&entry->se_list, &sai->sai_interim_entries); + + if (first && sai->sai_task) + wake_up_process(sai->sai_task); + } sai->sai_replied++; + spin_unlock(&lli->lli_sa_lock); return rc; @@ -699,16 +779,16 @@ static int ll_statahead_interpret(struct req_capsule *pill, /* async stat for file not found in dcache */ static int sa_lookup(struct inode *dir, struct sa_entry *entry) { - struct md_op_item *item; + struct md_enqueue_info *minfo; int rc; - item = sa_prep_data(dir, NULL, entry); - if (IS_ERR(item)) - return PTR_ERR(item); + minfo = sa_prep_data(dir, NULL, entry); + if (IS_ERR(minfo)) + return PTR_ERR(minfo); - rc = md_intent_getattr_async(ll_i2mdexp(dir), item); + rc = md_intent_getattr_async(ll_i2mdexp(dir), minfo); if (rc) - sa_fini_data(item); + sa_fini_data(minfo); return rc; } @@ -728,7 +808,7 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry, .it_op = IT_GETATTR, .it_lock_handle = 0 }; - struct md_op_item *item; + struct md_enqueue_info *minfo; int rc; if (unlikely(!inode)) @@ -737,9 +817,9 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry, if (d_mountpoint(dentry)) return 1; - item = sa_prep_data(dir, inode, entry); - if (IS_ERR(item)) - return PTR_ERR(item); + minfo = sa_prep_data(dir, inode, entry); + if (IS_ERR(minfo)) + return PTR_ERR(minfo); entry->se_inode = igrab(inode); rc = md_revalidate_lock(ll_i2mdexp(dir), &it, ll_inode2fid(inode), @@ -747,15 +827,15 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry, if (rc == 1) { entry->se_handle = it.it_lock_handle; ll_intent_release(&it); - sa_fini_data(item); + sa_fini_data(minfo); return 1; } - rc = md_intent_getattr_async(ll_i2mdexp(dir), item); + rc = md_intent_getattr_async(ll_i2mdexp(dir), minfo); if (rc) { entry->se_inode = NULL; iput(inode); - sa_fini_data(item); + sa_fini_data(minfo); } return rc; @@ -815,6 +895,9 @@ static int ll_agl_thread(void *arg) while (({set_current_state(TASK_IDLE); !kthread_should_stop(); })) { spin_lock(&plli->lli_agl_lock); + /* The statahead thread maybe help to process AGL entries, + * so check whether list empty again. + */ clli = list_first_entry_or_null(&sai->sai_agls, struct ll_inode_info, lli_agl_list); @@ -852,10 +935,9 @@ static void ll_stop_agl(struct ll_statahead_info *sai) kthread_stop(agl_task); spin_lock(&plli->lli_agl_lock); - clli = list_first_entry_or_null(&sai->sai_agls, - struct ll_inode_info, - lli_agl_list); - if (clli) { + while ((clli = list_first_entry_or_null(&sai->sai_agls, + struct ll_inode_info, + lli_agl_list)) != NULL) { list_del_init(&clli->lli_agl_list); spin_unlock(&plli->lli_agl_lock); clli->lli_agl_index = 0; @@ -928,8 +1010,10 @@ static int ll_statahead_thread(void *arg) break; } + sai->sai_in_readpage = 1; page = ll_get_dir_page(dir, op_data, pos); ll_unlock_md_op_lsm(op_data); + sai->sai_in_readpage = 0; if (IS_ERR(page)) { rc = PTR_ERR(page); CDEBUG(D_READA, @@ -993,9 +1077,14 @@ static int ll_statahead_thread(void *arg) while (({set_current_state(TASK_IDLE); sai->sai_task; })) { + if (sa_has_callback(sai)) { + __set_current_state(TASK_RUNNING); + sa_handle_callback(sai); + } + spin_lock(&lli->lli_agl_lock); while (sa_sent_full(sai) && - !list_empty(&sai->sai_agls)) { + !agl_list_empty(sai)) { struct ll_inode_info *clli; __set_current_state(TASK_RUNNING); @@ -1047,11 +1136,16 @@ static int ll_statahead_thread(void *arg) /* * statahead is finished, but statahead entries need to be cached, wait - * for file release closedir() call to stop me. + * for file release to stop me. */ while (({set_current_state(TASK_IDLE); sai->sai_task; })) { - schedule(); + if (sa_has_callback(sai)) { + __set_current_state(TASK_RUNNING); + sa_handle_callback(sai); + } else { + schedule(); + } } __set_current_state(TASK_RUNNING); out: @@ -1061,9 +1155,13 @@ static int ll_statahead_thread(void *arg) * wait for inflight statahead RPCs to finish, and then we can free sai * safely because statahead RPC will access sai data */ - while (sai->sai_sent != sai->sai_replied) + while (sai->sai_sent != sai->sai_replied) { /* in case we're not woken up, timeout wait */ msleep(125); + } + + /* release resources held by statahead RPCs */ + sa_handle_callback(sai); CDEBUG(D_READA, "statahead thread stopped: sai %p, parent %pd\n", sai, parent); @@ -1325,6 +1423,10 @@ static int revalidate_statahead_dentry(struct inode *dir, goto out_unplug; } + /* if statahead is busy in readdir, help it do post-work */ + if (!sa_ready(entry) && sai->sai_in_readpage) + sa_handle_callback(sai); + if (!sa_ready(entry)) { spin_lock(&lli->lli_sa_lock); sai->sai_index_wait = entry->se_index; @@ -1497,7 +1599,6 @@ static int start_statahead_thread(struct inode *dir, struct dentry *dentry, sai->sai_task = task; wake_up_process(task); - /* * We don't stat-ahead for the first dirent since we are already in * lookup. diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 1d9b830..71bf7811 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -3438,9 +3438,9 @@ static int lmv_clear_open_replay_data(struct obd_export *exp, } static int lmv_intent_getattr_async(struct obd_export *exp, - struct md_op_item *item) + struct md_enqueue_info *minfo) { - struct md_op_data *op_data = &item->mop_data; + struct md_op_data *op_data = &minfo->mi_data; struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; struct lmv_tgt_desc *ptgt = NULL; @@ -3464,7 +3464,7 @@ static int lmv_intent_getattr_async(struct obd_export *exp, if (ctgt != ptgt) return -EREMOTE; - return md_intent_getattr_async(ptgt->ltd_exp, item); + return md_intent_getattr_async(ptgt->ltd_exp, minfo); } static int lmv_revalidate_lock(struct obd_export *exp, struct lookup_intent *it, diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h index 2416607..fab40bd 100644 --- a/fs/lustre/mdc/mdc_internal.h +++ b/fs/lustre/mdc/mdc_internal.h @@ -130,7 +130,8 @@ int mdc_cancel_unused(struct obd_export *exp, const struct lu_fid *fid, int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it, struct lu_fid *fid, u64 *bits); -int mdc_intent_getattr_async(struct obd_export *exp, struct md_op_item *item); +int mdc_intent_getattr_async(struct obd_export *exp, + struct md_enqueue_info *minfo); enum ldlm_mode mdc_lock_match(struct obd_export *exp, u64 flags, const struct lu_fid *fid, enum ldlm_type type, diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c index a0fcab0..4135c3a 100644 --- a/fs/lustre/mdc/mdc_locks.c +++ b/fs/lustre/mdc/mdc_locks.c @@ -49,7 +49,7 @@ struct mdc_getattr_args { struct obd_export *ga_exp; - struct md_op_item *ga_item; + struct md_enqueue_info *ga_minfo; }; int it_open_error(int phase, struct lookup_intent *it) @@ -1360,10 +1360,10 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env, { struct mdc_getattr_args *ga = args; struct obd_export *exp = ga->ga_exp; - struct md_op_item *item = ga->ga_item; - struct ldlm_enqueue_info *einfo = &item->mop_einfo; - struct lookup_intent *it = &item->mop_it; - struct lustre_handle *lockh = &item->mop_lockh; + struct md_enqueue_info *minfo = ga->ga_minfo; + struct ldlm_enqueue_info *einfo = &minfo->mi_einfo; + struct lookup_intent *it = &minfo->mi_it; + struct lustre_handle *lockh = &minfo->mi_lockh; struct ldlm_reply *lockrep; u64 flags = LDLM_FL_HAS_INTENT; @@ -1388,17 +1388,18 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env, if (rc) goto out; - rc = mdc_finish_intent_lock(exp, req, &item->mop_data, it, lockh); + rc = mdc_finish_intent_lock(exp, req, &minfo->mi_data, it, lockh); + out: - item->mop_cb(&req->rq_pill, item, rc); + minfo->mi_cb(req, minfo, rc); return 0; } int mdc_intent_getattr_async(struct obd_export *exp, - struct md_op_item *item) + struct md_enqueue_info *minfo) { - struct md_op_data *op_data = &item->mop_data; - struct lookup_intent *it = &item->mop_it; + struct md_op_data *op_data = &minfo->mi_data; + struct lookup_intent *it = &minfo->mi_it; struct ptlrpc_request *req; struct mdc_getattr_args *ga; struct ldlm_res_id res_id; @@ -1427,11 +1428,11 @@ int mdc_intent_getattr_async(struct obd_export *exp, * to avoid possible races. It is safe to have glimpse handler * for non-DOM locks and costs nothing. */ - if (!item->mop_einfo.ei_cb_gl) - item->mop_einfo.ei_cb_gl = mdc_ldlm_glimpse_ast; + if (!minfo->mi_einfo.ei_cb_gl) + minfo->mi_einfo.ei_cb_gl = mdc_ldlm_glimpse_ast; - rc = ldlm_cli_enqueue(exp, &req, &item->mop_einfo, &res_id, &policy, - &flags, NULL, 0, LVB_T_NONE, &item->mop_lockh, 1); + rc = ldlm_cli_enqueue(exp, &req, &minfo->mi_einfo, &res_id, &policy, + &flags, NULL, 0, LVB_T_NONE, &minfo->mi_lockh, 1); if (rc < 0) { ptlrpc_req_finished(req); return rc; @@ -1439,7 +1440,7 @@ int mdc_intent_getattr_async(struct obd_export *exp, ga = ptlrpc_req_async_args(ga, req); ga->ga_exp = exp; - ga->ga_item = item; + ga->ga_minfo = minfo; req->rq_interpret_reply = mdc_intent_getattr_async_interpret; ptlrpcd_add_req(req); From patchwork Mon Aug 2 19:50:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E237C4320E for ; Mon, 2 Aug 2021 19:51:54 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D021D60FC2 for ; Mon, 2 Aug 2021 19:51:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D021D60FC2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 33F4F352ED2; Mon, 2 Aug 2021 12:51:38 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CE5A535286A for ; Mon, 2 Aug 2021 12:50:56 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 60D341007B47; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5CE14C2F4C; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:28 -0400 Message-Id: <1627933851-7603-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 08/25] lnet: Protect lpni deref in lnet_health_check X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Discovery thread can modify peer NI/peer net/peer relationship so we need to be careful when dereferencing the peer NI pointer in lnet_health_check(). Discovery thread operations under net lock, so move the peer NI dereference under the net lock which is taken for incrementing the health stats. Move some of the other code that is only relevant for messages with a health status != LNET_MSG_STATUS_OK under the appropriate condition. HPE-bug-id: LUS-9962 WC-bug-id: https://jira.whamcloud.com/browse/LU-14655 Lustre-commit: d87af24452a2e883 ("LU-14655 lnet: Protect lpni deref in lnet_health_check") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/43503 Reviewed-by: Alexander Boyko Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 71 ++++++++++++++++++++++++++----------------------- 1 file changed, 38 insertions(+), 33 deletions(-) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 580ddf6..e471848 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -821,38 +821,6 @@ attempt_remote_resend = false; } - /* Don't further decrement the health value if a recovery message - * failed. - */ - if (msg->msg_recovery) { - handle_local_health = false; - handle_remote_health = false; - } else { - handle_local_health = false; - handle_remote_health = true; - } - - /* For local failures, health/recovery/resends are not needed if I only - * have a single (non-lolnd) interface. NB: pb_nnis includes the lolnd - * interface, so a single-rail node would have pb_nnis == 2. - */ - if (the_lnet.ln_ping_target->pb_nnis <= 2) { - handle_local_health = false; - attempt_local_resend = false; - } - - /* For remote failures, health/recovery/resends are not needed if the - * peer only has a single interface. Special case for routers where we - * rely on health feature to manage route aliveness. NB: unlike pb_nnis - * above, lp_nnis does _not_ include the lolnd, so a single-rail node - * would have lp_nnis == 1. - */ - if (lpni && lpni->lpni_peer_net->lpn_peer->lp_nnis <= 1) { - attempt_remote_resend = false; - if (!lnet_isrouter(lpni)) - handle_remote_health = false; - } - if (!lo) LASSERT(ni && lpni); else @@ -865,11 +833,48 @@ lnet_health_error2str(hstatus)); /* stats are only incremented for errors so avoid wasting time - * incrementing statistics if there is no error. + * incrementing statistics if there is no error. Similarly, whether to + * update health values or perform resends is only applicable for + * messages with a health status != OK. */ if (hstatus != LNET_MSG_STATUS_OK) { + /* Don't further decrement the health value if a recovery + * message failed. + */ + if (msg->msg_recovery) { + handle_local_health = false; + handle_remote_health = false; + } else { + handle_local_health = true; + handle_remote_health = true; + } + + /* For local failures, health/recovery/resends are not needed if + * I only have a single (non-lolnd) interface. NB: pb_nnis + * includes the lolnd interface, so a single-rail node would + * have pb_nnis == 2. + */ + if (the_lnet.ln_ping_target->pb_nnis <= 2) { + handle_local_health = false; + attempt_local_resend = false; + } + lnet_net_lock(0); lnet_incr_hstats(ni, lpni, hstatus); + /* For remote failures, health/recovery/resends are not needed + * if the peer only has a single interface. Special case for + * routers where we rely on health feature to manage route + * aliveness. NB: unlike pb_nnis above, lp_nnis does _not_ + * include the lolnd, so a single-rail node would have + * lp_nnis == 1. + */ + if (lpni && lpni->lpni_peer_net && + lpni->lpni_peer_net->lpn_peer && + lpni->lpni_peer_net->lpn_peer->lp_nnis <= 1) { + attempt_remote_resend = false; + if (!lnet_isrouter(lpni)) + handle_remote_health = false; + } lnet_net_unlock(0); } From patchwork Mon Aug 2 19:50:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDA56C432BE for ; Mon, 2 Aug 2021 19:51:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 09E2660724 for ; Mon, 2 Aug 2021 19:51:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 09E2660724 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5C60C352C1E; Mon, 2 Aug 2021 12:51:01 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2F54135286A for ; Mon, 2 Aug 2021 12:50:57 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 61C871008040; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 60014C2F50; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:29 -0400 Message-Id: <1627933851-7603-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 09/25] lustre: uapi: remove MDS_SETATTR_PORTAL and service X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger Remove the MDS_SETATTR_PORTAL and the service threads listening on this portal since they are unused since Lustre 2.1 and are no longer needed. WC-bug-id: https://jira.whamcloud.com/browse/LU-13326 Lustre-commit: 7a2ef25f1f259c0a ("LU-13326 mds: remove MDS_SETATTR_PORTAL and service") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/37798 Reviewed-by: James Simmons Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_idl.h | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 2047b92..65948d8 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -93,15 +93,11 @@ #define CONNMGR_REQUEST_PORTAL 1 #define CONNMGR_REPLY_PORTAL 2 -/* #define OSC_REQUEST_PORTAL 3 */ #define OSC_REPLY_PORTAL 4 -/* #define OSC_BULK_PORTAL 5 */ #define OST_IO_PORTAL 6 #define OST_CREATE_PORTAL 7 #define OST_BULK_PORTAL 8 -/* #define MDC_REQUEST_PORTAL 9 */ #define MDC_REPLY_PORTAL 10 -/* #define MDC_BULK_PORTAL 11 */ #define MDS_REQUEST_PORTAL 12 #define MDS_IO_PORTAL 13 #define MDS_BULK_PORTAL 14 @@ -109,10 +105,7 @@ #define LDLM_CB_REPLY_PORTAL 16 #define LDLM_CANCEL_REQUEST_PORTAL 17 #define LDLM_CANCEL_REPLY_PORTAL 18 -/* #define PTLBD_REQUEST_PORTAL 19 */ -/* #define PTLBD_REPLY_PORTAL 20 */ -/* #define PTLBD_BULK_PORTAL 21 */ -#define MDS_SETATTR_PORTAL 22 +/* #define MDS_SETATTR_PORTAL 22 obsolete after 2.13 */ #define MDS_READPAGE_PORTAL 23 #define OUT_PORTAL 24 #define MGC_REPLY_PORTAL 25 From patchwork Mon Aug 2 19:50:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45123C4338F for ; Mon, 2 Aug 2021 19:51:11 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E4EC560F36 for ; Mon, 2 Aug 2021 19:51:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E4EC560F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 12050352DB2; Mon, 2 Aug 2021 12:51:08 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 665F435286A for ; Mon, 2 Aug 2021 12:50:57 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 65628100804A; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 62FB6C2F53; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:30 -0400 Message-Id: <1627933851-7603-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 10/25] lustre: llite: Modify AIO/DIO reference counting X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell For DIO pages, it's enough to have a reference on the cl_object associated with the AIO. This saves taking a reference on the cl_object for each page, which saves about 5% of the time when doing DIO/AIO. This is possible because the lifecycle of the aio struct is always greater than that of the associated pages. This patch reduces i/o time in ms/GiB by: Write: 6 ms/GiB Read: 1 ms/GiB Totals: Write: 198 ms/GiB Read: 197 ms/GiB mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect With previous patches in series: write 5030 MiB/s read 5174 MiB/s Plus this patch: write 5183 MiB/s read 5200 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: b3de247b76b4101 ("LU-13799 llite: Modify AIO/DIO reference counting") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/39442 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 5 +++-- fs/lustre/llite/file.c | 5 +++-- fs/lustre/obdclass/cl_io.c | 12 ++++++++---- fs/lustre/obdclass/cl_page.c | 6 ++++-- 4 files changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 61a14f4..0f785e5 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -2593,8 +2593,8 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, int ioret); int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor, long timeout, int ioret); -struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb); -void cl_aio_free(struct cl_dio_aio *aio); +struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj); +void cl_aio_free(const struct lu_env *env, struct cl_dio_aio *aio); static inline void cl_sync_io_init(struct cl_sync_io *anchor, int nr) { @@ -2624,6 +2624,7 @@ struct cl_sync_io { struct cl_dio_aio { struct cl_sync_io cda_sync; struct cl_page_list cda_pages; + struct cl_object *cda_obj; struct kiocb *cda_iocb; ssize_t cda_bytes; unsigned int cda_no_aio_complete:1; diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index b822ca5..1bf237b 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1656,7 +1656,8 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, if (!ll_sbi_has_parallel_dio(sbi)) is_parallel_dio = false; - ci_aio = cl_aio_alloc(args->u.normal.via_iocb); + ci_aio = cl_aio_alloc(args->u.normal.via_iocb, + ll_i2info(inode)->lli_clob); if (!ci_aio) { rc = -ENOMEM; goto out; @@ -1814,7 +1815,7 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot, cl_sync_io_note(env, &io->ci_aio->cda_sync, rc == -EIOCBQUEUED ? 0 : rc); if (!is_aio) { - cl_aio_free(io->ci_aio); + cl_aio_free(env, io->ci_aio); io->ci_aio = NULL; } } diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index 63ce39c..b5e7744b 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -1131,7 +1131,7 @@ static void cl_aio_end(const struct lu_env *env, struct cl_sync_io *anchor) ret ?: aio->cda_bytes, 0); } -struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb) +struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj) { struct cl_dio_aio *aio; @@ -1147,15 +1147,19 @@ struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb) cl_page_list_init(&aio->cda_pages); aio->cda_iocb = iocb; aio->cda_no_aio_complete = 0; + cl_object_get(obj); + aio->cda_obj = obj; } return aio; } EXPORT_SYMBOL(cl_aio_alloc); -void cl_aio_free(struct cl_dio_aio *aio) +void cl_aio_free(const struct lu_env *env, struct cl_dio_aio *aio) { - if (aio) + if (aio) { + cl_object_put(env, aio->cda_obj); kmem_cache_free(cl_dio_aio_kmem, aio); + } } EXPORT_SYMBOL(cl_aio_free); @@ -1196,7 +1200,7 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor, * If anchor->csi_aio is set, we are responsible for freeing * memory here rather than when cl_sync_io_wait() completes. */ - cl_aio_free(aio); + cl_aio_free(env, aio); } } EXPORT_SYMBOL(cl_sync_io_note); diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c index 1c9e91d..41bd767 100644 --- a/fs/lustre/obdclass/cl_page.c +++ b/fs/lustre/obdclass/cl_page.c @@ -147,7 +147,8 @@ static void cl_page_free(const struct lu_env *env, struct cl_page *cl_page, cl_page->cp_layer_count = 0; lu_object_ref_del_at(&obj->co_lu, &cl_page->cp_obj_ref, "cl_page", cl_page); - cl_object_put(env, obj); + if (cl_page->cp_type != CPT_TRANSIENT) + cl_object_put(env, obj); lu_ref_fini(&cl_page->cp_reference); __cl_page_free(cl_page, bufsize); } @@ -227,7 +228,8 @@ struct cl_page *cl_page_alloc(const struct lu_env *env, struct cl_object *o, BUILD_BUG_ON((1 << CP_TYPE_BITS) < CPT_NR); /* cp_type */ refcount_set(&cl_page->cp_ref, 1); cl_page->cp_obj = o; - cl_object_get(o); + if (type != CPT_TRANSIENT) + cl_object_get(o); lu_object_ref_add_at(&o->co_lu, &cl_page->cp_obj_ref, "cl_page", cl_page); cl_page->cp_vmpage = vmpage; From patchwork Mon Aug 2 19:50:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414713 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E277BC4320A for ; Mon, 2 Aug 2021 19:54:35 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A65FF60F36 for ; Mon, 2 Aug 2021 19:54:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A65FF60F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 672BE352F40; Mon, 2 Aug 2021 12:54:26 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B1D7235286A for ; Mon, 2 Aug 2021 12:50:57 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 68BAA100804B; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6604CC2F55; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:31 -0400 Message-Id: <1627933851-7603-12-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 11/25] lustre: llite: Remove transient page counting X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Transient page counting is not used for anything, as already noted in the commit message, but costs something like 4% of the time in DIO page submission. Remove it. mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect This patch reduces i/o time in ms/GiB by: Write: 6 ms/GiB Read: 11 ms/GiB Totals: Write: 174 ms/GiB Read: 167 ms/GiB With previous patches in series: write 5703 MiB/s read 5756 MiB/s Plus this patch: write 5900 MiB/s read 6136 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: 587e5aa8342980f7 ("LU-13799 llite: Remove transient page counting") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/39441 Reviewed-by: Wang Shilong Reviewed-by: Alexey Lyashkov Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/llite/vvp_internal.h | 7 ------- fs/lustre/llite/vvp_object.c | 4 +--- fs/lustre/llite/vvp_page.c | 5 ----- 3 files changed, 1 insertion(+), 15 deletions(-) diff --git a/fs/lustre/llite/vvp_internal.h b/fs/lustre/llite/vvp_internal.h index f2599be..b5e1df2 100644 --- a/fs/lustre/llite/vvp_internal.h +++ b/fs/lustre/llite/vvp_internal.h @@ -189,13 +189,6 @@ struct vvp_object { struct inode *vob_inode; /** - * Number of transient pages. This is no longer protected by i_sem, - * and needs to be atomic. This is not actually used for anything, - * and can probably be removed. - */ - atomic_t vob_transient_pages; - - /** * Number of outstanding mmaps on this file. * * \see ll_vm_open(), ll_vm_close(). diff --git a/fs/lustre/llite/vvp_object.c b/fs/lustre/llite/vvp_object.c index 096d996..294df88 100644 --- a/fs/lustre/llite/vvp_object.c +++ b/fs/lustre/llite/vvp_object.c @@ -63,8 +63,7 @@ static int vvp_object_print(const struct lu_env *env, void *cookie, struct inode *inode = obj->vob_inode; struct ll_inode_info *lli; - (*p)(env, cookie, "(%d %d) inode: %p ", - atomic_read(&obj->vob_transient_pages), + (*p)(env, cookie, "(%d) inode: %p ", atomic_read(&obj->vob_mmap_cnt), inode); if (inode) { lli = ll_i2info(inode); @@ -228,7 +227,6 @@ static int __vvp_object_init(const struct lu_env *env, const struct cl_object_conf *conf) { vob->vob_inode = conf->coc_inode; - atomic_set(&vob->vob_transient_pages, 0); cl_object_page_init(&vob->vob_cl, sizeof(struct vvp_page)); return 0; } diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c index 2ecd414..9e14898 100644 --- a/fs/lustre/llite/vvp_page.c +++ b/fs/lustre/llite/vvp_page.c @@ -437,10 +437,8 @@ static void vvp_transient_page_fini(const struct lu_env *env, struct pagevec *pvec) { struct vvp_page *vpg = cl2vvp_page(slice); - struct vvp_object *clobj = cl2vvp(slice->cpl_obj); vvp_page_fini_common(vpg, pvec); - atomic_dec(&clobj->vob_transient_pages); } static const struct cl_page_operations vvp_transient_page_ops = { @@ -469,11 +467,8 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj, cl_page_slice_add(page, &vpg->vpg_cl, obj, &vvp_page_ops); } else { - struct vvp_object *clobj = cl2vvp(obj); - cl_page_slice_add(page, &vpg->vpg_cl, obj, &vvp_transient_page_ops); - atomic_inc(&clobj->vob_transient_pages); } return 0; } From patchwork Mon Aug 2 19:50:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F415BC4338F for ; Mon, 2 Aug 2021 19:51:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A72D660F36 for ; Mon, 2 Aug 2021 19:51:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A72D660F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B5814352D9E; Mon, 2 Aug 2021 12:51:06 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 131FE352B78 for ; Mon, 2 Aug 2021 12:50:58 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 6A297100804D; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 690CAC2F56; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:32 -0400 Message-Id: <1627933851-7603-13-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 12/25] lustre: lov: Improve DIO submit X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Skip some unnecessary looping in page submission for the DIO case. This gives about a 2% improvement for AIO/DIO page submission. This patch reduces i/o time in ms/GiB by: Write: 2 ms/GiB Read: 2 ms/GiB Totals: Write: 172 ms/GiB Read: 165 ms/GiB mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect With previous patches in series: write 7726 MiB/s read 5899 MiB/s Plus this patch: write 5954 MiB/s read 6217 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: d31647c017a390c9 ("LU-13799 lov: Improve DIO submit") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/39446 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_io.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c index 9012ad6..2885943 100644 --- a/fs/lustre/lov/lov_io.c +++ b/fs/lustre/lov/lov_io.c @@ -1255,11 +1255,15 @@ static int lov_io_submit(const struct lu_env *env, struct lov_io *lio = cl2lov_io(env, ios); struct lov_io_sub *sub; struct cl_page_list *plist = &lov_env_info(env)->lti_plist; - struct cl_page *page; + struct cl_page *page = cl_page_list_first(qin); struct cl_page *tmp; + bool dio = false; int index; int rc = 0; + if (page->cp_type == CPT_TRANSIENT) + dio = true; + cl_page_list_init(plist); while (qin->pl_nr > 0) { struct cl_2queue *cl2q = &lov_env_info(env)->lti_cl2q; @@ -1281,12 +1285,17 @@ static int lov_io_submit(const struct lu_env *env, cl_page_list_move(&cl2q->c2_qin, qin, page); index = page->cp_lov_index; - cl_page_list_for_each_safe(page, tmp, qin) { - /* this page is not on this stripe */ - if (index != page->cp_lov_index) - continue; - - cl_page_list_move(&cl2q->c2_qin, qin, page); + /* DIO is already split by stripe */ + if (!dio) { + cl_page_list_for_each_safe(page, tmp, qin) { + /* this page is not on this stripe */ + if (index != page->cp_lov_index) + continue; + + cl_page_list_move(&cl2q->c2_qin, qin, page); + } + } else { + cl_page_list_splice(qin, &cl2q->c2_qin); } sub = lov_sub_get(env, lio, index); From patchwork Mon Aug 2 19:50:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414695 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B52FC432BE for ; Mon, 2 Aug 2021 19:52:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 25B0A61050 for ; Mon, 2 Aug 2021 19:52:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 25B0A61050 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 24F76352DDB; Mon, 2 Aug 2021 12:51:45 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4D764352B86 for ; Mon, 2 Aug 2021 12:50:58 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 6DB64100804E; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6C530C2F46; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:33 -0400 Message-Id: <1627933851-7603-14-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 13/25] lustre: llite: Adjust dio refcounting X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell We get a page reference in cl_page_find, then immediately add another for cl_page_list_add and remove the first reference. This is pretty silly, since the life cycle is the same on these. This improves DIO/AIO page submission by around 2%. This patch reduces i/o time in ms/GiB by: Write: 2 ms/GiB Read: 2 ms/GiB Totals: Write: 170 ms/GiB Read: 162 ms/GiB mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect With previous pa5ches in series: write 5955 MiB/s read 6218 MiB/s Plus this patch: write 6028 MiB/s read 6305 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: 1e4d10af3909452b ("LU-13799 llite: Adjust dio refcounting") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/39447 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 18 ++++++++++-------- fs/lustre/llite/llite_lib.c | 2 +- fs/lustre/llite/rw.c | 4 ++-- fs/lustre/llite/rw26.c | 9 +++++---- fs/lustre/llite/vvp_io.c | 4 ++-- fs/lustre/llite/vvp_page.c | 11 +++++++---- fs/lustre/obdclass/cl_io.c | 8 +++++--- fs/lustre/obdecho/echo_client.c | 4 ++-- 8 files changed, 34 insertions(+), 26 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 0f785e5..d068454 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -2548,14 +2548,16 @@ static inline struct cl_page *cl_page_list_first(struct cl_page_list *plist) list_for_each_entry_safe((page), (temp), &(list)->pl_pages, cp_batch) void cl_page_list_init(struct cl_page_list *plist); -void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page); +void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page, + bool get_ref); void cl_page_list_move(struct cl_page_list *dst, struct cl_page_list *src, struct cl_page *page); void cl_page_list_move_head(struct cl_page_list *dst, struct cl_page_list *src, struct cl_page *page); -void cl_page_list_splice(struct cl_page_list *list, struct cl_page_list *head); -void cl_page_list_del(const struct lu_env *env, struct cl_page_list *plist, - struct cl_page *page); +void cl_page_list_splice(struct cl_page_list *list, + struct cl_page_list *head); +void cl_page_list_del(const struct lu_env *env, + struct cl_page_list *plist, struct cl_page *page); void cl_page_list_disown(const struct lu_env *env, struct cl_io *io, struct cl_page_list *plist); void cl_page_list_discard(const struct lu_env *env, @@ -2563,10 +2565,10 @@ void cl_page_list_discard(const struct lu_env *env, void cl_page_list_fini(const struct lu_env *env, struct cl_page_list *plist); void cl_2queue_init(struct cl_2queue *queue); -void cl_2queue_disown(const struct lu_env *env, - struct cl_io *io, struct cl_2queue *queue); -void cl_2queue_discard(const struct lu_env *env, - struct cl_io *io, struct cl_2queue *queue); +void cl_2queue_disown(const struct lu_env *env, struct cl_io *io, + struct cl_2queue *queue); +void cl_2queue_discard(const struct lu_env *env, struct cl_io *io, + struct cl_2queue *queue); void cl_2queue_fini(const struct lu_env *env, struct cl_2queue *queue); void cl_2queue_init_page(struct cl_2queue *queue, struct cl_page *page); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 5610523..7b8f1b5 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1884,7 +1884,7 @@ int ll_io_zero_page(struct inode *inode, pgoff_t index, pgoff_t offset, anchor = &vvp_env_info(env)->vti_anchor; cl_sync_io_init(anchor, 1); clpage->cp_sync_io = anchor; - cl_page_list_add(&queue->c2_qin, clpage); + cl_page_list_add(&queue->c2_qin, clpage, true); rc = cl_io_submit_rw(env, io, CRT_WRITE, queue); if (rc) goto queuefini1; diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index 4de77f6..48984aa 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -249,7 +249,7 @@ static int ll_read_ahead_page(const struct lu_env *env, struct cl_io *io, vpg->vpg_defer_uptodate = 1; vpg->vpg_ra_used = 0; } - cl_page_list_add(queue, page); + cl_page_list_add(queue, page, true); } else { /* skip completed pages */ cl_page_unassume(env, io, page); @@ -1657,7 +1657,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io, cl_sync_io_init(anchor, 1); page->cp_sync_io = anchor; - cl_page_list_add(&queue->c2_qin, page); + cl_page_list_add(&queue->c2_qin, page, true); } io_start_index = cl_index(io->ci_obj, io->u.ci_rw.crw_pos); diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c index 0d72c3e..e5d80cb 100644 --- a/fs/lustre/llite/rw26.c +++ b/fs/lustre/llite/rw26.c @@ -264,7 +264,10 @@ struct ll_dio_pages { */ page->cp_inode = inode; } - cl_page_list_add(&queue->c2_qin, page); + /* We keep the refcount from cl_page_find, so we don't need + * another one here + */ + cl_page_list_add(&queue->c2_qin, page, false); /* * Set page clip to tell transfer formation engine * that page has to be sent even if it is beyond KMS. @@ -273,8 +276,6 @@ struct ll_dio_pages { cl_page_clip(env, page, 0, size); ++io_pages; - /* drop the reference count for cl_page_find */ - cl_page_put(env, page); offset += page_size; size -= page_size; } @@ -731,7 +732,7 @@ static int ll_write_end(struct file *file, struct address_space *mapping, lcc->lcc_page = NULL; /* page will be queued */ /* Add it into write queue */ - cl_page_list_add(plist, page); + cl_page_list_add(plist, page, true); if (plist->pl_nr == 1) /* first page */ vio->u.readwrite.vui_from = from; else diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 0e54f46..a117800 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -1444,7 +1444,7 @@ static int vvp_io_fault_start(const struct lu_env *env, cl_page_assume(env, io, page); cl_page_list_init(plist); - cl_page_list_add(plist, page); + cl_page_list_add(plist, page, true); /* size fixup */ if (last_index == vvp_index(vpg)) @@ -1466,7 +1466,7 @@ static int vvp_io_fault_start(const struct lu_env *env, if (result >= 0) { io->ci_noquota = 1; cl_page_own(env, io, page); - cl_page_list_add(plist, page); + cl_page_list_add(plist, page, true); lu_ref_add(&page->cp_reference, "cl_io", io); result = cl_io_commit_async(env, io, diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c index 9e14898..60a28d6 100644 --- a/fs/lustre/llite/vvp_page.c +++ b/fs/lustre/llite/vvp_page.c @@ -459,16 +459,19 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj, vpg->vpg_page = vmpage; get_page(vmpage); - if (page->cp_type == CPT_CACHEABLE) { + if (page->cp_type == CPT_TRANSIENT) { + /* DIO pages are referenced by userspace, we don't need to take + * a reference on them. (contrast with get_page() call above) + */ + cl_page_slice_add(page, &vpg->vpg_cl, obj, + &vvp_transient_page_ops); + } else { /* in cache, decref in vvp_page_delete */ refcount_inc(&page->cp_ref); SetPagePrivate(vmpage); vmpage->private = (unsigned long)page; cl_page_slice_add(page, &vpg->vpg_cl, obj, &vvp_page_ops); - } else { - cl_page_slice_add(page, &vpg->vpg_cl, obj, - &vvp_transient_page_ops); } return 0; } diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c index b5e7744b..9a0373f 100644 --- a/fs/lustre/obdclass/cl_io.c +++ b/fs/lustre/obdclass/cl_io.c @@ -825,7 +825,8 @@ void cl_page_list_init(struct cl_page_list *plist) /** * Adds a page to a page list. */ -void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page) +void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page, + bool get_ref) { /* it would be better to check that page is owned by "current" io, but * it is not passed here. @@ -836,7 +837,8 @@ void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page) list_add_tail(&page->cp_batch, &plist->pl_pages); ++plist->pl_nr; lu_ref_add_at(&page->cp_reference, &page->cp_queue_ref, "queue", plist); - cl_page_get(page); + if (get_ref) + cl_page_get(page); } EXPORT_SYMBOL(cl_page_list_add); @@ -1019,7 +1021,7 @@ void cl_2queue_init_page(struct cl_2queue *queue, struct cl_page *page) /* * Add a page to the incoming page list of 2-queue. */ - cl_page_list_add(&queue->c2_qin, page); + cl_page_list_add(&queue->c2_qin, page, true); } EXPORT_SYMBOL(cl_2queue_init_page); diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c index c3a12ce..4cc046a 100644 --- a/fs/lustre/obdecho/echo_client.c +++ b/fs/lustre/obdecho/echo_client.c @@ -1021,7 +1021,7 @@ static void echo_commit_callback(const struct lu_env *env, struct cl_io *io, struct page *vmpage = pvec->pages[i]; struct cl_page *page = (struct cl_page *)vmpage->private; - cl_page_list_add(&queue->c2_qout, page); + cl_page_list_add(&queue->c2_qout, page, true); } } @@ -1085,7 +1085,7 @@ static int cl_echo_object_brw(struct echo_object *eco, int rw, u64 offset, /* * Add a page to the incoming page list of 2-queue. */ - cl_page_list_add(&queue->c2_qin, clp); + cl_page_list_add(&queue->c2_qin, clp, true); /* drop the reference count for cl_page_find, so that the page * will be freed in cl_2queue_fini. From patchwork Mon Aug 2 19:50:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B059C4338F for ; Mon, 2 Aug 2021 19:52:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 13ED160F36 for ; Mon, 2 Aug 2021 19:52:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 13ED160F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A5F64352F15; Mon, 2 Aug 2021 12:51:43 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9896D352BAD for ; Mon, 2 Aug 2021 12:50:58 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 71A55100804F; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6F549C2F4C; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:34 -0400 Message-Id: <1627933851-7603-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 14/25] lustre: clio: Skip prep for transients X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The work done by cpo_prep() (etc) is unnecessary for transient pages. This gives only a minimal performance boost and is better seen as a step towards removing the cl_page abstraction for transient pages. But, it does consistently give around 1% better performance. This patch reduces i/o time in ms/GiB by: Write: 1 ms/GiB Read: 1 ms/GiB Totals: Write: 169 ms/GiB Read: 161 ms/GiB mpirun -np 1 $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect With previous patches in series: write 6028 MiB/s read 6305 MiB/s Plus this patch: write 6071 MiB/s read 6355 MiB/s WC-bug-id: https://jira.whamcloud.com/browse/LU-13799 Lustre-commit: b8553978789ad3dd ("LU-13799 clio: Skip prep for transients") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/39448 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/cl_page.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c index 41bd767..4bfa1c5 100644 --- a/fs/lustre/obdclass/cl_page.c +++ b/fs/lustre/obdclass/cl_page.c @@ -850,12 +850,15 @@ int cl_page_prep(const struct lu_env *env, struct cl_io *io, if (crt >= CRT_NR) return -EINVAL; - cl_page_slice_for_each(cl_page, slice, i) { - if (slice->cpl_ops->cpo_own) - result = (*slice->cpl_ops->io[crt].cpo_prep)(env, slice, - io); - if (result != 0) - break; + if (cl_page->cp_type != CPT_TRANSIENT) { + cl_page_slice_for_each(cl_page, slice, i) { + if (slice->cpl_ops->cpo_own) + result = (*slice->cpl_ops->io[crt].cpo_prep)(env, + slice, + io); + if (result != 0) + break; + } } if (result >= 0) { From patchwork Mon Aug 2 19:50:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414679 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95CA8C4338F for ; Mon, 2 Aug 2021 19:51:32 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5137260F36 for ; Mon, 2 Aug 2021 19:51:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5137260F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id F12F0352E3D; Mon, 2 Aug 2021 12:51:22 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D3C9C352B7B for ; Mon, 2 Aug 2021 12:50:58 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 73C471008050; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7246FC2F50; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:35 -0400 Message-Id: <1627933851-7603-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 15/25] lustre: osc: Improve osc_queue_sync_pages X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell This patch was split and partially done in: https://review.whamcloud.com/38214 So the text below refers to the combination of this patch and that one. This patch now just improves a looped atomic add by replacing with a single one. The rest of the grant calcuation change is in https://review.whamcloud.com/38214 (I am retaining the text below to show the performance improvement) ---------- osc_queue_sync_pages now has a grant calculation component, this has a pretty painful impact on the new faster DIO performance. Specifically, per page ktime_get() and the per-page atomic_add cost close to 10% of total CPU time in the DIO path. We can make this per batch of pages rather than for each page, which reduces this cost from 10% of CPU to almost nothing. This improves write performance by about 10% (but has no effect on reads, since they don't use grant). This patch reduces i/o time in ms/GiB by: Write: 10 ms/GiB Read: 0 ms/GiB Totals: Write: 158 ms/GiB Read: 161 ms/GiB mpirun -np 1 $IOR -w -t 1G -b 64G -o $FILE --posix.odirect Before patch: write 6071 After patch: write 6470 (Read is similar.) This also fixes a mistake in d23d4cb67c / LU-13419 where it removed the shrink interval update entirely from the direct i/o path. Fixes: d23d4cb67c ("lustre: osc: Move shrink update to per-write") WC-bug-id: https://jira.whamcloud.com/browse/LU-13419 Lustre-commit: 87c4535f7a5d239a ("LU-13799 osc: Improve osc_queue_sync_pages") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/39482 Reviewed-by: Andreas Dilger Reviewed-by: Wang Shilong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_cache.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index 50f6477..69cf9ba 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -2715,8 +2715,8 @@ int osc_queue_sync_pages(const struct lu_env *env, struct cl_io *io, list_for_each_entry(oap, list, oap_pending_item) { osc_consume_write_grant(cli, &oap->oap_brw_page); - atomic_long_inc(&obd_dirty_pages); } + atomic_long_add(page_count, &obd_dirty_pages); osc_unreserve_grant_nolock(cli, grants, 0); ext->oe_grants = grants; } else { @@ -2730,6 +2730,7 @@ int osc_queue_sync_pages(const struct lu_env *env, struct cl_io *io, "not enough grant available, switching to sync for this i/o\n"); } spin_unlock(&cli->cl_loi_list_lock); + osc_update_next_shrink(cli); } ext->oe_is_rdma_only = !!(brw_flags & OBD_BRW_RDMA_ONLY); From patchwork Mon Aug 2 19:50:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07B81C432BE for ; Mon, 2 Aug 2021 19:51:17 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ADAD560F36 for ; Mon, 2 Aug 2021 19:51:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org ADAD560F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 95312352D88; Mon, 2 Aug 2021 12:51:11 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1939A35286A for ; Mon, 2 Aug 2021 12:50:59 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 781041008051; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 75488C2F53; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:36 -0400 Message-Id: <1627933851-7603-17-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> MIME-Version: 1.0 Subject: [lustre-devel] [PATCH 16/25] lustre: llite: avoid project quota overflow X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Wang Shilong , Lustre Development List Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Wang Shilong Currently, project ID is stored as u32, max possible value for it is 4294967295. However, VFS reserve max value for special usage, see following function: static inline bool qid_has_mapping(struct user_namespace *ns, struct kqid qid) { return from_kqid(ns, qid) != (qid_t) -1; } So qid_has_mapping() could return 0 for id 4294967295. A further try on chown test: $ chown 4294967295:4294967295 c.sh chown: invalid user: ‘4294967295:4294967295’ $ chown 4294967294:4294967294 c.sh Fix to check max possible value for project ID in the client kernel side, and add a test case for this. WC-bug-id: https://jira.whamcloud.com/browse/LU-14740 Lustre-commit: 3ffa5d680f0092ae ("LU-14740 llite: avoid project quota overflow") Signed-off-by: Wang Shilong Reviewed-on: https://review.whamcloud.com/43939 Reviewed-by: Hongchao Zhang Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 1bf237b..a4e432e 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -3323,8 +3323,17 @@ int ll_ioctl_check_project(struct inode *inode, u32 xflags, * namespace. Enforce that restriction only if we are trying to change * the quota ID state. Everything else is allowed in user namespaces. */ - if (current_user_ns() == &init_user_ns) + if (current_user_ns() == &init_user_ns) { + /* + * Caller is allowed to change the project ID. if it is being + * changed, make sure that the new value is valid. + */ + if (ll_i2info(inode)->lli_projid != projid && + !projid_valid(make_kprojid(&init_user_ns, projid))) + return -EINVAL; + return 0; + } if (ll_i2info(inode)->lli_projid != projid) return -EINVAL; From patchwork Mon Aug 2 19:50:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414697 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFE5FC4320A for ; Mon, 2 Aug 2021 19:52:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 970EB60FC2 for ; Mon, 2 Aug 2021 19:52:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 970EB60FC2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 13E0D352F56; Mon, 2 Aug 2021 12:51:48 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5ABAB352BD9 for ; Mon, 2 Aug 2021 12:50:59 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 79AFC1008052; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 78566C2F55; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:37 -0400 Message-Id: <1627933851-7603-18-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 17/25] lnet: check memdup_user_nul using IS_ERR X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cyril Bordage , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Cyril Bordage Crash in proc_lnet_portal_rotor. memdup_user_nul returns an ERR_PTR on error, not a NULL pointer. IS_ERR and PTR_ERR functions have to be used to check and return the correct error code. The fix has been applied in other locations having the wrong check. Fixes: 986fbf5bf19 ("lnet: libcfs: discard cfs_trace_copyin_string()") WC-bug-id: https://jira.whamcloud.com/browse/LU-14788 Lustre-commit: 449d046e55a42cc4 ("LU-14788 lnet: check memdup_user_nul using IS_ERR") Signed-off-by: Cyril Bordage Reviewed-on: https://review.whamcloud.com/44091 Reviewed-by: John L. Hammond Reviewed-by: Andreas Dilger Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/libcfs/module.c | 4 ++-- net/lnet/libcfs/tracefile.c | 8 ++++---- net/lnet/lnet/router_proc.c | 4 ++-- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/net/lnet/libcfs/module.c b/net/lnet/libcfs/module.c index 8059569..a249bdd 100644 --- a/net/lnet/libcfs/module.c +++ b/net/lnet/libcfs/module.c @@ -317,8 +317,8 @@ static int proc_dobitmasks(struct ctl_table *table, int write, } } else { tmpstr = memdup_user_nul(buffer, nob); - if (!tmpstr) - return -ENOMEM; + if (IS_ERR(tmpstr)) + return PTR_ERR(tmpstr); rc = libcfs_debug_str2mask(mask, strim(tmpstr), is_subsys); /* Always print LBUG/LASSERT to console, so keep this mask */ diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c index 6321840..e0ef234 100644 --- a/net/lnet/libcfs/tracefile.c +++ b/net/lnet/libcfs/tracefile.c @@ -942,8 +942,8 @@ int cfs_trace_dump_debug_buffer_usrstr(void __user *usr_str, int usr_str_nob) int rc; str = memdup_user_nul(usr_str, usr_str_nob); - if (!str) - return -ENOMEM; + if (IS_ERR(str)) + return PTR_ERR(str); path = strim(str); if (path[0] != '/') @@ -1001,8 +1001,8 @@ int cfs_trace_daemon_command_usrstr(void __user *usr_str, int usr_str_nob) int rc; str = memdup_user_nul(usr_str, usr_str_nob); - if (!str) - return -ENOMEM; + if (IS_ERR(str)) + return PTR_ERR(str); rc = cfs_trace_daemon_command(str); kfree(str); diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c index dd52a08..0de6681 100644 --- a/net/lnet/lnet/router_proc.c +++ b/net/lnet/lnet/router_proc.c @@ -816,8 +816,8 @@ static int proc_lnet_portal_rotor(struct ctl_table *table, int write, } buf = memdup_user_nul(buffer, nob); - if (!buf) - return -ENOMEM; + if (IS_ERR(buf)) + return PTR_ERR(buf); tmp = strim(buf); From patchwork Mon Aug 2 19:50:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BA09C432BE for ; Mon, 2 Aug 2021 19:51:23 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 97C7E60F36 for ; Mon, 2 Aug 2021 19:51:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 97C7E60F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1138C352D71; Mon, 2 Aug 2021 12:51:16 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9645D352BFC for ; Mon, 2 Aug 2021 12:50:59 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 7E68D1008053; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7BA73C2F46; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:38 -0400 Message-Id: <1627933851-7603-19-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 18/25] lustre: osc: Remove lockless truncate X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Lockless truncate does not work and cannot be made to work. Fundamentally, it has no means of ensuring consistency across clients because it can't force them all to drop cached data without locking. It's been off for years - let's just get rid of it. WC-bug-id: https://jira.whamcloud.com/browse/LU-14838 Lustre-commit: 6335dba83995765 ("LU-14838 osc: Remove lockless truncate") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/44204 Reviewed-by: Wang Shilong Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 2 -- fs/lustre/llite/llite_lib.c | 3 +-- fs/lustre/mdc/lproc_mdc.c | 2 -- fs/lustre/osc/lproc_osc.c | 35 ---------------------------------- fs/lustre/osc/osc_io.c | 10 ---------- fs/lustre/osc/osc_lock.c | 6 +----- fs/lustre/ptlrpc/wiretest.c | 2 -- include/uapi/linux/lustre/lustre_idl.h | 1 - 8 files changed, 2 insertions(+), 59 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index 13e9363..3a2d8bc 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -116,12 +116,10 @@ struct osc_device { struct osc_stats { u64 os_lockless_writes; /* by bytes */ u64 os_lockless_reads; /* by bytes */ - u64 os_lockless_truncates; /* by times */ } od_stats; /* configuration item(s) */ time64_t od_contention_time; - int od_lockless_truncate; }; /* \defgroup osc osc diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 7b8f1b5..63d0f02 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -284,7 +284,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) data->ocd_connect_flags = OBD_CONNECT_IBITS | OBD_CONNECT_NODEVOH | OBD_CONNECT_ATTRFID | OBD_CONNECT_GRANT | OBD_CONNECT_VERSION | OBD_CONNECT_BRW_SIZE | - OBD_CONNECT_SRVLOCK | OBD_CONNECT_TRUNCLOCK| + OBD_CONNECT_SRVLOCK | OBD_CONNECT_CANCELSET | OBD_CONNECT_FID | OBD_CONNECT_AT | OBD_CONNECT_LOV_V3 | OBD_CONNECT_VBR | OBD_CONNECT_FULL20 | @@ -510,7 +510,6 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) OBD_CONNECT_REQPORTAL | OBD_CONNECT_BRW_SIZE | OBD_CONNECT_CANCELSET | OBD_CONNECT_FID | OBD_CONNECT_SRVLOCK | - OBD_CONNECT_TRUNCLOCK | OBD_CONNECT_AT | OBD_CONNECT_OSS_CAPA | OBD_CONNECT_VBR | OBD_CONNECT_FULL20 | OBD_CONNECT_64BITHASH | OBD_CONNECT_MAXBYTES | diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c index 02636ef..b3ace37 100644 --- a/fs/lustre/mdc/lproc_mdc.c +++ b/fs/lustre/mdc/lproc_mdc.c @@ -561,8 +561,6 @@ static int mdc_stats_seq_show(struct seq_file *seq, void *v) stats->os_lockless_writes); seq_printf(seq, "lockless_read_bytes\t\t%llu\n", stats->os_lockless_reads); - seq_printf(seq, "lockless_truncate\t\t%llu\n", - stats->os_lockless_truncates); return 0; } diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index 3991b2c..bfc5df1 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -539,38 +539,6 @@ static ssize_t contention_seconds_store(struct kobject *kobj, } LUSTRE_RW_ATTR(contention_seconds); -static ssize_t lockless_truncate_show(struct kobject *kobj, - struct attribute *attr, - char *buf) -{ - struct obd_device *obd = container_of(kobj, struct obd_device, - obd_kset.kobj); - struct osc_device *od = obd2osc_dev(obd); - - return sprintf(buf, "%u\n", od->od_lockless_truncate); -} - -static ssize_t lockless_truncate_store(struct kobject *kobj, - struct attribute *attr, - const char *buffer, - size_t count) -{ - struct obd_device *obd = container_of(kobj, struct obd_device, - obd_kset.kobj); - struct osc_device *od = obd2osc_dev(obd); - bool val; - int rc; - - rc = kstrtobool(buffer, &val); - if (rc) - return rc; - - od->od_lockless_truncate = val; - - return count; -} -LUSTRE_RW_ATTR(lockless_truncate); - static ssize_t destroys_in_flight_show(struct kobject *kobj, struct attribute *attr, char *buf) @@ -890,8 +858,6 @@ static int osc_stats_seq_show(struct seq_file *seq, void *v) stats->os_lockless_writes); seq_printf(seq, "lockless_read_bytes\t\t%llu\n", stats->os_lockless_reads); - seq_printf(seq, "lockless_truncate\t\t%llu\n", - stats->os_lockless_truncates); return 0; } @@ -928,7 +894,6 @@ void lproc_osc_attach_seqstat(struct obd_device *obd) &lustre_attr_cur_dirty_grant_bytes.attr, &lustre_attr_destroys_in_flight.attr, &lustre_attr_grant_shrink_interval.attr, - &lustre_attr_lockless_truncate.attr, &lustre_attr_max_dirty_mb.attr, &lustre_attr_max_pages_per_rpc.attr, &lustre_attr_max_rpcs_in_flight.attr, diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c index f69f201..047ae00 100644 --- a/fs/lustre/osc/osc_io.c +++ b/fs/lustre/osc/osc_io.c @@ -703,16 +703,6 @@ void osc_io_setattr_end(const struct lu_env *env, result = cbargs->opc_rc; io->ci_result = cbargs->opc_rc; } - if (result == 0) { - if (oio->oi_lockless) { - /* lockless truncate */ - struct osc_device *osc = lu2osc_dev(obj->co_lu.lo_dev); - - LASSERT(cl_io_is_trunc(io) || cl_io_is_fallocate(io)); - /* XXX: Need a lock. */ - osc->od_stats.os_lockless_truncates++; - } - } if (cl_io_is_trunc(io)) { u64 size = io->u.ci_setattr.sa_attr.lvb_size; diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index 422f3e5..6d6d271 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -800,7 +800,6 @@ void osc_lock_to_lockless(const struct lu_env *env, struct cl_io *io = oio->oi_cl.cis_io; struct cl_object *obj = slice->cls_obj; struct osc_object *oob = cl2osc(obj); - const struct osc_device *osd = lu2osc_dev(obj->co_lu.lo_dev); struct obd_connect_data *ocd; LASSERT(ols->ols_state == OLS_NEW || @@ -821,10 +820,7 @@ void osc_lock_to_lockless(const struct lu_env *env, OBD_CONNECT_SRVLOCK); if (io->ci_lockreq == CILR_NEVER || /* lockless IO */ - (ols->ols_locklessable && osc_object_is_contended(oob)) || - /* lockless truncate */ - (cl_io_is_trunc(io) && osd->od_lockless_truncate && - (ocd->ocd_connect_flags & OBD_CONNECT_TRUNCLOCK))) { + (ols->ols_locklessable && osc_object_is_contended(oob))) { ols->ols_locklessable = 1; slice->cls_ops = ols->ols_lockless_ops; } diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index cd1456c..4301bd4 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1108,8 +1108,6 @@ void lustre_assert_wire_constants(void) OBD_CONNECT_XATTR); LASSERTF(OBD_CONNECT_LARGE_ACL == 0x200ULL, "found 0x%.16llxULL\n", OBD_CONNECT_LARGE_ACL); - LASSERTF(OBD_CONNECT_TRUNCLOCK == 0x400ULL, "found 0x%.16llxULL\n", - OBD_CONNECT_TRUNCLOCK); LASSERTF(OBD_CONNECT_TRANSNO == 0x800ULL, "found 0x%.16llxULL\n", OBD_CONNECT_TRANSNO); LASSERTF(OBD_CONNECT_IBITS == 0x1000ULL, "found 0x%.16llxULL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 65948d8..77a64f2 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -719,7 +719,6 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT_ACL 0x80ULL /*access control lists */ #define OBD_CONNECT_XATTR 0x100ULL /*client use extended attr */ #define OBD_CONNECT_LARGE_ACL 0x200ULL /* more than 32 ACL entries */ -#define OBD_CONNECT_TRUNCLOCK 0x400ULL /*locks on server for punch */ #define OBD_CONNECT_TRANSNO 0x800ULL /*replay sends init transno */ #define OBD_CONNECT_IBITS 0x1000ULL /* not checked in 2.11+ */ #define OBD_CONNECT_JOIN 0x2000ULL /*files can be concatenated. From patchwork Mon Aug 2 19:50:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C65AC4338F for ; Mon, 2 Aug 2021 19:51:38 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 209F860F36 for ; Mon, 2 Aug 2021 19:51:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 209F860F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1854E352DA7; Mon, 2 Aug 2021 12:51:27 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E3BFE352C04 for ; Mon, 2 Aug 2021 12:50:59 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 801A71008054; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7EAE6C2F4C; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:39 -0400 Message-Id: <1627933851-7603-20-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 19/25] lustre: osc: Remove client contention support X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell Lockless buffered i/o and contention detection don't work, lockless bufferd i/o is unfixable and contention detection is broken enough that it will have to be rewritten. Let's remove both. This patch starts the removal by pulling the client side support. WC-bug-id: https://jira.whamcloud.com/browse/LU-14838 Lustre-commit: 5ad00e36eca11a14 ("LU-14838 osc: Remove client contention support") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/44205 Reviewed-by: Andreas Dilger Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_osc.h | 1 - fs/lustre/mdc/lproc_mdc.c | 41 ----------------------------------------- fs/lustre/mdc/mdc_dev.c | 15 +-------------- fs/lustre/osc/lproc_osc.c | 33 --------------------------------- fs/lustre/osc/osc_lock.c | 19 ++----------------- fs/lustre/osc/osc_object.c | 22 ---------------------- 6 files changed, 3 insertions(+), 128 deletions(-) diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h index 3a2d8bc..8a62eb2 100644 --- a/fs/lustre/include/lustre_osc.h +++ b/fs/lustre/include/lustre_osc.h @@ -658,7 +658,6 @@ int osc_attr_update(const struct lu_env *env, struct cl_object *obj, int osc_object_glimpse(const struct lu_env *env, const struct cl_object *obj, struct ost_lvb *lvb); int osc_object_invalidate(const struct lu_env *env, struct osc_object *osc); -int osc_object_is_contended(struct osc_object *obj); int osc_object_find_cbdata(const struct lu_env *env, struct cl_object *obj, ldlm_iterator_t iter, void *data); int osc_object_prune(const struct lu_env *env, struct cl_object *obj); diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c index b3ace37..d13a6b7 100644 --- a/fs/lustre/mdc/lproc_mdc.c +++ b/fs/lustre/mdc/lproc_mdc.c @@ -268,45 +268,6 @@ static int mdc_cached_mb_seq_show(struct seq_file *m, void *v) } LDEBUGFS_SEQ_FOPS(mdc_cached_mb); -static int mdc_contention_seconds_seq_show(struct seq_file *m, void *v) -{ - struct obd_device *obd = m->private; - struct osc_device *od = obd2osc_dev(obd); - - seq_printf(m, "%lld\n", od->od_contention_time); - return 0; -} - -static ssize_t mdc_contention_seconds_seq_write(struct file *file, - const char __user *buffer, - size_t count, loff_t *off) -{ - struct seq_file *sfl = file->private_data; - struct obd_device *obd = sfl->private; - struct osc_device *od = obd2osc_dev(obd); - int rc; - char kernbuf[128]; - s64 val; - - if (count >= sizeof(kernbuf)) - return -EINVAL; - - if (copy_from_user(kernbuf, buffer, count)) - return -EFAULT; - kernbuf[count] = 0; - - rc = kstrtos64(kernbuf, count, &val); - if (rc) - return rc; - if (val < 0 || val > INT_MAX) - return -ERANGE; - - od->od_contention_time = val; - - return count; -} -LDEBUGFS_SEQ_FOPS(mdc_contention_seconds); - static int mdc_unstable_stats_seq_show(struct seq_file *m, void *v) { struct obd_device *obd = m->private; @@ -628,8 +589,6 @@ static ssize_t mdc_dom_min_repsize_seq_write(struct file *file, .fops = &mdc_checksum_type_fops }, { .name = "timeouts", .fops = &mdc_timeouts_fops }, - { .name = "contention_seconds", - .fops = &mdc_contention_seconds_fops }, { .name = "import", .fops = &mdc_import_fops }, { .name = "state", diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c index 1c28f80..ce4148d 100644 --- a/fs/lustre/mdc/mdc_dev.c +++ b/fs/lustre/mdc/mdc_dev.c @@ -536,18 +536,7 @@ static int mdc_lock_upcall(void *cookie, struct lustre_handle *lockh, mdc_lock_granted(env, oscl, lockh); /* Error handling, some errors are tolerable. */ - if (oscl->ols_locklessable && rc == -EUSERS) { - /* This is a tolerable error, turn this lock into - * lockless lock. - */ - osc_object_set_contended(cl2osc(slice->cls_obj)); - LASSERT(slice->cls_ops != oscl->ols_lockless_ops); - - /* Change this lock to ldlmlock-less lock. */ - osc_lock_to_lockless(env, oscl, 1); - oscl->ols_state = OLS_GRANTED; - rc = 0; - } else if (oscl->ols_glimpse && rc == -ENAVAIL) { + if (oscl->ols_glimpse && rc == -ENAVAIL) { LASSERT(oscl->ols_flags & LDLM_FL_LVB_READY); mdc_lock_lvb_update(env, cl2osc(slice->cls_obj), NULL, &oscl->ols_lvb); @@ -972,8 +961,6 @@ int mdc_lock_init(const struct lu_env *env, struct cl_object *obj, if (!(enqflags & CEF_MUST)) osc_lock_to_lockless(env, ols, (enqflags & CEF_NEVER)); - if (ols->ols_locklessable && !(enqflags & CEF_DISCARD_DATA)) - ols->ols_flags |= LDLM_FL_DENY_ON_CONTENTION; if (io->ci_type == CIT_WRITE || cl_io_is_mkwrite(io)) osc_lock_set_writer(env, io, obj, ols); diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c index bfc5df1..f9878e0 100644 --- a/fs/lustre/osc/lproc_osc.c +++ b/fs/lustre/osc/lproc_osc.c @@ -507,38 +507,6 @@ static ssize_t checksum_dump_store(struct kobject *kobj, } LUSTRE_RW_ATTR(checksum_dump); -static ssize_t contention_seconds_show(struct kobject *kobj, - struct attribute *attr, - char *buf) -{ - struct obd_device *obd = container_of(kobj, struct obd_device, - obd_kset.kobj); - struct osc_device *od = obd2osc_dev(obd); - - return sprintf(buf, "%lld\n", od->od_contention_time); -} - -static ssize_t contention_seconds_store(struct kobject *kobj, - struct attribute *attr, - const char *buffer, - size_t count) -{ - struct obd_device *obd = container_of(kobj, struct obd_device, - obd_kset.kobj); - struct osc_device *od = obd2osc_dev(obd); - unsigned int val; - int rc; - - rc = kstrtouint(buffer, 10, &val); - if (rc) - return rc; - - od->od_contention_time = val; - - return count; -} -LUSTRE_RW_ATTR(contention_seconds); - static ssize_t destroys_in_flight_show(struct kobject *kobj, struct attribute *attr, char *buf) @@ -887,7 +855,6 @@ void lproc_osc_attach_seqstat(struct obd_device *obd) &lustre_attr_active.attr, &lustre_attr_checksums.attr, &lustre_attr_checksum_dump.attr, - &lustre_attr_contention_seconds.attr, &lustre_attr_cur_dirty_bytes.attr, &lustre_attr_cur_grant_bytes.attr, &lustre_attr_cur_lost_grant_bytes.attr, diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index 6d6d271..f6faed7 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -287,18 +287,7 @@ static int osc_lock_upcall(void *cookie, struct lustre_handle *lockh, osc_lock_granted(env, oscl, lockh); /* Error handling, some errors are tolerable. */ - if (oscl->ols_locklessable && rc == -EUSERS) { - /* This is a tolerable error, turn this lock into - * lockless lock. - */ - osc_object_set_contended(cl2osc(slice->cls_obj)); - LASSERT(slice->cls_ops != oscl->ols_lockless_ops); - - /* Change this lock to ldlmlock-less lock. */ - osc_lock_to_lockless(env, oscl, 1); - oscl->ols_state = OLS_GRANTED; - rc = 0; - } else if (oscl->ols_glimpse && rc == -ENAVAIL) { + if (oscl->ols_glimpse && rc == -ENAVAIL) { LASSERT(oscl->ols_flags & LDLM_FL_LVB_READY); osc_lock_lvb_update(env, cl2osc(slice->cls_obj), NULL, &oscl->ols_lvb); @@ -818,9 +807,7 @@ void osc_lock_to_lockless(const struct lu_env *env, (io->ci_lockreq == CILR_MAYBE) && (ocd->ocd_connect_flags & OBD_CONNECT_SRVLOCK); - if (io->ci_lockreq == CILR_NEVER || - /* lockless IO */ - (ols->ols_locklessable && osc_object_is_contended(oob))) { + if (io->ci_lockreq == CILR_NEVER) { ols->ols_locklessable = 1; slice->cls_ops = ols->ols_lockless_ops; } @@ -1242,8 +1229,6 @@ int osc_lock_init(const struct lu_env *env, if (!(enqflags & CEF_MUST)) /* try to convert this lock to a lockless lock */ osc_lock_to_lockless(env, oscl, (enqflags & CEF_NEVER)); - if (oscl->ols_locklessable && !(enqflags & CEF_DISCARD_DATA)) - oscl->ols_flags |= LDLM_FL_DENY_ON_CONTENTION; if (io->ci_type == CIT_WRITE || cl_io_is_mkwrite(io)) osc_lock_set_writer(env, io, obj, oscl); diff --git a/fs/lustre/osc/osc_object.c b/fs/lustre/osc/osc_object.c index 0dd926a..517ce5c 100644 --- a/fs/lustre/osc/osc_object.c +++ b/fs/lustre/osc/osc_object.c @@ -332,28 +332,6 @@ static int osc_object_fiemap(const struct lu_env *env, struct cl_object *obj, return rc; } -int osc_object_is_contended(struct osc_object *obj) -{ - struct osc_device *dev = lu2osc_dev(obj->oo_cl.co_lu.lo_dev); - time64_t osc_contention_time = dev->od_contention_time; - ktime_t retry_time; - - if (OBD_FAIL_CHECK(OBD_FAIL_OSC_OBJECT_CONTENTION)) - return 1; - - if (!obj->oo_contended) - return 0; - - retry_time = ktime_add_ns(obj->oo_contention_time, - osc_contention_time * NSEC_PER_SEC); - if (ktime_after(ktime_get(), retry_time)) { - osc_object_clear_contended(obj); - return 0; - } - return 1; -} -EXPORT_SYMBOL(osc_object_is_contended); - /** * Implementation of struct cl_object_operations::coo_req_attr_set() for osc * layer. osc is responsible for struct obdo::o_id and struct obdo::o_seq From patchwork Mon Aug 2 19:50:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414701 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EDCEC4338F for ; Mon, 2 Aug 2021 19:52:16 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5553860F36 for ; Mon, 2 Aug 2021 19:52:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5553860F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3A7F6352F91; Mon, 2 Aug 2021 12:51:52 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3FA04352C09 for ; Mon, 2 Aug 2021 12:51:00 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 83E971008055; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 81980C2F50; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:40 -0400 Message-Id: <1627933851-7603-21-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 20/25] lustre: osc: osc: Do not flush on lockless cancel X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell The cancellation of a an OSC lock without an LDLM lock (a 'lockless' OSC lock) should not flush pages. Only direct i/o is allowed to use a lockless OSC lock, and direct i/o does not create flushable pages. DIO pages are not flushable because: A) all synced ASAP, and B) the OSC extents created for them are not added to the extent tree which is used to track these pages. Instead, this has the effect of trying to flush pages from ongoing buffered i/o. This can lead to crashes like the following: osc_cache_writeback_range()) ASSERTION(hp == 0 && discard == 0) failed This assert essentially says the lock cancellation (hp == 1) found an active i/o (an extent in the OES_ACTIVE state). This is not allowed because the flushing code assumes an LDLM lock is being cancelled, which will only start once there is no active i/o. Because the OSC lock being cancelled is not associated with an LDLM lock, this is not true, and nothing prevents active i/o under a different lock, leading to this assert. The solution is simply to not flush pages when cancelling a no-LDLM-lock OSC lock. Additional note: New lockless OSC locks cannot be created if they are blocked by a regular OSC lock, but a new regular lock can be created if there is a lockless lock present. Thus, the sequence is something like this: Direct i/o creates lockless OSC lock Buffered i/o creates OSC and LDLM lock on the same range Direct i/o finishes, starts cancelling its OSC lock Buffered i/o is still ongoing, with extents in OES_ACTIVE This results in the above crash during the OSC lock cancellation. Note it would be possible to resolve this issue by not allowing lockless OSC locks to match regular OSC locks, but this is not necessary, since there's no reason for lockless locks to flush pages on cancellation. WC-bug-id: https://jira.whamcloud.com/browse/LU-14814 Lustre-commit: 6717c573ed90da91 ("LU-14814 osc: osc: Do not flush on lockless cancel") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/44152 Reviewed-by: Li Dongyang Reviewed-by: Wang Shilong Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/osc/osc_lock.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c index f6faed7..eb3cb58 100644 --- a/fs/lustre/osc/osc_lock.c +++ b/fs/lustre/osc/osc_lock.c @@ -1134,16 +1134,8 @@ static void osc_lock_lockless_cancel(const struct lu_env *env, { struct osc_lock *ols = cl2osc_lock(slice); struct osc_object *osc = cl2osc(slice->cls_obj); - struct cl_lock_descr *descr = &slice->cls_lock->cll_descr; - int result; LASSERT(!ols->ols_dlmlock); - result = osc_lock_flush(osc, descr->cld_start, descr->cld_end, - descr->cld_mode, false); - if (result) - CERROR("Pages for lockless lock %p were not purged(%d)\n", - ols, result); - osc_lock_wake_waiters(env, osc, ols); } From patchwork Mon Aug 2 19:50:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51DF7C4338F for ; Mon, 2 Aug 2021 19:51:15 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0435C60724 for ; Mon, 2 Aug 2021 19:51:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0435C60724 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7CB93352DD7; Mon, 2 Aug 2021 12:51:10 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8B5F5352BAF for ; Mon, 2 Aug 2021 12:51:00 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 877A41008056; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 848BEC2F53; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:41 -0400 Message-Id: <1627933851-7603-22-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 21/26] lustre: pcc: add LCM_FL_PCC_RDONLY layout flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin The upcoming new feature PCC-RO is combined with FLR and extend the on-disk data strucutre 'enum lov_comp_md_flags' for layout components. It adds a new layout flag: LCM_FL_PCC_RDONLY. enum lov_comp_md_flags { LCM_FL_NONE = 0x0, LCM_FL_RDONLY = 0x1, LCM_FL_WRITE_PENDING = 0x2, LCM_FL_SYNC_PENDING = 0x3, LCM_FL_PCC_RDONLY = 0x8, LCM_FL_FLR_MASK = 0xB, }; The LCM_FL_PCC_RDONLY flag, which is dedicated for PCC-RO, is different from LCM_FL_RDONLY. A PCC-RO cached file could be in the state: - LCM_FL_PCC_RDONLY | LCM_FL_RDONLY: it means that all FLR components are synced and in up-to-date state. The replicated file is on read-only state. And then one client attaches the file into the PCC backend with PCC-RO mode. - LCM_FL_PCC_RDONLY | LCM_FL_WRITE_PENDING: it means the file was once modified, the data content of layout components are not synced. MDT has already picked a promary replica and marked other components as STALE. At this time, a client can still PCC-RO attach the file. On this client, the primary component and the PCC copy are both in up-to-date state. As a new LCM_FL_PCC_RDONLY flag is added, the old client may not understand this new FLR layout flag, and may result in inconsistent data access. This patch adds this new flag for the purpose of compatibility and interoperability. WC-bug-id: https://jira.whamcloud.com/browse/LU-13602 Lustre-commit: adc1bbbf20e0a8a5 ("LU-13602 pcc: add LCM_FL_PCC_RDONLY layout flag") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/40813 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_user.h | 13 +++++++------ 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 4301bd4..c3a8a35 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1727,6 +1727,8 @@ void lustre_assert_wire_constants(void) (long long)LCM_FL_WRITE_PENDING); LASSERTF(LCM_FL_SYNC_PENDING == 3, "found %lld\n", (long long)LCM_FL_SYNC_PENDING); + LASSERTF(LCM_FL_PCC_RDONLY == 8, "found %lld\n", + (long long)LCM_FL_PCC_RDONLY); /* Checks for struct lmv_mds_md_v1 */ LASSERTF((int)sizeof(struct lmv_mds_md_v1) == 56, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index da15ca8..748c044 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -622,12 +622,13 @@ static inline __u16 mirror_id_of(__u32 id) * on-disk data for lcm_flags. Valid if lcm_magic is LOV_MAGIC_COMP_V1. */ enum lov_comp_md_flags { - /* the least 2 bits are used by FLR to record file state */ - LCM_FL_NONE = 0, - LCM_FL_RDONLY = 1, - LCM_FL_WRITE_PENDING = 2, - LCM_FL_SYNC_PENDING = 3, - LCM_FL_FLR_MASK = 0x3, + /* the least 4 bits are used by FLR to record file state */ + LCM_FL_NONE = 0x0, + LCM_FL_RDONLY = 0x1, + LCM_FL_WRITE_PENDING = 0x2, + LCM_FL_SYNC_PENDING = 0x3, + LCM_FL_PCC_RDONLY = 0x8, + LCM_FL_FLR_MASK = 0x8, }; struct lov_comp_md_v1 { From patchwork Mon Aug 2 19:50:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8C6DC4338F for ; Mon, 2 Aug 2021 19:51:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7FC6B60724 for ; Mon, 2 Aug 2021 19:51:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7FC6B60724 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4CCBD352DFC; Mon, 2 Aug 2021 12:51:14 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1E957352C1D for ; Mon, 2 Aug 2021 12:51:01 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 8CB8A1008064; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 8A765C2F46; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:43 -0400 Message-Id: <1627933851-7603-24-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 22/25] lustre: mdc: set default LMV on ROOT X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao To balance MDT usage, set default LMV on ROOT if it's not set. The default stripe offset is "-1", and default stripe count is "1". Then directory created by "mkdir" under ROOT will be scattered on all MDTs by usage. WC-bug-id: https://jira.whamcloud.com/browse/LU-13417 Lustre-commit: 3e04b0fd6c3dd363 ("LU-13417 mdd: set default LMV on ROOT") Signed-off-by: Lai Siyao Signed-off-by: Andreas Dilger Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/38553 Signed-off-by: James Simmons --- fs/lustre/mdc/mdc_request.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 1fb9c46..8b94f6c 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -557,6 +557,13 @@ static int mdc_get_lustre_md(struct obd_export *exp, struct req_capsule *pill, goto out; } + if (md_exp->exp_obd->obd_type->typ_lu == &mdc_device_type) { + CERROR("%s: no LMV, upgrading from old version?\n", + md_exp->exp_obd->obd_name); + rc = 0; + goto out_acl; + } + if (md->body->mbo_valid & OBD_MD_MEA) { lmv_size = md->body->mbo_eadatasize; if (!lmv_size) { @@ -618,6 +625,7 @@ static int mdc_get_lustre_md(struct obd_export *exp, struct req_capsule *pill, } rc = 0; +out_acl: /* for ACL, it's possible that FLACL is set but aclsize is zero. * only when aclsize != 0 there's an actual segment for ACL * in reply buffer. From patchwork Mon Aug 2 19:50:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414677 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1BDEC432BE for ; Mon, 2 Aug 2021 19:51:28 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6388660FC2 for ; Mon, 2 Aug 2021 19:51:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6388660FC2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B1B48352E27; Mon, 2 Aug 2021 12:51:20 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 928F3352C1D for ; Mon, 2 Aug 2021 12:51:01 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 91712100F35C; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 90631C2F50; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:45 -0400 Message-Id: <1627933851-7603-26-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 23/25] lustre: llite: enable filesystem-wide default LMV X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao This change includes three parts: 1. save dir depth to ROOT after lookup on client side. 2. once space balanced default LMV is set on ROOT, and max-inherit/max-inherit-rr is unlimited or not less than directory depth, new directory will be created in QOS or roundrobin mode. 3. set ROOT default LMV max-inherit unlimited, and max-inherit-rr to 3, and increase the ratio to create subdirectory on local MDT with the directory depth to ROOT, so that new directories will be created by space usage, and the deeper it's located it's more likely to create on local MDTs; and the top 3 layer will be created in roundrobin mode if system is balanced. WC-bug-id: https://jira.whamcloud.com/browse/LU-14792 Lustre-commit: b9c4dc3c33fe87ec ("LU-14792 llite: enable filesystem-wide default LMV") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/44090 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 5 +++ fs/lustre/llite/dir.c | 2 + fs/lustre/llite/file.c | 5 ++- fs/lustre/llite/llite_internal.h | 5 ++- fs/lustre/llite/llite_lib.c | 17 ++++++++ fs/lustre/llite/namei.c | 74 ++++++++++++++++++++++++++++++--- fs/lustre/llite/statahead.c | 5 ++- fs/lustre/lmv/lmv_obd.c | 32 ++++++++------ fs/lustre/lmv/lproc_lmv.c | 26 +++++++++++- include/uapi/linux/lustre/lustre_user.h | 2 + 10 files changed, 149 insertions(+), 24 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index f619342..7c5e699 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -706,6 +706,8 @@ enum md_op_flags { MF_MDC_CANCEL_FID4 = BIT(3), MF_GET_MDT_IDX = BIT(4), MF_GETATTR_BY_FID = BIT(5), + MF_QOS_MKDIR = BIT(6), + MF_RR_MKDIR = BIT(7), }; enum md_cli_flags { @@ -795,6 +797,9 @@ struct md_op_data { u32 op_projid; + /* mkdir */ + unsigned short op_dir_depth; + u16 op_mirror_id; /* diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 9666534..57f7c3c 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -442,6 +442,8 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump, if (IS_ERR(op_data)) return PTR_ERR(op_data); + op_data->op_dir_depth = ll_i2info(parent)->lli_depth; + if (ll_sbi_has_encrypt(sbi) && (IS_ENCRYPTED(parent) || unlikely(fscrypt_dummy_context_enabled(parent)))) { diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index a4e432e..aa5c662 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -676,8 +676,11 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, * of kernel will deal with that later. */ ll_set_lock_data(sbi->ll_md_exp, inode, itp, &bits); - if (bits & MDS_INODELOCK_LOOKUP) + if (bits & MDS_INODELOCK_LOOKUP) { d_lustre_revalidate(de); + ll_update_dir_depth(parent->d_inode, d_inode(de)); + } + /* if DoM bit returned along with LAYOUT bit then there * can be read-on-open data returned. */ diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 2247806..95e4f45 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -178,13 +178,15 @@ struct ll_inode_info { * -- I am the owner of dir statahead. */ pid_t lli_opendir_pid; + /* directory depth to ROOT */ + unsigned short lli_depth; /* stat will try to access statahead entries or start * statahead if this flag is set, and this flag will be * set upon dir open, and cleared when dir is closed, * statahead hit ratio is too low, or start statahead * thread failed. */ - unsigned int lli_sa_enabled:1; + unsigned short lli_sa_enabled:1; /* generation for statahead */ unsigned int lli_sa_generation; /* rw lock protects lli_lsm_md */ @@ -1215,6 +1217,7 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, u32 flags); int ll_update_inode(struct inode *inode, struct lustre_md *md); void ll_update_inode_flags(struct inode *inode, unsigned int ext_flags); +void ll_update_dir_depth(struct inode *dir, struct inode *inode); int ll_read_inode2(struct inode *inode, void *opaque); void ll_truncate_inode_pages_final(struct inode *inode); void ll_delete_inode(struct inode *inode); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 63d0f02..f540caf 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2483,6 +2483,23 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md) return 0; } +/* update directory depth to ROOT, called after LOOKUP lock is fetched. */ +void ll_update_dir_depth(struct inode *dir, struct inode *inode) +{ + struct ll_inode_info *lli; + + if (!S_ISDIR(inode->i_mode)) + return; + + if (inode == dir) + return; + + lli = ll_i2info(inode); + lli->lli_depth = ll_i2info(dir)->lli_depth + 1; + CDEBUG(D_INODE, DFID" depth %hu\n", PFID(&lli->lli_fid), + lli->lli_depth); +} + void ll_truncate_inode_pages_final(struct inode *inode) { struct address_space *mapping = &inode->i_data; diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 5cc01f0..54b4e0a 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -741,8 +741,10 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, if (!it_disposition(it, DISP_LOOKUP_NEG)) { /* We have the "lookup" lock, so unhide dentry */ - if (bits & MDS_INODELOCK_LOOKUP) + if (bits & MDS_INODELOCK_LOOKUP) { d_lustre_revalidate(*de); + ll_update_dir_depth(parent, d_inode(*de)); + } if (encrypt) { rc = fscrypt_get_encryption_info(inode); @@ -1415,10 +1417,6 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry, return rc; } - ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, inode, it, &bits); - if (bits & MDS_INODELOCK_LOOKUP) - d_lustre_revalidate(dentry); - d_instantiate(dentry, inode); if (encrypt) { @@ -1427,8 +1425,17 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry, return rc; } - if (!(ll_i2sbi(inode)->ll_flags & LL_SBI_FILE_SECCTX)) + if (!(ll_i2sbi(inode)->ll_flags & LL_SBI_FILE_SECCTX)) { rc = ll_inode_init_security(dentry, inode, dir); + if (rc) + return rc; + } + + ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, inode, it, &bits); + if (bits & MDS_INODELOCK_LOOKUP) { + d_lustre_revalidate(dentry); + ll_update_dir_depth(dir, inode); + } return rc; } @@ -1451,6 +1458,58 @@ void ll_update_times(struct ptlrpc_request *request, struct inode *inode) inode->i_ctime.tv_sec = body->mbo_ctime; } +/* once default LMV (space balanced) is set on ROOT, it should take effect if + * default LMV is not set on parent directory. + */ +static void ll_qos_mkdir_prep(struct md_op_data *op_data, struct inode *dir) +{ + struct inode *root = dir->i_sb->s_root->d_inode; + struct ll_inode_info *rlli = ll_i2info(root); + struct ll_inode_info *lli = ll_i2info(dir); + struct lmv_stripe_md *lsm; + + op_data->op_dir_depth = lli->lli_depth; + + /* parent directory is striped */ + if (unlikely(lli->lli_lsm_md)) + return; + + /* default LMV set on parent directory */ + if (unlikely(lli->lli_default_lsm_md)) + return; + + /* parent is ROOT */ + if (unlikely(dir == root)) + return; + + /* default LMV not set on ROOT */ + if (!rlli->lli_default_lsm_md) + return; + + down_read(&rlli->lli_lsm_sem); + lsm = rlli->lli_default_lsm_md; + if (!lsm) + goto unlock; + + /* not space balanced */ + if (lsm->lsm_md_master_mdt_index != LMV_OFFSET_DEFAULT) + goto unlock; + + if (lsm->lsm_md_max_inherit != LMV_INHERIT_NONE && + (lsm->lsm_md_max_inherit == LMV_INHERIT_UNLIMITED || + lsm->lsm_md_max_inherit >= lli->lli_depth)) { + op_data->op_flags |= MF_QOS_MKDIR; + if (lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE && + (lsm->lsm_md_max_inherit_rr == LMV_INHERIT_RR_UNLIMITED || + lsm->lsm_md_max_inherit_rr >= lli->lli_depth)) + op_data->op_flags |= MF_RR_MKDIR; + CDEBUG(D_INODE, DFID" requests qos mkdir %#x\n", + PFID(&lli->lli_fid), op_data->op_flags); + } +unlock: + up_read(&rlli->lli_lsm_sem); +} + static int ll_new_node(struct inode *dir, struct dentry *dentry, const char *tgt, umode_t mode, int rdev, u32 opc) @@ -1475,6 +1534,9 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry, goto err_exit; } + if (S_ISDIR(mode)) + ll_qos_mkdir_prep(op_data, dir); + if (sbi->ll_flags & LL_SBI_FILE_SECCTX) { err = ll_dentry_init_security(dentry, mode, &dentry->d_name, &op_data->op_file_secctx_name, diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 8930f61..e00fe58 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -1488,8 +1488,11 @@ static int revalidate_statahead_dentry(struct inode *dir, } if ((bits & MDS_INODELOCK_LOOKUP) && - d_lustre_invalid(*dentryp)) + d_lustre_invalid(*dentryp)) { d_lustre_revalidate(*dentryp); + ll_update_dir_depth(dir, (*dentryp)->d_inode); + } + ll_intent_release(&it); } } diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 71bf7811..fb64b6c 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1427,7 +1427,8 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, return md_close(tgt->ltd_exp, op_data, mod, request); } -static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) +static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt, + unsigned short dir_depth) { struct lu_tgt_desc *tgt, *cur = NULL; u64 total_avail = 0; @@ -1470,10 +1471,10 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) /* if current MDT has above-average space, within range of the QOS * threshold, stay on the same MDT to avoid creating needless remote - * MDT directories. + * MDT directories. It's more likely for low level directories. */ rand = total_avail * (256 - lmv->lmv_qos.lq_threshold_rr) / - (total_usable * 256); + (total_usable * 256 * (1 + dir_depth / 4)); if (cur && cur->ltd_qos.ltq_avail >= rand) { tgt = cur; rc = 0; @@ -1727,12 +1728,14 @@ static inline bool lmv_op_default_qos_mkdir(const struct md_op_data *op_data) { const struct lmv_stripe_md *lsm = op_data->op_default_mea1; - return lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT; + return (op_data->op_flags & MF_QOS_MKDIR) || + (lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT); } -/* mkdir by QoS in two cases: - * 1. 'lfs mkdir -i -1' - * 2. parent default LMV master_mdt_index is -1 +/* mkdir by QoS in three cases: + * 1. ROOT default LMV is space balanced. + * 2. 'lfs mkdir -i -1' + * 3. parent default LMV master_mdt_index is -1 * * NB, mkdir by QoS only if parent is not striped, this is to avoid remote * directories under striped directory. @@ -1754,11 +1757,12 @@ static inline bool lmv_op_qos_mkdir(const struct md_op_data *op_data) return false; } -/* if default LMV is set, and its index is LMV_OFFSET_DEFAULT, and - * 1. max_inherit_rr is set and is not LMV_INHERIT_RR_NONE +/* if parent default LMV is space balanced, and + * 1. max_inherit_rr is set * 2. or parent is ROOT - * mkdir roundrobin. - * NB, this also needs to check server is balanced, which is checked by caller. + * mkdir roundrobin. Or if parent doesn't have default LMV, while ROOT default + * LMV requests roundrobin mkdir, do the same. + * NB, this needs to check server is balanced, which is done by caller. */ static inline bool lmv_op_default_rr_mkdir(const struct md_op_data *op_data) { @@ -1767,7 +1771,8 @@ static inline bool lmv_op_default_rr_mkdir(const struct md_op_data *op_data) if (!lmv_op_default_qos_mkdir(op_data)) return false; - return lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE || + return (op_data->op_flags & MF_RR_MKDIR) || + (lsm && lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE) || fid_is_root(&op_data->op_fid1); } @@ -1842,7 +1847,8 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, } else if (lmv_op_qos_mkdir(op_data)) { struct lmv_tgt_desc *tmp = tgt; - tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds); + tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds, + op_data->op_dir_depth); if (tgt == ERR_PTR(-EAGAIN)) { if (ltd_qos_is_balanced(&lmv->lmv_mdt_descs) && !lmv_op_default_rr_mkdir(op_data) && diff --git a/fs/lustre/lmv/lproc_lmv.c b/fs/lustre/lmv/lproc_lmv.c index 767b40e..b9efae9 100644 --- a/fs/lustre/lmv/lproc_lmv.c +++ b/fs/lustre/lmv/lproc_lmv.c @@ -121,10 +121,21 @@ static ssize_t qos_prio_free_store(struct kobject *kobj, struct obd_device *obd = container_of(kobj, struct obd_device, obd_kset.kobj); struct lmv_obd *lmv = &obd->u.lmv; + char buf[6], *tmp; unsigned int val; int rc; - rc = kstrtouint(buffer, 0, &val); + /* "100%\n\0" should be largest string */ + if (count >= sizeof(buf)) + return -ERANGE; + + strncpy(buf, buffer, sizeof(buf)); + buf[sizeof(buf) - 1] = '\0'; + tmp = strchr(buf, '%'); + if (tmp) + *tmp = '\0'; + + rc = kstrtouint(buf, 0, &val); if (rc) return rc; @@ -158,10 +169,21 @@ static ssize_t qos_threshold_rr_store(struct kobject *kobj, struct obd_device *obd = container_of(kobj, struct obd_device, obd_kset.kobj); struct lmv_obd *lmv = &obd->u.lmv; + char buf[6], *tmp; unsigned int val; int rc; - rc = kstrtouint(buffer, 0, &val); + /* "100%\n\0" should be largest string */ + if (count >= sizeof(buf)) + return -ERANGE; + + strncpy(buf, buffer, sizeof(buf)); + buf[sizeof(buf) - 1] = '\0'; + tmp = strchr(buf, '%'); + if (tmp) + *tmp = '\0'; + + rc = kstrtouint(buf, 0, &val); if (rc) return rc; diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index da15ca8..b317bbf 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -847,6 +847,8 @@ enum { LMV_INHERIT_RR_DEFAULT = 0, /* not inherit any more */ LMV_INHERIT_RR_END = 1, + /* default inherit_rr of ROOT */ + LMV_INHERIT_RR_ROOT = 3, /* max inherit depth */ LMV_INHERIT_RR_MAX = 250, /* [251, 254] are reserved */ From patchwork Mon Aug 2 19:50:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9964CC4338F for ; Mon, 2 Aug 2021 19:51:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 577D560F36 for ; Mon, 2 Aug 2021 19:51:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 577D560F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A138D352DD4; Mon, 2 Aug 2021 12:51:18 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6B23E352BF8 for ; Mon, 2 Aug 2021 12:51:02 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 9ABAE100F35F; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 99B58C2F46; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:48 -0400 Message-Id: <1627933851-7603-29-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 24/26] lustre: llite: enable filesystem-wide default LMV X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao This change includes three parts: 1. save dir depth to ROOT after lookup on client side. 2. once space balanced default LMV is set on ROOT, and max-inherit/max-inherit-rr is unlimited or not less than directory depth, new directory will be created in QOS or roundrobin mode. 3. set ROOT default LMV max-inherit unlimited, and max-inherit-rr to 3, and increase the ratio to create subdirectory on local MDT with the directory depth to ROOT, so that new directories will be created by space usage, and the deeper it's located it's more likely to create on local MDTs; and the top 3 layer will be created in roundrobin mode if system is balanced. WC-bug-id: https://jira.whamcloud.com/browse/LU-14792 Lustre-commit: b9c4dc3c33fe87ec ("LU-14792 llite: enable filesystem-wide default LMV") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/44090 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 5 +++ fs/lustre/llite/dir.c | 2 + fs/lustre/llite/file.c | 5 ++- fs/lustre/llite/llite_internal.h | 5 ++- fs/lustre/llite/llite_lib.c | 17 ++++++++ fs/lustre/llite/namei.c | 74 ++++++++++++++++++++++++++++++--- fs/lustre/llite/statahead.c | 5 ++- fs/lustre/lmv/lmv_obd.c | 32 ++++++++------ fs/lustre/lmv/lproc_lmv.c | 26 +++++++++++- include/uapi/linux/lustre/lustre_user.h | 2 + 10 files changed, 149 insertions(+), 24 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index f619342..7c5e699 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -706,6 +706,8 @@ enum md_op_flags { MF_MDC_CANCEL_FID4 = BIT(3), MF_GET_MDT_IDX = BIT(4), MF_GETATTR_BY_FID = BIT(5), + MF_QOS_MKDIR = BIT(6), + MF_RR_MKDIR = BIT(7), }; enum md_cli_flags { @@ -795,6 +797,9 @@ struct md_op_data { u32 op_projid; + /* mkdir */ + unsigned short op_dir_depth; + u16 op_mirror_id; /* diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 9666534..57f7c3c 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -442,6 +442,8 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump, if (IS_ERR(op_data)) return PTR_ERR(op_data); + op_data->op_dir_depth = ll_i2info(parent)->lli_depth; + if (ll_sbi_has_encrypt(sbi) && (IS_ENCRYPTED(parent) || unlikely(fscrypt_dummy_context_enabled(parent)))) { diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index a4e432e..aa5c662 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -676,8 +676,11 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize, * of kernel will deal with that later. */ ll_set_lock_data(sbi->ll_md_exp, inode, itp, &bits); - if (bits & MDS_INODELOCK_LOOKUP) + if (bits & MDS_INODELOCK_LOOKUP) { d_lustre_revalidate(de); + ll_update_dir_depth(parent->d_inode, d_inode(de)); + } + /* if DoM bit returned along with LAYOUT bit then there * can be read-on-open data returned. */ diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 2247806..95e4f45 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -178,13 +178,15 @@ struct ll_inode_info { * -- I am the owner of dir statahead. */ pid_t lli_opendir_pid; + /* directory depth to ROOT */ + unsigned short lli_depth; /* stat will try to access statahead entries or start * statahead if this flag is set, and this flag will be * set upon dir open, and cleared when dir is closed, * statahead hit ratio is too low, or start statahead * thread failed. */ - unsigned int lli_sa_enabled:1; + unsigned short lli_sa_enabled:1; /* generation for statahead */ unsigned int lli_sa_generation; /* rw lock protects lli_lsm_md */ @@ -1215,6 +1217,7 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs, u32 flags); int ll_update_inode(struct inode *inode, struct lustre_md *md); void ll_update_inode_flags(struct inode *inode, unsigned int ext_flags); +void ll_update_dir_depth(struct inode *dir, struct inode *inode); int ll_read_inode2(struct inode *inode, void *opaque); void ll_truncate_inode_pages_final(struct inode *inode); void ll_delete_inode(struct inode *inode); diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 63d0f02..f540caf 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -2483,6 +2483,23 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md) return 0; } +/* update directory depth to ROOT, called after LOOKUP lock is fetched. */ +void ll_update_dir_depth(struct inode *dir, struct inode *inode) +{ + struct ll_inode_info *lli; + + if (!S_ISDIR(inode->i_mode)) + return; + + if (inode == dir) + return; + + lli = ll_i2info(inode); + lli->lli_depth = ll_i2info(dir)->lli_depth + 1; + CDEBUG(D_INODE, DFID" depth %hu\n", PFID(&lli->lli_fid), + lli->lli_depth); +} + void ll_truncate_inode_pages_final(struct inode *inode) { struct address_space *mapping = &inode->i_data; diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 5cc01f0..54b4e0a 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -741,8 +741,10 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request, if (!it_disposition(it, DISP_LOOKUP_NEG)) { /* We have the "lookup" lock, so unhide dentry */ - if (bits & MDS_INODELOCK_LOOKUP) + if (bits & MDS_INODELOCK_LOOKUP) { d_lustre_revalidate(*de); + ll_update_dir_depth(parent, d_inode(*de)); + } if (encrypt) { rc = fscrypt_get_encryption_info(inode); @@ -1415,10 +1417,6 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry, return rc; } - ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, inode, it, &bits); - if (bits & MDS_INODELOCK_LOOKUP) - d_lustre_revalidate(dentry); - d_instantiate(dentry, inode); if (encrypt) { @@ -1427,8 +1425,17 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry, return rc; } - if (!(ll_i2sbi(inode)->ll_flags & LL_SBI_FILE_SECCTX)) + if (!(ll_i2sbi(inode)->ll_flags & LL_SBI_FILE_SECCTX)) { rc = ll_inode_init_security(dentry, inode, dir); + if (rc) + return rc; + } + + ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, inode, it, &bits); + if (bits & MDS_INODELOCK_LOOKUP) { + d_lustre_revalidate(dentry); + ll_update_dir_depth(dir, inode); + } return rc; } @@ -1451,6 +1458,58 @@ void ll_update_times(struct ptlrpc_request *request, struct inode *inode) inode->i_ctime.tv_sec = body->mbo_ctime; } +/* once default LMV (space balanced) is set on ROOT, it should take effect if + * default LMV is not set on parent directory. + */ +static void ll_qos_mkdir_prep(struct md_op_data *op_data, struct inode *dir) +{ + struct inode *root = dir->i_sb->s_root->d_inode; + struct ll_inode_info *rlli = ll_i2info(root); + struct ll_inode_info *lli = ll_i2info(dir); + struct lmv_stripe_md *lsm; + + op_data->op_dir_depth = lli->lli_depth; + + /* parent directory is striped */ + if (unlikely(lli->lli_lsm_md)) + return; + + /* default LMV set on parent directory */ + if (unlikely(lli->lli_default_lsm_md)) + return; + + /* parent is ROOT */ + if (unlikely(dir == root)) + return; + + /* default LMV not set on ROOT */ + if (!rlli->lli_default_lsm_md) + return; + + down_read(&rlli->lli_lsm_sem); + lsm = rlli->lli_default_lsm_md; + if (!lsm) + goto unlock; + + /* not space balanced */ + if (lsm->lsm_md_master_mdt_index != LMV_OFFSET_DEFAULT) + goto unlock; + + if (lsm->lsm_md_max_inherit != LMV_INHERIT_NONE && + (lsm->lsm_md_max_inherit == LMV_INHERIT_UNLIMITED || + lsm->lsm_md_max_inherit >= lli->lli_depth)) { + op_data->op_flags |= MF_QOS_MKDIR; + if (lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE && + (lsm->lsm_md_max_inherit_rr == LMV_INHERIT_RR_UNLIMITED || + lsm->lsm_md_max_inherit_rr >= lli->lli_depth)) + op_data->op_flags |= MF_RR_MKDIR; + CDEBUG(D_INODE, DFID" requests qos mkdir %#x\n", + PFID(&lli->lli_fid), op_data->op_flags); + } +unlock: + up_read(&rlli->lli_lsm_sem); +} + static int ll_new_node(struct inode *dir, struct dentry *dentry, const char *tgt, umode_t mode, int rdev, u32 opc) @@ -1475,6 +1534,9 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry, goto err_exit; } + if (S_ISDIR(mode)) + ll_qos_mkdir_prep(op_data, dir); + if (sbi->ll_flags & LL_SBI_FILE_SECCTX) { err = ll_dentry_init_security(dentry, mode, &dentry->d_name, &op_data->op_file_secctx_name, diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c index 8930f61..e00fe58 100644 --- a/fs/lustre/llite/statahead.c +++ b/fs/lustre/llite/statahead.c @@ -1488,8 +1488,11 @@ static int revalidate_statahead_dentry(struct inode *dir, } if ((bits & MDS_INODELOCK_LOOKUP) && - d_lustre_invalid(*dentryp)) + d_lustre_invalid(*dentryp)) { d_lustre_revalidate(*dentryp); + ll_update_dir_depth(dir, (*dentryp)->d_inode); + } + ll_intent_release(&it); } } diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 71bf7811..fb64b6c 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1427,7 +1427,8 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data, return md_close(tgt->ltd_exp, op_data, mod, request); } -static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) +static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt, + unsigned short dir_depth) { struct lu_tgt_desc *tgt, *cur = NULL; u64 total_avail = 0; @@ -1470,10 +1471,10 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt) /* if current MDT has above-average space, within range of the QOS * threshold, stay on the same MDT to avoid creating needless remote - * MDT directories. + * MDT directories. It's more likely for low level directories. */ rand = total_avail * (256 - lmv->lmv_qos.lq_threshold_rr) / - (total_usable * 256); + (total_usable * 256 * (1 + dir_depth / 4)); if (cur && cur->ltd_qos.ltq_avail >= rand) { tgt = cur; rc = 0; @@ -1727,12 +1728,14 @@ static inline bool lmv_op_default_qos_mkdir(const struct md_op_data *op_data) { const struct lmv_stripe_md *lsm = op_data->op_default_mea1; - return lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT; + return (op_data->op_flags & MF_QOS_MKDIR) || + (lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT); } -/* mkdir by QoS in two cases: - * 1. 'lfs mkdir -i -1' - * 2. parent default LMV master_mdt_index is -1 +/* mkdir by QoS in three cases: + * 1. ROOT default LMV is space balanced. + * 2. 'lfs mkdir -i -1' + * 3. parent default LMV master_mdt_index is -1 * * NB, mkdir by QoS only if parent is not striped, this is to avoid remote * directories under striped directory. @@ -1754,11 +1757,12 @@ static inline bool lmv_op_qos_mkdir(const struct md_op_data *op_data) return false; } -/* if default LMV is set, and its index is LMV_OFFSET_DEFAULT, and - * 1. max_inherit_rr is set and is not LMV_INHERIT_RR_NONE +/* if parent default LMV is space balanced, and + * 1. max_inherit_rr is set * 2. or parent is ROOT - * mkdir roundrobin. - * NB, this also needs to check server is balanced, which is checked by caller. + * mkdir roundrobin. Or if parent doesn't have default LMV, while ROOT default + * LMV requests roundrobin mkdir, do the same. + * NB, this needs to check server is balanced, which is done by caller. */ static inline bool lmv_op_default_rr_mkdir(const struct md_op_data *op_data) { @@ -1767,7 +1771,8 @@ static inline bool lmv_op_default_rr_mkdir(const struct md_op_data *op_data) if (!lmv_op_default_qos_mkdir(op_data)) return false; - return lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE || + return (op_data->op_flags & MF_RR_MKDIR) || + (lsm && lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE) || fid_is_root(&op_data->op_fid1); } @@ -1842,7 +1847,8 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data, } else if (lmv_op_qos_mkdir(op_data)) { struct lmv_tgt_desc *tmp = tgt; - tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds); + tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds, + op_data->op_dir_depth); if (tgt == ERR_PTR(-EAGAIN)) { if (ltd_qos_is_balanced(&lmv->lmv_mdt_descs) && !lmv_op_default_rr_mkdir(op_data) && diff --git a/fs/lustre/lmv/lproc_lmv.c b/fs/lustre/lmv/lproc_lmv.c index 767b40e..b9efae9 100644 --- a/fs/lustre/lmv/lproc_lmv.c +++ b/fs/lustre/lmv/lproc_lmv.c @@ -121,10 +121,21 @@ static ssize_t qos_prio_free_store(struct kobject *kobj, struct obd_device *obd = container_of(kobj, struct obd_device, obd_kset.kobj); struct lmv_obd *lmv = &obd->u.lmv; + char buf[6], *tmp; unsigned int val; int rc; - rc = kstrtouint(buffer, 0, &val); + /* "100%\n\0" should be largest string */ + if (count >= sizeof(buf)) + return -ERANGE; + + strncpy(buf, buffer, sizeof(buf)); + buf[sizeof(buf) - 1] = '\0'; + tmp = strchr(buf, '%'); + if (tmp) + *tmp = '\0'; + + rc = kstrtouint(buf, 0, &val); if (rc) return rc; @@ -158,10 +169,21 @@ static ssize_t qos_threshold_rr_store(struct kobject *kobj, struct obd_device *obd = container_of(kobj, struct obd_device, obd_kset.kobj); struct lmv_obd *lmv = &obd->u.lmv; + char buf[6], *tmp; unsigned int val; int rc; - rc = kstrtouint(buffer, 0, &val); + /* "100%\n\0" should be largest string */ + if (count >= sizeof(buf)) + return -ERANGE; + + strncpy(buf, buffer, sizeof(buf)); + buf[sizeof(buf) - 1] = '\0'; + tmp = strchr(buf, '%'); + if (tmp) + *tmp = '\0'; + + rc = kstrtouint(buf, 0, &val); if (rc) return rc; diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 748c044..1688a53 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -848,6 +848,8 @@ enum { LMV_INHERIT_RR_DEFAULT = 0, /* not inherit any more */ LMV_INHERIT_RR_END = 1, + /* default inherit_rr of ROOT */ + LMV_INHERIT_RR_ROOT = 3, /* max inherit depth */ LMV_INHERIT_RR_MAX = 250, /* [251, 254] are reserved */ From patchwork Mon Aug 2 19:50:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414717 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 256F6C432BE for ; Mon, 2 Aug 2021 19:54:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E3AF460F36 for ; Mon, 2 Aug 2021 19:54:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E3AF460F36 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C8199352D85; Mon, 2 Aug 2021 12:54:29 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B4846352CA2 for ; Mon, 2 Aug 2021 12:51:02 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 9DE11100F360; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 9CA71C2F4C; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:49 -0400 Message-Id: <1627933851-7603-30-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 25/25] lnet: add "stats reset" to lnetctl X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cyril Bordage , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Cyril Bordage This new command resets stats shown by "lnetctl stats show". It could be useful when debugging connectivity issues, by making easier the process to detect the changes in stats from the clean state rather than on top of historical values. WC-bug-id: https://jira.whamcloud.com/browse/LU-13299 Lustre-commit: db0b09018e771146 ("LU-13299 lnet: add "stats reset" to lnetctl") Signed-off-by: Cyril Bordage Reviewed-on: https://review.whamcloud.com/44150 Reviewed-by: Andreas Dilger Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lnet/libcfs_ioctl.h | 3 ++- net/lnet/lnet/api-ni.c | 8 ++++++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h index 2c900ef..7b1c880 100644 --- a/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/include/uapi/linux/lnet/libcfs_ioctl.h @@ -155,6 +155,7 @@ struct libcfs_ioctl_data { #define IOC_LIBCFS_GET_UDSP_SIZE _IOWR(IOC_LIBCFS_TYPE, 107, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_UDSP _IOWR(IOC_LIBCFS_TYPE, 108, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_CONST_UDSP_INFO _IOWR(IOC_LIBCFS_TYPE, 109, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_MAX_NR 109 +#define IOC_LIBCFS_RESET_LNET_STATS _IOWR(IOC_LIBCFS_TYPE, 110, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_MAX_NR 110 #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 4513d8d..c7df936 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3886,6 +3886,14 @@ u32 lnet_get_dlc_seq_locked(void) return rc; } + case IOC_LIBCFS_RESET_LNET_STATS: + { + mutex_lock(&the_lnet.ln_api_mutex); + lnet_counters_reset(); + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + case IOC_LIBCFS_CONFIG_RTR: config = arg; From patchwork Mon Aug 2 19:50:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12414699 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2D9FC4338F for ; Mon, 2 Aug 2021 19:52:10 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7277260FC2 for ; Mon, 2 Aug 2021 19:52:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7277260FC2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 0B611352F66; Mon, 2 Aug 2021 12:51:49 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 35FF4352D06 for ; Mon, 2 Aug 2021 12:51:03 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id A38FC100F364; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id A2AD8C2F53; Mon, 2 Aug 2021 15:50:53 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Mon, 2 Aug 2021 15:50:51 -0400 Message-Id: <1627933851-7603-32-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> References: <1627933851-7603-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 26/26] lnet: add "stats reset" to lnetctl X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cyril Bordage , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Cyril Bordage This new command resets stats shown by "lnetctl stats show". It could be useful when debugging connectivity issues, by making easier the process to detect the changes in stats from the clean state rather than on top of historical values. WC-bug-id: https://jira.whamcloud.com/browse/LU-13299 Lustre-commit: db0b09018e771146 ("LU-13299 lnet: add "stats reset" to lnetctl") Signed-off-by: Cyril Bordage Reviewed-on: https://review.whamcloud.com/44150 Reviewed-by: Andreas Dilger Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lnet/libcfs_ioctl.h | 3 ++- net/lnet/lnet/api-ni.c | 8 ++++++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h index 2c900ef..7b1c880 100644 --- a/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/include/uapi/linux/lnet/libcfs_ioctl.h @@ -155,6 +155,7 @@ struct libcfs_ioctl_data { #define IOC_LIBCFS_GET_UDSP_SIZE _IOWR(IOC_LIBCFS_TYPE, 107, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_UDSP _IOWR(IOC_LIBCFS_TYPE, 108, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_CONST_UDSP_INFO _IOWR(IOC_LIBCFS_TYPE, 109, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_MAX_NR 109 +#define IOC_LIBCFS_RESET_LNET_STATS _IOWR(IOC_LIBCFS_TYPE, 110, IOCTL_CONFIG_SIZE) +#define IOC_LIBCFS_MAX_NR 110 #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 4513d8d..c7df936 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -3886,6 +3886,14 @@ u32 lnet_get_dlc_seq_locked(void) return rc; } + case IOC_LIBCFS_RESET_LNET_STATS: + { + mutex_lock(&the_lnet.ln_api_mutex); + lnet_counters_reset(); + mutex_unlock(&the_lnet.ln_api_mutex); + return 0; + } + case IOC_LIBCFS_CONFIG_RTR: config = arg;