From patchwork Tue Dec 22 07:47:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 11986083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47953C433DB for ; Tue, 22 Dec 2020 07:48:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1193D23127 for ; Tue, 22 Dec 2020 07:48:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726024AbgLVHsf convert rfc822-to-8bit (ORCPT ); Tue, 22 Dec 2020 02:48:35 -0500 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:27165 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725785AbgLVHse (ORCPT ); Tue, 22 Dec 2020 02:48:34 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-16-kBZaKX7dOcye1JJY4L2Kzg-1; Tue, 22 Dec 2020 02:47:34 -0500 X-MC-Unique: kBZaKX7dOcye1JJY4L2Kzg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7237C800D62; Tue, 22 Dec 2020 07:47:33 +0000 (UTC) Received: from mickey.themaw.net (ovpn-116-49.sin2.redhat.com [10.67.116.49]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1031710013C1; Tue, 22 Dec 2020 07:47:27 +0000 (UTC) Subject: [PATCH 1/6] kernfs: move revalidate to be near lookup From: Ian Kent To: Fox Chen Cc: Tejun Heo , Greg Kroah-Hartman , Rick Lindsley , Al Viro , David Howells , Miklos Szeredi , linux-fsdevel Date: Tue, 22 Dec 2020 15:47:24 +0800 Message-ID: <160862324477.291330.6410675850055496982.stgit@mickey.themaw.net> In-Reply-To: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> References: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> User-Agent: StGit/0.21 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org While the dentry operation kernfs_dop_revalidate() is grouped with dentry type functions it also has a strong afinity to the inode operation ->lookup(). In order to take agvantage of the VFS netative dentry caching that can be used to reduce path lookup overhead on non-existent paths it will need to call kernfs_find_ns(). So, to avoid a forward declaration, move it to be near kernfs_iop_lookup(). There's no functional change from this patch. Signed-off-by: Ian Kent --- fs/kernfs/dir.c | 86 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 43 insertions(+), 43 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 7a53eed69fef..c52190acda8a 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -548,49 +548,6 @@ void kernfs_put(struct kernfs_node *kn) } EXPORT_SYMBOL_GPL(kernfs_put); -static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) -{ - struct kernfs_node *kn; - - if (flags & LOOKUP_RCU) - return -ECHILD; - - /* Always perform fresh lookup for negatives */ - if (d_really_is_negative(dentry)) - goto out_bad_unlocked; - - kn = kernfs_dentry_node(dentry); - mutex_lock(&kernfs_mutex); - - /* The kernfs node has been deactivated */ - if (!kernfs_active(kn)) - goto out_bad; - - /* The kernfs node has been moved? */ - if (kernfs_dentry_node(dentry->d_parent) != kn->parent) - goto out_bad; - - /* The kernfs node has been renamed */ - if (strcmp(dentry->d_name.name, kn->name) != 0) - goto out_bad; - - /* The kernfs node has been moved to a different namespace */ - if (kn->parent && kernfs_ns_enabled(kn->parent) && - kernfs_info(dentry->d_sb)->ns != kn->ns) - goto out_bad; - - mutex_unlock(&kernfs_mutex); - return 1; -out_bad: - mutex_unlock(&kernfs_mutex); -out_bad_unlocked: - return 0; -} - -const struct dentry_operations kernfs_dops = { - .d_revalidate = kernfs_dop_revalidate, -}; - /** * kernfs_node_from_dentry - determine kernfs_node associated with a dentry * @dentry: the dentry in question @@ -1073,6 +1030,49 @@ struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent, return ERR_PTR(rc); } +static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) +{ + struct kernfs_node *kn; + + if (flags & LOOKUP_RCU) + return -ECHILD; + + /* Always perform fresh lookup for negatives */ + if (d_really_is_negative(dentry)) + goto out_bad_unlocked; + + kn = kernfs_dentry_node(dentry); + mutex_lock(&kernfs_mutex); + + /* The kernfs node has been deactivated */ + if (!kernfs_active_read(kn)) + goto out_bad; + + /* The kernfs node has been moved? */ + if (kernfs_dentry_node(dentry->d_parent) != kn->parent) + goto out_bad; + + /* The kernfs node has been renamed */ + if (strcmp(dentry->d_name.name, kn->name) != 0) + goto out_bad; + + /* The kernfs node has been moved to a different namespace */ + if (kn->parent && kernfs_ns_enabled(kn->parent) && + kernfs_info(dentry->d_sb)->ns != kn->ns) + goto out_bad; + + mutex_unlock(&kernfs_mutex); + return 1; +out_bad: + mutex_unlock(&kernfs_mutex); +out_bad_unlocked: + return 0; +} + +const struct dentry_operations kernfs_dops = { + .d_revalidate = kernfs_dop_revalidate, +}; + static struct dentry *kernfs_iop_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) From patchwork Tue Dec 22 07:47:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 11986085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9905EC433E0 for ; Tue, 22 Dec 2020 07:48:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 56C99225AB for ; Tue, 22 Dec 2020 07:48:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726030AbgLVHst convert rfc822-to-8bit (ORCPT ); Tue, 22 Dec 2020 02:48:49 -0500 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:35155 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725785AbgLVHst (ORCPT ); Tue, 22 Dec 2020 02:48:49 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-354-NnEHhwp-MdWNip3acu6hiw-1; Tue, 22 Dec 2020 02:47:49 -0500 X-MC-Unique: NnEHhwp-MdWNip3acu6hiw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 17830800D62; Tue, 22 Dec 2020 07:47:48 +0000 (UTC) Received: from mickey.themaw.net (ovpn-116-49.sin2.redhat.com [10.67.116.49]) by smtp.corp.redhat.com (Postfix) with ESMTP id 94D57299AD; Tue, 22 Dec 2020 07:47:42 +0000 (UTC) Subject: [PATCH 2/6] kernfs: use VFS negative dentry caching From: Ian Kent To: Fox Chen Cc: Tejun Heo , Greg Kroah-Hartman , Rick Lindsley , Al Viro , David Howells , Miklos Szeredi , linux-fsdevel Date: Tue, 22 Dec 2020 15:47:39 +0800 Message-ID: <160862325932.291330.15146665974057046065.stgit@mickey.themaw.net> In-Reply-To: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> References: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> User-Agent: StGit/0.21 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If there are many lookups for non-existent paths these negative lookups can lead to a lot of overhead during path walks. The VFS allows dentries to be created as negative and hashed, and caches them so they can be used to reduce the fairly high overhead alloc/free cycle that occurs during these lookups. Signed-off-by: Ian Kent --- fs/kernfs/dir.c | 45 +++++++++++++++++++++++++++++++-------------- 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index c52190acda8a..34b15b95a1c2 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1032,17 +1032,37 @@ struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent, static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) { + struct kernfs_node *parent; struct kernfs_node *kn; if (flags & LOOKUP_RCU) return -ECHILD; - /* Always perform fresh lookup for negatives */ - if (d_really_is_negative(dentry)) - goto out_bad_unlocked; + mutex_lock(&kernfs_mutex); kn = kernfs_dentry_node(dentry); - mutex_lock(&kernfs_mutex); + + /* Negative hashed dentry? */ + if (!kn) { + /* If the kernfs node can be found this is a stale negative + * hashed dentry so it must be discarded and the lookup redone. + */ + parent = kernfs_dentry_node(dentry->d_parent); + if (parent) { + const void *ns = NULL; + + if (kernfs_ns_enabled(parent)) + ns = kernfs_info(dentry->d_parent->d_sb)->ns; + kn = kernfs_find_ns(parent, dentry->d_name.name, ns); + if (kn) + goto out_bad; + } + + /* The kernfs node doesn't exist, leave the dentry negative + * and return success. + */ + goto out; + } /* The kernfs node has been deactivated */ if (!kernfs_active_read(kn)) @@ -1060,12 +1080,11 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) if (kn->parent && kernfs_ns_enabled(kn->parent) && kernfs_info(dentry->d_sb)->ns != kn->ns) goto out_bad; - +out: mutex_unlock(&kernfs_mutex); return 1; out_bad: mutex_unlock(&kernfs_mutex); -out_bad_unlocked: return 0; } @@ -1080,7 +1099,7 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, struct dentry *ret; struct kernfs_node *parent = dir->i_private; struct kernfs_node *kn; - struct inode *inode; + struct inode *inode = NULL; const void *ns = NULL; mutex_lock(&kernfs_mutex); @@ -1090,11 +1109,9 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, kn = kernfs_find_ns(parent, dentry->d_name.name, ns); - /* no such entry */ - if (!kn || !kernfs_active(kn)) { - ret = NULL; - goto out_unlock; - } + /* no such entry, retain as negative hashed dentry */ + if (!kn || !kernfs_active(kn)) + goto out_negative; /* attach dentry and inode */ inode = kernfs_get_inode(dir->i_sb, kn); @@ -1102,10 +1119,10 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, ret = ERR_PTR(-ENOMEM); goto out_unlock; } - +out_negative: /* instantiate and hash dentry */ ret = d_splice_alias(inode, dentry); - out_unlock: +out_unlock: mutex_unlock(&kernfs_mutex); return ret; } From patchwork Tue Dec 22 07:47:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 11986087 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADC62C433E0 for ; Tue, 22 Dec 2020 07:49:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7AD4822D57 for ; Tue, 22 Dec 2020 07:49:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726082AbgLVHtC convert rfc822-to-8bit (ORCPT ); Tue, 22 Dec 2020 02:49:02 -0500 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:53476 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725850AbgLVHtC (ORCPT ); Tue, 22 Dec 2020 02:49:02 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-18-EyZ81KT6MgSOJxpq8uvo6g-1; Tue, 22 Dec 2020 02:48:04 -0500 X-MC-Unique: EyZ81KT6MgSOJxpq8uvo6g-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 16E9D15720; Tue, 22 Dec 2020 07:48:03 +0000 (UTC) Received: from mickey.themaw.net (ovpn-116-49.sin2.redhat.com [10.67.116.49]) by smtp.corp.redhat.com (Postfix) with ESMTP id 19D4F17F73; Tue, 22 Dec 2020 07:47:56 +0000 (UTC) Subject: [PATCH 3/6] kernfs: use revision to identify directory node changes From: Ian Kent To: Fox Chen Cc: Tejun Heo , Greg Kroah-Hartman , Rick Lindsley , Al Viro , David Howells , Miklos Szeredi , linux-fsdevel Date: Tue, 22 Dec 2020 15:47:54 +0800 Message-ID: <160862327395.291330.3759464665965297953.stgit@mickey.themaw.net> In-Reply-To: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> References: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> User-Agent: StGit/0.21 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If a kernfs directory node hasn't changed there's no need to search for an added (or removed) child dentry. Add a revision counter to kernfs directory nodes so it can be used to detect if a directory node has changed. Signed-off-by: Ian Kent --- fs/kernfs/dir.c | 15 ++++++++++++--- fs/kernfs/kernfs-internal.h | 24 ++++++++++++++++++++++++ include/linux/kernfs.h | 5 +++++ 3 files changed, 41 insertions(+), 3 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 34b15b95a1c2..aced0bb41083 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -372,6 +372,7 @@ static int kernfs_link_sibling(struct kernfs_node *kn) /* successfully added, account subdir number */ if (kernfs_type(kn) == KERNFS_DIR) kn->parent->dir.subdirs++; + kernfs_inc_rev(kn->parent); return 0; } @@ -394,6 +395,7 @@ static bool kernfs_unlink_sibling(struct kernfs_node *kn) if (kernfs_type(kn) == KERNFS_DIR) kn->parent->dir.subdirs--; + kernfs_inc_rev(kn->parent); rb_erase(&kn->rb, &kn->parent->dir.children); RB_CLEAR_NODE(&kn->rb); @@ -1044,13 +1046,18 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) /* Negative hashed dentry? */ if (!kn) { - /* If the kernfs node can be found this is a stale negative - * hashed dentry so it must be discarded and the lookup redone. - */ parent = kernfs_dentry_node(dentry->d_parent); if (parent) { const void *ns = NULL; + /* Directory node changed, no, then don't search? */ + if (kernfs_dir_changed(parent, dentry)) + goto out_bad; + + /* If the kernfs node can be found this is a stale + * negative hashed dentry so it must be discarded and + * the lookup redone. + */ if (kernfs_ns_enabled(parent)) ns = kernfs_info(dentry->d_parent->d_sb)->ns; kn = kernfs_find_ns(parent, dentry->d_name.name, ns); @@ -1104,6 +1111,8 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, mutex_lock(&kernfs_mutex); + kernfs_set_rev(dentry, parent); + if (kernfs_ns_enabled(parent)) ns = kernfs_info(dir->i_sb)->ns; diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index 7ee97ef59184..0d48a367231d 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -81,6 +81,30 @@ static inline struct kernfs_node *kernfs_dentry_node(struct dentry *dentry) return d_inode(dentry)->i_private; } +static inline void kernfs_set_rev(struct dentry *dentry, + struct kernfs_node *kn) +{ + dentry->d_time = kn->dir.rev; +} + +static inline void kernfs_inc_rev(struct kernfs_node *kn) +{ + if (kernfs_type(kn) == KERNFS_DIR) { + if (!++kn->dir.rev) + kn->dir.rev++; + } +} + +static inline bool kernfs_dir_changed(struct kernfs_node *kn, + struct dentry *dentry) +{ + if (kernfs_type(kn) == KERNFS_DIR) { + if (kn->dir.rev != dentry->d_time) + return true; + } + return false; +} + extern const struct super_operations kernfs_sops; extern struct kmem_cache *kernfs_node_cache, *kernfs_iattrs_cache; diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 9e8ca8743c26..7947acb1163d 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -98,6 +98,11 @@ struct kernfs_elem_dir { * better directly in kernfs_node but is here to save space. */ struct kernfs_root *root; + /* + * Monotonic revision counter, used to identify if a directory + * node has changed during revalidation. + */ + unsigned long rev; }; struct kernfs_elem_symlink { From patchwork Tue Dec 22 07:48:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 11986089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B0C7C433E0 for ; Tue, 22 Dec 2020 07:49:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5FD2E225AB for ; Tue, 22 Dec 2020 07:49:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725878AbgLVHtn convert rfc822-to-8bit (ORCPT ); Tue, 22 Dec 2020 02:49:43 -0500 Received: from us-smtp-delivery-44.mimecast.com ([207.211.30.44]:26607 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725300AbgLVHtn (ORCPT ); Tue, 22 Dec 2020 02:49:43 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-223-lku-1dzGPQmrfoQ-Yf5w0g-1; Tue, 22 Dec 2020 02:48:20 -0500 X-MC-Unique: lku-1dzGPQmrfoQ-Yf5w0g-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D919759; Tue, 22 Dec 2020 07:48:18 +0000 (UTC) Received: from mickey.themaw.net (ovpn-116-49.sin2.redhat.com [10.67.116.49]) by smtp.corp.redhat.com (Postfix) with ESMTP id 498D010013C1; Tue, 22 Dec 2020 07:48:11 +0000 (UTC) Subject: [PATCH 4/6] kernfs: switch kernfs to use an rwsem From: Ian Kent To: Fox Chen Cc: Tejun Heo , Greg Kroah-Hartman , Rick Lindsley , Al Viro , David Howells , Miklos Szeredi , linux-fsdevel Date: Tue, 22 Dec 2020 15:48:08 +0800 Message-ID: <160862328895.291330.6343298157363590479.stgit@mickey.themaw.net> In-Reply-To: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> References: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> User-Agent: StGit/0.21 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The kernfs global lock restricts the ability to perform kernfs node lookup operations in parallel. Change the kernfs mutex to an rwsem so that, when oppertunity arises, node searches can be done in parallel with path walk lookups. Signed-off-by: Ian Kent --- fs/kernfs/dir.c | 117 ++++++++++++++++++++++++------------------- fs/kernfs/file.c | 4 + fs/kernfs/inode.c | 16 +++--- fs/kernfs/kernfs-internal.h | 5 +- fs/kernfs/mount.c | 12 ++-- fs/kernfs/symlink.c | 4 + 6 files changed, 85 insertions(+), 73 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index aced0bb41083..fdeae2c6e7ba 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -17,7 +17,7 @@ #include "kernfs-internal.h" -DEFINE_MUTEX(kernfs_mutex); +DECLARE_RWSEM(kernfs_rwsem); static DEFINE_SPINLOCK(kernfs_rename_lock); /* kn->parent and ->name */ static char kernfs_pr_cont_buf[PATH_MAX]; /* protected by rename_lock */ static DEFINE_SPINLOCK(kernfs_idr_lock); /* root->ino_idr */ @@ -26,10 +26,21 @@ static DEFINE_SPINLOCK(kernfs_idr_lock); /* root->ino_idr */ static bool kernfs_active(struct kernfs_node *kn) { - lockdep_assert_held(&kernfs_mutex); return atomic_read(&kn->active) >= 0; } +static bool kernfs_active_write(struct kernfs_node *kn) +{ + lockdep_assert_held_write(&kernfs_rwsem); + return kernfs_active(kn); +} + +static bool kernfs_active_read(struct kernfs_node *kn) +{ + lockdep_assert_held_read(&kernfs_rwsem); + return kernfs_active(kn); +} + static bool kernfs_lockdep(struct kernfs_node *kn) { #ifdef CONFIG_DEBUG_LOCK_ALLOC @@ -340,7 +351,7 @@ static int kernfs_sd_compare(const struct kernfs_node *left, * @kn->parent->dir.children. * * Locking: - * mutex_lock(kernfs_mutex) + * kernfs_rwsem write lock * * RETURNS: * 0 on susccess -EEXIST on failure. @@ -386,7 +397,7 @@ static int kernfs_link_sibling(struct kernfs_node *kn) * removed, %false if @kn wasn't on the rbtree. * * Locking: - * mutex_lock(kernfs_mutex) + * kernfs_rwsem write lock */ static bool kernfs_unlink_sibling(struct kernfs_node *kn) { @@ -457,14 +468,14 @@ void kernfs_put_active(struct kernfs_node *kn) * return after draining is complete. */ static void kernfs_drain(struct kernfs_node *kn) - __releases(&kernfs_mutex) __acquires(&kernfs_mutex) + __releases(&kernfs_rwsem) __acquires(&kernfs_rwsem) { struct kernfs_root *root = kernfs_root(kn); - lockdep_assert_held(&kernfs_mutex); + lockdep_assert_held_write(&kernfs_rwsem); WARN_ON_ONCE(kernfs_active(kn)); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); if (kernfs_lockdep(kn)) { rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_); @@ -483,7 +494,7 @@ static void kernfs_drain(struct kernfs_node *kn) kernfs_drain_open_files(kn); - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); } /** @@ -722,7 +733,7 @@ int kernfs_add_one(struct kernfs_node *kn) bool has_ns; int ret; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); ret = -EINVAL; has_ns = kernfs_ns_enabled(parent); @@ -737,7 +748,7 @@ int kernfs_add_one(struct kernfs_node *kn) if (parent->flags & KERNFS_EMPTY_DIR) goto out_unlock; - if ((parent->flags & KERNFS_ACTIVATED) && !kernfs_active(parent)) + if ((parent->flags & KERNFS_ACTIVATED) && !kernfs_active_write(parent)) goto out_unlock; kn->hash = kernfs_name_hash(kn->name, kn->ns); @@ -753,7 +764,7 @@ int kernfs_add_one(struct kernfs_node *kn) ps_iattr->ia_mtime = ps_iattr->ia_ctime; } - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); /* * Activate the new node unless CREATE_DEACTIVATED is requested. @@ -767,7 +778,7 @@ int kernfs_add_one(struct kernfs_node *kn) return 0; out_unlock: - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return ret; } @@ -788,7 +799,7 @@ static struct kernfs_node *kernfs_find_ns(struct kernfs_node *parent, bool has_ns = kernfs_ns_enabled(parent); unsigned int hash; - lockdep_assert_held(&kernfs_mutex); + lockdep_assert_held(&kernfs_rwsem); if (has_ns != (bool)ns) { WARN(1, KERN_WARNING "kernfs: ns %s in '%s' for '%s'\n", @@ -820,7 +831,7 @@ static struct kernfs_node *kernfs_walk_ns(struct kernfs_node *parent, size_t len; char *p, *name; - lockdep_assert_held(&kernfs_mutex); + lockdep_assert_held_read(&kernfs_rwsem); /* grab kernfs_rename_lock to piggy back on kernfs_pr_cont_buf */ spin_lock_irq(&kernfs_rename_lock); @@ -860,10 +871,10 @@ struct kernfs_node *kernfs_find_and_get_ns(struct kernfs_node *parent, { struct kernfs_node *kn; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); kn = kernfs_find_ns(parent, name, ns); kernfs_get(kn); - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); return kn; } @@ -884,10 +895,10 @@ struct kernfs_node *kernfs_walk_and_get_ns(struct kernfs_node *parent, { struct kernfs_node *kn; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); kn = kernfs_walk_ns(parent, path, ns); kernfs_get(kn); - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); return kn; } @@ -1040,7 +1051,7 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) if (flags & LOOKUP_RCU) return -ECHILD; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); kn = kernfs_dentry_node(dentry); @@ -1088,10 +1099,10 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) kernfs_info(dentry->d_sb)->ns != kn->ns) goto out_bad; out: - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); return 1; out_bad: - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); return 0; } @@ -1109,7 +1120,7 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, struct inode *inode = NULL; const void *ns = NULL; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); kernfs_set_rev(dentry, parent); @@ -1132,7 +1143,7 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, /* instantiate and hash dentry */ ret = d_splice_alias(inode, dentry); out_unlock: - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); return ret; } @@ -1251,7 +1262,7 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos, { struct rb_node *rbn; - lockdep_assert_held(&kernfs_mutex); + lockdep_assert_held_write(&kernfs_rwsem); /* if first iteration, visit leftmost descendant which may be root */ if (!pos) @@ -1287,7 +1298,7 @@ void kernfs_activate(struct kernfs_node *kn) { struct kernfs_node *pos; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); pos = NULL; while ((pos = kernfs_next_descendant_post(pos, kn))) { @@ -1301,14 +1312,14 @@ void kernfs_activate(struct kernfs_node *kn) pos->flags |= KERNFS_ACTIVATED; } - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); } static void __kernfs_remove(struct kernfs_node *kn) { struct kernfs_node *pos; - lockdep_assert_held(&kernfs_mutex); + lockdep_assert_held_write(&kernfs_rwsem); /* * Short-circuit if non-root @kn has already finished removal. @@ -1323,7 +1334,7 @@ static void __kernfs_remove(struct kernfs_node *kn) /* prevent any new usage under @kn by deactivating all nodes */ pos = NULL; while ((pos = kernfs_next_descendant_post(pos, kn))) - if (kernfs_active(pos)) + if (kernfs_active_write(pos)) atomic_add(KN_DEACTIVATED_BIAS, &pos->active); /* deactivate and unlink the subtree node-by-node */ @@ -1331,7 +1342,7 @@ static void __kernfs_remove(struct kernfs_node *kn) pos = kernfs_leftmost_descendant(kn); /* - * kernfs_drain() drops kernfs_mutex temporarily and @pos's + * kernfs_drain() drops kernfs_rwsem temporarily and @pos's * base ref could have been put by someone else by the time * the function returns. Make sure it doesn't go away * underneath us. @@ -1378,9 +1389,9 @@ static void __kernfs_remove(struct kernfs_node *kn) */ void kernfs_remove(struct kernfs_node *kn) { - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); __kernfs_remove(kn); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); } /** @@ -1467,17 +1478,17 @@ bool kernfs_remove_self(struct kernfs_node *kn) { bool ret; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); kernfs_break_active_protection(kn); /* * SUICIDAL is used to arbitrate among competing invocations. Only * the first one will actually perform removal. When the removal * is complete, SUICIDED is set and the active ref is restored - * while holding kernfs_mutex. The ones which lost arbitration - * waits for SUICDED && drained which can happen only after the - * enclosing kernfs operation which executed the winning instance - * of kernfs_remove_self() finished. + * while holding kernfs_rwsem for write. The ones which lost + * arbitration waits for SUICIDED && drained which can happen only + * after the enclosing kernfs operation which executed the winning + * instance of kernfs_remove_self() finished. */ if (!(kn->flags & KERNFS_SUICIDAL)) { kn->flags |= KERNFS_SUICIDAL; @@ -1495,9 +1506,9 @@ bool kernfs_remove_self(struct kernfs_node *kn) atomic_read(&kn->active) == KN_DEACTIVATED_BIAS) break; - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); schedule(); - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); } finish_wait(waitq, &wait); WARN_ON_ONCE(!RB_EMPTY_NODE(&kn->rb)); @@ -1505,12 +1516,12 @@ bool kernfs_remove_self(struct kernfs_node *kn) } /* - * This must be done while holding kernfs_mutex; otherwise, waiting - * for SUICIDED && deactivated could finish prematurely. + * This must be done while holding kernfs_rwsem for write; otherwise, + * waiting for SUICIDED && deactivated could finish prematurely. */ kernfs_unbreak_active_protection(kn); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return ret; } @@ -1534,13 +1545,13 @@ int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name, return -ENOENT; } - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); kn = kernfs_find_ns(parent, name, ns); if (kn) __kernfs_remove(kn); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); if (kn) return 0; @@ -1566,10 +1577,10 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent, if (!kn->parent) return -EINVAL; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); error = -ENOENT; - if (!kernfs_active(kn) || !kernfs_active(new_parent) || + if (!kernfs_active_write(kn) || !kernfs_active_write(new_parent) || (new_parent->flags & KERNFS_EMPTY_DIR)) goto out; @@ -1620,7 +1631,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent, error = 0; out: - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return error; } @@ -1640,7 +1651,7 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns, struct kernfs_node *parent, loff_t hash, struct kernfs_node *pos) { if (pos) { - int valid = kernfs_active(pos) && + int valid = kernfs_active_read(pos) && pos->parent == parent && hash == pos->hash; kernfs_put(pos); if (!valid) @@ -1660,7 +1671,7 @@ static struct kernfs_node *kernfs_dir_pos(const void *ns, } } /* Skip over entries which are dying/dead or in the wrong namespace */ - while (pos && (!kernfs_active(pos) || pos->ns != ns)) { + while (pos && (!kernfs_active_read(pos) || pos->ns != ns)) { struct rb_node *node = rb_next(&pos->rb); if (!node) pos = NULL; @@ -1681,7 +1692,7 @@ static struct kernfs_node *kernfs_dir_next_pos(const void *ns, pos = NULL; else pos = rb_to_kn(node); - } while (pos && (!kernfs_active(pos) || pos->ns != ns)); + } while (pos && (!kernfs_active_read(pos) || pos->ns != ns)); } return pos; } @@ -1695,7 +1706,7 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx) if (!dir_emit_dots(file, ctx)) return 0; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); if (kernfs_ns_enabled(parent)) ns = kernfs_info(dentry->d_sb)->ns; @@ -1712,12 +1723,12 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx) file->private_data = pos; kernfs_get(pos); - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); if (!dir_emit(ctx, name, len, ino, type)) return 0; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); } - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); file->private_data = NULL; ctx->pos = INT_MAX; return 0; diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index f277d023ebcd..7e391784dbe4 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -879,7 +879,7 @@ static void kernfs_notify_workfn(struct work_struct *work) spin_unlock_irq(&kernfs_notify_lock); /* kick fsnotify */ - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); list_for_each_entry(info, &kernfs_root(kn)->supers, node) { struct kernfs_node *parent; @@ -917,7 +917,7 @@ static void kernfs_notify_workfn(struct work_struct *work) iput(inode); } - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); kernfs_put(kn); goto repeat; } diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c index fc2469a20fed..ddaf18198935 100644 --- a/fs/kernfs/inode.c +++ b/fs/kernfs/inode.c @@ -106,9 +106,9 @@ int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr) { int ret; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); ret = __kernfs_setattr(kn, iattr); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return ret; } @@ -121,7 +121,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr) if (!kn) return -EINVAL; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); error = setattr_prepare(dentry, iattr); if (error) goto out; @@ -134,7 +134,7 @@ int kernfs_iop_setattr(struct dentry *dentry, struct iattr *iattr) setattr_copy(inode, iattr); out: - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return error; } @@ -189,9 +189,9 @@ int kernfs_iop_getattr(const struct path *path, struct kstat *stat, struct inode *inode = d_inode(path->dentry); struct kernfs_node *kn = inode->i_private; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); kernfs_refresh_inode(kn, inode); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); generic_fillattr(inode, stat); return 0; @@ -281,9 +281,9 @@ int kernfs_iop_permission(struct inode *inode, int mask) kn = inode->i_private; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); kernfs_refresh_inode(kn, inode); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return generic_permission(inode, mask); } diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index 0d48a367231d..a7b0e2074260 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -69,7 +70,7 @@ struct kernfs_super_info { */ const void *ns; - /* anchored at kernfs_root->supers, protected by kernfs_mutex */ + /* anchored at kernfs_root->supers, protected by kernfs_rwsem */ struct list_head node; }; #define kernfs_info(SB) ((struct kernfs_super_info *)(SB->s_fs_info)) @@ -123,7 +124,7 @@ int __kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr); /* * dir.c */ -extern struct mutex kernfs_mutex; +extern struct rw_semaphore kernfs_rwsem; extern const struct dentry_operations kernfs_dops; extern const struct file_operations kernfs_dir_fops; extern const struct inode_operations kernfs_dir_iops; diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index 9dc7e7a64e10..baa4155ba2ed 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -255,9 +255,9 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k sb->s_shrink.seeks = 0; /* get root inode, initialize and unlock it */ - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); inode = kernfs_get_inode(sb, info->root->kn); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); if (!inode) { pr_debug("kernfs: could not get root inode\n"); return -ENOMEM; @@ -344,9 +344,9 @@ int kernfs_get_tree(struct fs_context *fc) } sb->s_flags |= SB_ACTIVE; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); list_add(&info->node, &info->root->supers); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); } fc->root = dget(sb->s_root); @@ -372,9 +372,9 @@ void kernfs_kill_sb(struct super_block *sb) { struct kernfs_super_info *info = kernfs_info(sb); - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); list_del(&info->node); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); /* * Remove the superblock from fs_supers/s_instances diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c index 5432883d819f..7246b470de3c 100644 --- a/fs/kernfs/symlink.c +++ b/fs/kernfs/symlink.c @@ -116,9 +116,9 @@ static int kernfs_getlink(struct inode *inode, char *path) struct kernfs_node *target = kn->symlink.target_kn; int error; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); error = kernfs_get_target_path(parent, target, path); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return error; } From patchwork Tue Dec 22 07:48:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 11986091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64BDAC433E0 for ; Tue, 22 Dec 2020 07:49:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2344D225AB for ; Tue, 22 Dec 2020 07:49:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726010AbgLVHtz convert rfc822-to-8bit (ORCPT ); Tue, 22 Dec 2020 02:49:55 -0500 Received: from us-smtp-delivery-44.mimecast.com ([207.211.30.44]:20556 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725300AbgLVHtz (ORCPT ); Tue, 22 Dec 2020 02:49:55 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-449-G6FqNAIFO-WUMl6zQO5Gcw-1; Tue, 22 Dec 2020 02:48:34 -0500 X-MC-Unique: G6FqNAIFO-WUMl6zQO5Gcw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 64CB11005513; Tue, 22 Dec 2020 07:48:33 +0000 (UTC) Received: from mickey.themaw.net (ovpn-116-49.sin2.redhat.com [10.67.116.49]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0DCC55D6D1; Tue, 22 Dec 2020 07:48:27 +0000 (UTC) Subject: [PATCH 5/6] kernfs: stay in rcu-walk mode if possible From: Ian Kent To: Fox Chen Cc: Tejun Heo , Greg Kroah-Hartman , Rick Lindsley , Al Viro , David Howells , Miklos Szeredi , linux-fsdevel Date: Tue, 22 Dec 2020 15:48:24 +0800 Message-ID: <160862330474.291330.11664503360150456908.stgit@mickey.themaw.net> In-Reply-To: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> References: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> User-Agent: StGit/0.21 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org During path walks in sysfs (kernfs) needing to take a reference to a mount doesn't happen often since the walk won't be crossing mount point boundaries. Also while staying in rcu-walk mode where possible wouldn't normally give much improvement. But when there are many concurrent path walks and there is high d_lock contention dget() will often need to resort to taking a spin lock to get the reference. And that could happen each time the reference is passed from component to component. So, in the high contention case, it will contribute to the contention. Therefore staying in rcu-walk mode when possible will reduce contention. Signed-off-by: Ian Kent --- fs/kernfs/dir.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index fdeae2c6e7ba..50c5c8c886af 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1048,8 +1048,54 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) struct kernfs_node *parent; struct kernfs_node *kn; - if (flags & LOOKUP_RCU) + if (flags & LOOKUP_RCU) { + parent = kernfs_dentry_node(dentry->d_parent); + + /* Directory node changed, no, then don't search? */ + if (!kernfs_dir_changed(parent, dentry)) + return 1; + + kn = kernfs_dentry_node(dentry); + if (!kn) { + /* Negative hashed dentry, tell the VFS to switch to + * ref-walk mode and call us again so that node + * existence can be checked. + */ + if (!d_unhashed(dentry)) + return -ECHILD; + + /* Negative unhashed dentry, this shouldn't happen + * because this case occurs in ref-walk mode after + * dentry allocation which is followed by a call + * to ->loopup(). But if it does happen the dentry + * is surely invalid. + */ + return 0; + } + + /* Since the dentry is positive (we got the kernfs node) a + * kernfs node reference was held at the time. Now if the + * dentry reference count is still greater than 0 it's still + * positive so take a reference to the node to perform an + * active check. + */ + if (d_count(dentry) <= 0 || !atomic_inc_not_zero(&kn->count)) + return -ECHILD; + + /* The kernfs node reference count was greater than 0, if + * it's active continue in rcu-walk mode. + */ + if (kernfs_active_read(kn)) { + kernfs_put(kn); + return 1; + } + + /* Otherwise, just tell the VFS to switch to ref-walk mode + * and call us again so the kernfs node can be validated. + */ + kernfs_put(kn); return -ECHILD; + } down_read(&kernfs_rwsem); From patchwork Tue Dec 22 07:48:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 11986093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82137C433DB for ; Tue, 22 Dec 2020 07:50:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4434B225AB for ; Tue, 22 Dec 2020 07:50:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726129AbgLVHuL convert rfc822-to-8bit (ORCPT ); Tue, 22 Dec 2020 02:50:11 -0500 Received: from us-smtp-delivery-44.mimecast.com ([207.211.30.44]:46286 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725300AbgLVHuL (ORCPT ); Tue, 22 Dec 2020 02:50:11 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-77-ltgZ50APOYGdCz0bgvCKZA-1; Tue, 22 Dec 2020 02:48:49 -0500 X-MC-Unique: ltgZ50APOYGdCz0bgvCKZA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1418A15727; Tue, 22 Dec 2020 07:48:48 +0000 (UTC) Received: from mickey.themaw.net (ovpn-116-49.sin2.redhat.com [10.67.116.49]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8B12B1B466; Tue, 22 Dec 2020 07:48:41 +0000 (UTC) Subject: [PATCH 6/6] kernfs: add a spinlock to kernfs iattrs for inode updates From: Ian Kent To: Fox Chen Cc: Tejun Heo , Greg Kroah-Hartman , Rick Lindsley , Al Viro , David Howells , Miklos Szeredi , linux-fsdevel Date: Tue, 22 Dec 2020 15:48:39 +0800 Message-ID: <160862331927.291330.16497188823501358991.stgit@mickey.themaw.net> In-Reply-To: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> References: <160862320263.291330.9467216031366035418.stgit@mickey.themaw.net> User-Agent: StGit/0.21 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The inode operations .permission() and .getattr() use the kernfs node write lock but all that's needed is to keep the rb tree stable while updating the inode attributes as well as protecting the update itself against concurrent changes. And .permission() is called frequently during path walks and can cause quite a bit of contention between kernfs node opertations and path walks when the number of concurrant walks is high. To change kernfs_iop_getattr() and kernfs_iop_permission() to take the rw sem read lock instead of the write lock an addtional lock is needed to protect against multiple processes concurrently updating the inode attributes and link count in kernfs_refresh_inode(). So add a spin lock to the kernfs_iattrs structure to protect these inode attributes updates and use it in kernfs_refresh_inode(). Signed-off-by: Ian Kent Reported-by: kernel test robot --- fs/kernfs/inode.c | 11 +++++++---- fs/kernfs/kernfs-internal.h | 1 + 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c index ddaf18198935..f583dde70174 100644 --- a/fs/kernfs/inode.c +++ b/fs/kernfs/inode.c @@ -55,6 +55,7 @@ static struct kernfs_iattrs *__kernfs_iattrs(struct kernfs_node *kn, int alloc) simple_xattrs_init(&kn->iattr->xattrs); atomic_set(&kn->iattr->nr_user_xattrs, 0); atomic_set(&kn->iattr->user_xattr_size, 0); + spin_lock_init(&kn->iattr->inode_lock); out_unlock: ret = kn->iattr; mutex_unlock(&iattr_mutex); @@ -171,6 +172,7 @@ static void kernfs_refresh_inode(struct kernfs_node *kn, struct inode *inode) { struct kernfs_iattrs *attrs = kn->iattr; + spin_lock(&attrs->inode_lock); inode->i_mode = kn->mode; if (attrs) /* @@ -181,6 +183,7 @@ static void kernfs_refresh_inode(struct kernfs_node *kn, struct inode *inode) if (kernfs_type(kn) == KERNFS_DIR) set_nlink(inode, kn->dir.subdirs + 2); + spin_unlock(&attrs->inode_lock); } int kernfs_iop_getattr(const struct path *path, struct kstat *stat, @@ -189,9 +192,9 @@ int kernfs_iop_getattr(const struct path *path, struct kstat *stat, struct inode *inode = d_inode(path->dentry); struct kernfs_node *kn = inode->i_private; - down_write(&kernfs_rwsem); + down_read(&kernfs_rwsem); kernfs_refresh_inode(kn, inode); - up_write(&kernfs_rwsem); + up_read(&kernfs_rwsem); generic_fillattr(inode, stat); return 0; @@ -281,9 +284,9 @@ int kernfs_iop_permission(struct inode *inode, int mask) kn = inode->i_private; - down_write(&kernfs_rwsem); + down_read(&kernfs_rwsem); kernfs_refresh_inode(kn, inode); - up_write(&kernfs_rwsem); + up_read(&kernfs_rwsem); return generic_permission(inode, mask); } diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index a7b0e2074260..184e4424b389 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -29,6 +29,7 @@ struct kernfs_iattrs { struct simple_xattrs xattrs; atomic_t nr_user_xattrs; atomic_t user_xattr_size; + spinlock_t inode_lock; }; /* +1 to avoid triggering overflow warning when negating it */