[11/19] fs: add new shutdown_sb and free_sb methods

Message ID	20230913111013.77623-12-hch@lst.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@vger.kernel.org> From: Christoph Hellwig <hch@lst.de> To: Christian Brauner <brauner@kernel.org>, Al Viro <viro@zeniv.linux.org.uk> Cc: Heiko Carstens <hca@linux.ibm.com>, Vasily Gorbik <gor@linux.ibm.com>, Alexander Gordeev <agordeev@linux.ibm.com>, Fenghua Yu <fenghua.yu@intel.com>, Reinette Chatre <reinette.chatre@intel.com>, Miquel Raynal <miquel.raynal@bootlin.com>, Richard Weinberger <richard@nod.at>, Vignesh Raghavendra <vigneshr@ti.com>, Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>, Tejun Heo <tj@kernel.org>, Trond Myklebust <trond.myklebust@hammerspace.com>, Anna Schumaker <anna@kernel.org>, Kees Cook <keescook@chromium.org>, Damien Le Moal <dlemoal@kernel.org>, Naohiro Aota <naohiro.aota@wdc.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org, linux-hardening@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH 11/19] fs: add new shutdown_sb and free_sb methods Date: Wed, 13 Sep 2023 08:10:05 -0300 Message-Id: <20230913111013.77623-12-hch@lst.de> In-Reply-To: <20230913111013.77623-1-hch@lst.de> References: <20230913111013.77623-1-hch@lst.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[01/19] fs: reflow deactivate_locked_super \| expand [01/19] fs: reflow deactivate_locked_super [02/19] fs: make ->kill_sb optional [03/19] fs: release anon dev_t in deactivate_locked_super [04/19] NFS: remove the s_dev field from struct nfs_server [05/19] fs: assign an anon dev_t in common code [06/19] qibfs: use simple_release_fs [07/19] hypfs: use d_genocide to kill fs entries [08/19] pstore: shrink the pstore_sb_lock critical section in pstore_kill_sb [09/19] zonefs: remove duplicate cleanup in zonefs_fill_super [10/19] USB: gadget/legacy: remove sb_mutex [11/19] fs: add new shutdown_sb and free_sb methods [12/19] fs: convert kill_litter_super to litter_shutdown_sb [13/19] fs: convert kill_block_super to block_free_sb [14/19] jffs2: convert to ->shutdown_sb and ->free_sb [15/19] kernfs: split ->kill_sb [16/19] x86/resctrl: release rdtgroup_mutex and the CPU hotplug lock in rdt_shutdown_sb [17/19] NFS: move nfs_kill_super to fs_context.c [18/19] fs: simple ->shutdown_sb and ->free_sb conversions [19/19] fs: remove ->kill_sb

Message ID

20230913111013.77623-12-hch@lst.de (mailing list archive)

State

New, archived

Headers

From: Christoph Hellwig <hch@lst.de>
To: Christian Brauner <brauner@kernel.org>,
        Al Viro <viro@zeniv.linux.org.uk>
Cc: Heiko Carstens <hca@linux.ibm.com>,
        Vasily Gorbik <gor@linux.ibm.com>,
        Alexander Gordeev <agordeev@linux.ibm.com>,
        Fenghua Yu <fenghua.yu@intel.com>,
        Reinette Chatre <reinette.chatre@intel.com>,
        Miquel Raynal <miquel.raynal@bootlin.com>,
        Richard Weinberger <richard@nod.at>,
        Vignesh Raghavendra <vigneshr@ti.com>,
        Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
        Tejun Heo <tj@kernel.org>,
        Trond Myklebust <trond.myklebust@hammerspace.com>,
        Anna Schumaker <anna@kernel.org>,
        Kees Cook <keescook@chromium.org>,
        Damien Le Moal <dlemoal@kernel.org>,
        Naohiro Aota <naohiro.aota@wdc.com>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
        linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org,
        linux-nfs@vger.kernel.org, linux-hardening@vger.kernel.org,
        cgroups@vger.kernel.org
Subject: [PATCH 11/19] fs: add new shutdown_sb and free_sb methods
Date: Wed, 13 Sep 2023 08:10:05 -0300
Message-Id: <20230913111013.77623-12-hch@lst.de>
In-Reply-To: <20230913111013.77623-1-hch@lst.de>
References: <20230913111013.77623-1-hch@lst.de>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

[01/19] fs: reflow deactivate_locked_super | expand

Commit Message

Christoph Hellwig Sept. 13, 2023, 11:10 a.m. UTC

Currently super_blocks are shut down using the ->kill_sb method, which
must call generic_shutdown_super, but allows the file system to
add extra work before or after the call to generic_shutdown_super.

File systems tend to get rather confused by this, so add an alternative
shutdown sequence where generic_shutdown_super is called by the core
code, and there are extra ->shutdown_sb and ->free_sb hooks before and
after it.  To remove the amount of boilerplate code ->shutdown_sb is only
called if the super has finished initialization and ->d_root is set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/filesystems/locking.rst |  4 ++++
 Documentation/filesystems/vfs.rst     | 12 ++++++++++++
 fs/super.c                            |  9 +++++++--
 include/linux/fs.h                    |  2 ++
 4 files changed, 25 insertions(+), 2 deletions(-)

Comments

Al Viro Sept. 14, 2023, 2:07 a.m. UTC | #1

On Wed, Sep 13, 2023 at 08:10:05AM -0300, Christoph Hellwig wrote:
> Currently super_blocks are shut down using the ->kill_sb method, which
> must call generic_shutdown_super, but allows the file system to
> add extra work before or after the call to generic_shutdown_super.
> 
> File systems tend to get rather confused by this, so add an alternative
> shutdown sequence where generic_shutdown_super is called by the core
> code, and there are extra ->shutdown_sb and ->free_sb hooks before and
> after it.  To remove the amount of boilerplate code ->shutdown_sb is only
> called if the super has finished initialization and ->d_root is set.

The last sentence doesn't match the patchset.  That aside, there
is an issue with method names.

->shutdown_sb() is... odd.  ->begin_shutdown_sb(), perhaps?  For the
majority of filesystems it's NULL, after all...

Worse, ->free_sb() is seriously misguiding - the name implies that
we are, well, freeing a superblock passed to it.  Which is not what is
happening here - superblock itself is freed only when all passive
references go away.  It's asking for trouble down the road.

We already have more than enough confusion in the area.  Note, BTW,
that there's a delicate issue around RCU accesses and freeing stuff -
->d_compare() can bloody well be called when superblock is getting
shut down.  For anything that might be needed by it (or by other
RCU'd methods) we must arrange for RCU-delayed destruction.
E.g. in case of fatfs we have sbi freeing done via call_rcu() (from
fat_put_super(), called by generic_shutdown_super()).

<checks the current tree>

Oh, bugger...  AFAICS, exfat has a problem - exfat_free_sbi() is called
directly from exfat_kill_sb(), without any concern for this:
static int exfat_utf8_d_cmp(const struct dentry *dentry, unsigned int len,
                const char *str, const struct qstr *name)
{
        struct super_block *sb = dentry->d_sb;
        unsigned int alen = exfat_striptail_len(name->len, name->name,
                                EXFAT_SB(sb)->options.keep_last_dots);

That kfree() needs to be RCU-delayed...  While we are at it, there's
this:
static int exfat_d_hash(const struct dentry *dentry, struct qstr *qstr)
{
        struct super_block *sb = dentry->d_sb;
        struct nls_table *t = EXFAT_SB(sb)->nls_io;
and we need this
        unload_nls(sbi->nls_io);
in exfat_put_super() RCU-delayed as well.  And I suspect that
        exfat_free_upcase_table(sbi);
right after it needs the same treatment.

AFFS: similar problem, wants ->s_fs_info freeing RCU-delayed.

hfsplus: similar, including non-delayed unlock_nls() calls.

ntfs3:
        /*
         * Try slow way with current upcase table
         */
        sbi = dentry->d_sb->s_fs_info;
        uni1 = __getname();
        if (!uni1)
                return -ENOMEM;
__getname().  "Give me a page and you might block, while you are
at it".  Done from ->d_compare().  Called under dentry->d_lock
and rcu_read_lock().  OK, any further investigation of that
one is... probably not worth bothering with at that point.

Other in-tree instances appear to be correct.  I'll push fixes for
those (well, ntfs3 aside) out tomorrow.

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 7be2900806c853..c33e2f03ed1f69 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -220,7 +220,9 @@  prototypes::
 
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);
+	void (*shutdown_sb) (struct super_block *);
 	void (*kill_sb) (struct super_block *);
+	void (*free_sb) (struct super_block *);
 
 locking rules:
 
@@ -228,7 +230,9 @@  locking rules:
 ops		may block
 =======		=========
 mount		yes
+shutdown_sb	yes
 kill_sb		yes
+free_sb		yes
 =======		=========
 
 ->mount() returns ERR_PTR or the root dentry; its superblock should be locked
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 99acc2e9867391..1a7c6926c31f34 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -119,7 +119,9 @@  members are defined:
 		const struct fs_parameter_spec *parameters;
 		struct dentry *(*mount) (struct file_system_type *, int,
 			const char *, void *);
+		void (*shutdown_sb) (struct super_block *);
 		void (*kill_sb) (struct super_block *);
+		void (*free_sb) (struct super_block *);
 		struct module *owner;
 		struct file_system_type * next;
 		struct hlist_head fs_supers;
@@ -155,10 +157,20 @@  members are defined:
 	the method to call when a new instance of this filesystem should
 	be mounted
 
+``shutdown_sb``
+	Cleanup after a super_block has reached a zero active count, and before
+	the VFS level cleanup happens.  Typical picks all fs-specific objects
+	(if any) that need destruction out of superblock and releases them.
+	Note: dentries and inodes are normally taken care of and do not need
+	specific handling unless they are pinned by kernel users.
+
 ``kill_sb``
 	the method to call when an instance of this filesystem should be
 	shut down
 
+``free_sb``
+	Free file system specific resources like sb->s_fs_info that are
+	still needed while inodes are freed during umount.
 
 ``owner``
 	for internal VFS use: you should initialize this to THIS_MODULE
diff --git a/fs/super.c b/fs/super.c
index 5c685b4944c2d6..8e173eccc8c113 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -480,10 +480,15 @@  void deactivate_locked_super(struct super_block *s)
 
 	unregister_shrinker(&s->s_shrink);
 
-	if (fs->kill_sb)
+	if (fs->kill_sb) {
 		fs->kill_sb(s);
-	else
+	} else {
+		if (fs->shutdown_sb)
+			fs->shutdown_sb(s);
 		generic_shutdown_super(s);
+		if (fs->free_sb)
+			fs->free_sb(s);
+	}
 
 	kill_super_notify(s);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 31b6b235b36efa..12fff7df3cc46b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2341,6 +2341,8 @@  struct file_system_type {
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);
 	void (*kill_sb) (struct super_block *);
+	void (*shutdown_sb)(struct super_block *sb);
+	void (*free_sb)(struct super_block *sb);
 	struct module *owner;
 	struct file_system_type * next;
 	struct hlist_head fs_supers;

[11/19] fs: add new shutdown_sb and free_sb methods

Commit Message

Comments

Patch