diff mbox series

[11/19] fs: add new shutdown_sb and free_sb methods

Message ID 20230913111013.77623-12-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [01/19] fs: reflow deactivate_locked_super | expand

Commit Message

Christoph Hellwig Sept. 13, 2023, 11:10 a.m. UTC
Currently super_blocks are shut down using the ->kill_sb method, which
must call generic_shutdown_super, but allows the file system to
add extra work before or after the call to generic_shutdown_super.

File systems tend to get rather confused by this, so add an alternative
shutdown sequence where generic_shutdown_super is called by the core
code, and there are extra ->shutdown_sb and ->free_sb hooks before and
after it.  To remove the amount of boilerplate code ->shutdown_sb is only
called if the super has finished initialization and ->d_root is set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/filesystems/locking.rst |  4 ++++
 Documentation/filesystems/vfs.rst     | 12 ++++++++++++
 fs/super.c                            |  9 +++++++--
 include/linux/fs.h                    |  2 ++
 4 files changed, 25 insertions(+), 2 deletions(-)

Comments

Al Viro Sept. 14, 2023, 2:07 a.m. UTC | #1
On Wed, Sep 13, 2023 at 08:10:05AM -0300, Christoph Hellwig wrote:
> Currently super_blocks are shut down using the ->kill_sb method, which
> must call generic_shutdown_super, but allows the file system to
> add extra work before or after the call to generic_shutdown_super.
> 
> File systems tend to get rather confused by this, so add an alternative
> shutdown sequence where generic_shutdown_super is called by the core
> code, and there are extra ->shutdown_sb and ->free_sb hooks before and
> after it.  To remove the amount of boilerplate code ->shutdown_sb is only
> called if the super has finished initialization and ->d_root is set.

The last sentence doesn't match the patchset.  That aside, there
is an issue with method names.

->shutdown_sb() is... odd.  ->begin_shutdown_sb(), perhaps?  For the
majority of filesystems it's NULL, after all...

Worse, ->free_sb() is seriously misguiding - the name implies that
we are, well, freeing a superblock passed to it.  Which is not what is
happening here - superblock itself is freed only when all passive
references go away.  It's asking for trouble down the road.

We already have more than enough confusion in the area.  Note, BTW,
that there's a delicate issue around RCU accesses and freeing stuff -
->d_compare() can bloody well be called when superblock is getting
shut down.  For anything that might be needed by it (or by other
RCU'd methods) we must arrange for RCU-delayed destruction.
E.g. in case of fatfs we have sbi freeing done via call_rcu() (from
fat_put_super(), called by generic_shutdown_super()).

<checks the current tree>

Oh, bugger...  AFAICS, exfat has a problem - exfat_free_sbi() is called
directly from exfat_kill_sb(), without any concern for this:
static int exfat_utf8_d_cmp(const struct dentry *dentry, unsigned int len,
                const char *str, const struct qstr *name)
{
        struct super_block *sb = dentry->d_sb;
        unsigned int alen = exfat_striptail_len(name->len, name->name,
                                EXFAT_SB(sb)->options.keep_last_dots);

That kfree() needs to be RCU-delayed...  While we are at it, there's
this:
static int exfat_d_hash(const struct dentry *dentry, struct qstr *qstr)
{
        struct super_block *sb = dentry->d_sb;
        struct nls_table *t = EXFAT_SB(sb)->nls_io;
and we need this
        unload_nls(sbi->nls_io);
in exfat_put_super() RCU-delayed as well.  And I suspect that
        exfat_free_upcase_table(sbi);
right after it needs the same treatment.

AFFS: similar problem, wants ->s_fs_info freeing RCU-delayed.

hfsplus: similar, including non-delayed unlock_nls() calls.

ntfs3:
        /*
         * Try slow way with current upcase table
         */
        sbi = dentry->d_sb->s_fs_info;
        uni1 = __getname();
        if (!uni1)
                return -ENOMEM;
__getname().  "Give me a page and you might block, while you are
at it".  Done from ->d_compare().  Called under dentry->d_lock
and rcu_read_lock().  OK, any further investigation of that
one is... probably not worth bothering with at that point.

Other in-tree instances appear to be correct.  I'll push fixes for
those (well, ntfs3 aside) out tomorrow.
diff mbox series

Patch

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 7be2900806c853..c33e2f03ed1f69 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -220,7 +220,9 @@  prototypes::
 
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);
+	void (*shutdown_sb) (struct super_block *);
 	void (*kill_sb) (struct super_block *);
+	void (*free_sb) (struct super_block *);
 
 locking rules:
 
@@ -228,7 +230,9 @@  locking rules:
 ops		may block
 =======		=========
 mount		yes
+shutdown_sb	yes
 kill_sb		yes
+free_sb		yes
 =======		=========
 
 ->mount() returns ERR_PTR or the root dentry; its superblock should be locked
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 99acc2e9867391..1a7c6926c31f34 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -119,7 +119,9 @@  members are defined:
 		const struct fs_parameter_spec *parameters;
 		struct dentry *(*mount) (struct file_system_type *, int,
 			const char *, void *);
+		void (*shutdown_sb) (struct super_block *);
 		void (*kill_sb) (struct super_block *);
+		void (*free_sb) (struct super_block *);
 		struct module *owner;
 		struct file_system_type * next;
 		struct hlist_head fs_supers;
@@ -155,10 +157,20 @@  members are defined:
 	the method to call when a new instance of this filesystem should
 	be mounted
 
+``shutdown_sb``
+	Cleanup after a super_block has reached a zero active count, and before
+	the VFS level cleanup happens.  Typical picks all fs-specific objects
+	(if any) that need destruction out of superblock and releases them.
+	Note: dentries and inodes are normally taken care of and do not need
+	specific handling unless they are pinned by kernel users.
+
 ``kill_sb``
 	the method to call when an instance of this filesystem should be
 	shut down
 
+``free_sb``
+	Free file system specific resources like sb->s_fs_info that are
+	still needed while inodes are freed during umount.
 
 ``owner``
 	for internal VFS use: you should initialize this to THIS_MODULE
diff --git a/fs/super.c b/fs/super.c
index 5c685b4944c2d6..8e173eccc8c113 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -480,10 +480,15 @@  void deactivate_locked_super(struct super_block *s)
 
 	unregister_shrinker(&s->s_shrink);
 
-	if (fs->kill_sb)
+	if (fs->kill_sb) {
 		fs->kill_sb(s);
-	else
+	} else {
+		if (fs->shutdown_sb)
+			fs->shutdown_sb(s);
 		generic_shutdown_super(s);
+		if (fs->free_sb)
+			fs->free_sb(s);
+	}
 
 	kill_super_notify(s);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 31b6b235b36efa..12fff7df3cc46b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2341,6 +2341,8 @@  struct file_system_type {
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);
 	void (*kill_sb) (struct super_block *);
+	void (*shutdown_sb)(struct super_block *sb);
+	void (*free_sb)(struct super_block *sb);
 	struct module *owner;
 	struct file_system_type * next;
 	struct hlist_head fs_supers;