diff mbox series

efivarfs: fix NULL dereference on resume

Message ID 3e998bf87638a442cbc6864cdcd3d8d9e08ce3e3.camel@HansenPartnership.com (mailing list archive)
State New
Headers show
Series efivarfs: fix NULL dereference on resume | expand

Commit Message

James Bottomley March 18, 2025, 3:06 a.m. UTC
LSMs often inspect the path.mnt of files in the security hooks which
causes a NULL deref in efivarfs_pm_notify because the path is
constructed with a NULL path.mnt.  Fix by obtaining from
vfs_kern_mount() instead and being very careful to ensure that
deactivate_super() (potentially caused by a racing userspace umount) is
not called directly from the notifier because it would deadlock when
efivarfs_kill_sb tried to unregister the notifier chain.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 fs/efivarfs/super.c | 50 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

Comments

Al Viro March 18, 2025, 3:37 a.m. UTC | #1
On Mon, Mar 17, 2025 at 11:06:01PM -0400, James Bottomley wrote:

> +	/* ensure single superblock is alive and pin it */
> +	if (!atomic_inc_not_zero(&s->s_active))
> +		return NOTIFY_DONE;
> +
>  	pr_info("efivarfs: resyncing variable state\n");
>  
> -	/* O_NOATIME is required to prevent oops on NULL mnt */
> +	path.dentry = sfi->sb->s_root;
> +
> +	/*
> +	 * do not add SB_KERNMOUNT which a single superblock could
> +	 * expose to userspace and which also causes MNT_INTERNAL, see
> +	 * below
> +	 */
> +	path.mnt = vfs_kern_mount(&efivarfs_type, 0,
> +				  efivarfs_type.name, NULL);

Umm...  That's probably safe, but not as a long-term solution -
it's too intimately dependent upon fs/super.c internals.
The reasons why you can't run into ->s_umount deadlock here
are non-trivial...
Ard Biesheuvel March 18, 2025, 7:04 a.m. UTC | #2
On Tue, 18 Mar 2025 at 04:37, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Mar 17, 2025 at 11:06:01PM -0400, James Bottomley wrote:
>
> > +     /* ensure single superblock is alive and pin it */
> > +     if (!atomic_inc_not_zero(&s->s_active))
> > +             return NOTIFY_DONE;
> > +
> >       pr_info("efivarfs: resyncing variable state\n");
> >
> > -     /* O_NOATIME is required to prevent oops on NULL mnt */
> > +     path.dentry = sfi->sb->s_root;
> > +
> > +     /*
> > +      * do not add SB_KERNMOUNT which a single superblock could
> > +      * expose to userspace and which also causes MNT_INTERNAL, see
> > +      * below
> > +      */
> > +     path.mnt = vfs_kern_mount(&efivarfs_type, 0,
> > +                               efivarfs_type.name, NULL);
>
> Umm...  That's probably safe, but not as a long-term solution -
> it's too intimately dependent upon fs/super.c internals.
> The reasons why you can't run into ->s_umount deadlock here
> are non-trivial...

Thanks - I'll incorporate this observation in the commit log, if you don't mind.

To me, it seems rather counter-intuitive that we need a second mount
in order to be able to implement this. Synchronizing the efivarfs
contents with the backing store after an event that may have modified
the latter is only needed when it is mounted to begin with, and as a
VFS non-expert, I struggle to understand why it is a) ok and b)
preferred to create a new mount to pass to kernel_file_open(). Could
we add a paragraph to the commit log that explains this?

But if the VFS experts agree that this is a reasonable band-aid for
the time being, I will take the changes themselves as they are. I
intend to send this out asap as a fix for the v6.14 release.
Al Viro March 18, 2025, 7:49 a.m. UTC | #3
On Tue, Mar 18, 2025 at 08:04:59AM +0100, Ard Biesheuvel wrote:

> the latter is only needed when it is mounted to begin with, and as a
> VFS non-expert, I struggle to understand why it is a) ok and b)
> preferred to create a new mount to pass to kernel_file_open(). Could
> we add a paragraph to the commit log that explains this?

I'm not at all convinced that iterate_dir() is the right thing to use
there, but *IF* we go that way, yes, we need a reference to struct
mount.  We are not going to introduce a very special kind of struct
file, along with the arseloads of checking for that crap all over the
place - not for the sake of one weird case in one weird filesystem.

file->f_path is a valid struct path, which means that ->mnt->mnt_sb == ->dentry->d_sb
and refcount of ->mnt is positive as long as struct file exists.

Keeping a persistent internal struct mount is, of course, possible,
but it will make the damn thing impossible to rmmod, etc. - it will
remain in place until the reboot.

It might be possible to put together something like "grab a reference
to superblock and allocate a temporary struct mount refering to it"
(which is what that vfs_kern_mount() boils down to).  But I would
very much prefer to have it go over the list of children of ->s_root
manually, instead of playing silly buggers with iterate_dir().

And yes, it would require exclusion with dissolving dentry tree on
umount, for obvious reasons.  Which might be done with ->s_active
or simply by unregistering that notifier chain as the very first step
in ->kill_sb() there.
James Bottomley March 18, 2025, 12:15 p.m. UTC | #4
On Tue, 2025-03-18 at 07:49 +0000, Al Viro wrote:
> On Tue, Mar 18, 2025 at 08:04:59AM +0100, Ard Biesheuvel wrote:
> 
> > the latter is only needed when it is mounted to begin with, and as
> > a VFS non-expert, I struggle to understand why it is a) ok and b)
> > preferred to create a new mount to pass to kernel_file_open().
> > Could we add a paragraph to the commit log that explains this?
> 
> I'm not at all convinced that iterate_dir() is the right thing to use
> there, but *IF* we go that way, yes, we need a reference to struct
> mount.  We are not going to introduce a very special kind of struct
> file, along with the arseloads of checking for that crap all over the
> place - not for the sake of one weird case in one weird filesystem.
> 
> file->f_path is a valid struct path, which means that ->mnt->mnt_sb
> == ->dentry->d_sb and refcount of ->mnt is positive as long as struct
> file exists.
> 
> Keeping a persistent internal struct mount is, of course, possible,
> but it will make the damn thing impossible to rmmod, etc. - it will
> remain in place until the reboot.

Right, that's the configfs problem and we do want the module to be
removable (and not occupying memory) when it's not mounted in
userspace.

> It might be possible to put together something like "grab a reference
> to superblock and allocate a temporary struct mount refering to it"
> (which is what that vfs_kern_mount() boils down to).  But I would
> very much prefer to have it go over the list of children of ->s_root
> manually, instead of playing silly buggers with iterate_dir().

I did think of that ... how hard, after all can it be just to traverse
all the single level children (we never do subdirectories in efivarfs).
However, doing that seemed to involve replicating the whole of
libfs.c:dcache_readdir() and scan_positives() plus the cursor
allocation it uses is currently marked as an internal interface.  If
there's a simpler way of doing it, I'm all ears, but the code in
libfs.c is subtle and complex and should probably stay there.

So I think the only route to this would be to extract most of the guts
of dcache_readdir() into a helper function that takes a callback where
it currently does dir_emit() and expose that for filesystem use.  Is
that a route you'd like me to investigate?

> And yes, it would require exclusion with dissolving dentry tree on
> umount, for obvious reasons.  Which might be done with ->s_active
> or simply by unregistering that notifier chain as the very first step
> in ->kill_sb() there.

I can move it above kill_litter_super(), sure.  However, the check on
s_active at the top of efivarfs_pm_notify() should ensure nothing
dissolves the tree until we're finished.

Regards,

James
Christian Brauner March 18, 2025, 2:13 p.m. UTC | #5
> (which is what that vfs_kern_mount() boils down to).  But I would
> very much prefer to have it go over the list of children of ->s_root
> manually, instead of playing silly buggers with iterate_dir().

This is what I suggested earlier in the thread to rewrite it to not rely
on files. That's probably the full-on fix we want later.
James Bottomley March 18, 2025, 2:52 p.m. UTC | #6
On Tue, 2025-03-18 at 15:13 +0100, Christian Brauner wrote:
> > (which is what that vfs_kern_mount() boils down to).  But I would
> > very much prefer to have it go over the list of children of -
> > >s_root manually, instead of playing silly buggers with
> > iterate_dir().
> 
> This is what I suggested earlier in the thread to rewrite it to not
> rely on files. That's probably the full-on fix we want later.

So as I said in the other reply, I think I can do that as a libfs.c
helper.  However, it definitely won't be -fixes material and since this
is a current bug, is it OK if the patch goes up as is and I'll do the
rewrite for the merge window branch (and probably the next merge window
not this one, given where we are)?  If I haven't got something by
LSF/MM you can yell at me in person ...

Regards,

James
diff mbox series

Patch

diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c
index 6eae8cf655c1..81b3c6b7e100 100644
--- a/fs/efivarfs/super.c
+++ b/fs/efivarfs/super.c
@@ -474,12 +474,25 @@  static int efivarfs_check_missing(efi_char16_t *name16, efi_guid_t vendor,
 	return err;
 }
 
+static void efivarfs_deactivate_super_work(struct work_struct *work)
+{
+	struct super_block *s = container_of(work, struct super_block,
+					     destroy_work);
+	/*
+	 * note: here s->destroy_work is free for reuse (which
+	 * will happen in deactivate_super)
+	 */
+	deactivate_super(s);
+}
+
+static struct file_system_type efivarfs_type;
+
 static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action,
 			      void *ptr)
 {
 	struct efivarfs_fs_info *sfi = container_of(nb, struct efivarfs_fs_info,
 						    pm_nb);
-	struct path path = { .mnt = NULL, .dentry = sfi->sb->s_root, };
+	struct path path;
 	struct efivarfs_ctx ectx = {
 		.ctx = {
 			.actor	= efivarfs_actor,
@@ -487,6 +500,7 @@  static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action,
 		.sb = sfi->sb,
 	};
 	struct file *file;
+	struct super_block *s = sfi->sb;
 	static bool rescan_done = true;
 
 	if (action == PM_HIBERNATION_PREPARE) {
@@ -499,11 +513,43 @@  static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action,
 	if (rescan_done)
 		return NOTIFY_DONE;
 
+	/* ensure single superblock is alive and pin it */
+	if (!atomic_inc_not_zero(&s->s_active))
+		return NOTIFY_DONE;
+
 	pr_info("efivarfs: resyncing variable state\n");
 
-	/* O_NOATIME is required to prevent oops on NULL mnt */
+	path.dentry = sfi->sb->s_root;
+
+	/*
+	 * do not add SB_KERNMOUNT which a single superblock could
+	 * expose to userspace and which also causes MNT_INTERNAL, see
+	 * below
+	 */
+	path.mnt = vfs_kern_mount(&efivarfs_type, 0,
+				  efivarfs_type.name, NULL);
+	if (IS_ERR(path.mnt)) {
+		pr_err("efivarfs: internal mount failed\n");
+		/*
+		 * We may be the last pinner of the superblock but
+		 * calling efivarfs_kill_sb from within the notifier
+		 * here would deadlock trying to unregister it
+		 */
+		INIT_WORK(&s->destroy_work, efivarfs_deactivate_super_work);
+		schedule_work(&s->destroy_work);
+		return PTR_ERR(path.mnt);
+	}
+
+	/* path.mnt now has pin on superblock, so this must be above one */
+	atomic_dec(&s->s_active);
+
 	file = kernel_file_open(&path, O_RDONLY | O_DIRECTORY | O_NOATIME,
 				current_cred());
+	/*
+	 * safe even if last put because no MNT_INTERNAL means this
+	 * will do delayed deactivate_super and not deadlock
+	 */
+	mntput(path.mnt);
 	if (IS_ERR(file))
 		return NOTIFY_DONE;