Message ID | 3e998bf87638a442cbc6864cdcd3d8d9e08ce3e3.camel@HansenPartnership.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | efivarfs: fix NULL dereference on resume | expand |
On Mon, Mar 17, 2025 at 11:06:01PM -0400, James Bottomley wrote: > + /* ensure single superblock is alive and pin it */ > + if (!atomic_inc_not_zero(&s->s_active)) > + return NOTIFY_DONE; > + > pr_info("efivarfs: resyncing variable state\n"); > > - /* O_NOATIME is required to prevent oops on NULL mnt */ > + path.dentry = sfi->sb->s_root; > + > + /* > + * do not add SB_KERNMOUNT which a single superblock could > + * expose to userspace and which also causes MNT_INTERNAL, see > + * below > + */ > + path.mnt = vfs_kern_mount(&efivarfs_type, 0, > + efivarfs_type.name, NULL); Umm... That's probably safe, but not as a long-term solution - it's too intimately dependent upon fs/super.c internals. The reasons why you can't run into ->s_umount deadlock here are non-trivial...
On Tue, 18 Mar 2025 at 04:37, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Mon, Mar 17, 2025 at 11:06:01PM -0400, James Bottomley wrote: > > > + /* ensure single superblock is alive and pin it */ > > + if (!atomic_inc_not_zero(&s->s_active)) > > + return NOTIFY_DONE; > > + > > pr_info("efivarfs: resyncing variable state\n"); > > > > - /* O_NOATIME is required to prevent oops on NULL mnt */ > > + path.dentry = sfi->sb->s_root; > > + > > + /* > > + * do not add SB_KERNMOUNT which a single superblock could > > + * expose to userspace and which also causes MNT_INTERNAL, see > > + * below > > + */ > > + path.mnt = vfs_kern_mount(&efivarfs_type, 0, > > + efivarfs_type.name, NULL); > > Umm... That's probably safe, but not as a long-term solution - > it's too intimately dependent upon fs/super.c internals. > The reasons why you can't run into ->s_umount deadlock here > are non-trivial... Thanks - I'll incorporate this observation in the commit log, if you don't mind. To me, it seems rather counter-intuitive that we need a second mount in order to be able to implement this. Synchronizing the efivarfs contents with the backing store after an event that may have modified the latter is only needed when it is mounted to begin with, and as a VFS non-expert, I struggle to understand why it is a) ok and b) preferred to create a new mount to pass to kernel_file_open(). Could we add a paragraph to the commit log that explains this? But if the VFS experts agree that this is a reasonable band-aid for the time being, I will take the changes themselves as they are. I intend to send this out asap as a fix for the v6.14 release.
On Tue, Mar 18, 2025 at 08:04:59AM +0100, Ard Biesheuvel wrote: > the latter is only needed when it is mounted to begin with, and as a > VFS non-expert, I struggle to understand why it is a) ok and b) > preferred to create a new mount to pass to kernel_file_open(). Could > we add a paragraph to the commit log that explains this? I'm not at all convinced that iterate_dir() is the right thing to use there, but *IF* we go that way, yes, we need a reference to struct mount. We are not going to introduce a very special kind of struct file, along with the arseloads of checking for that crap all over the place - not for the sake of one weird case in one weird filesystem. file->f_path is a valid struct path, which means that ->mnt->mnt_sb == ->dentry->d_sb and refcount of ->mnt is positive as long as struct file exists. Keeping a persistent internal struct mount is, of course, possible, but it will make the damn thing impossible to rmmod, etc. - it will remain in place until the reboot. It might be possible to put together something like "grab a reference to superblock and allocate a temporary struct mount refering to it" (which is what that vfs_kern_mount() boils down to). But I would very much prefer to have it go over the list of children of ->s_root manually, instead of playing silly buggers with iterate_dir(). And yes, it would require exclusion with dissolving dentry tree on umount, for obvious reasons. Which might be done with ->s_active or simply by unregistering that notifier chain as the very first step in ->kill_sb() there.
On Tue, 2025-03-18 at 07:49 +0000, Al Viro wrote: > On Tue, Mar 18, 2025 at 08:04:59AM +0100, Ard Biesheuvel wrote: > > > the latter is only needed when it is mounted to begin with, and as > > a VFS non-expert, I struggle to understand why it is a) ok and b) > > preferred to create a new mount to pass to kernel_file_open(). > > Could we add a paragraph to the commit log that explains this? > > I'm not at all convinced that iterate_dir() is the right thing to use > there, but *IF* we go that way, yes, we need a reference to struct > mount. We are not going to introduce a very special kind of struct > file, along with the arseloads of checking for that crap all over the > place - not for the sake of one weird case in one weird filesystem. > > file->f_path is a valid struct path, which means that ->mnt->mnt_sb > == ->dentry->d_sb and refcount of ->mnt is positive as long as struct > file exists. > > Keeping a persistent internal struct mount is, of course, possible, > but it will make the damn thing impossible to rmmod, etc. - it will > remain in place until the reboot. Right, that's the configfs problem and we do want the module to be removable (and not occupying memory) when it's not mounted in userspace. > It might be possible to put together something like "grab a reference > to superblock and allocate a temporary struct mount refering to it" > (which is what that vfs_kern_mount() boils down to). But I would > very much prefer to have it go over the list of children of ->s_root > manually, instead of playing silly buggers with iterate_dir(). I did think of that ... how hard, after all can it be just to traverse all the single level children (we never do subdirectories in efivarfs). However, doing that seemed to involve replicating the whole of libfs.c:dcache_readdir() and scan_positives() plus the cursor allocation it uses is currently marked as an internal interface. If there's a simpler way of doing it, I'm all ears, but the code in libfs.c is subtle and complex and should probably stay there. So I think the only route to this would be to extract most of the guts of dcache_readdir() into a helper function that takes a callback where it currently does dir_emit() and expose that for filesystem use. Is that a route you'd like me to investigate? > And yes, it would require exclusion with dissolving dentry tree on > umount, for obvious reasons. Which might be done with ->s_active > or simply by unregistering that notifier chain as the very first step > in ->kill_sb() there. I can move it above kill_litter_super(), sure. However, the check on s_active at the top of efivarfs_pm_notify() should ensure nothing dissolves the tree until we're finished. Regards, James
> (which is what that vfs_kern_mount() boils down to). But I would > very much prefer to have it go over the list of children of ->s_root > manually, instead of playing silly buggers with iterate_dir(). This is what I suggested earlier in the thread to rewrite it to not rely on files. That's probably the full-on fix we want later.
On Tue, 2025-03-18 at 15:13 +0100, Christian Brauner wrote: > > (which is what that vfs_kern_mount() boils down to). But I would > > very much prefer to have it go over the list of children of - > > >s_root manually, instead of playing silly buggers with > > iterate_dir(). > > This is what I suggested earlier in the thread to rewrite it to not > rely on files. That's probably the full-on fix we want later. So as I said in the other reply, I think I can do that as a libfs.c helper. However, it definitely won't be -fixes material and since this is a current bug, is it OK if the patch goes up as is and I'll do the rewrite for the merge window branch (and probably the next merge window not this one, given where we are)? If I haven't got something by LSF/MM you can yell at me in person ... Regards, James
diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c index 6eae8cf655c1..81b3c6b7e100 100644 --- a/fs/efivarfs/super.c +++ b/fs/efivarfs/super.c @@ -474,12 +474,25 @@ static int efivarfs_check_missing(efi_char16_t *name16, efi_guid_t vendor, return err; } +static void efivarfs_deactivate_super_work(struct work_struct *work) +{ + struct super_block *s = container_of(work, struct super_block, + destroy_work); + /* + * note: here s->destroy_work is free for reuse (which + * will happen in deactivate_super) + */ + deactivate_super(s); +} + +static struct file_system_type efivarfs_type; + static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, void *ptr) { struct efivarfs_fs_info *sfi = container_of(nb, struct efivarfs_fs_info, pm_nb); - struct path path = { .mnt = NULL, .dentry = sfi->sb->s_root, }; + struct path path; struct efivarfs_ctx ectx = { .ctx = { .actor = efivarfs_actor, @@ -487,6 +500,7 @@ static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, .sb = sfi->sb, }; struct file *file; + struct super_block *s = sfi->sb; static bool rescan_done = true; if (action == PM_HIBERNATION_PREPARE) { @@ -499,11 +513,43 @@ static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, if (rescan_done) return NOTIFY_DONE; + /* ensure single superblock is alive and pin it */ + if (!atomic_inc_not_zero(&s->s_active)) + return NOTIFY_DONE; + pr_info("efivarfs: resyncing variable state\n"); - /* O_NOATIME is required to prevent oops on NULL mnt */ + path.dentry = sfi->sb->s_root; + + /* + * do not add SB_KERNMOUNT which a single superblock could + * expose to userspace and which also causes MNT_INTERNAL, see + * below + */ + path.mnt = vfs_kern_mount(&efivarfs_type, 0, + efivarfs_type.name, NULL); + if (IS_ERR(path.mnt)) { + pr_err("efivarfs: internal mount failed\n"); + /* + * We may be the last pinner of the superblock but + * calling efivarfs_kill_sb from within the notifier + * here would deadlock trying to unregister it + */ + INIT_WORK(&s->destroy_work, efivarfs_deactivate_super_work); + schedule_work(&s->destroy_work); + return PTR_ERR(path.mnt); + } + + /* path.mnt now has pin on superblock, so this must be above one */ + atomic_dec(&s->s_active); + file = kernel_file_open(&path, O_RDONLY | O_DIRECTORY | O_NOATIME, current_cred()); + /* + * safe even if last put because no MNT_INTERNAL means this + * will do delayed deactivate_super and not deadlock + */ + mntput(path.mnt); if (IS_ERR(file)) return NOTIFY_DONE;
LSMs often inspect the path.mnt of files in the security hooks which causes a NULL deref in efivarfs_pm_notify because the path is constructed with a NULL path.mnt. Fix by obtaining from vfs_kern_mount() instead and being very careful to ensure that deactivate_super() (potentially caused by a racing userspace umount) is not called directly from the notifier because it would deadlock when efivarfs_kill_sb tried to unregister the notifier chain. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> --- fs/efivarfs/super.c | 50 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 48 insertions(+), 2 deletions(-)