Message ID | 7bdcc2c5d8022d2f1a7ec23c0351f7816d4464c8.camel@HansenPartnership.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC,1/1] fix NULL mnt [was Re: apparmor NULL pointer dereference on resume [efivarfs]] | expand |
On Fri, Mar 14, 2025 at 10:59:14AM -0400, James Bottomley wrote: > On Tue, 2025-03-11 at 09:01 -0400, James Bottomley wrote: > > On Tue, 2025-03-11 at 09:45 +0100, Christian Brauner wrote: > [...] > > > But since efivars does only ever have a single global superblock, > > > one possibility is to an internal superblock that always exits and > > > is resurfaced whenever userspace mounts efivarfs. That's > > > essentially the devtmpfs model. > > > > > > Then you can stash: > > > > > > static struct vfsmount *efivarfs_mnt; > > > > > > globally and use that in efivarfs_pm_notify() to fill in struct > > > path. > > > > I didn't see devtmpfs when looking for examples, since it's hiding > > outside of the fs/ directory. However, it does seem to be a bit > > legacy nasty as an example to copy. However, I get the basics: we'd > > instantiate the mnt and superblock on init (stashing mnt in the sfi > > so the notifier gets it). Then we can do the variable population on > > reconfigure, just in case an EFI system doesn't want to mount > > efivarfs to save memory. > > > > I can code that up if I can get an answer to the uid/gid parameter > > question above. > > I coded up the naive implementation and it definitely works, but it > suffers from the problem that everything that pins in the module init > routine (like configfs) does in that once inserted the module can never > be removed. Plus, for efivarfs, we would allocate all resources on > module insertion not on first mount. The final problem we'd have is > that the uid/gid parameters for variable creation would be taken from > the kernel internal mount, so if they got specified on a user mount, > they'd be ignored (because the variable inodes are already created). > > To answer some of your other questions: > > > (1) Is it guaranteed that efivarfs_pm_notify() is only called once a > > superblock exists? > > Yes, as you realized. > > > (2) Is it guaranteed that efivarfs_pm_notify() is only called when > > and while a mount for the superblock exists? > > No, but the behaviour is correct because the notifier needs to update > the variable list and we create the variable list in > efivarfs_fill_super. Now you can argue this is suboptimal because if > userspace didn't ever mount, we'd simply destroy it all again on last > put of the superblock so it's wasted effort, but its function is > correct. > > > Another question is whether the superblock can be freed while > > efivarfs_pm_notify() is running? I think that can't happen because > > blocking_notifier_chain_unregister(&efivar_ops_nh, &sfi->nb) will > > block in efivarfs_kill_sb() until all outstanding calls to > > efivarfs_pm_notify() are finished? > > That's right: a blocking notifier is called under the notifier list > rwsem. It's taken read for calls but write for register/unregister, so > efivarfs_kill_sb would block in the unregister until the call chain was > executed. > > Taking into account the module removal issue, the simplest way I found > to fix the issue was to call vfs_kern_mount() from the notifier to get Yeah, Al had already mentioned that. I initially had the same idea but since I didn't know enough about the notifier block stuff I wasn't sure whether there's some odd deadlock that could be caused by this. > a struct vfsmount before opening the path. We ensure it's gone by > calling mntput immediately after open, but, by that time, the open file > is pinning the vfsmnt if the open was successful. > > If this looks OK to everyone I'll code it up as a fix which can be cc'd > to stable. > > Regards, > > James > > --- > > diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c > index 6eae8cf655c1..e2e6575b5abf 100644 > --- a/fs/efivarfs/super.c > +++ b/fs/efivarfs/super.c > @@ -474,12 +474,14 @@ static int efivarfs_check_missing(efi_char16_t *name16, efi_guid_t vendor, > return err; > } > > +static struct file_system_type efivarfs_type; > + > static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, > void *ptr) > { > struct efivarfs_fs_info *sfi = container_of(nb, struct efivarfs_fs_info, > pm_nb); > - struct path path = { .mnt = NULL, .dentry = sfi->sb->s_root, }; > + struct path path; > struct efivarfs_ctx ectx = { > .ctx = { > .actor = efivarfs_actor, > @@ -501,9 +503,17 @@ static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, > > pr_info("efivarfs: resyncing variable state\n"); > > - /* O_NOATIME is required to prevent oops on NULL mnt */ > + path.dentry = sfi->sb->s_root; > + path.mnt = vfs_kern_mount(&efivarfs_type, SB_KERNMOUNT, > + efivarfs_type.name, NULL); > + if (IS_ERR(path.mnt)) { > + pr_err("efivarfs: internal mount failed\n"); > + return PTR_ERR(path.mnt); > + } I see some issues with this. A umount by another task could already hit: sb->kill_sb == efivarfs_kill_super() which means the superblock is already marked as dying. By calling vfs_kern_mount() unconditionally you end up calling vfs_get_tree() and then get_tree_single() again. That would mean efivarfs_pm_notify() now waits for the old superblock to be dead. But the old superblock waits in efivarfs_kill_sb() for efivarfs_pm_notify() to finish before actually killing the old superblock. So this would deadlock. So you need to make sure that the superbock a) isn't dead and b) doesn't go away behind your back: diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c index 6eae8cf655c1..6a4f95c27697 100644 --- a/fs/efivarfs/super.c +++ b/fs/efivarfs/super.c @@ -474,6 +474,8 @@ static int efivarfs_check_missing(efi_char16_t *name16, efi_guid_t vendor, return err; } +static struct file_system_type efivarfs_type; + static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, void *ptr) { @@ -499,6 +501,31 @@ static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, if (rescan_done) return NOTIFY_DONE; + /* + * Ensure that efivarfs is still alive and cannot go away behind + * our back. + */ + if (!atomic_inc_not_zero(&sfi->sb->s_active)) + return NOTIFY_DONE; + + path.mnt = vfs_kern_mount(&efivarfs_type, SB_KERNMOUNT, + efivarfs_type.name, NULL); Since efivars uses a single global superblock and we know that sfi->sb is still alive (After all we've just pinned it above.) vfs_kern_mount() will reuse the same superblock. There's two cases to consider: (1) vfs_kern_mount() was successful. In this case path->mnt will hold an active superblock reference that will be released asynchronously via __fput(). That is safe and correct. (2) vfs_kern_mount() fails. That's an issue because you need to call deactivate_super() which will have a similar deadlock problem. If efivarfs_pm_notify() now holds the last reference to the superblock then deactivate_super() super will put that last reference and call efivarfs_kill_super() which in turn will wait for efivarfs_pm_notify() to finish. => deadlock So in the error case you need to offload the call to deactivate_super() to a workqueue. diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c index 6eae8cf655c1..288c1dd8622b 100644 --- a/fs/efivarfs/super.c +++ b/fs/efivarfs/super.c @@ -474,6 +474,8 @@ static int efivarfs_check_missing(efi_char16_t *name16, efi_guid_t vendor, return err; } +static struct file_system_type efivarfs_type; + static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, void *ptr) { @@ -499,6 +501,39 @@ static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, if (rescan_done) return NOTIFY_DONE; + /* + * Ensure that efivarfs is still alive and cannot go away behind + * our back. + */ + if (!atomic_inc_not_zero(&sfi->sb->s_active)) + return NOTIFY_DONE; + + path.mnt = vfs_kern_mount(&efivarfs_type, SB_KERNMOUNT, + efivarfs_type.name, NULL); + /* + * Since efivars uses a single global superblock and we know + * that sfi->sb is still alive (After all we've just pinned it + * above.) vfs_kern_mount() will reuse the same superblock. + * + * If vfs_kern_mount() was successful path->mnt will hold an + * active superblock reference that will be released + * asynchronously via __fput(). + * + * If vfs_kern_mount() fails we might be the ones to hold the + * last reference now so we need to call deactivate_super(). But + * we need to ensure that this is done asynchronously so + * efivarfs_kill_super() doesn't deadlock by waiting on + * efivarfs_pm_notify() to finish. + */ + if (IS_ERR(path.mnt)) { + + /* TODO: offload to workqueue so that we don't deadlock. */ + deactivate_super(sfi->sb); + pr_err("efivarfs: internal mount failed\n"); + return PTR_ERR(path.mnt); + } + atomic_dec(&sfi->sb->s_active); + pr_info("efivarfs: resyncing variable state\n"); /* O_NOATIME is required to prevent oops on NULL mnt */
diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c index 6eae8cf655c1..e2e6575b5abf 100644 --- a/fs/efivarfs/super.c +++ b/fs/efivarfs/super.c @@ -474,12 +474,14 @@ static int efivarfs_check_missing(efi_char16_t *name16, efi_guid_t vendor, return err; } +static struct file_system_type efivarfs_type; + static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, void *ptr) { struct efivarfs_fs_info *sfi = container_of(nb, struct efivarfs_fs_info, pm_nb); - struct path path = { .mnt = NULL, .dentry = sfi->sb->s_root, }; + struct path path; struct efivarfs_ctx ectx = { .ctx = { .actor = efivarfs_actor, @@ -501,9 +503,17 @@ static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, pr_info("efivarfs: resyncing variable state\n"); - /* O_NOATIME is required to prevent oops on NULL mnt */ + path.dentry = sfi->sb->s_root; + path.mnt = vfs_kern_mount(&efivarfs_type, SB_KERNMOUNT, + efivarfs_type.name, NULL); + if (IS_ERR(path.mnt)) { + pr_err("efivarfs: internal mount failed\n"); + return PTR_ERR(path.mnt); + } + file = kernel_file_open(&path, O_RDONLY | O_DIRECTORY | O_NOATIME, current_cred()); + mntput(path.mnt); if (IS_ERR(file)) return NOTIFY_DONE;