Message ID | 218e806e61cd5ae2fd38f9d546f953f86c763b58.1542149969.git.rgb@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | audit: avoid umount hangs on missing mount | expand |
On Fri, Nov 16, 2018 at 6:34 PM Richard Guy Briggs <rgb@redhat.com> wrote: > > Don't fetch fcaps when umount2 is called with MNT_FORCE to avoid a > process hang while it waits for the missing resource to (possibly never) > re-appear. The patch would be pretty good if the dependence on MNT_FORCE wasn't added. As it is, it's buggy in more ways than one: - It does the opposite of the above (i.e. skips fcaps *unless* MNT_FORCE is set) - sets LOOKUP_NO_REVAL from caller of path lookup, which is invalid (LOOKUP_NO_REVAL is used only internally by path lookup) - the fact that *_path_mountpoint_at() shouldn't touch the mount root is independent of MNT_FORCE I still don't quite understand what audit is trying to do here, but apparently it's okay to skip getxattr in the MNT_FORCE case. So why is it not okay to skip it in the non-MNT_FORCE case? Thanks, Miklos
On 2018-11-19 13:47, Miklos Szeredi wrote: > On Fri, Nov 16, 2018 at 6:34 PM Richard Guy Briggs <rgb@redhat.com> wrote: > > > > Don't fetch fcaps when umount2 is called with MNT_FORCE to avoid a > > process hang while it waits for the missing resource to (possibly never) > > re-appear. > > The patch would be pretty good if the dependence on MNT_FORCE wasn't > added. As it is, it's buggy in more ways than one: > > - It does the opposite of the above (i.e. skips fcaps *unless* > MNT_FORCE is set) I agree it looks wrong now that I look at it. It turns out my test case didn't trigger it properly since "umount -l" doesn't set MNT_FORCE while it needs "-f" to do so. This is unacceptable since "-l" needs to work in this situation too. > - sets LOOKUP_NO_REVAL from caller of path lookup, which is invalid > (LOOKUP_NO_REVAL is used only internally by path lookup) Fair enough. 949a852e46dd viro 2016-03-14 ("namei: teach lookup_slow() to skip revalidate") needs a comment update. Maybe my patch was interacting with this one and changing the behaviour I expected. > - the fact that *_path_mountpoint_at() shouldn't touch the mount root > is independent of MNT_FORCE We don't entirely agree here since I'm still aiming for a best effort to collect this information for the PATH record, but that may be misleading at best. > I still don't quite understand what audit is trying to do here, but > apparently it's okay to skip getxattr in the MNT_FORCE case. So why > is it not okay to skip it in the non-MNT_FORCE case? The simple answer is that the audit PATH record format expects the four cap_f* fields to be there and a best effort is being attempted to fill in that information in an expected way with meaningful values. Perhaps better to accept that it is unreasonable to expect any fcaps on any umount operation and simply ignore those fields in the PATH record for umount syscall events. This is really a problem the audit folks have backed ourselves into. This was introduced by 851f7ff56d9c ("This patch will print cap_permitted and cap_inheritable data in the PATH...") The fcaps are only really needed for the case of an event that changes fcaps. In that case, the fcaps should have been added as a seperate audit record to accompany this event as necessary, rather than included in the PATH record that is shared by multiple event types, most of which do not change the fcaps. There has been significant and ongoing effort to normalize all audit record types so that they contain predictable fields in a predictable order without any fields that swing in and out since this makes userspace audit record parsers faster and more reliable. My preferred solution would be to in fact remove these four cap_f* fields from the PATH record and put them in a new record that was only included when the event is relevant and the values are non-zero. This isn't an option with current upstream kernel audit devel policy. Thanks Miklos for taking the time to provide feedback on this patch. > Thanks, > Miklos - RGB -- Richard Guy Briggs <rgb@redhat.com> Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635
On Mon, Nov 19, 2018 at 11:59 PM Richard Guy Briggs <rgb@redhat.com> wrote: > The simple answer is that the audit PATH record format expects the four > cap_f* fields to be there and a best effort is being attempted to fill > in that information in an expected way with meaningful values. Perhaps > better to accept that it is unreasonable to expect any fcaps on any > umount operation and simply ignore those fields in the PATH record for > umount syscall events. When there's a mount there are in fact two objects belonging to the exact same path, each having completely independent metadata: the mount point and the root of the mount. For example: stat /mnt umount /mnt stat /mnt The first stat will show the root of the mount, the second one will show the mount point. Which one is the relevant for audit? Not saying audit should be doing getxattr on any of them, just trying to see more clearly. Thanks, Miklos
On 2018-11-20 09:17, Miklos Szeredi wrote: > On Mon, Nov 19, 2018 at 11:59 PM Richard Guy Briggs <rgb@redhat.com> wrote: > > > The simple answer is that the audit PATH record format expects the four > > cap_f* fields to be there and a best effort is being attempted to fill > > in that information in an expected way with meaningful values. Perhaps > > better to accept that it is unreasonable to expect any fcaps on any > > umount operation and simply ignore those fields in the PATH record for > > umount syscall events. > > When there's a mount there are in fact two objects belonging to the > exact same path, each having completely independent metadata: the > mount point and the root of the mount. For example: > > stat /mnt > umount /mnt > stat /mnt > > The first stat will show the root of the mount, the second one will > show the mount point. > Which one is the relevant for audit? It would be the root of the mount, the one that is visible to processes in that mount namespace. Obviously, once that mount has been unmounted, it would be the mount point (no longer in use as such at that point) that is of interest. On mounting, I'm guessing both would be of interest if the fcaps changed for that process-visible path in that mount namespace, so this provides an additional operation that would need recording aside from the case of a simple attribute change. > Not saying audit should be doing getxattr on any of them, just trying > to see more clearly. > > Thanks, > Miklos - RGB -- Richard Guy Briggs <rgb@redhat.com> Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635
On Tuesday, November 20, 2018 10:48:20 AM EST Richard Guy Briggs wrote: > On 2018-11-20 09:17, Miklos Szeredi wrote: > > On Mon, Nov 19, 2018 at 11:59 PM Richard Guy Briggs <rgb@redhat.com> wrote: > > > The simple answer is that the audit PATH record format expects the four > > > cap_f* fields to be there and a best effort is being attempted to fill > > > in that information in an expected way with meaningful values. Perhaps > > > better to accept that it is unreasonable to expect any fcaps on any > > > umount operation and simply ignore those fields in the PATH record for > > > umount syscall events. > > > > When there's a mount there are in fact two objects belonging to the > > exact same path, each having completely independent metadata: the > > mount point and the root of the mount. For example: > > > > stat /mnt > > umount /mnt > > stat /mnt > > > > The first stat will show the root of the mount, the second one will > > show the mount point. > > Which one is the relevant for audit? > > It would be the root of the mount, the one that is visible to processes > in that mount namespace. > > Obviously, once that mount has been unmounted, it would be the mount > point (no longer in use as such at that point) that is of interest. > > On mounting, I'm guessing both would be of interest if the fcaps changed > for that process-visible path in that mount namespace, so this provides > an additional operation that would need recording aside from the case > of a simple attribute change. fcaps are on files. Mountpoints are directories. Would fcaps changes be possible? -Steve > > Not saying audit should be doing getxattr on any of them, just trying > > to see more clearly. > > > > Thanks, > > Miklos > > - RGB > > -- > Richard Guy Briggs <rgb@redhat.com> > Sr. S/W Engineer, Kernel Security, Base Operating Systems > Remote, Ottawa, Red Hat Canada > IRC: rgb, SunRaycer > Voice: +1.647.777.2635, Internal: (81) 32635
diff --git a/fs/namei.c b/fs/namei.c index 0cab6494978c..5ac8410b704c 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2720,7 +2720,7 @@ int user_path_at_empty(int dfd, const char __user *name, unsigned flags, if (unlikely(error == -ESTALE)) error = path_mountpoint(&nd, flags | LOOKUP_REVAL, path); if (likely(!error)) - audit_inode(name, path->dentry, 0); + audit_inode(name, path->dentry, flags & LOOKUP_NO_REVAL); restore_nameidata(); putname(name); return error; diff --git a/fs/namespace.c b/fs/namespace.c index 99186556f8d3..5bae5bbd4e1f 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1636,6 +1636,9 @@ int ksys_umount(char __user *name, int flags) if (!(flags & UMOUNT_NOFOLLOW)) lookup_flags |= LOOKUP_FOLLOW; + if (!(flags & MNT_FORCE)) + lookup_flags |= LOOKUP_NO_REVAL; + retval = user_path_mountpoint_at(AT_FDCWD, name, lookup_flags, &path); if (retval) goto out; diff --git a/include/linux/audit.h b/include/linux/audit.h index 9334fbef7bae..503f1710c9d0 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -25,6 +25,7 @@ #include <linux/sched.h> #include <linux/ptrace.h> +#include <linux/namei.h> /* LOOKUP_* */ #include <uapi/linux/audit.h> #define AUDIT_INO_UNSET ((unsigned long)-1) @@ -229,6 +230,7 @@ extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1, #define AUDIT_INODE_PARENT 1 /* dentry represents the parent */ #define AUDIT_INODE_HIDDEN 2 /* audit record should be hidden */ +#define AUDIT_INODE_NOREVAL 4 /* audit record incomplete */ extern void __audit_inode(struct filename *name, const struct dentry *dentry, unsigned int flags); extern void __audit_file(const struct file *); @@ -289,11 +291,13 @@ static inline void audit_getname(struct filename *name) } static inline void audit_inode(struct filename *name, const struct dentry *dentry, - unsigned int parent) { + unsigned int lflags) { if (unlikely(!audit_dummy_context())) { unsigned int flags = 0; - if (parent) + if (lflags & LOOKUP_PARENT) flags |= AUDIT_INODE_PARENT; + if (lflags & LOOKUP_NO_REVAL) + flags |= AUDIT_INODE_NOREVAL; __audit_inode(name, dentry, flags); } } diff --git a/kernel/audit.c b/kernel/audit.c index 2a8058764aa6..45ca6d15ce89 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -2097,7 +2097,7 @@ static inline int audit_copy_fcaps(struct audit_names *name, /* Copy inode data into an audit_names. */ void audit_copy_inode(struct audit_names *name, const struct dentry *dentry, - struct inode *inode) + struct inode *inode, unsigned int flags) { name->ino = inode->i_ino; name->dev = inode->i_sb->s_dev; @@ -2106,7 +2106,8 @@ void audit_copy_inode(struct audit_names *name, const struct dentry *dentry, name->gid = inode->i_gid; name->rdev = inode->i_rdev; security_inode_getsecid(inode, &name->osid); - audit_copy_fcaps(name, dentry); + if (!(flags & AUDIT_INODE_NOREVAL)) + audit_copy_fcaps(name, dentry); } /** diff --git a/kernel/audit.h b/kernel/audit.h index 214e14948370..7db09151c2d0 100644 --- a/kernel/audit.h +++ b/kernel/audit.h @@ -212,7 +212,7 @@ struct audit_context { extern void audit_copy_inode(struct audit_names *name, const struct dentry *dentry, - struct inode *inode); + struct inode *inode, unsigned int flags); extern void audit_log_cap(struct audit_buffer *ab, char *prefix, kernel_cap_t *cap); extern void audit_log_name(struct audit_context *context, diff --git a/kernel/auditsc.c b/kernel/auditsc.c index b2d1f043f17f..d39a7fbaf944 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -1846,7 +1846,7 @@ void __audit_inode(struct filename *name, const struct dentry *dentry, n->type = AUDIT_TYPE_NORMAL; } handle_path(dentry); - audit_copy_inode(n, dentry, inode); + audit_copy_inode(n, dentry, inode, flags & AUDIT_INODE_NOREVAL); } void __audit_file(const struct file *file) @@ -1947,7 +1947,7 @@ void __audit_inode_child(struct inode *parent, n = audit_alloc_name(context, AUDIT_TYPE_PARENT); if (!n) return; - audit_copy_inode(n, NULL, parent); + audit_copy_inode(n, NULL, parent, 0); } if (!found_child) { @@ -1966,7 +1966,7 @@ void __audit_inode_child(struct inode *parent, } if (inode) - audit_copy_inode(found_child, dentry, inode); + audit_copy_inode(found_child, dentry, inode, 0); else found_child->ino = AUDIT_INO_UNSET; }
Don't fetch fcaps when umount2 is called with MNT_FORCE to avoid a process hang while it waits for the missing resource to (possibly never) re-appear. Note the comment above user_path_mountpoint_at(): * A umount is a special case for path walking. We're not actually interested * in the inode in this situation, and ESTALE errors can be a problem. We * simply want track down the dentry and vfsmount attached at the mountpoint * and avoid revalidating the last component. This can happen on ceph, cifs, 9p, lustre, fuse (gluster) or NFS. Please see the github issue tracker https://github.com/linux-audit/audit-kernel/issues/100 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> --- fs/namei.c | 2 +- fs/namespace.c | 3 +++ include/linux/audit.h | 8 ++++++-- kernel/audit.c | 5 +++-- kernel/audit.h | 2 +- kernel/auditsc.c | 6 +++--- 6 files changed, 17 insertions(+), 9 deletions(-)