Message ID | 20180731211045.5671-3-mfasheh@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfs: map unique ino/dev pairs for user space | expand |
On Tue, Jul 31, 2018 at 02:10:43PM -0700, Mark Fasheh wrote: > ->getattr from inside the kernel won't always have a vfsmount, check for > this. Really? Where would that happen?
Hi Al, On Tue, Jul 31, 2018 at 11:16:05PM +0100, Al Viro wrote: > On Tue, Jul 31, 2018 at 02:10:43PM -0700, Mark Fasheh wrote: > > ->getattr from inside the kernel won't always have a vfsmount, check for > > this. > > Really? Where would that happen? It happens in my first patch. FWIW, I'm not tied to that particular bit of code, or even this (latest) approach. I would actually very much appreciate your input into how we might fix the problem we have where we are sometimes not exporting a correct ino/dev pair to user space. I have a good break-down of the problem and possible solutions here: https://www.spinics.net/lists/linux-fsdevel/msg128003.html Thanks, --Mark
On Tue, Jul 31, 2018 at 10:51:57PM +0000, Mark Fasheh wrote: > Hi Al, > > On Tue, Jul 31, 2018 at 11:16:05PM +0100, Al Viro wrote: > > On Tue, Jul 31, 2018 at 02:10:43PM -0700, Mark Fasheh wrote: > > > ->getattr from inside the kernel won't always have a vfsmount, check for > > > this. > > > > Really? Where would that happen? > > It happens in my first patch. FWIW, I'm not tied to that particular bit of > code, or even this (latest) approach. I would actually very much appreciate > your input into how we might fix the problem we have where we are sometimes > not exporting a correct ino/dev pair to user space. Which callers would those be? I don't see anything in your series that wouldn't have vfsmount at hand... > I have a good break-down of the problem and possible solutions here: > > https://www.spinics.net/lists/linux-fsdevel/msg128003.html btrfs pulling odd crap with device numbers is a problem, but this is far from the most obvious unpleasantness caused by that - e.g. userland code expecting that unequal st_dev for directory and subdirectory means that we have a mountpoint there. Or assuming it can compare that to st_dev of mountpoints (or, indeed, the values in /proc/self/mountinfo) to find which fs it's on... /proc/*/maps is unfortunate, but it does contain the pathnames, doesn't it? What _real_ problems are there - the ones where we don't have a saner solution? Note that /proc/locks is (a) FPOS interface and (b) unsuitable for your approach anyway.
Hi Al, On Thu, Aug 02, 2018 at 01:43:48AM +0100, Al Viro wrote: > On Tue, Jul 31, 2018 at 10:51:57PM +0000, Mark Fasheh wrote: > > Hi Al, > > > > On Tue, Jul 31, 2018 at 11:16:05PM +0100, Al Viro wrote: > > > On Tue, Jul 31, 2018 at 02:10:43PM -0700, Mark Fasheh wrote: > > > > ->getattr from inside the kernel won't always have a vfsmount, check for > > > > this. > > > > > > Really? Where would that happen? > > > > It happens in my first patch. FWIW, I'm not tied to that particular bit of > > code, or even this (latest) approach. I would actually very much appreciate > > your input into how we might fix the problem we have where we are sometimes > > not exporting a correct ino/dev pair to user space. > > Which callers would those be? I don't see anything in your series that > wouldn't have vfsmount at hand... You are correct - there's no callers in my patch series that don't have a vfsmount available. I can remove patch #2 and pass the vfsmount through. At some point I was trying to plug this callback into audit_copy_inode() which AFAICT doesn't have a vfsmount to pass in when we're coming from vfs_rename (vfs_rename() -> fsnotify_move()). I will eventually have to go back and handle this for audit though so it seems worth mentioning here. > > I have a good break-down of the problem and possible solutions here: > > > > https://www.spinics.net/lists/linux-fsdevel/msg128003.html > > btrfs pulling odd crap with device numbers is a problem, but this is far from > the most obvious unpleasantness caused by that - e.g. userland code expecting > that unequal st_dev for directory and subdirectory means that we have > a mountpoint there. Or assuming it can compare that to st_dev of mountpoints > (or, indeed, the values in /proc/self/mountinfo) to find which fs it's on... Indeed, I agree that all of these are pretty gross and things I'd like to see fixed - I have to deal with subvolume boundaries in some of my own software. I know Jeff is working on some code to give subvolumes their own vfsmount but that wouldn't help with our device reporting unless we did something like move s_dev from the super_block to the vfsmount (also an idea that's been thrown around). > /proc/*/maps is unfortunate, but it does contain the pathnames, doesn't it? > What _real_ problems are there - the ones where we don't have a saner solution? > Note that /proc/locks is (a) FPOS interface and (b) unsuitable for your > approach anyway. Yeah I see now that /proc/locks isn't going to work with something calling into ->getattr(). Primarily my concerns are /proc/*/maps and audit (which is recording ino/dev pairs). Even though /proc/*/maps prints a path it is still a problem as there is no way for userspace to verify that the inode it stats using that path is the correct one. I don't mean to overstate the case but the /proc/*/maps issue is real to us (SUSE) in that it's produced bug reports. For example, lsof produces confusing output when we run with a btrfs subvolume as root. As far as problems that can't be solved by this approach - there are tracepoints in events/lock.h and events/writeback.h which ought to be reporting the correct ino/dev pairs. There's of course /proc/locks too. I acknowledge that whether those are severe enough to warrant a different approach might be up for debate. Thanks, --Mark
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index b65aee481d13..7a3d90a7cc92 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -795,7 +795,9 @@ int nfs_getattr(const struct path *path, struct kstat *stat, } /* - * We may force a getattr if the user cares about atime. + * We may force a getattr if the user cares about atime. A + * NULL vfsmount means we are called from inside the kernel, + * where no atime is required. * * Note that we only have to check the vfsmount flags here: * - NFS always sets S_NOATIME by so checking it would give a @@ -803,7 +805,7 @@ int nfs_getattr(const struct path *path, struct kstat *stat, * - NFS never sets SB_NOATIME or SB_NODIRATIME so there is * no point in checking those. */ - if ((path->mnt->mnt_flags & MNT_NOATIME) || + if (!path->mnt || (path->mnt->mnt_flags & MNT_NOATIME) || ((path->mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))) request_mask &= ~STATX_ATIME;
->getattr from inside the kernel won't always have a vfsmount, check for this. Signed-off-by: Mark Fasheh <mfasheh@suse.de> --- fs/nfs/inode.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)