Message ID | 20160915230748.GS30497@dastard (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner <david@fromorbit.com> wrote: > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote: >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig <hch@lst.de> wrote: >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote: >> >> The DAX property, page cache bypass, of a VMA is only detectable via the >> >> vma_is_dax() helper to check the S_DAX inode flag. However, this is >> >> only available internal to the kernel and is a property that userspace >> >> applications would like to interrogate. >> > >> > They have absolutely no business knowing such an implementation detail. >> >> Hasn't that train already left the station with FS_XFLAG_DAX? > > No, that's an admin flag, not a runtime hint for applications. Just > because that flag is set on an inode, it does not mean that DAX is > actually in use - it will be ignored if the backing dev is not dax > capable. Ok, but then VM_DAX does not suffer from that problem. I'm trying to understand why VM_DAX has no business being in the smaps "VmFlags" line, but something ambiguous to userspace like VM_MIXEDMAP does? > >> The other problem with hiding the DAX property is that it turns out to >> not be a transparent acceleration feature. See xfs/086 xfs/088 >> xfs/089 xfs/091 which fail with DAX and, as far as I understand, it is >> due to the fact that DAX disallows delayed allocation behavior. > > Which is not a bug, nor is it something that app developers should > be surprised by. > > i.e. Subtle differences in error reporting behaviour occur in > filesystems /all the time/. Run the test on a non-dax filesystem > with an extent size hint. It fails /exactly the same way as DAX/. > Run it with direct IO - fails the same way as DAX. Run it > with synchronous writes - it fails the same way as DAX. > > IOWs, if an app can't handle the way DAX reports errors, then they > are /broken/. Delayed allocation requires checking the return value > of fsync() or close() to capture the allocation error - many more > apps get that wrong than the ones that expect the immediate errors > from write()... > > Anyway: to domeonstrate that the nothign is actually broken, and > you might sometimes need to fix tests and send patches to > fstests@vger.kernel.org, this makes xfs/086 pass for me on DAX: > > --- a/tests/xfs/086 > +++ b/tests/xfs/086 > @@ -96,7 +96,8 @@ _scratch_mount > > echo "+ modify files" > for x in `seq 1 64`; do > - $XFS_IO_PROG -f -c "pwrite -S 0x62 0 ${blksz}" "${TESTFILE}.${x}" >> $seqres.full > + $XFS_IO_PROG -f -c "pwrite -S 0x62 0 ${blksz}" "${TESTFILE}.${x}" \ > + >> $seqres.full 2>&1 > done > umount "${SCRATCH_MNT}" Thanks for that! Wasn't immediately obvious to me, and didn't get that response when I asked on the list a while back. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner <david@fromorbit.com> wrote: > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote: >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig <hch@lst.de> wrote: >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote: >> >> The DAX property, page cache bypass, of a VMA is only detectable via the >> >> vma_is_dax() helper to check the S_DAX inode flag. However, this is >> >> only available internal to the kernel and is a property that userspace >> >> applications would like to interrogate. >> > >> > They have absolutely no business knowing such an implementation detail. >> >> Hasn't that train already left the station with FS_XFLAG_DAX? > > No, that's an admin flag, not a runtime hint for applications. Just > because that flag is set on an inode, it does not mean that DAX is > actually in use - it will be ignored if the backing dev is not dax > capable. > What's the point of an admin flag if an admin can't do cat /proc/<pid of interest>/smaps, or some other mechanism, to validate that the setting the admin cares about is in effect? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 15, 2016 at 05:16:42PM -0700, Dan Williams wrote: > On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote: > >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig <hch@lst.de> wrote: > >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote: > >> >> The DAX property, page cache bypass, of a VMA is only detectable via the > >> >> vma_is_dax() helper to check the S_DAX inode flag. However, this is > >> >> only available internal to the kernel and is a property that userspace > >> >> applications would like to interrogate. > >> > > >> > They have absolutely no business knowing such an implementation detail. > >> > >> Hasn't that train already left the station with FS_XFLAG_DAX? > > > > No, that's an admin flag, not a runtime hint for applications. Just > > because that flag is set on an inode, it does not mean that DAX is > > actually in use - it will be ignored if the backing dev is not dax > > capable. > > What's the point of an admin flag if an admin can't do cat /proc/<pid > of interest>/smaps, or some other mechanism, to validate that the > setting the admin cares about is in effect? Sorry, I don't follow - why would you be looking at mapping file regions in /proc to determine if some file somewhere in a filesystem has a specific flag set on it or not? FS_XFLAG_DAX is an inode attribute flag, not something you can query or administrate through mmap: I.e. # xfs_io -c "lsattr" -c "chattr +x" -c lsattr -c "chattr -x" -c "lsattr" foo --------------- foo --------------x foo --------------- foo # What happens when that flag is set on an inode is determined by a whole bunch of other things that are completely separate to the management of the inode flag itself. Cheers, Dave.
On Thu, Sep 15, 2016 at 6:24 PM, Dave Chinner <david@fromorbit.com> wrote: > On Thu, Sep 15, 2016 at 05:16:42PM -0700, Dan Williams wrote: >> On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner <david@fromorbit.com> wrote: >> > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote: >> >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig <hch@lst.de> wrote: >> >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote: >> >> >> The DAX property, page cache bypass, of a VMA is only detectable via the >> >> >> vma_is_dax() helper to check the S_DAX inode flag. However, this is >> >> >> only available internal to the kernel and is a property that userspace >> >> >> applications would like to interrogate. >> >> > >> >> > They have absolutely no business knowing such an implementation detail. >> >> >> >> Hasn't that train already left the station with FS_XFLAG_DAX? >> > >> > No, that's an admin flag, not a runtime hint for applications. Just >> > because that flag is set on an inode, it does not mean that DAX is >> > actually in use - it will be ignored if the backing dev is not dax >> > capable. >> >> What's the point of an admin flag if an admin can't do cat /proc/<pid >> of interest>/smaps, or some other mechanism, to validate that the >> setting the admin cares about is in effect? > > Sorry, I don't follow - why would you be looking at mapping file > regions in /proc to determine if some file somewhere in a filesystem > has a specific flag set on it or not? > > FS_XFLAG_DAX is an inode attribute flag, not something you can > query or administrate through mmap: > > I.e. > # xfs_io -c "lsattr" -c "chattr +x" -c lsattr -c "chattr -x" -c "lsattr" foo > --------------- foo > --------------x foo > --------------- foo > # > > What happens when that flag is set on an inode is determined by a > whole bunch of other things that are completely separate to the > management of the inode flag itself. Right, I understand that, but how does an admin audit those "bunch of other things" that actually gate whether DAX ends up being used in practice? There's currently no way for userspace to observe that a file with FS_XFLAG_DAX actually results in a change in mmap behavior. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 15, 2016 at 7:04 PM, Dan Williams <dan.j.williams@intel.com> wrote: > On Thu, Sep 15, 2016 at 6:24 PM, Dave Chinner <david@fromorbit.com> wrote: >> On Thu, Sep 15, 2016 at 05:16:42PM -0700, Dan Williams wrote: >>> On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner <david@fromorbit.com> wrote: >>> > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote: >>> >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig <hch@lst.de> wrote: >>> >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote: >>> >> >> The DAX property, page cache bypass, of a VMA is only detectable via the >>> >> >> vma_is_dax() helper to check the S_DAX inode flag. However, this is >>> >> >> only available internal to the kernel and is a property that userspace >>> >> >> applications would like to interrogate. >>> >> > >>> >> > They have absolutely no business knowing such an implementation detail. >>> >> >>> >> Hasn't that train already left the station with FS_XFLAG_DAX? >>> > >>> > No, that's an admin flag, not a runtime hint for applications. Just >>> > because that flag is set on an inode, it does not mean that DAX is >>> > actually in use - it will be ignored if the backing dev is not dax >>> > capable. >>> >>> What's the point of an admin flag if an admin can't do cat /proc/<pid >>> of interest>/smaps, or some other mechanism, to validate that the >>> setting the admin cares about is in effect? >> >> Sorry, I don't follow - why would you be looking at mapping file >> regions in /proc to determine if some file somewhere in a filesystem >> has a specific flag set on it or not? >> >> FS_XFLAG_DAX is an inode attribute flag, not something you can >> query or administrate through mmap: >> >> I.e. >> # xfs_io -c "lsattr" -c "chattr +x" -c lsattr -c "chattr -x" -c "lsattr" foo >> --------------- foo >> --------------x foo >> --------------- foo >> # >> >> What happens when that flag is set on an inode is determined by a >> whole bunch of other things that are completely separate to the >> management of the inode flag itself. > > Right, I understand that, but how does an admin audit those "bunch of > other things" that actually gate whether DAX ends up being used in > practice? There's currently no way for userspace to observe that a > file with FS_XFLAG_DAX actually results in a change in mmap behavior. Let me put it another way, if we inadvertently break DAX causing it to be disabled in scenarios when it should be enabled. What is the interface for the admin to check "I have the DAX inode flag set, but the file this application expects to be mapped DAX is mapped with page cache"? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 15, 2016 at 07:04:27PM -0700, Dan Williams wrote: > On Thu, Sep 15, 2016 at 6:24 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Thu, Sep 15, 2016 at 05:16:42PM -0700, Dan Williams wrote: > >> On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner <david@fromorbit.com> wrote: > >> > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote: > >> >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig <hch@lst.de> wrote: > >> >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote: > >> >> >> The DAX property, page cache bypass, of a VMA is only detectable via the > >> >> >> vma_is_dax() helper to check the S_DAX inode flag. However, this is > >> >> >> only available internal to the kernel and is a property that userspace > >> >> >> applications would like to interrogate. > >> >> > > >> >> > They have absolutely no business knowing such an implementation detail. > >> >> > >> >> Hasn't that train already left the station with FS_XFLAG_DAX? > >> > > >> > No, that's an admin flag, not a runtime hint for applications. Just > >> > because that flag is set on an inode, it does not mean that DAX is > >> > actually in use - it will be ignored if the backing dev is not dax > >> > capable. > >> > >> What's the point of an admin flag if an admin can't do cat /proc/<pid > >> of interest>/smaps, or some other mechanism, to validate that the > >> setting the admin cares about is in effect? > > > > Sorry, I don't follow - why would you be looking at mapping file > > regions in /proc to determine if some file somewhere in a filesystem > > has a specific flag set on it or not? > > > > FS_XFLAG_DAX is an inode attribute flag, not something you can > > query or administrate through mmap: > > > > I.e. > > # xfs_io -c "lsattr" -c "chattr +x" -c lsattr -c "chattr -x" -c "lsattr" foo > > --------------- foo > > --------------x foo > > --------------- foo > > # > > > > What happens when that flag is set on an inode is determined by a > > whole bunch of other things that are completely separate to the > > management of the inode flag itself. > > Right, I understand that, but how does an admin audit those "bunch of > other things" Filesystem mounts checks all the various stuff that determines whether DAX can be used. It logs to the console that it is "Dax capable". Any file that then has FS_XFLAG_DAX set will result in DAX being used. There is no other possibility when these two things are reported. /me points at runtime diagnostic tracepoints like trace_xfs_file_dax_read() and notes that dax is sadly lacking in diagnostic tracepoints. Besides, userspace can't do anything useful with this information, because the FS_XFLAG_DAX can be changed /at any time/ by an admin. And the filesystem is free to remove it at any time, too, if it needs to (e.g. file gets reflinked or snapshotted). That's right, an inode can dynamically change from DAX to non-DAX underneath the application, and the application /will not notice/. That's because changing the flag will sync and invalidate the existing mappings and the next application access will simply fault it back in using whatever mechanism the inode is now configured with. Plain and simple: userspace has absolutely no fucking idea of whether DAX is enabled or not, and whatever the kernel returns to userspace above the DAX configuration is stale before it even got out of the kernel.... Cheers, Dave.
On Thu, Sep 15, 2016 at 10:36 PM, Dave Chinner <david@fromorbit.com> wrote: > On Thu, Sep 15, 2016 at 07:04:27PM -0700, Dan Williams wrote: >> On Thu, Sep 15, 2016 at 6:24 PM, Dave Chinner <david@fromorbit.com> wrote: >> > On Thu, Sep 15, 2016 at 05:16:42PM -0700, Dan Williams wrote: >> >> On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner <david@fromorbit.com> wrote: >> >> > On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote: >> >> >> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig <hch@lst.de> wrote: >> >> >> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote: >> >> >> >> The DAX property, page cache bypass, of a VMA is only detectable via the >> >> >> >> vma_is_dax() helper to check the S_DAX inode flag. However, this is >> >> >> >> only available internal to the kernel and is a property that userspace >> >> >> >> applications would like to interrogate. >> >> >> > >> >> >> > They have absolutely no business knowing such an implementation detail. >> >> >> >> >> >> Hasn't that train already left the station with FS_XFLAG_DAX? >> >> > >> >> > No, that's an admin flag, not a runtime hint for applications. Just >> >> > because that flag is set on an inode, it does not mean that DAX is >> >> > actually in use - it will be ignored if the backing dev is not dax >> >> > capable. >> >> >> >> What's the point of an admin flag if an admin can't do cat /proc/<pid >> >> of interest>/smaps, or some other mechanism, to validate that the >> >> setting the admin cares about is in effect? >> > >> > Sorry, I don't follow - why would you be looking at mapping file >> > regions in /proc to determine if some file somewhere in a filesystem >> > has a specific flag set on it or not? >> > >> > FS_XFLAG_DAX is an inode attribute flag, not something you can >> > query or administrate through mmap: >> > >> > I.e. >> > # xfs_io -c "lsattr" -c "chattr +x" -c lsattr -c "chattr -x" -c "lsattr" foo >> > --------------- foo >> > --------------x foo >> > --------------- foo >> > # >> > >> > What happens when that flag is set on an inode is determined by a >> > whole bunch of other things that are completely separate to the >> > management of the inode flag itself. >> >> Right, I understand that, but how does an admin audit those "bunch of >> other things" > > Filesystem mounts checks all the various stuff that determines > whether DAX can be used. It logs to the console that it is "Dax > capable". Any file that then has FS_XFLAG_DAX set will result in DAX > being used. There is no other possibility when these two things are > reported. > > /me points at runtime diagnostic tracepoints like > trace_xfs_file_dax_read() and notes that dax is sadly lacking in > diagnostic tracepoints. > > Besides, userspace can't do anything useful with this information, > because the FS_XFLAG_DAX can be changed /at any time/ by an admin. > And the filesystem is free to remove it at any time, too, if it > needs to (e.g. file gets reflinked or snapshotted). > > That's right, an inode can dynamically change from DAX to non-DAX > underneath the application, and the application /will not notice/. > That's because changing the flag will sync and invalidate the > existing mappings and the next application access will simply fault > it back in using whatever mechanism the inode is now configured > with. > > Plain and simple: userspace has absolutely no fucking idea of > whether DAX is enabled or not, and whatever the kernel returns to > userspace above the DAX configuration is stale before it even got > out of the kernel.... smaps is already known to be an ephemeral interface, but we output useful information there nonetheless. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/tests/xfs/086 +++ b/tests/xfs/086 @@ -96,7 +96,8 @@ _scratch_mount echo "+ modify files" for x in `seq 1 64`; do - $XFS_IO_PROG -f -c "pwrite -S 0x62 0 ${blksz}" "${TESTFILE}.${x}" >> $seqres.full + $XFS_IO_PROG -f -c "pwrite -S 0x62 0 ${blksz}" "${TESTFILE}.${x}" \ + >> $seqres.full 2>&1 done umount "${SCRATCH_MNT}"