[0/3] ext2, ext4, xfs: hard fail dax mount on unsupported devices
mbox series

Message ID 1539027169-23332-1-git-send-email-sandeen@sandeen.net
Headers show
Series
  • ext2, ext4, xfs: hard fail dax mount on unsupported devices
Related show

Message

Eric Sandeen Oct. 8, 2018, 7:32 p.m. UTC
In response to an earlier xfs patch to change how xfs reacts to
dax incompatibilities, Dave said:

> I suspect we need to be more harsh are rejecting mounts with -o dax
> on devices DAX isn't supported on. This mount option is going into
> production systems - it's not just for "testing" as the comments all
> claim. i Things will break in production systems if DAX isn't
> enabled and they are expecting it to be enabled.

and I tend to agree, so proposing this change to hard-fail a dax mount if
the device doesn't support it, instead of silently disabling the
functionality.  Proposing for ext2, ext4, and xfs to keep behavior in
sync.

Thanks,
-Eric

Comments

Jan Kara Oct. 11, 2018, 10:36 a.m. UTC | #1
On Mon 08-10-18 14:32:46, Eric Sandeen wrote:
> In response to an earlier xfs patch to change how xfs reacts to
> dax incompatibilities, Dave said:
> 
> > I suspect we need to be more harsh are rejecting mounts with -o dax
> > on devices DAX isn't supported on. This mount option is going into
> > production systems - it's not just for "testing" as the comments all
> > claim. i Things will break in production systems if DAX isn't
> > enabled and they are expecting it to be enabled.
> 
> and I tend to agree, so proposing this change to hard-fail a dax mount if
> the device doesn't support it, instead of silently disabling the
> functionality.  Proposing for ext2, ext4, and xfs to keep behavior in
> sync.

Let me include Dan and Ross into the discussion since they were the ones
proposing the "silent fallback" behavior (ext4 actually did fail the mount
instead not so long ago - see 24f3478d664b "ext4: auto disable dax instead
of failing mount" from December). Guys, why did you choose the fallback
path instead of a failure?

								Honza
Dan Williams Oct. 11, 2018, 6:08 p.m. UTC | #2
On Thu, Oct 11, 2018 at 3:37 AM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 08-10-18 14:32:46, Eric Sandeen wrote:
> > In response to an earlier xfs patch to change how xfs reacts to
> > dax incompatibilities, Dave said:
> >
> > > I suspect we need to be more harsh are rejecting mounts with -o dax
> > > on devices DAX isn't supported on. This mount option is going into
> > > production systems - it's not just for "testing" as the comments all
> > > claim. i Things will break in production systems if DAX isn't
> > > enabled and they are expecting it to be enabled.
> >
> > and I tend to agree, so proposing this change to hard-fail a dax mount if
> > the device doesn't support it, instead of silently disabling the
> > functionality.  Proposing for ext2, ext4, and xfs to keep behavior in
> > sync.
>
> Let me include Dan and Ross into the discussion since they were the ones
> proposing the "silent fallback" behavior (ext4 actually did fail the mount
> instead not so long ago - see 24f3478d664b "ext4: auto disable dax instead
> of failing mount" from December). Guys, why did you choose the fallback
> path instead of a failure?

The different behavior between filesystems was confusing customers so
we had to align them, then the question was which default to pick.
Honestly, we came to the decision to bring ext4 in line with the xfs
behavior because we thought that would be easier than the alternative.
Dave and Christoph made repeated arguments that DAX is just a hidden
performance optimization that no application should rely on, so we
went the path of least resistance and changed the ext4 default.

I'm perfectly fine switching both to the "fail if not DAX device" behavior.
Eric Sandeen Oct. 11, 2018, 6:38 p.m. UTC | #3
On 10/11/18 1:08 PM, Dan Williams wrote:
> On Thu, Oct 11, 2018 at 3:37 AM Jan Kara <jack@suse.cz> wrote:
>>
>> On Mon 08-10-18 14:32:46, Eric Sandeen wrote:
>>> In response to an earlier xfs patch to change how xfs reacts to
>>> dax incompatibilities, Dave said:
>>>
>>>> I suspect we need to be more harsh are rejecting mounts with -o dax
>>>> on devices DAX isn't supported on. This mount option is going into
>>>> production systems - it's not just for "testing" as the comments all
>>>> claim. i Things will break in production systems if DAX isn't
>>>> enabled and they are expecting it to be enabled.
>>>
>>> and I tend to agree, so proposing this change to hard-fail a dax mount if
>>> the device doesn't support it, instead of silently disabling the
>>> functionality.  Proposing for ext2, ext4, and xfs to keep behavior in
>>> sync.
>>
>> Let me include Dan and Ross into the discussion since they were the ones
>> proposing the "silent fallback" behavior (ext4 actually did fail the mount
>> instead not so long ago - see 24f3478d664b "ext4: auto disable dax instead
>> of failing mount" from December). Guys, why did you choose the fallback
>> path instead of a failure?
> 
> The different behavior between filesystems was confusing customers so
> we had to align them, then the question was which default to pick.
> Honestly, we came to the decision to bring ext4 in line with the xfs
> behavior because we thought that would be easier than the alternative.
> Dave and Christoph made repeated arguments that DAX is just a hidden
> performance optimization that no application should rely on, so we
> went the path of least resistance and changed the ext4 default.

Ok, well, I guess we'd better reconcile "it's a hidden performance hint"
with "if the administrator asked they must receive..." before making this
change... cc: hch for bonus input.

-Eric

> I'm perfectly fine switching both to the "fail if not DAX device" behavior.
>
Theodore Y. Ts'o Oct. 12, 2018, 2:21 a.m. UTC | #4
On Thu, Oct 11, 2018 at 01:38:34PM -0500, Eric Sandeen wrote:
> > The different behavior between filesystems was confusing customers so
> > we had to align them, then the question was which default to pick.
> > Honestly, we came to the decision to bring ext4 in line with the xfs
> > behavior because we thought that would be easier than the alternative.
> > Dave and Christoph made repeated arguments that DAX is just a hidden
> > performance optimization that no application should rely on, so we
> > went the path of least resistance and changed the ext4 default.
> 
> Ok, well, I guess we'd better reconcile "it's a hidden performance hint"
> with "if the administrator asked they must receive..." before making this
> change... cc: hch for bonus input.

And it's not so hidden if there are some applications that are
demanding that they know whether "dax" is turned on....

I don't really care, but it would nice if we settled all of these
disagreements about what dax is one way or another.  Flip a coin if
necessary; ext4 isn't supporting a per-file dax flag right now since
it hasn't been clear whether or not XFS is going to drop support
(which it is claimed we can do since dax is still "experimental").
 
But whether it's flipping a coin or super-soakers at 20 paces, can we
please figure this out?  One way or another?  I'll provide some
suitable coin at LSF/MM if we can't figure it out sooner --- but I
really would prefer that it be sooner.  :-)

Thanks,

						- Ted
Christoph Hellwig Oct. 12, 2018, 8:21 a.m. UTC | #5
On Thu, Oct 11, 2018 at 01:38:34PM -0500, Eric Sandeen wrote:
> > The different behavior between filesystems was confusing customers so
> > we had to align them, then the question was which default to pick.
> > Honestly, we came to the decision to bring ext4 in line with the xfs
> > behavior because we thought that would be easier than the alternative.
> > Dave and Christoph made repeated arguments that DAX is just a hidden
> > performance optimization that no application should rely on, so we
> > went the path of least resistance and changed the ext4 default.
> 
> Ok, well, I guess we'd better reconcile "it's a hidden performance hint"
> with "if the administrator asked they must receive..." before making this
> change... cc: hch for bonus input.

I don't really care too mouch on the mount options, the important bit
was the application behavior.

I fully agree with Dan that we should have the same behavior for every
file system, though.
Ross Zwisler Oct. 13, 2018, 4:05 p.m. UTC | #6
On Fri, Oct 12, 2018 at 2:21 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Oct 11, 2018 at 01:38:34PM -0500, Eric Sandeen wrote:
> > > The different behavior between filesystems was confusing customers so
> > > we had to align them, then the question was which default to pick.
> > > Honestly, we came to the decision to bring ext4 in line with the xfs
> > > behavior because we thought that would be easier than the alternative.
> > > Dave and Christoph made repeated arguments that DAX is just a hidden
> > > performance optimization that no application should rely on, so we
> > > went the path of least resistance and changed the ext4 default.
> >
> > Ok, well, I guess we'd better reconcile "it's a hidden performance hint"
> > with "if the administrator asked they must receive..." before making this
> > change... cc: hch for bonus input.
>
> I don't really care too mouch on the mount options, the important bit
> was the application behavior.
>
> I fully agree with Dan that we should have the same behavior for every
> file system, though.

One factor that might influence this is how we expect users to detect
whether or not DAX is being used, and whether that can vary per-inode
within a filesystem.  If we choose to only have the mount option then
I agree that a hard failure when -o dax doesn't work seems fine.  And
of course keeping the filesystems behaving the same is desirable.

If we eventually do go back to having a per-inode DAX option, though,
the mount option becomes a hint as to what the default behavior is,
and the user will need another way to detect whether or not DAX is
being used for a given inode.  In that case having the mount option
fail loudly isn't as important because all we've really changed is the
filesystem's default, and the application will still need a consistent
way of detecting whether the inode they are actually using is DAX or
not.

I'm not sure if per-inode DAX is still a goal for anyone.  If not,
then sure, using the DAX mount option as the one source of truth and
making it a hard failure when it doesn't work seems reasonable.
Eric Sandeen Oct. 17, 2018, 7:42 p.m. UTC | #7
On 10/13/18 11:05 AM, Ross Zwisler wrote:
> On Fri, Oct 12, 2018 at 2:21 AM Christoph Hellwig <hch@lst.de> wrote:
>>
>> On Thu, Oct 11, 2018 at 01:38:34PM -0500, Eric Sandeen wrote:
>>>> The different behavior between filesystems was confusing customers so
>>>> we had to align them, then the question was which default to pick.
>>>> Honestly, we came to the decision to bring ext4 in line with the xfs
>>>> behavior because we thought that would be easier than the alternative.
>>>> Dave and Christoph made repeated arguments that DAX is just a hidden
>>>> performance optimization that no application should rely on, so we
>>>> went the path of least resistance and changed the ext4 default.
>>>
>>> Ok, well, I guess we'd better reconcile "it's a hidden performance hint"
>>> with "if the administrator asked they must receive..." before making this
>>> change... cc: hch for bonus input.
>>
>> I don't really care too mouch on the mount options, the important bit
>> was the application behavior.
>>
>> I fully agree with Dan that we should have the same behavior for every
>> file system, though.
> 
> One factor that might influence this is how we expect users to detect
> whether or not DAX is being used, and whether that can vary per-inode
> within a filesystem.  If we choose to only have the mount option then
> I agree that a hard failure when -o dax doesn't work seems fine.  And
> of course keeping the filesystems behaving the same is desirable.
> 
> If we eventually do go back to having a per-inode DAX option, though,
> the mount option becomes a hint as to what the default behavior is,
> and the user will need another way to detect whether or not DAX is
> being used for a given inode.  In that case having the mount option
> fail loudly isn't as important because all we've really changed is the
> filesystem's default, and the application will still need a consistent
> way of detecting whether the inode they are actually using is DAX or
> not.
> 
> I'm not sure if per-inode DAX is still a goal for anyone.  If not,
> then sure, using the DAX mount option as the one source of truth and
> making it a hard failure when it doesn't work seems reasonable.

I've been thinking about the per-inode stuff a bit, and while I don't know
how to resolve some of the trickier issues, at least the expected behavior
seems like something we can narrow down and specify.

Because it's an on-disk flag (in xfs today, in any case) it seems that
the only sane behavior to expect is either/or, i.e.:

Mount option: All files always dax, per-inode flags ignored (or rejected)
Per-inode: Mount option cannot be specified; only inodes explicitly flagged are dax

Think about it; what would mount-option-plus-per-inode mean?  We have
no "negative" dax flag, so while mount-option-with-flag surely means
"dax", what the heck does mount-option-without-flag mean, and how is it
distinguishable from mount option only?

I submit that flags can only have meaning w/o the fs-wide mount option
enabled, so the question of "should we hard fail mount -o dax for devices
that cannot support it" seems to be orthogonal to the per-inode question.

i.e. mount -o dax really can only mean "I want dax on everything" and so
again, I think we probably need to fail the mount if that can't be honored.

-Eric
Ross Zwisler Oct. 17, 2018, 7:51 p.m. UTC | #8
On Wed, Oct 17, 2018 at 1:42 PM Eric Sandeen <sandeen@sandeen.net> wrote:
> On 10/13/18 11:05 AM, Ross Zwisler wrote:
> > On Fri, Oct 12, 2018 at 2:21 AM Christoph Hellwig <hch@lst.de> wrote:
> >>
> >> On Thu, Oct 11, 2018 at 01:38:34PM -0500, Eric Sandeen wrote:
> >>>> The different behavior between filesystems was confusing customers so
> >>>> we had to align them, then the question was which default to pick.
> >>>> Honestly, we came to the decision to bring ext4 in line with the xfs
> >>>> behavior because we thought that would be easier than the alternative.
> >>>> Dave and Christoph made repeated arguments that DAX is just a hidden
> >>>> performance optimization that no application should rely on, so we
> >>>> went the path of least resistance and changed the ext4 default.
> >>>
> >>> Ok, well, I guess we'd better reconcile "it's a hidden performance hint"
> >>> with "if the administrator asked they must receive..." before making this
> >>> change... cc: hch for bonus input.
> >>
> >> I don't really care too mouch on the mount options, the important bit
> >> was the application behavior.
> >>
> >> I fully agree with Dan that we should have the same behavior for every
> >> file system, though.
> >
> > One factor that might influence this is how we expect users to detect
> > whether or not DAX is being used, and whether that can vary per-inode
> > within a filesystem.  If we choose to only have the mount option then
> > I agree that a hard failure when -o dax doesn't work seems fine.  And
> > of course keeping the filesystems behaving the same is desirable.
> >
> > If we eventually do go back to having a per-inode DAX option, though,
> > the mount option becomes a hint as to what the default behavior is,
> > and the user will need another way to detect whether or not DAX is
> > being used for a given inode.  In that case having the mount option
> > fail loudly isn't as important because all we've really changed is the
> > filesystem's default, and the application will still need a consistent
> > way of detecting whether the inode they are actually using is DAX or
> > not.
> >
> > I'm not sure if per-inode DAX is still a goal for anyone.  If not,
> > then sure, using the DAX mount option as the one source of truth and
> > making it a hard failure when it doesn't work seems reasonable.
>
> I've been thinking about the per-inode stuff a bit, and while I don't know
> how to resolve some of the trickier issues, at least the expected behavior
> seems like something we can narrow down and specify.
>
> Because it's an on-disk flag (in xfs today, in any case) it seems that
> the only sane behavior to expect is either/or, i.e.:
>
> Mount option: All files always dax, per-inode flags ignored (or rejected)
> Per-inode: Mount option cannot be specified; only inodes explicitly flagged are dax
>
> Think about it; what would mount-option-plus-per-inode mean?  We have
> no "negative" dax flag, so while mount-option-with-flag surely means
> "dax", what the heck does mount-option-without-flag mean, and how is it
> distinguishable from mount option only?
>
> I submit that flags can only have meaning w/o the fs-wide mount option
> enabled, so the question of "should we hard fail mount -o dax for devices
> that cannot support it" seems to be orthogonal to the per-inode question.
>
> i.e. mount -o dax really can only mean "I want dax on everything" and so
> again, I think we probably need to fail the mount if that can't be honored.

Works for me.
Dan Williams Oct. 17, 2018, 7:52 p.m. UTC | #9
On Wed, Oct 17, 2018 at 12:42 PM Eric Sandeen <sandeen@sandeen.net> wrote:
>
>
>
> On 10/13/18 11:05 AM, Ross Zwisler wrote:
> > On Fri, Oct 12, 2018 at 2:21 AM Christoph Hellwig <hch@lst.de> wrote:
> >>
> >> On Thu, Oct 11, 2018 at 01:38:34PM -0500, Eric Sandeen wrote:
> >>>> The different behavior between filesystems was confusing customers so
> >>>> we had to align them, then the question was which default to pick.
> >>>> Honestly, we came to the decision to bring ext4 in line with the xfs
> >>>> behavior because we thought that would be easier than the alternative.
> >>>> Dave and Christoph made repeated arguments that DAX is just a hidden
> >>>> performance optimization that no application should rely on, so we
> >>>> went the path of least resistance and changed the ext4 default.
> >>>
> >>> Ok, well, I guess we'd better reconcile "it's a hidden performance hint"
> >>> with "if the administrator asked they must receive..." before making this
> >>> change... cc: hch for bonus input.
> >>
> >> I don't really care too mouch on the mount options, the important bit
> >> was the application behavior.
> >>
> >> I fully agree with Dan that we should have the same behavior for every
> >> file system, though.
> >
> > One factor that might influence this is how we expect users to detect
> > whether or not DAX is being used, and whether that can vary per-inode
> > within a filesystem.  If we choose to only have the mount option then
> > I agree that a hard failure when -o dax doesn't work seems fine.  And
> > of course keeping the filesystems behaving the same is desirable.
> >
> > If we eventually do go back to having a per-inode DAX option, though,
> > the mount option becomes a hint as to what the default behavior is,
> > and the user will need another way to detect whether or not DAX is
> > being used for a given inode.  In that case having the mount option
> > fail loudly isn't as important because all we've really changed is the
> > filesystem's default, and the application will still need a consistent
> > way of detecting whether the inode they are actually using is DAX or
> > not.
> >
> > I'm not sure if per-inode DAX is still a goal for anyone.  If not,
> > then sure, using the DAX mount option as the one source of truth and
> > making it a hard failure when it doesn't work seems reasonable.
>
> I've been thinking about the per-inode stuff a bit, and while I don't know
> how to resolve some of the trickier issues, at least the expected behavior
> seems like something we can narrow down and specify.
>
> Because it's an on-disk flag (in xfs today, in any case) it seems that
> the only sane behavior to expect is either/or, i.e.:
>
> Mount option: All files always dax, per-inode flags ignored (or rejected)
> Per-inode: Mount option cannot be specified; only inodes explicitly flagged are dax
>
> Think about it; what would mount-option-plus-per-inode mean?  We have
> no "negative" dax flag, so while mount-option-with-flag surely means
> "dax", what the heck does mount-option-without-flag mean, and how is it
> distinguishable from mount option only?
>
> I submit that flags can only have meaning w/o the fs-wide mount option
> enabled, so the question of "should we hard fail mount -o dax for devices
> that cannot support it" seems to be orthogonal to the per-inode question.
>
> i.e. mount -o dax really can only mean "I want dax on everything" and so
> again, I think we probably need to fail the mount if that can't be honored.

+1 from me. The mount option is a blunt global override and we should
proceed with the finer-grained enabling. DAX is not guaranteed to have
neutral to positive performance impact so it should be enabled
consciously.
Jeff Moyer Oct. 17, 2018, 9:31 p.m. UTC | #10
Eric Sandeen <sandeen@sandeen.net> writes:

> I've been thinking about the per-inode stuff a bit, and while I don't know
> how to resolve some of the trickier issues, at least the expected behavior
> seems like something we can narrow down and specify.
>
> Because it's an on-disk flag (in xfs today, in any case) it seems that
> the only sane behavior to expect is either/or, i.e.:
>
> Mount option: All files always dax, per-inode flags ignored (or rejected)
> Per-inode: Mount option cannot be specified; only inodes explicitly flagged are dax
>
> Think about it; what would mount-option-plus-per-inode mean?  We have
> no "negative" dax flag, so while mount-option-with-flag surely means
> "dax", what the heck does mount-option-without-flag mean, and how is it
> distinguishable from mount option only?
>
> I submit that flags can only have meaning w/o the fs-wide mount option
> enabled, so the question of "should we hard fail mount -o dax for devices
> that cannot support it" seems to be orthogonal to the per-inode question.
>
> i.e. mount -o dax really can only mean "I want dax on everything" and so
> again, I think we probably need to fail the mount if that can't be honored.

I hate to even open up this can of worms, but what about killing the dax
mount option?

To quote Christoph:
  How does an application "make use of DAX"?  What actual user visible
  semantics are associated with a file that has this flag set?

We're already talking about making caching decisions automatically, so
does DAX even mean anything at that point?  If the storage and the file
system support it, enable it.

From what we've seen so far, aplications want:
1) to be able to make data persistent from userspace
   For this, we have MAP_SYNC.
2) to determine whether or not page cache will be used
   For this, we have O_DIRECT for read/write access, and MAP_SYNC for
   mmap access (and maybe a third option coming, we'll see).

The only thing users gain from a mount option is the ability to turn OFF
dax.  I suppose there might be a use case that wants this, but I'm not
aware of it.

Cheers,
Jeff
Dan Williams Oct. 17, 2018, 9:44 p.m. UTC | #11
On Wed, Oct 17, 2018 at 2:31 PM Jeff Moyer <jmoyer@redhat.com> wrote:
>
> Eric Sandeen <sandeen@sandeen.net> writes:
>
> > I've been thinking about the per-inode stuff a bit, and while I don't know
> > how to resolve some of the trickier issues, at least the expected behavior
> > seems like something we can narrow down and specify.
> >
> > Because it's an on-disk flag (in xfs today, in any case) it seems that
> > the only sane behavior to expect is either/or, i.e.:
> >
> > Mount option: All files always dax, per-inode flags ignored (or rejected)
> > Per-inode: Mount option cannot be specified; only inodes explicitly flagged are dax
> >
> > Think about it; what would mount-option-plus-per-inode mean?  We have
> > no "negative" dax flag, so while mount-option-with-flag surely means
> > "dax", what the heck does mount-option-without-flag mean, and how is it
> > distinguishable from mount option only?
> >
> > I submit that flags can only have meaning w/o the fs-wide mount option
> > enabled, so the question of "should we hard fail mount -o dax for devices
> > that cannot support it" seems to be orthogonal to the per-inode question.
> >
> > i.e. mount -o dax really can only mean "I want dax on everything" and so
> > again, I think we probably need to fail the mount if that can't be honored.
>
> I hate to even open up this can of worms, but what about killing the dax
> mount option?
>
> To quote Christoph:
>   How does an application "make use of DAX"?  What actual user visible
>   semantics are associated with a file that has this flag set?
>
> We're already talking about making caching decisions automatically, so
> does DAX even mean anything at that point?  If the storage and the file
> system support it, enable it.
>
> From what we've seen so far, aplications want:
> 1) to be able to make data persistent from userspace
>    For this, we have MAP_SYNC.
> 2) to determine whether or not page cache will be used
>    For this, we have O_DIRECT for read/write access, and MAP_SYNC for
>    mmap access (and maybe a third option coming, we'll see).

As Jan has said, it's not safe to assume that 'no page cache' is
implied with MAP_SYNC. It's a side effect not a contract of the
current implementation.

> The only thing users gain from a mount option is the ability to turn OFF
> dax.  I suppose there might be a use case that wants this, but I'm not
> aware of it.

I think we're stuck with it as many scripts would break if it ever
went completely away. However, we could mark it deprecated / ignored
provided we had a way for applications to query and override if DAX is
enabled. I also think it's important to keep separate the dax-mmap
behavior from the dax-read/write behavior. dax-mmap is where an
application would make different decisions if it can get a mapping
without page cache, dax-read/write does not appear to have any
justification to be advertised because the application would not do
anything different whether that is present or not.
Dave Chinner Oct. 18, 2018, 1:05 a.m. UTC | #12
On Wed, Oct 17, 2018 at 02:44:55PM -0700, Dan Williams wrote:
> On Wed, Oct 17, 2018 at 2:31 PM Jeff Moyer <jmoyer@redhat.com> wrote:
> >
> > Eric Sandeen <sandeen@sandeen.net> writes:
> >
> > > I've been thinking about the per-inode stuff a bit, and while I don't know
> > > how to resolve some of the trickier issues, at least the expected behavior
> > > seems like something we can narrow down and specify.
> > >
> > > Because it's an on-disk flag (in xfs today, in any case) it seems that
> > > the only sane behavior to expect is either/or, i.e.:
> > >
> > > Mount option: All files always dax, per-inode flags ignored (or rejected)
> > > Per-inode: Mount option cannot be specified; only inodes explicitly flagged are dax
> > >
> > > Think about it; what would mount-option-plus-per-inode mean?  We have
> > > no "negative" dax flag, so while mount-option-with-flag surely means
> > > "dax", what the heck does mount-option-without-flag mean, and how is it
> > > distinguishable from mount option only?
> > >
> > > I submit that flags can only have meaning w/o the fs-wide mount option
> > > enabled, so the question of "should we hard fail mount -o dax for devices
> > > that cannot support it" seems to be orthogonal to the per-inode question.
> > >
> > > i.e. mount -o dax really can only mean "I want dax on everything" and so
> > > again, I think we probably need to fail the mount if that can't be honored.
> >
> > I hate to even open up this can of worms, but what about killing the dax
> > mount option?
> >
> > To quote Christoph:
> >   How does an application "make use of DAX"?  What actual user visible
> >   semantics are associated with a file that has this flag set?
> >
> > We're already talking about making caching decisions automatically, so
> > does DAX even mean anything at that point?  If the storage and the file
> > system support it, enable it.
> >
> > From what we've seen so far, aplications want:
> > 1) to be able to make data persistent from userspace
> >    For this, we have MAP_SYNC.
> > 2) to determine whether or not page cache will be used
> >    For this, we have O_DIRECT for read/write access, and MAP_SYNC for
> >    mmap access (and maybe a third option coming, we'll see).
> 
> As Jan has said, it's not safe to assume that 'no page cache' is
> implied with MAP_SYNC. It's a side effect not a contract of the
> current implementation.

Even MAP_DIRECT shouldn't mean "no page cache". O_DIRECT is a hint,
not a guarantee, and so it may very well use the page cache if it
needs to (as I've just explained in detail in a different thread).

> > The only thing users gain from a mount option is the ability to turn OFF
> > dax.  I suppose there might be a use case that wants this, but I'm not
> > aware of it.
> 
> I think we're stuck with it as many scripts would break if it ever
> went completely away. However, we could mark it deprecated / ignored

I don't really care that much about this - it is still marked
experimental.

That said, deprecation is the best way forward here if we are going
to remove the mount option. We've done this for other XFS mount
options recently (e.g. barrier/nobarrier) where the functionality is
now fully baked into the fileystem and there's no user option to
control it anymore.

What we really need is a document describing the expected behaviour
of filesysetms on dax-capable storage. Let's nail down exactly what
we need to do to pull DAX out of the experimental state before we
start changing things. We've been doing things in a very ad-hoc way
for a while now, and we're not really converging on an endpoint where we
can say "we're done, have at it".

I think we need to decide on:

- default filesystem behaviour on dax-capable block devices
- what information aout DAX do applications actually need? What
  makes sense to provide them with that information?
- how to provide hints to the kernel for desired behaviour
  - on-disk inode flags, or something else?
  - dax/nodax mount options or root dir inode flags become default
    global hints?
  - is a single hint flag sufficient or do we also need an
    explicit "do not use dax" flag?
- behaviour of MAP_SYNC w.r.t. non-DAX filesystems that can provide
  required MAP_SYNC semnatics
- behaviour of MAP_DIRECT - hint like O_DIRECT or guarantee?
- default read/write path behaviour of dax-capable block devices
  - automatically bypass the pagecache if bdev is capable?
- default mmap behaviour on dax capable devices
  - use dax always?
- DAX vs get_user_pages_longterm
  - turns off DAX dynamically?
  - how do DAX-enabled filesystems interact with page fault capable
    hardware? Can we allow DAX in those cases?

I'm sure there's a heap more we need to document and nail down.
There's a lot of stuff to sort out before we start hammering on
random bits of code....

> provided we had a way for applications to query and override if DAX is
> enabled. I also think it's important to keep separate the dax-mmap
> behavior from the dax-read/write behavior. dax-mmap is where an
> application would make different decisions if it can get a mapping
> without page cache,

The functionality people keep saying "requires DAX" really doesn't -
what it really requires is that mmap() exposes filesystem tracked
pmem in a CPU addressable memory range. DAX is not the only way to
do that - a filesystem with a pmem-based persistent page cache can
provide MAP_SYNC semantics to userspace without being a DAX
filesystem.

(see other thread again)

Cheers,

Dave.
Dan Williams Oct. 18, 2018, 2:01 a.m. UTC | #13
On Wed, Oct 17, 2018 at 6:05 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Oct 17, 2018 at 02:44:55PM -0700, Dan Williams wrote:
> > On Wed, Oct 17, 2018 at 2:31 PM Jeff Moyer <jmoyer@redhat.com> wrote:
> > >
> > > Eric Sandeen <sandeen@sandeen.net> writes:
> > >
> > > > I've been thinking about the per-inode stuff a bit, and while I don't know
> > > > how to resolve some of the trickier issues, at least the expected behavior
> > > > seems like something we can narrow down and specify.
> > > >
> > > > Because it's an on-disk flag (in xfs today, in any case) it seems that
> > > > the only sane behavior to expect is either/or, i.e.:
> > > >
> > > > Mount option: All files always dax, per-inode flags ignored (or rejected)
> > > > Per-inode: Mount option cannot be specified; only inodes explicitly flagged are dax
> > > >
> > > > Think about it; what would mount-option-plus-per-inode mean?  We have
> > > > no "negative" dax flag, so while mount-option-with-flag surely means
> > > > "dax", what the heck does mount-option-without-flag mean, and how is it
> > > > distinguishable from mount option only?
> > > >
> > > > I submit that flags can only have meaning w/o the fs-wide mount option
> > > > enabled, so the question of "should we hard fail mount -o dax for devices
> > > > that cannot support it" seems to be orthogonal to the per-inode question.
> > > >
> > > > i.e. mount -o dax really can only mean "I want dax on everything" and so
> > > > again, I think we probably need to fail the mount if that can't be honored.
> > >
> > > I hate to even open up this can of worms, but what about killing the dax
> > > mount option?
> > >
> > > To quote Christoph:
> > >   How does an application "make use of DAX"?  What actual user visible
> > >   semantics are associated with a file that has this flag set?
> > >
> > > We're already talking about making caching decisions automatically, so
> > > does DAX even mean anything at that point?  If the storage and the file
> > > system support it, enable it.
> > >
> > > From what we've seen so far, aplications want:
> > > 1) to be able to make data persistent from userspace
> > >    For this, we have MAP_SYNC.
> > > 2) to determine whether or not page cache will be used
> > >    For this, we have O_DIRECT for read/write access, and MAP_SYNC for
> > >    mmap access (and maybe a third option coming, we'll see).
> >
> > As Jan has said, it's not safe to assume that 'no page cache' is
> > implied with MAP_SYNC. It's a side effect not a contract of the
> > current implementation.
>
> Even MAP_DIRECT shouldn't mean "no page cache". O_DIRECT is a hint,
> not a guarantee, and so it may very well use the page cache if it
> needs to (as I've just explained in detail in a different thread).
>
> > > The only thing users gain from a mount option is the ability to turn OFF
> > > dax.  I suppose there might be a use case that wants this, but I'm not
> > > aware of it.
> >
> > I think we're stuck with it as many scripts would break if it ever
> > went completely away. However, we could mark it deprecated / ignored
>
> I don't really care that much about this - it is still marked
> experimental.
>
> That said, deprecation is the best way forward here if we are going
> to remove the mount option. We've done this for other XFS mount
> options recently (e.g. barrier/nobarrier) where the functionality is
> now fully baked into the fileystem and there's no user option to
> control it anymore.
>
> What we really need is a document describing the expected behaviour
> of filesysetms on dax-capable storage. Let's nail down exactly what
> we need to do to pull DAX out of the experimental state before we
> start changing things. We've been doing things in a very ad-hoc way
> for a while now, and we're not really converging on an endpoint where we
> can say "we're done, have at it".
>
> I think we need to decide on:
>
> - default filesystem behaviour on dax-capable block devices
> - what information aout DAX do applications actually need? What
>   makes sense to provide them with that information?
> - how to provide hints to the kernel for desired behaviour
>   - on-disk inode flags, or something else?
>   - dax/nodax mount options or root dir inode flags become default
>     global hints?
>   - is a single hint flag sufficient or do we also need an
>     explicit "do not use dax" flag?
> - behaviour of MAP_SYNC w.r.t. non-DAX filesystems that can provide
>   required MAP_SYNC semnatics
> - behaviour of MAP_DIRECT - hint like O_DIRECT or guarantee?
> - default read/write path behaviour of dax-capable block devices
>   - automatically bypass the pagecache if bdev is capable?
> - default mmap behaviour on dax capable devices
>   - use dax always?
> - DAX vs get_user_pages_longterm
>   - turns off DAX dynamically?
>   - how do DAX-enabled filesystems interact with page fault capable
>     hardware? Can we allow DAX in those cases?
>
> I'm sure there's a heap more we need to document and nail down.
> There's a lot of stuff to sort out before we start hammering on
> random bits of code....

Nice, yes, I'll add some more:

- Is MADV_DIRECT_ACCESS a hint or a requirement?
- How does the kernel communicate the effective mode of a mapping
  taking into account madvise(), inode flags, mount options, and / or
  default fs behavior? New madvice() syscall?
- What is the behavior of dax in the presence of reflink'd extents?
  Just failing seems the 'experimental' behavior. What to do about
  page->index when page belongs to more than 1 file via reflink?
- Is there ever a case to force disable dax operation? To date we've
  only ever thought about interfaces to force *enable* dax operation
- The virtio-pmem use case wants dax mappings but requires an explicit
  fsync() instead of MAP_SYNC to flush software buffers, it's a DAX
  sub-set, should it have it's own name?
- DAX operation is loosely tied to block devices. There has been
  discussions of mounting filesystems on /dev/dax devices directly.
  Should we take that to its logical conclusion and support a
  block-layer-less conversion of dax-capable file systems?
- Willy has proposed that the Xarray cache file-offset-to-physical
  address lookups, currently it only tracks dirty mapping state
- The NVDIMM sub-system tracks badblocks, but the filesytem currently
  only finds out about them late when it attempts dax_direct_access().
  Applications want to be able to list files+offsets that have
  experienced media corruption.

> > provided we had a way for applications to query and override if DAX is
> > enabled. I also think it's important to keep separate the dax-mmap
> > behavior from the dax-read/write behavior. dax-mmap is where an
> > application would make different decisions if it can get a mapping
> > without page cache,
>
> The functionality people keep saying "requires DAX" really doesn't -
> what it really requires is that mmap() exposes filesystem tracked
> pmem in a CPU addressable memory range. DAX is not the only way to
> do that - a filesystem with a pmem-based persistent page cache can
> provide MAP_SYNC semantics to userspace without being a DAX
> filesystem.

*nod*