Message ID | 20200824203724.13477-1-ailiop@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | xfsprogs: blockdev dax detection and warnings | expand |
On Mon, Aug 24, 2020 at 10:37:18PM +0200, Anthony Iliopoulos wrote: > Hi all, > > This short series adds blockdev dax capability detection via libblkid, > and subsequently uses this bit to warn on a couple of incompatible > configurations during mkfs. > > The first patch adds the detection capability to libtopology, and the > following two patches add mkfs warnings that are issued when the fs > block size is not matching the page size, and when reflink is being > enabled in conjunction with dax. This makes the assumption that anyone running mkfs on a dax capable device is going to use DAX, and prevents mkfs from running if the config is not DAX compatible. The issue here is that making a filesystem that is not DAX compatible on a DAX capable device is not a fatal error. The filesystem will work just fine using buffered and direct IO, and there are definitely workloads where we want to use buffered IO on pmem and not DAX. Why? Because existing pmem is terribly slow for write intensive applications compared to page cache based mmap(). And even buffered writes are faster than DAX direct writes because the slow writeback is done in the background via async writeback. Also, what happens if you have a 64kB page size? mkfs defaults to 4kB block size, so with these changes mkfs will refuse to run on a dax capable device unless the user specifically directs it to do something different. That's not a good behaviour for the default config to have.... Hence I don't think that preventing mkfs from running unless the config is exactly waht DAX requires or the "force" option is set is the right policy here. I agree that mkfs needs to be aware of DAX capability of the block device, but that capability existing should not cause mkfs to fail. If we want users to be able to direct mkfs to to create a DAX capable filesystem then adding a -d dax option would be a better idea. This would direct mkfs to align/size all the data options to use a DAX compatible topology if blkid supports reporting the DAX topology. It would also do things like turn off reflink (until that is supported w/ DAX), etc. i.e. if the user knows they are going to use DAX (and they will) then they can tell mkfs to make a DAX compatible filesystem. > The next patch adds a new cli option that can be used to override > warnings, and converts all warnings that can be forced to this option > instead the overloaded -f option. This avoids cases where forcing a > warning may also be implicitly forcing overwriting an existing > partition. I don't want both an "ignore warnings" and a "force" CLI option. They both do the same thing - allow the user to override things that are potentially destructive or result in an unusable config - so why should we add the complexity of having a different "force" options for every different possible thing that can be overridden? Then we'll grow a "--force-all" option to override everything. i.e. we end up with this insanity: $ dpkg --help |grep force --force-help Show help on forcing. --[no-]triggers Skip or force consequential trigger processing. --force-<thing>[,...] Override problems (see --force-help). --no-force-<thing>[,...] Stop when problems encountered. and "--force-help" gives you a list of about 30 different things you can force the behaviour of, including "--force-all". This is simply not a good UI model... > The last two patches are small cleanups that remove redundant code. They should be first. Cheers, Dave.
On Tue, Aug 25, 2020 at 08:55:33AM +1000, Dave Chinner wrote: > On Mon, Aug 24, 2020 at 10:37:18PM +0200, Anthony Iliopoulos wrote: > > Hi all, > > > > This short series adds blockdev dax capability detection via libblkid, > > and subsequently uses this bit to warn on a couple of incompatible > > configurations during mkfs. > > > > The first patch adds the detection capability to libtopology, and the > > following two patches add mkfs warnings that are issued when the fs > > block size is not matching the page size, and when reflink is being > > enabled in conjunction with dax. > > This makes the assumption that anyone running mkfs on a dax capable > device is going to use DAX, and prevents mkfs from running if the > config is not DAX compatible. > > The issue here is that making a filesystem that is not DAX > compatible on a DAX capable device is not a fatal error. The > filesystem will work just fine using buffered and direct IO, and > there are definitely workloads where we want to use buffered IO on > pmem and not DAX. Why? Because existing pmem is terribly slow for > write intensive applications compared to page cache based mmap(). > And even buffered writes are faster than DAX direct writes because > the slow writeback is done in the background via async writeback. > > Also, what happens if you have a 64kB page size? mkfs defaults to > 4kB block size, so with these changes mkfs will refuse to run on a > dax capable device unless the user specifically directs it to do > something different. That's not a good behaviour for the default > config to have.... > > Hence I don't think that preventing mkfs from running unless the config > is exactly waht DAX requires or the "force" option is set is the > right policy here. > > I agree that mkfs needs to be aware of DAX capability of the block > device, but that capability existing should not cause mkfs to fail. > If we want users to be able to direct mkfs to to create a DAX > capable filesystem then adding a -d dax option would be a better > idea. This would direct mkfs to align/size all the data options to > use a DAX compatible topology if blkid supports reporting the DAX > topology. It would also do things like turn off reflink (until that > is supported w/ DAX), etc. I do like the idea of adding an explicit dax option, but I'm not sure what the right policy would be: 1. -d dax option specified, set dax-compatible parameters irrespective of dax capability (blkid detection not strictly required) 2. -d dax option specified, set dax-compatible parameters only if blockdev is dax capable; fallback to default params otherwise 3. autodetect dax capability and automatically set dax-compatible params, irrespective of if the option is specified or not (potentially also needs an explicit override option e.g. -d dax=0). I'd be inclined to go with the second option which should lead to the least amount of surprises for users. Still, this doesn't address the exact thing I was trying to do (see below). > i.e. if the user knows they are going to use DAX (and they will) > then they can tell mkfs to make a DAX compatible filesystem. So I've been trying to prevent cases where users create a filesystem with the default params, go on to populate it, and at some later point of time find themselves wanting to mount it with dax only to realize that this is not possible without mkfs (most commonly due to the mismatch of the 64kB page size on ppc64). We could potentially just issue the warning and not force mkfs to bail out, but I'm afraid that warnings aren't very discernible and are easily missed. I do agree though that requiring an override may not be the best model here. Would it make sense to simply emit the warnings and drop the bail-out and override logic altogether? Alternatively the third option above (autodetection), would take care of those cases at the expense of overriding otherwise potentially desirable options (e.g. switching off reflink), which will come as a surprise to users that don't intent to use dax. I don't think this would be a good default policy. > > The next patch adds a new cli option that can be used to override > > warnings, and converts all warnings that can be forced to this option > > instead the overloaded -f option. This avoids cases where forcing a > > warning may also be implicitly forcing overwriting an existing > > partition. > > I don't want both an "ignore warnings" and a "force" CLI option. > They both do the same thing - allow the user to override things that > are potentially destructive or result in an unusable config - so why > should we add the complexity of having a different "force" options > for every different possible thing that can be overridden? The rationale here was to only make a distinction between destructive and (conditionally) unusable, otherwise we would indeed need an override toggle per warning - which I totally agree is an overkill. If implicitly suppressing the destructive operation confirmation isn't a concern, then I'd definitely drop this patch. Thanks for the feedback! Anthony
On 8/24/20 5:55 PM, Dave Chinner wrote: > I agree that mkfs needs to be aware of DAX capability of the block > device, but that capability existing should not cause mkfs to fail. > If we want users to be able to direct mkfs to to create a DAX > capable filesystem then adding a -d dax option would be a better > idea. This would direct mkfs to align/size all the data options to > use a DAX compatible topology if blkid supports reporting the DAX > topology. It would also do things like turn off reflink (until that > is supported w/ DAX), etc. > > i.e. if the user knows they are going to use DAX (and they will) > then they can tell mkfs to make a DAX compatible filesystem. FWIW, Darrick /just/ added a -d daxinherit option, though all it does now is set the inheritable dax flag on the root dir, it doesn't enforce things like page vs block size, etc. That change is currently staged in my local tree. I suppose we could condition that on other requirements, although we've always had the ability to mkfs a filesystem that can't necessarily be used on the current machine - i.e. you can make a 64k block size filesystem on a 4k page machine, etc. So I'm not sure we want to tie mkfs abilities to the current mkfs environment.... Still, I wonder if I should hold off on "-d daxinherit" patch until we have thought through things like reflink conflicts, for now. (though again, mkfs is "perfectly capapable" of making a consistent reflink+dax filesystem, it's just that no kernel can mount it today...) -Eric
On Tue, Aug 25, 2020 at 08:59:39AM -0500, Eric Sandeen wrote: > On 8/24/20 5:55 PM, Dave Chinner wrote: > > I agree that mkfs needs to be aware of DAX capability of the block > > device, but that capability existing should not cause mkfs to fail. > > If we want users to be able to direct mkfs to to create a DAX > > capable filesystem then adding a -d dax option would be a better > > idea. This would direct mkfs to align/size all the data options to > > use a DAX compatible topology if blkid supports reporting the DAX > > topology. It would also do things like turn off reflink (until that > > is supported w/ DAX), etc. > > > > i.e. if the user knows they are going to use DAX (and they will) > > then they can tell mkfs to make a DAX compatible filesystem. > > FWIW, Darrick /just/ added a -d daxinherit option, though all it does > now is set the inheritable dax flag on the root dir, it doesn't enforce > things like page vs block size, etc. > > That change is currently staged in my local tree. > > I suppose we could condition that on other requirements, although we've > always had the ability to mkfs a filesystem that can't necessarily be > used on the current machine - i.e. you can make a 64k block size filesystem > on a 4k page machine, etc. So I'm not sure we want to tie mkfs abilities > to the current mkfs environment.... > > Still, I wonder if I should hold off on "-d daxinherit" patch until we > have thought through things like reflink conflicts, for now. > > (though again, mkfs is "perfectly capapable" of making a consistent > reflink+dax filesystem, it's just that no kernel can mount it today...) No, please don't layer additional meanings onto daxinherit=1. I actually /do/ want to have a -d dax=1 option for "set up this filesystem for DAX" that will configure the geometry for that device to play nicely with the things that (some) DAX users want. IOWs, you say "-d dax=1" and that means that mkfs sniffs out the DAXiness of the underlying device and the PMD size. Then it turns off reflink by default, sets the daxinherit=1 hint, and configures the extent size and su/sw hints to match the PMD size. Or, you say "-r dax=1" for the realtime device, and now it sets the allocation unit to the PMD size for people running huge databases and want only huge pages to back their table data<cough>. Zooming out a bit, maybe we should instead introduce a new "tuning" parameter for -d and -r so that administrators could tune the filesystem for specific purposes: -d tune=dax: Reject if device not dax, set daxinherit=1, set extsize/su/sw to match PMD -d tune=ssd: Set agcount to match the number of CPUs if possible, make the log larger to support a large number of threads and iops. -d tune=rotational: Probably does nothing. ;) -d tune=auto: Query blkid to guess which of the above three profiles we should use. -d tune=none: No tuning. And then you'd do the same for the realtime device. This would help us get rid of the seeeekret mkfs wrapper that we use to make it easier for our internal customers to use DAX since mkfs.xfs doesn't support config files. --D > -Eric
On Tue, Aug 25, 2020 at 08:59:39AM -0500, Eric Sandeen wrote: > On 8/24/20 5:55 PM, Dave Chinner wrote: > > I agree that mkfs needs to be aware of DAX capability of the block > > device, but that capability existing should not cause mkfs to fail. > > If we want users to be able to direct mkfs to to create a DAX > > capable filesystem then adding a -d dax option would be a better > > idea. This would direct mkfs to align/size all the data options to > > use a DAX compatible topology if blkid supports reporting the DAX > > topology. It would also do things like turn off reflink (until that > > is supported w/ DAX), etc. > > > > i.e. if the user knows they are going to use DAX (and they will) > > then they can tell mkfs to make a DAX compatible filesystem. > > FWIW, Darrick /just/ added a -d daxinherit option, though all it does > now is set the inheritable dax flag on the root dir, it doesn't enforce > things like page vs block size, etc. I am aware of that patch, but I considered the option to be somewhat orthogonal, given that FS_XFLAG_DAX can be set (and inherited) irrespective of dax support in the block device (and overridden via mount opts if need be), so I didn't want to overload daxinherit. > That change is currently staged in my local tree. > > I suppose we could condition that on other requirements, although we've > always had the ability to mkfs a filesystem that can't necessarily be > used on the current machine - i.e. you can make a 64k block size filesystem > on a 4k page machine, etc. So I'm not sure we want to tie mkfs abilities > to the current mkfs environment.... Agreed, so I suppose any dax option should be an opt-in, e.g. similar to the -d dax=1 proposal. That won't prevent users from neglecting it and creating a fs which will be later incompatible with -o dax, but that's a different story I guess.. - Anthony
On 8/25/20 10:09 AM, Anthony Iliopoulos wrote: > On Tue, Aug 25, 2020 at 08:59:39AM -0500, Eric Sandeen wrote: >> On 8/24/20 5:55 PM, Dave Chinner wrote: >>> I agree that mkfs needs to be aware of DAX capability of the block >>> device, but that capability existing should not cause mkfs to fail. >>> If we want users to be able to direct mkfs to to create a DAX >>> capable filesystem then adding a -d dax option would be a better >>> idea. This would direct mkfs to align/size all the data options to >>> use a DAX compatible topology if blkid supports reporting the DAX >>> topology. It would also do things like turn off reflink (until that >>> is supported w/ DAX), etc. >>> >>> i.e. if the user knows they are going to use DAX (and they will) >>> then they can tell mkfs to make a DAX compatible filesystem. >> >> FWIW, Darrick /just/ added a -d daxinherit option, though all it does >> now is set the inheritable dax flag on the root dir, it doesn't enforce >> things like page vs block size, etc. > > I am aware of that patch, but I considered the option to be somewhat > orthogonal, given that FS_XFLAG_DAX can be set (and inherited) > irrespective of dax support in the block device (and overridden via > mount opts if need be), so I didn't want to overload daxinherit. OK fair enough. >> That change is currently staged in my local tree. >> >> I suppose we could condition that on other requirements, although we've >> always had the ability to mkfs a filesystem that can't necessarily be >> used on the current machine - i.e. you can make a 64k block size filesystem >> on a 4k page machine, etc. So I'm not sure we want to tie mkfs abilities >> to the current mkfs environment.... > > Agreed, so I suppose any dax option should be an opt-in, e.g. similar to > the -d dax=1 proposal. That won't prevent users from neglecting it and > creating a fs which will be later incompatible with -o dax, but that's a > different story I guess.. I guess my overarching desire here is to not try to predict too much, or to make predictions that won't stand the test of time. Inferring what the admin wants can be tricky. :) I also want to be consistent; i.e. today we can mkfs a 64k block filesystem on a 4k block machine, and no warning is emitted. If we mkfs a V5/CRC filesystem or a reflink filesystem on an old kernel that doesn't support it, no warning is emitted. If we're going to warn every time when "this filesystem can't be used under the currently running kernel" then there are quite a lot of cases to handle... I don't want to warn about some things and not others, so we need to decide if this is actually something we want to do in general, not just for dax. I'm somewhat disinclined, TBH, and would rather rely on a mount failure to alert the admin to the problem. (It's not like there are a lot of resources invested between mkfs & mount). I'll reread this thread & Dave's responses and give this some more thought. Thanks, -Eric > - Anthony >
On Tue, Aug 25, 2020 at 07:40:15AM -0700, Darrick J. Wong wrote: > Zooming out a bit, maybe we should instead introduce a new "tuning" > parameter for -d and -r so that administrators could tune the filesystem > for specific purposes: > > -d tune=dax: Reject if device not dax, set daxinherit=1, set > extsize/su/sw to match PMD > > -d tune=ssd: Set agcount to match the number of CPUs if > possible, make the log larger to support a large number of > threads and iops. > > -d tune=rotational: Probably does nothing. ;) > > -d tune=auto: Query blkid to guess which of the above three > profiles we should use. > > -d tune=none: No tuning. > > And then you'd do the same for the realtime device. Please, no. The problem with this is that a specific "tune" will need to vary over time (e.g. when reflink is supported with DAX) and so now we back to the same situation where the definition of "tune=dax" changes depending on what version of mkfs you use. Hence you can still make a filesystem that the kernel won't mount because you have a xfsprogs that supports DAX+reflink and a kernel that doesn't. I just don't see this as a viable way to produce filesystems that work for specific situations because it doesn't solve the kernel vs xfsprogs version support issue that requires tweaking mkfs parameters manually to avoid... > This would help us get rid of the seeeekret mkfs wrapper that we use to > make it easier for our internal customers to use DAX since mkfs.xfs > doesn't support config files. Let's fix that, then. I've written a bunch of stuff in the past couple of years that uses basic ini config files via a simple library and it just works. If people are happy with ini format config files via a library, then I'll just go do that, eh? Cheers, Dave.