Message ID | 20240528143305.18374-1-mariusz.tkaczyk@linux.intel.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [RFC] mdadm: add --fast-initialize | expand |
Hi Mariusz The discard can't promise to write zero to nvme disks, right? If so, we can't use it for resync, because it can't make sure the raid is in sync state. Best Regards Xiao On Tue, May 28, 2024 at 10:33 PM Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> wrote: > > This is not complete change but I would like to get the feedback on > concept proposed. There are few features for optimized space zeroing. > We already support --write-zeroes but Intel would like to add support of > deallocate command (discard) in the future. There is also Sata trim > which could be potentially used. > > The goal of this RFC is to get feedback about proposing one option to > check for few features which can be used for performing smarter > initialization instead of resync. With that, user may just type > --fast-initialize and mdadm will determine what can be used, else abort. > > This won't be merged. > > Cc: Logan Gunthorpe <logang@deltatee.com> > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > > --- > mdadm.8.in | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/mdadm.8.in b/mdadm.8.in > index aa0c540399f6..be592d70ac9b 100644 > --- a/mdadm.8.in > +++ b/mdadm.8.in > @@ -849,6 +849,17 @@ each disk is zeroed in parallel with the others. > .IP > This is only meaningful with --create. > > +.TP > +.BR \-\-fast-initialize > +When creating an array, check disks for optional features to perform optimized initialization > +instead of resync. These features are: NVMe's write-zeros or deallocate and Sata trims. If there is > +feature supported by all drives, it is executed, otherwise error is returned. This option invokes > +.B \-\-assume\-clean > +.This is intended for use with devices that have hardware offload for zeroing, but despite this > +zeroing can still take several minutes for large disks to complete. > +.IP > +This is only meaningful with --create. > + > .TP > .BR \-\-backup\-file= > This is needed when > -- > 2.35.3 > >
On 2024-06-04 06:46, Xiao Ni wrote: > Hi Mariusz > > The discard can't promise to write zero to nvme disks, right? If so, > we can't use it for resync, because it can't make sure the raid is in > sync state. Yes, discard requests are a best effort and the drive is free to ignore some or all of the request. See [1] for more information from Martin Peterson. I think if we have a device that has a fast zero operation that we know guarantees zeroing then the kernel's write-zeros operation should be changed to use it. We shouldn't make fast-but-dangerous options in mdadm. Thanks, Logan [1] https://lore.kernel.org/all/yq1fsgwbijv.fsf@ca-mkp.ca.oracle.com/T/#u
On Tue, 4 Jun 2024 10:19:59 -0600 Logan Gunthorpe <logang@deltatee.com> wrote: > On 2024-06-04 06:46, Xiao Ni wrote: > > Hi Mariusz > > > > The discard can't promise to write zero to nvme disks, right? If so, > > we can't use it for resync, because it can't make sure the raid is in > > sync state. > > Yes, discard requests are a best effort and the drive is free to ignore > some or all of the request. See [1] for more information from Martin > Peterson. > > I think if we have a device that has a fast zero operation that we know > guarantees zeroing then the kernel's write-zeros operation should be > changed to use it. We shouldn't make fast-but-dangerous options in mdadm. > > Thanks, > > Logan > > > [1] https://lore.kernel.org/all/yq1fsgwbijv.fsf@ca-mkp.ca.oracle.com/T/#u Thanks for giving the valuable feedback. I'm not directly involved in technical details about this implementation and in fact I didn't read the previous discussion yet. You pointed great problem and I will make sure that it is addressed. I asked about mdadm API, it is despite the technical implementation. I would like to propose one command to integrate existing way (--write-zeroes) and potentially new way (if any other fast-initialization capability would be safe to add). Do you see it as right approach or we should keep them separately? Mariusz
diff --git a/mdadm.8.in b/mdadm.8.in index aa0c540399f6..be592d70ac9b 100644 --- a/mdadm.8.in +++ b/mdadm.8.in @@ -849,6 +849,17 @@ each disk is zeroed in parallel with the others. .IP This is only meaningful with --create. +.TP +.BR \-\-fast-initialize +When creating an array, check disks for optional features to perform optimized initialization +instead of resync. These features are: NVMe's write-zeros or deallocate and Sata trims. If there is +feature supported by all drives, it is executed, otherwise error is returned. This option invokes +.B \-\-assume\-clean +.This is intended for use with devices that have hardware offload for zeroing, but despite this +zeroing can still take several minutes for large disks to complete. +.IP +This is only meaningful with --create. + .TP .BR \-\-backup\-file= This is needed when
This is not complete change but I would like to get the feedback on concept proposed. There are few features for optimized space zeroing. We already support --write-zeroes but Intel would like to add support of deallocate command (discard) in the future. There is also Sata trim which could be potentially used. The goal of this RFC is to get feedback about proposing one option to check for few features which can be used for performing smarter initialization instead of resync. With that, user may just type --fast-initialize and mdadm will determine what can be used, else abort. This won't be merged. Cc: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> --- mdadm.8.in | 11 +++++++++++ 1 file changed, 11 insertions(+)