diff mbox series

[RFC] mdadm: add --fast-initialize

Message ID 20240528143305.18374-1-mariusz.tkaczyk@linux.intel.com (mailing list archive)
State Not Applicable
Headers show
Series [RFC] mdadm: add --fast-initialize | expand

Commit Message

Mariusz Tkaczyk May 28, 2024, 2:33 p.m. UTC
This is not complete change but I would like to get the feedback on
concept proposed. There are few features for optimized space zeroing.
We already support --write-zeroes but Intel would like to add support of
deallocate command (discard) in the future. There is also Sata trim
which could be potentially used.

The goal of this RFC is to get feedback about proposing one option to
check for few features which can be used for performing smarter
initialization instead of resync. With that, user may just type
--fast-initialize and mdadm will determine what can be used, else abort.

This won't be merged.

Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>

---
 mdadm.8.in | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Xiao Ni June 4, 2024, 12:46 p.m. UTC | #1
Hi Mariusz

The discard can't promise to write zero to nvme disks, right? If so,
we can't use it for resync, because it can't make sure the raid is in
sync state.

Best Regards
Xiao

On Tue, May 28, 2024 at 10:33 PM Mariusz Tkaczyk
<mariusz.tkaczyk@linux.intel.com> wrote:
>
> This is not complete change but I would like to get the feedback on
> concept proposed. There are few features for optimized space zeroing.
> We already support --write-zeroes but Intel would like to add support of
> deallocate command (discard) in the future. There is also Sata trim
> which could be potentially used.
>
> The goal of this RFC is to get feedback about proposing one option to
> check for few features which can be used for performing smarter
> initialization instead of resync. With that, user may just type
> --fast-initialize and mdadm will determine what can be used, else abort.
>
> This won't be merged.
>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
>
> ---
>  mdadm.8.in | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/mdadm.8.in b/mdadm.8.in
> index aa0c540399f6..be592d70ac9b 100644
> --- a/mdadm.8.in
> +++ b/mdadm.8.in
> @@ -849,6 +849,17 @@ each disk is zeroed in parallel with the others.
>  .IP
>  This is only meaningful with --create.
>
> +.TP
> +.BR \-\-fast-initialize
> +When creating an array, check disks for optional features to perform optimized initialization
> +instead of resync. These features are: NVMe's write-zeros or deallocate and Sata trims. If there is
> +feature supported by all drives, it is executed, otherwise error is returned. This option invokes
> +.B \-\-assume\-clean
> +.This is intended for use with devices that have hardware offload for zeroing, but despite this
> +zeroing can still take several minutes for large disks to complete.
> +.IP
> +This is only meaningful with --create.
> +
>  .TP
>  .BR \-\-backup\-file=
>  This is needed when
> --
> 2.35.3
>
>
Logan Gunthorpe June 4, 2024, 4:19 p.m. UTC | #2
On 2024-06-04 06:46, Xiao Ni wrote:
> Hi Mariusz
> 
> The discard can't promise to write zero to nvme disks, right? If so,
> we can't use it for resync, because it can't make sure the raid is in
> sync state.

Yes, discard requests are a best effort and the drive is free to ignore
some or all of the request. See [1] for more information from Martin
Peterson.

I think if we have a device that has a fast zero operation that we know
guarantees zeroing then the kernel's write-zeros operation should be
changed to use it. We shouldn't make fast-but-dangerous options in mdadm.

Thanks,

Logan


[1] https://lore.kernel.org/all/yq1fsgwbijv.fsf@ca-mkp.ca.oracle.com/T/#u
Mariusz Tkaczyk June 10, 2024, 8:57 a.m. UTC | #3
On Tue, 4 Jun 2024 10:19:59 -0600
Logan Gunthorpe <logang@deltatee.com> wrote:

> On 2024-06-04 06:46, Xiao Ni wrote:
> > Hi Mariusz
> > 
> > The discard can't promise to write zero to nvme disks, right? If so,
> > we can't use it for resync, because it can't make sure the raid is in
> > sync state.  
> 
> Yes, discard requests are a best effort and the drive is free to ignore
> some or all of the request. See [1] for more information from Martin
> Peterson.
> 
> I think if we have a device that has a fast zero operation that we know
> guarantees zeroing then the kernel's write-zeros operation should be
> changed to use it. We shouldn't make fast-but-dangerous options in mdadm.
> 
> Thanks,
> 
> Logan
> 
> 
> [1] https://lore.kernel.org/all/yq1fsgwbijv.fsf@ca-mkp.ca.oracle.com/T/#u

Thanks for giving the valuable feedback. I'm not directly involved in technical
details about this implementation and in fact I didn't read the previous
discussion yet. You pointed great problem and I will make sure that it is
addressed.

I asked about mdadm API, it is despite the technical implementation.
I would like to propose one command to integrate existing way (--write-zeroes)
and potentially new way (if any other fast-initialization capability would be
safe to add).

Do you see it as right approach or we should keep them separately?

Mariusz
diff mbox series

Patch

diff --git a/mdadm.8.in b/mdadm.8.in
index aa0c540399f6..be592d70ac9b 100644
--- a/mdadm.8.in
+++ b/mdadm.8.in
@@ -849,6 +849,17 @@  each disk is zeroed in parallel with the others.
 .IP
 This is only meaningful with --create.
 
+.TP
+.BR \-\-fast-initialize
+When creating an array, check disks for optional features to perform optimized initialization
+instead of resync. These features are: NVMe's write-zeros or deallocate and Sata trims. If there is
+feature supported by all drives, it is executed, otherwise error is returned. This option invokes
+.B \-\-assume\-clean
+.This is intended for use with devices that have hardware offload for zeroing, but despite this
+zeroing can still take several minutes for large disks to complete.
+.IP
+This is only meaningful with --create.
+
 .TP
 .BR \-\-backup\-file=
 This is needed when