diff mbox

dm-raid: add RAID discard support

Message ID 20141002093403.25cc832f@notabene.brown (mailing list archive)
State Superseded, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

NeilBrown Oct. 1, 2014, 11:34 p.m. UTC
On Wed, 1 Oct 2014 09:32:37 -0400 Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Sep 30 2014 at 10:56pm -0400,
> NeilBrown <neilb@suse.de> wrote:
> 
> > On Wed, 24 Sep 2014 13:02:28 +0200 Heinz Mauelshagen <heinzm@redhat.com>
> > wrote:
> > 
> > > 
> > > Martin,
> > > 
> > > thanks for the good explanation of the state of the discard union.
> > > Do you have an ETA for the 'zeroout, deallocate' ... support you mentioned?
> > > 
> > > I was planning to have a followup patch for dm-raid supporting a dm-raid 
> > > table
> > > line argument to prohibit discard passdown.
> > > 
> > > In lieu of the fuzzy field situation wrt SSD fw and discard_zeroes_data 
> > > support
> > > related to RAID4/5/6, we need that in upstream together with the initial 
> > > patch.
> > > 
> > > That 'no_discard_passdown' table line can be added to dm-raid RAID4/5/6 
> > > table
> > > lines to avoid possible data corruption but can be avoided on RAID1/10 
> > > table lines,
> > > because the latter are not suffering from any  discard_zeroes_data flaw.
> > > 
> > > 
> > > Neil,
> > > 
> > > are you going to disable discards in RAID4/5/6 shortly
> > > or rather go with your bitmap solution?
> > 
> > Can I just close my eyes and hope it goes away?
> > 
> > The idea of a bitmap of uninitialised areas is not a short-term solution.
> > But I'm not really keen on simply disabling discard for RAID4/5/6 either. It
> > would  mean that people with good sensible hardware wouldn't be able to use
> > it properly.
> > 
> > I would really rather that discard_zeroes_data were only set on devices where
> > it was actually true.  Then it wouldn't be my problem any more.
> > 
> > Maybe I could do a loud warning
> >   "Not enabling DISCARD on RAID5 because we cannot trust committees.
> >    Set "md_mod.willing_to_risk_discard=Y" if your devices reads discarded
> >    sectors as zeros"
> > 
> > and add an appropriate module parameter......
> 
> I had the same thought and would be happy with this too.  I was going to
> update Heinz's patch to have the same default off but allow user to
> enable:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=8e0cff64f35971135a6de7907bbc8c3a010aff8f
> 
> But I'd love to just follow what you arrive at with MD (using the same
> name for the module param in dm-raid).
> 
> I'm open to getting this done now and included in 3.18 if you are.
> 
> Mike

How about something like this?
I want to keep it well away from regular API stuff as I hope it is just a
temporary hack until a more general solution can be found and implemented.

Thanks,
NeilBrown
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Mike Snitzer Oct. 2, 2014, 1:31 a.m. UTC | #1
Hi,

On Wed, Oct 01 2014 at  7:34pm -0400,
NeilBrown <neilb@suse.de> wrote:

> On Wed, 1 Oct 2014 09:32:37 -0400 Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > 
> > I had the same thought and would be happy with this too.  I was going to
> > update Heinz's patch to have the same default off but allow user to
> > enable:
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=8e0cff64f35971135a6de7907bbc8c3a010aff8f
> > 
> > But I'd love to just follow what you arrive at with MD (using the same
> > name for the module param in dm-raid).
> > 
> > I'm open to getting this done now and included in 3.18 if you are.
> > 
> > Mike
> 
> How about something like this?
> I want to keep it well away from regular API stuff as I hope it is just a
> temporary hack until a more general solution can be found and implemented.
> 
> Thanks,
> NeilBrown
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 183588b11fc1..3ed668c5378c 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -64,6 +64,10 @@
>  #define cpu_to_group(cpu) cpu_to_node(cpu)
>  #define ANY_GROUP NUMA_NO_NODE
>  
> +static bool devices_handle_discard_safely = false;
> +module_param(devices_handle_discard_safely, bool, false);
> +MODULE_PARM_DESC(devices_handle_discard_safely,
> +		 "Set to Y if all devices in array reliably return zeroes on reads from discarded regions");
>  static struct workqueue_struct *raid5_wq;
>  /*
>   * Stripe cache
> @@ -6208,7 +6212,7 @@ static int run(struct mddev *mddev)
>  		mddev->queue->limits.discard_granularity = stripe;
>  		/*
>  		 * unaligned part of discard request will be ignored, so can't
> -		 * guarantee discard_zerors_data
> +		 * guarantee discard_zeroes_data
>  		 */
>  		mddev->queue->limits.discard_zeroes_data = 0;
>  
> @@ -6233,6 +6237,18 @@ static int run(struct mddev *mddev)
>  			    !bdev_get_queue(rdev->bdev)->
>  						limits.discard_zeroes_data)
>  				discard_supported = false;
> +			/* Unfortunately, discard_zeroes_data is not currently
> +			 * a guarantee - just a hint.  So we only allow DISCARD
> +			 * if the sysadmin has confirmed that only safe devices
> +			 * are in use but setting a module parameter.
> +			 */
> +			if (!devices_handle_discard_safely) {
> +				if (discard_supported) {
> +					pr_info("md/raid456: discard support disabled due to uncertainty.\n");
> +					pr_info("Set raid456.devices_handle_discard_safely=Y to override.\n");
> +				}
> +				discard_supported = false;
> +			}
>  		}
>  
>  		if (discard_supported &&


There is a typo in the new block comment above: "are in use but setting
a module parameter".  s/but/by/

But thinking further: should this be a per array override?  E.g. for DM
this could easily be a dm-raid table parameter.  But I know the MD
implementation would likely be more cumbersome (superblock update?).

Though given the (hopefully) temporary nature of this, maybe it isn't
worth it for MD.  Maybe be a bit more precise in the MODULE_PARM_DESC
with: 
"Set to Y if all devices in each array reliably returns zeroes on reads
from discarded regions" ?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
NeilBrown Oct. 2, 2014, 2 a.m. UTC | #2
On Wed, 1 Oct 2014 21:31:36 -0400 Mike Snitzer <snitzer@redhat.com> wrote:

> Hi,
> 
> On Wed, Oct 01 2014 at  7:34pm -0400,
> NeilBrown <neilb@suse.de> wrote:
> 
> > On Wed, 1 Oct 2014 09:32:37 -0400 Mike Snitzer <snitzer@redhat.com> wrote:
> > 
> > > 
> > > I had the same thought and would be happy with this too.  I was going to
> > > update Heinz's patch to have the same default off but allow user to
> > > enable:
> > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=8e0cff64f35971135a6de7907bbc8c3a010aff8f
> > > 
> > > But I'd love to just follow what you arrive at with MD (using the same
> > > name for the module param in dm-raid).
> > > 
> > > I'm open to getting this done now and included in 3.18 if you are.
> > > 
> > > Mike
> > 
> > How about something like this?
> > I want to keep it well away from regular API stuff as I hope it is just a
> > temporary hack until a more general solution can be found and implemented.
> > 
> > Thanks,
> > NeilBrown
> > 
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index 183588b11fc1..3ed668c5378c 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -64,6 +64,10 @@
> >  #define cpu_to_group(cpu) cpu_to_node(cpu)
> >  #define ANY_GROUP NUMA_NO_NODE
> >  
> > +static bool devices_handle_discard_safely = false;
> > +module_param(devices_handle_discard_safely, bool, false);
> > +MODULE_PARM_DESC(devices_handle_discard_safely,
> > +		 "Set to Y if all devices in array reliably return zeroes on reads from discarded regions");
> >  static struct workqueue_struct *raid5_wq;
> >  /*
> >   * Stripe cache
> > @@ -6208,7 +6212,7 @@ static int run(struct mddev *mddev)
> >  		mddev->queue->limits.discard_granularity = stripe;
> >  		/*
> >  		 * unaligned part of discard request will be ignored, so can't
> > -		 * guarantee discard_zerors_data
> > +		 * guarantee discard_zeroes_data
> >  		 */
> >  		mddev->queue->limits.discard_zeroes_data = 0;
> >  
> > @@ -6233,6 +6237,18 @@ static int run(struct mddev *mddev)
> >  			    !bdev_get_queue(rdev->bdev)->
> >  						limits.discard_zeroes_data)
> >  				discard_supported = false;
> > +			/* Unfortunately, discard_zeroes_data is not currently
> > +			 * a guarantee - just a hint.  So we only allow DISCARD
> > +			 * if the sysadmin has confirmed that only safe devices
> > +			 * are in use but setting a module parameter.
> > +			 */
> > +			if (!devices_handle_discard_safely) {
> > +				if (discard_supported) {
> > +					pr_info("md/raid456: discard support disabled due to uncertainty.\n");
> > +					pr_info("Set raid456.devices_handle_discard_safely=Y to override.\n");
> > +				}
> > +				discard_supported = false;
> > +			}
> >  		}
> >  
> >  		if (discard_supported &&
> 
> 
> There is a typo in the new block comment above: "are in use but setting
> a module parameter".  s/but/by/

Fixed, thanks.


> 
> But thinking further: should this be a per array override?  E.g. for DM
> this could easily be a dm-raid table parameter.  But I know the MD
> implementation would likely be more cumbersome (superblock update?).

If you want to use discard selectively on some arrays, you can mount
filesystems with appropriate options, or be careful in your use of 'fstrim'.

I see the module option as something to force people to think or ask before
"blindly" using DISCARD.  I don't see it is a configuration tool.

> 
> Though given the (hopefully) temporary nature of this, maybe it isn't
> worth it for MD.  Maybe be a bit more precise in the MODULE_PARM_DESC
> with: 
> "Set to Y if all devices in each array reliably returns zeroes on reads
> from discarded regions" ?

I added the 'each'.  I don't agree with the 's' on 'returns' :-(

Thanks,
NeilBrown
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 183588b11fc1..3ed668c5378c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -64,6 +64,10 @@ 
 #define cpu_to_group(cpu) cpu_to_node(cpu)
 #define ANY_GROUP NUMA_NO_NODE
 
+static bool devices_handle_discard_safely = false;
+module_param(devices_handle_discard_safely, bool, false);
+MODULE_PARM_DESC(devices_handle_discard_safely,
+		 "Set to Y if all devices in array reliably return zeroes on reads from discarded regions");
 static struct workqueue_struct *raid5_wq;
 /*
  * Stripe cache
@@ -6208,7 +6212,7 @@  static int run(struct mddev *mddev)
 		mddev->queue->limits.discard_granularity = stripe;
 		/*
 		 * unaligned part of discard request will be ignored, so can't
-		 * guarantee discard_zerors_data
+		 * guarantee discard_zeroes_data
 		 */
 		mddev->queue->limits.discard_zeroes_data = 0;
 
@@ -6233,6 +6237,18 @@  static int run(struct mddev *mddev)
 			    !bdev_get_queue(rdev->bdev)->
 						limits.discard_zeroes_data)
 				discard_supported = false;
+			/* Unfortunately, discard_zeroes_data is not currently
+			 * a guarantee - just a hint.  So we only allow DISCARD
+			 * if the sysadmin has confirmed that only safe devices
+			 * are in use but setting a module parameter.
+			 */
+			if (!devices_handle_discard_safely) {
+				if (discard_supported) {
+					pr_info("md/raid456: discard support disabled due to uncertainty.\n");
+					pr_info("Set raid456.devices_handle_discard_safely=Y to override.\n");
+				}
+				discard_supported = false;
+			}
 		}
 
 		if (discard_supported &&