diff mbox

[v5,2/8] md: Neither resync nor reshape while the system is frozen

Message ID 20171002225218.18548-3-bart.vanassche@wdc.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche Oct. 2, 2017, 10:52 p.m. UTC
Some people use the md driver on laptops and use the suspend and
resume functionality. Since it is essential that submitting of
new I/O requests stops before a hibernation image is created,
interrupt the md resync and reshape actions if the system is
being frozen. Note: the resync and reshape will restart after
the system is resumed and a message similar to the following
will appear in the system log:

md: md0: data-check interrupted.

Additionally, only allow resync or reshape activity to be
started after system freezing is over.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/md/md.c | 37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

Comments

Luis Chamberlain Oct. 4, 2017, 1:16 a.m. UTC | #1
On Mon, Oct 02, 2017 at 03:52:12PM -0700, Bart Van Assche wrote:
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3f7426120a3b..a2cf2a93b0cb 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8961,6 +8963,37 @@ static void md_stop_all_writes(void)
>  		mdelay(1000*1);
>  }
>  
> +/*
> + * Ensure that neither resyncing nor reshaping occurs while the system is
> + * frozen.
> + */
> +static int md_notify_pm(struct notifier_block *bl, unsigned long state,
> +			void *unused)
> +{
> +	struct mddev *mddev;
> +	struct list_head *tmp;
> +
> +	pr_debug("%s: state = %ld; system_freezing_cnt = %d\n", __func__, state,
> +		 atomic_read(&system_freezing_cnt));
> +
> +	switch (state) {
> +	case PM_HIBERNATION_PREPARE:

Hm, why not also include and use this for PM_SUSPEND_PREPARE and/or
a PM_RESTORE_PREPARE.

        case PM_HIBERNATION_PREPARE:
        case PM_SUSPEND_PREPARE:
        case PM_RESTORE_PREPARE:

> +		md_stop_all_writes();
> +		break;
> +	case PM_POST_HIBERNATION:

Likewise here:

        case PM_POST_SUSPEND:
        case PM_POST_HIBERNATION:
        case PM_POST_RESTORE:

I have revised the kernel suspend ordering and indeed things issued
with the pm notifier should suffice to be processed given we actually
call the pm ops (dpm_prepare()) for device drivers *after* the notifier
is called and then after userspace is frozen. That is:

   pm_suspend() --> enter_state() -->
     sys_sync()
     suspend_prepare() -->
             __pm_notifier_call_chain(PM_SUSPEND_PREPARE, ...);
             suspend_freeze_processes() -->
                     freeze_processes() -->
                             __usermodehelper_set_disable_depth(UMH_DISABLED);
                             freeze all tasks ...
                     freeze_kernel_threads()
     suspend_devices_and_enter() -->
             dpm_suspend_start() -->
                             dpm_prepare()
                             dpm_suspend()
             suspend_enter()  -->
                     platform_suspend_prepare()
                     dpm_suspend_late()
                     freeze_enter()
                     syscore_suspend()

On our way back up:

     enter_state() -->
     suspend_devices_and_enter() --> (bail from loop)
             dpm_resume_end() -->
                     dpm_resume()
                     dpm_complete()
     suspend_finish() -->
             suspend_thaw_processes() -->
                     thaw_processes() -->
                             __usermodehelper_set_disable_depth(UMH_FREEZING);
                             thaw_workqueues();
                             thaw all processes ...
                             usermodehelper_enable();
             pm_notifier_call_chain(PM_POST_SUSPEND);

So notifier would indeed be the right tooling to use here.

  Luis
Ming Lei Oct. 4, 2017, 11:30 a.m. UTC | #2
On Mon, Oct 02, 2017 at 03:52:12PM -0700, Bart Van Assche wrote:
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before a hibernation image is created,
> interrupt the md resync and reshape actions if the system is
> being frozen. Note: the resync and reshape will restart after
> the system is resumed and a message similar to the following
> will appear in the system log:
> 
> md: md0: data-check interrupted.
> 
> Additionally, only allow resync or reshape activity to be
> started after system freezing is over.

Last time, I asked which issue is fixed by this patch, but
never get answered, so again, what is issue fixed by
this patch?

Given the following patches for making SCSI quiesce safe
is required in either PM suspend/resume or other path
of transport_spi revalidate. So once theses patches are
in, I don't see any I/O hang in suspend/resume given
the SCSI quiesce issue is very specific one, and not
see this issue on other kind of device.

Thanks,
Ming
Bart Van Assche Oct. 4, 2017, 3:38 p.m. UTC | #3
On Wed, 2017-10-04 at 19:30 +0800, Ming Lei wrote:
> Last time, I asked which issue is fixed by this patch, but

> never get answered, so again, what is issue fixed by

> this patch?


That's not correct. I did answer your question. See e.g.
https://marc.info/?l=linux-block&m=150695879309606.

> Given the following patches for making SCSI quiesce safe

> is required in either PM suspend/resume or other path

> of transport_spi revalidate. So once theses patches are

> in, I don't see any I/O hang in suspend/resume given

> the SCSI quiesce issue is very specific one, and not

> see this issue on other kind of device.


There are many more storage drivers and filesystems than the md driver that
can submit I/O after user space processes have been frozen, e.g. the BTRFS
RAID and scrub functionalities. It will take time to fix all these drivers
and also to add filesystem freeze/thaw functionality to the suspend and
resume sequences. As long as not all drivers have been fixed we will need
SCSI device quiescing and resuming as a workaround to avoid that any I/O
occurs while a hibernation image is being created.

Bart.
Bart Van Assche Oct. 4, 2017, 4:58 p.m. UTC | #4
On Wed, 2017-10-04 at 03:16 +0200, Luis R. Rodriguez wrote:
> On Mon, Oct 02, 2017 at 03:52:12PM -0700, Bart Van Assche wrote:

> > diff --git a/drivers/md/md.c b/drivers/md/md.c

> > index 3f7426120a3b..a2cf2a93b0cb 100644

> > --- a/drivers/md/md.c

> > +++ b/drivers/md/md.c

> > @@ -8961,6 +8963,37 @@ static void md_stop_all_writes(void)

> >  		mdelay(1000*1);

> >  }

> >  

> > +/*

> > + * Ensure that neither resyncing nor reshaping occurs while the system is

> > + * frozen.

> > + */

> > +static int md_notify_pm(struct notifier_block *bl, unsigned long state,

> > +			void *unused)

> > +{

> > +	struct mddev *mddev;

> > +	struct list_head *tmp;

> > +

> > +	pr_debug("%s: state = %ld; system_freezing_cnt = %d\n", __func__, state,

> > +		 atomic_read(&system_freezing_cnt));

> > +

> > +	switch (state) {

> > +	case PM_HIBERNATION_PREPARE:

> 

> Hm, why not also include and use this for PM_SUSPEND_PREPARE and/or

> a PM_RESTORE_PREPARE.

> 

>         case PM_HIBERNATION_PREPARE:

>         case PM_SUSPEND_PREPARE:

>         case PM_RESTORE_PREPARE:


Thanks for the feedback Luis. I will make sure that resync activity is also
stopped for the PM_SUSPEND_PREPARE and PM_RESTORE_PREPARE cases.

Bart.
diff mbox

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3f7426120a3b..a2cf2a93b0cb 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -66,6 +66,7 @@ 
 #include <linux/raid/md_u.h>
 #include <linux/slab.h>
 #include <linux/percpu-refcount.h>
+#include <linux/suspend.h>
 
 #include <trace/events/block.h>
 #include "md.h"
@@ -7450,7 +7451,8 @@  static int md_thread(void *arg)
 
 		wait_event_interruptible_timeout
 			(thread->wqueue,
-			 test_bit(THREAD_WAKEUP, &thread->flags)
+			 (test_bit(THREAD_WAKEUP, &thread->flags) &&
+			  atomic_read(&system_freezing_cnt) == 0)
 			 || kthread_should_stop() || kthread_should_park(),
 			 thread->timeout);
 
@@ -8961,6 +8963,37 @@  static void md_stop_all_writes(void)
 		mdelay(1000*1);
 }
 
+/*
+ * Ensure that neither resyncing nor reshaping occurs while the system is
+ * frozen.
+ */
+static int md_notify_pm(struct notifier_block *bl, unsigned long state,
+			void *unused)
+{
+	struct mddev *mddev;
+	struct list_head *tmp;
+
+	pr_debug("%s: state = %ld; system_freezing_cnt = %d\n", __func__, state,
+		 atomic_read(&system_freezing_cnt));
+
+	switch (state) {
+	case PM_HIBERNATION_PREPARE:
+		md_stop_all_writes();
+		break;
+	case PM_POST_HIBERNATION:
+		rcu_read_lock();
+		for_each_mddev(mddev, tmp)
+			wake_up(&mddev->thread->wqueue);
+		rcu_read_unlock();
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block md_pm_notifier = {
+	.notifier_call	= md_notify_pm,
+};
+
 static int md_notify_reboot(struct notifier_block *this,
 			    unsigned long code, void *x)
 {
@@ -9007,6 +9040,7 @@  static int __init md_init(void)
 			    md_probe, NULL, NULL);
 
 	register_reboot_notifier(&md_reboot_notifier);
+	register_pm_notifier(&md_pm_notifier);
 	raid_table_header = register_sysctl_table(raid_root_table);
 
 	md_geninit();
@@ -9246,6 +9280,7 @@  static __exit void md_exit(void)
 
 	unregister_blkdev(MD_MAJOR,"md");
 	unregister_blkdev(mdp_major, "mdp");
+	unregister_pm_notifier(&md_pm_notifier);
 	unregister_reboot_notifier(&md_reboot_notifier);
 	unregister_sysctl_table(raid_table_header);