diff mbox

[v3,1/6] md: Make md resync and reshape threads freezable

Message ID 20170922221405.22091-2-bart.vanassche@wdc.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche Sept. 22, 2017, 10:14 p.m. UTC
Some people use the md driver on laptops and use the suspend and
resume functionality. Since it is essential that submitting of
new I/O requests stops before device quiescing starts, make the
md resync and reshape threads freezable.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/md/md.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

Comments

Ming Lei Sept. 25, 2017, 2:38 a.m. UTC | #1
On Fri, Sep 22, 2017 at 03:14:00PM -0700, Bart Van Assche wrote:
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before device quiescing starts, make the
> md resync and reshape threads freezable.
> 

Why is this patch part of this patchset? I don't think
MD is special enough wrt. the SCSI quiesce issue.
Bart Van Assche Sept. 25, 2017, 4:22 p.m. UTC | #2
On Mon, 2017-09-25 at 10:38 +0800, Ming Lei wrote:
> On Fri, Sep 22, 2017 at 03:14:00PM -0700, Bart Van Assche wrote:

> > Some people use the md driver on laptops and use the suspend and

> > resume functionality. Since it is essential that submitting of

> > new I/O requests stops before device quiescing starts, make the

> > md resync and reshape threads freezable.

> 

> Why is this patch part of this patchset? I don't think

> MD is special enough wrt. the SCSI quiesce issue.


Because the scope of this patch series is not only to make SCSI quiesce safe
but also to avoid that suspend and resume lock up with a storage stack that
uses the md driver.

Bart.
Ming Lei Sept. 25, 2017, 10:45 p.m. UTC | #3
On Mon, Sep 25, 2017 at 04:22:29PM +0000, Bart Van Assche wrote:
> On Mon, 2017-09-25 at 10:38 +0800, Ming Lei wrote:
> > On Fri, Sep 22, 2017 at 03:14:00PM -0700, Bart Van Assche wrote:
> > > Some people use the md driver on laptops and use the suspend and
> > > resume functionality. Since it is essential that submitting of
> > > new I/O requests stops before device quiescing starts, make the
> > > md resync and reshape threads freezable.
> > 
> > Why is this patch part of this patchset? I don't think
> > MD is special enough wrt. the SCSI quiesce issue.
> 
> Because the scope of this patch series is not only to make SCSI quiesce safe
> but also to avoid that suspend and resume lock up with a storage stack that
> uses the md driver.

If SCSI quiesce is safe, there shouldn't be suspend/resume lockup on any
SCSI device wrt. quiesce, right?

Did you see Martin's report which is on BTRFS(RAID) instead of MD?

	https://marc.info/?l=linux-block&m=150634883816965&w=2
Bart Van Assche Sept. 25, 2017, 10:48 p.m. UTC | #4
On Tue, 2017-09-26 at 06:45 +0800, Ming Lei wrote:
> On Mon, Sep 25, 2017 at 04:22:29PM +0000, Bart Van Assche wrote:

> > On Mon, 2017-09-25 at 10:38 +0800, Ming Lei wrote:

> > > On Fri, Sep 22, 2017 at 03:14:00PM -0700, Bart Van Assche wrote:

> > > > Some people use the md driver on laptops and use the suspend and

> > > > resume functionality. Since it is essential that submitting of

> > > > new I/O requests stops before device quiescing starts, make the

> > > > md resync and reshape threads freezable.

> > > 

> > > Why is this patch part of this patchset? I don't think

> > > MD is special enough wrt. the SCSI quiesce issue.

> > 

> > Because the scope of this patch series is not only to make SCSI quiesce safe

> > but also to avoid that suspend and resume lock up with a storage stack that

> > uses the md driver.

> 

> If SCSI quiesce is safe, there shouldn't be suspend/resume lockup on any

> SCSI device wrt. quiesce, right?


Please reread my e-mail. You misread it. I didn't write that SCSI quiesce is safe.

Bart.
diff mbox

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 08fcaebc61bd..26a12bd0db65 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -66,6 +66,7 @@ 
 #include <linux/raid/md_u.h>
 #include <linux/slab.h>
 #include <linux/percpu-refcount.h>
+#include <linux/freezer.h>
 
 #include <trace/events/block.h>
 #include "md.h"
@@ -7424,6 +7425,7 @@  static int md_thread(void *arg)
 	 */
 
 	allow_signal(SIGKILL);
+	set_freezable();
 	while (!kthread_should_stop()) {
 
 		/* We need to wait INTERRUPTIBLE so that
@@ -7434,7 +7436,7 @@  static int md_thread(void *arg)
 		if (signal_pending(current))
 			flush_signals(current);
 
-		wait_event_interruptible_timeout
+		wait_event_freezable_timeout
 			(thread->wqueue,
 			 test_bit(THREAD_WAKEUP, &thread->flags)
 			 || kthread_should_stop() || kthread_should_park(),
@@ -8133,6 +8135,8 @@  void md_do_sync(struct md_thread *thread)
 		return;
 	}
 
+	set_freezable();
+
 	if (mddev_is_clustered(mddev)) {
 		ret = md_cluster_ops->resync_start(mddev);
 		if (ret)
@@ -8324,7 +8328,7 @@  void md_do_sync(struct md_thread *thread)
 		     mddev->curr_resync_completed > mddev->resync_max
 			    )) {
 			/* time to update curr_resync_completed */
-			wait_event(mddev->recovery_wait,
+			wait_event_freezable(mddev->recovery_wait,
 				   atomic_read(&mddev->recovery_active) == 0);
 			mddev->curr_resync_completed = j;
 			if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) &&
@@ -8342,10 +8346,10 @@  void md_do_sync(struct md_thread *thread)
 			 * to avoid triggering warnings.
 			 */
 			flush_signals(current); /* just in case */
-			wait_event_interruptible(mddev->recovery_wait,
-						 mddev->resync_max > j
-						 || test_bit(MD_RECOVERY_INTR,
-							     &mddev->recovery));
+			wait_event_freezable(mddev->recovery_wait,
+					     mddev->resync_max > j ||
+					     test_bit(MD_RECOVERY_INTR,
+						      &mddev->recovery));
 		}
 
 		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
@@ -8421,7 +8425,7 @@  void md_do_sync(struct md_thread *thread)
 				 * Give other IO more of a chance.
 				 * The faster the devices, the less we wait.
 				 */
-				wait_event(mddev->recovery_wait,
+				wait_event_freezable(mddev->recovery_wait,
 					   !atomic_read(&mddev->recovery_active));
 			}
 		}
@@ -8433,7 +8437,8 @@  void md_do_sync(struct md_thread *thread)
 	 * this also signals 'finished resyncing' to md_stop
 	 */
 	blk_finish_plug(&plug);
-	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
+	wait_event_freezable(mddev->recovery_wait,
+			     !atomic_read(&mddev->recovery_active));
 
 	if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
 	    !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&