linux-next scsi-mq hang in suspend-resume
diff mbox

Message ID 20170717075341.GA19733@infradead.org
State New
Headers show

Commit Message

Christoph Hellwig July 17, 2017, 7:53 a.m. UTC
I still haven't gotten hold of an i915 machine where I could
run the actua ltest suite.

But I did some audit of the code, and it seems blk-mq is lacking
support for the RQF_PM flag.  While I can't directly see how
this would cause the hang your caused it's a least easy to test.

Can you apply the patch below and test with the use_blk_mq=0 parameter?

Note that implementing RQF_PM for blk-mq shouldn't be too hard either,
but if we don't get rid of the nr_pending counter somehow it would
be a severe performance penalty for all scsi devices.

Comments

Tomi Sarvela July 17, 2017, 10:30 a.m. UTC | #1
On 17/07/17 10:53, Christoph Hellwig wrote:
> I still haven't gotten hold of an i915 machine where I could
> run the actua ltest suite.
> 
> But I did some audit of the code, and it seems blk-mq is lacking
> support for the RQF_PM flag.  While I can't directly see how
> this would cause the hang your caused it's a least easy to test.
> 
> Can you apply the patch below and test with the use_blk_mq=0 parameter?
> 
> Note that implementing RQF_PM for blk-mq shouldn't be too hard either,
> but if we don't get rid of the nr_pending counter somehow it would
> be a severe performance penalty for all scsi devices.

First, tested that next-20170717 still triggers the problem when no 
extra options given. Adding scsi_mod.use_blk_mq=0 makes tests work.

Then I tried with sd.diff patched next-20170717. Works (still) with 
use_blk_mq=0. Also works when no options given, so this patch avoids the 
hang when using the new block-mq.

These tests on generic Haswell 4790K desktop machine.

Best regards,

Tomi
Christoph Hellwig July 17, 2017, 10:35 a.m. UTC | #2
On Mon, Jul 17, 2017 at 01:30:00PM +0300, Tomi Sarvela wrote:
> First, tested that next-20170717 still triggers the problem when no extra
> options given. Adding scsi_mod.use_blk_mq=0 makes tests work.
> 
> Then I tried with sd.diff patched next-20170717. Works (still) with
> use_blk_mq=0. Also works when no options given, so this patch avoids the
> hang when using the new block-mq.
> 
> These tests on generic Haswell 4790K desktop machine.

Thanks Tomi,

this seems to confirm it's runtime PM related, although I don't
really understand why that's an issue.  Let me spin up an implementation
of RQF_PM for blk-mq and give it to you for testing.

Patch
diff mbox

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bea36adeee17..5c3818ebee9c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -554,7 +554,7 @@  static struct scsi_driver sd_template = {
 		.probe		= sd_probe,
 		.remove		= sd_remove,
 		.shutdown	= sd_shutdown,
-		.pm		= &sd_pm_ops,
+//		.pm		= &sd_pm_ops,
 	},
 	.rescan			= sd_rescan,
 	.init_command		= sd_init_command,
@@ -3249,7 +3249,7 @@  static void sd_probe_async(void *data, async_cookie_t cookie)
 		gd->events |= DISK_EVENT_MEDIA_CHANGE;
 	}
 
-	blk_pm_runtime_init(sdp->request_queue, dev);
+//	blk_pm_runtime_init(sdp->request_queue, dev);
 	device_add_disk(dev, gd);
 	if (sdkp->capacity)
 		sd_dif_config_host(sdkp);