[v2,2/2] scsi: sd: Rework asynchronous resume support

Message ID	20220630195703.10155-3-bvanassche@acm.org (mailing list archive)
State	Accepted
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Bart Van Assche <bvanassche@acm.org> To: "Martin K . Petersen" <martin.petersen@oracle.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org>, linux-scsi@vger.kernel.org, Bart Van Assche <bvanassche@acm.org>, Ming Lei <ming.lei@redhat.com>, Hannes Reinecke <hare@suse.de>, John Garry <john.garry@huawei.com>, ericspero@icloud.com, jason600.groome@gmail.com Subject: [PATCH v2 2/2] scsi: sd: Rework asynchronous resume support Date: Thu, 30 Jun 2022 12:57:03 -0700 Message-Id: <20220630195703.10155-3-bvanassche@acm.org> In-Reply-To: <20220630195703.10155-1-bvanassche@acm.org> References: <20220630195703.10155-1-bvanassche@acm.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Reduce ATA disk resume time \| expand [v2,0/2] Reduce ATA disk resume time [v2,1/2] scsi: core: Move the definition of SCSI_QUEUE_DELAY [v2,2/2] scsi: sd: Rework asynchronous resume support

Bart Van Assche June 30, 2022, 7:57 p.m. UTC

For some technologies, e.g. an ATA bus, resuming can take multiple
seconds. Waiting for resume to finish can cause a very noticeable delay.
Hence this patch that restores the behavior from before patch "scsi:
core: pm: Rely on the device driver core for async power management" for
most SCSI devices.

This patch introduces a behavior change: if the START command fails, do
not consider this as a SCSI disk resume failure.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Cc: ericspero@icloud.com
Cc: jason600.groome@gmail.com
Tested-by: jason600.groome@gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215880
Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/sd.c | 84 +++++++++++++++++++++++++++++++++++++----------
 drivers/scsi/sd.h |  5 +++
 2 files changed, 71 insertions(+), 18 deletions(-)

Geert Uytterhoeven July 19, 2022, 9:26 a.m. UTC | #1

Hoi Bart,

On Thu, 30 Jun 2022, Bart Van Assche wrote:
> For some technologies, e.g. an ATA bus, resuming can take multiple
> seconds. Waiting for resume to finish can cause a very noticeable delay.
> Hence this patch that restores the behavior from before patch "scsi:
> core: pm: Rely on the device driver core for async power management" for
> most SCSI devices.
>
> This patch introduces a behavior change: if the START command fails, do
> not consider this as a SCSI disk resume failure.
>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: John Garry <john.garry@huawei.com>
> Cc: ericspero@icloud.com
> Cc: jason600.groome@gmail.com
> Tested-by: jason600.groome@gmail.com
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215880
> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Thanks for your patch, which is now commit 88f1669019bd62b3 ("scsi: sd:
Rework asynchronous resume support") in scsi/for-next.

On the Salvator-XS development board[1] with a SATA hard drive
connected, accessing the hard drive after resume from s2idle hangs.
I have bisected this to the aformentioned commit, and reverting this
commit fixes the issue.

[1] arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts

> ---
> drivers/scsi/sd.c | 84 +++++++++++++++++++++++++++++++++++++----------
> drivers/scsi/sd.h |  5 +++
> 2 files changed, 71 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 895b56c8f25e..84696b3652ee 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -103,6 +103,7 @@ static void sd_config_discard(struct scsi_disk *, unsigned int);
> static void sd_config_write_same(struct scsi_disk *);
> static int  sd_revalidate_disk(struct gendisk *);
> static void sd_unlock_native_capacity(struct gendisk *disk);
> +static void sd_start_done_work(struct work_struct *work);
> static int  sd_probe(struct device *);
> static int  sd_remove(struct device *);
> static void sd_shutdown(struct device *);
> @@ -3463,6 +3464,7 @@ static int sd_probe(struct device *dev)
> 	sdkp->max_retries = SD_MAX_RETRIES;
> 	atomic_set(&sdkp->openers, 0);
> 	atomic_set(&sdkp->device->ioerr_cnt, 0);
> +	INIT_WORK(&sdkp->start_done_work, sd_start_done_work);
>
> 	if (!sdp->request_queue->rq_timeout) {
> 		if (sdp->type != TYPE_MOD)
> @@ -3585,12 +3587,69 @@ static void scsi_disk_release(struct device *dev)
> 	kfree(sdkp);
> }
>
> +/* Process sense data after a START command finished. */
> +static void sd_start_done_work(struct work_struct *work)
> +{
> +	struct scsi_disk *sdkp = container_of(work, typeof(*sdkp),
> +					      start_done_work);
> +	struct scsi_sense_hdr sshdr;
> +	int res = sdkp->start_result;
> +
> +	if (res == 0)
> +		return;
> +
> +	sd_print_result(sdkp, "Start/Stop Unit failed", res);
> +
> +	if (res < 0)
> +		return;
> +
> +	if (scsi_normalize_sense(sdkp->start_sense_buffer,
> +				 sdkp->start_sense_len, &sshdr))
> +		sd_print_sense_hdr(sdkp, &sshdr);
> +}
> +
> +/* A START command finished. May be called from interrupt context. */
> +static void sd_start_done(struct request *req, blk_status_t status)
> +{
> +	const struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> +	struct scsi_disk *sdkp = scsi_disk(req->q->disk);
> +
> +	sdkp->start_result = scmd->result;
> +	WARN_ON_ONCE(scmd->sense_len > SCSI_SENSE_BUFFERSIZE);
> +	sdkp->start_sense_len = scmd->sense_len;
> +	memcpy(sdkp->start_sense_buffer, scmd->sense_buffer,
> +	       ARRAY_SIZE(sdkp->start_sense_buffer));
> +	WARN_ON_ONCE(!schedule_work(&sdkp->start_done_work));
> +}
> +
> +/* Submit a START command asynchronously. */
> +static int sd_submit_start(struct scsi_disk *sdkp, u8 cmd[], u8 cmd_len)
> +{
> +	struct scsi_device *sdev = sdkp->device;
> +	struct request_queue *q = sdev->request_queue;
> +	struct request *req;
> +	struct scsi_cmnd *scmd;
> +
> +	req = scsi_alloc_request(q, REQ_OP_DRV_IN, BLK_MQ_REQ_PM);
> +	if (IS_ERR(req))
> +		return PTR_ERR(req);
> +
> +	scmd = blk_mq_rq_to_pdu(req);
> +	scmd->cmd_len = cmd_len;
> +	memcpy(scmd->cmnd, cmd, cmd_len);
> +	scmd->allowed = sdkp->max_retries;
> +	req->timeout = SD_TIMEOUT;
> +	req->rq_flags |= RQF_PM | RQF_QUIET;
> +	req->end_io = sd_start_done;
> +	blk_execute_rq_nowait(req, /*at_head=*/true);
> +
> +	return 0;
> +}
> +
> static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
> {
> 	unsigned char cmd[6] = { START_STOP };	/* START_VALID */
> -	struct scsi_sense_hdr sshdr;
> 	struct scsi_device *sdp = sdkp->device;
> -	int res;
>
> 	if (start)
> 		cmd[4] |= 1;	/* START */
> @@ -3601,23 +3660,10 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
> 	if (!scsi_device_online(sdp))
> 		return -ENODEV;
>
> -	res = scsi_execute(sdp, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
> -			SD_TIMEOUT, sdkp->max_retries, 0, RQF_PM, NULL);
> -	if (res) {
> -		sd_print_result(sdkp, "Start/Stop Unit failed", res);
> -		if (res > 0 && scsi_sense_valid(&sshdr)) {
> -			sd_print_sense_hdr(sdkp, &sshdr);
> -			/* 0x3a is medium not present */
> -			if (sshdr.asc == 0x3a)
> -				res = 0;
> -		}
> -	}
> +	/* Wait until processing of sense data has finished. */
> +	flush_work(&sdkp->start_done_work);
>
> -	/* SCSI error codes must not go to the generic layer */
> -	if (res)
> -		return -EIO;
> -
> -	return 0;
> +	return sd_submit_start(sdkp, cmd, sizeof(cmd));
> }
>
> /*
> @@ -3644,6 +3690,8 @@ static void sd_shutdown(struct device *dev)
> 		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
> 		sd_start_stop_device(sdkp, 0);
> 	}
> +
> +	flush_work(&sdkp->start_done_work);
> }
>
> static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 5eea762f84d1..b89187761d61 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -150,6 +150,11 @@ struct scsi_disk {
> 	unsigned	urswrz : 1;
> 	unsigned	security : 1;
> 	unsigned	ignore_medium_access_errors : 1;
> +
> +	int		start_result;
> +	u32		start_sense_len;
> +	u8		start_sense_buffer[SCSI_SENSE_BUFFERSIZE];
> +	struct work_struct start_done_work;
> };
> #define to_scsi_disk(obj) container_of(obj, struct scsi_disk, disk_dev)
>

Gr{oetje,eeting}s,

 						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
 							    -- Linus Torvalds

Bart Van Assche July 19, 2022, 6:14 p.m. UTC | #2

On 7/19/22 02:26, Geert Uytterhoeven wrote:
> Thanks for your patch, which is now commit 88f1669019bd62b3 ("scsi: sd:
> Rework asynchronous resume support") in scsi/for-next.
> 
> On the Salvator-XS development board[1] with a SATA hard drive
> connected, accessing the hard drive after resume from s2idle hangs.
> I have bisected this to the aformentioned commit, and reverting this
> commit fixes the issue.
> 
> [1] arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts

Hi Geert,

Thank you for having reported this. How about creating a bug report on 
https://bugzilla.kernel.org/ and attaching the kernel logs with and 
without this patch? It's probably better to upload the kernel logs 
somewhere rather than to post these on the linux-scsi mailing list.

Thanks,

Bart.

Geert Uytterhoeven July 20, 2022, 7:26 a.m. UTC | #3

Hoi Bart,

On Tue, Jul 19, 2022 at 8:14 PM Bart Van Assche <bvanassche@acm.org> wrote:
> On 7/19/22 02:26, Geert Uytterhoeven wrote:
> > Thanks for your patch, which is now commit 88f1669019bd62b3 ("scsi: sd:
> > Rework asynchronous resume support") in scsi/for-next.
> >
> > On the Salvator-XS development board[1] with a SATA hard drive
> > connected, accessing the hard drive after resume from s2idle hangs.
> > I have bisected this to the aformentioned commit, and reverting this
> > commit fixes the issue.
> >
> > [1] arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
>
> Thank you for having reported this. How about creating a bug report on
> https://bugzilla.kernel.org/ and attaching the kernel logs with and

I never created a bug report on bko before...

> without this patch? It's probably better to upload the kernel logs

Unfortunately the logs do not indicate anything went wrong.
I.e. no difference before/after.

> somewhere rather than to post these on the linux-scsi mailing list.

Lore links are great for after-the-fact reporting.  If you follow
the lore link in the buggy commit, you arrive in this thread ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Geert Uytterhoeven July 20, 2022, 7:47 a.m. UTC | #4

On Wed, Jul 20, 2022 at 9:26 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Tue, Jul 19, 2022 at 8:14 PM Bart Van Assche <bvanassche@acm.org> wrote:
> > On 7/19/22 02:26, Geert Uytterhoeven wrote:
> > > Thanks for your patch, which is now commit 88f1669019bd62b3 ("scsi: sd:
> > > Rework asynchronous resume support") in scsi/for-next.
> > >
> > > On the Salvator-XS development board[1] with a SATA hard drive
> > > connected, accessing the hard drive after resume from s2idle hangs.
> > > I have bisected this to the aformentioned commit, and reverting this
> > > commit fixes the issue.
> > >
> > > [1] arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
> >
> > Thank you for having reported this. How about creating a bug report on
> > https://bugzilla.kernel.org/ and attaching the kernel logs with and
>
> I never created a bug report on bko before...
>
> > without this patch? It's probably better to upload the kernel logs
>
> Unfortunately the logs do not indicate anything went wrong.
> I.e. no difference before/after.

With more debug options enabled, it prints:

INFO: task kworker/0:7:283 blocked for more than 120 seconds.
      Not tainted 5.19.0-rc7-salvator-x-00794-g6780eb02b605 #1287
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:7     state:D stack:    0 pid:  283 ppid:     2 flags:0x00000008
Workqueue: events ata_scsi_dev_rescan
Call trace:
 __switch_to+0xbc/0x124
 __schedule+0x540/0x71c
 schedule+0x58/0xa0
 io_schedule+0x18/0x34
 blk_mq_get_tag+0x138/0x244
 __blk_mq_alloc_requests+0x130/0x2f0
 blk_mq_alloc_request+0x74/0xa8
 scsi_alloc_request+0x10/0x30
 __scsi_execute+0x5c/0x18c
 scsi_vpd_inquiry+0x7c/0xdc
 scsi_get_vpd_size+0x34/0xa8
 scsi_get_vpd_buf+0x28/0xf4
 scsi_attach_vpd+0x44/0x170
 scsi_rescan_device+0x30/0x98
 ata_scsi_dev_rescan+0xc8/0xfc
 process_one_work+0x2e0/0x474
 worker_thread+0x1cc/0x270
 kthread+0xd8/0xe8
 ret_from_fork+0x10/0x20

This doesn't look like it's blocked in the R-Car SATA driver, but on
some I/O scheduling event in the block core?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Bart Van Assche July 20, 2022, 4:51 p.m. UTC | #5

On 7/20/22 00:47, Geert Uytterhoeven wrote:
> With more debug options enabled, it prints:
> 
> INFO: task kworker/0:7:283 blocked for more than 120 seconds.
>        Not tainted 5.19.0-rc7-salvator-x-00794-g6780eb02b605 #1287
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/0:7     state:D stack:    0 pid:  283 ppid:     2 flags:0x00000008
> Workqueue: events ata_scsi_dev_rescan
> Call trace:
>   __switch_to+0xbc/0x124
>   __schedule+0x540/0x71c
>   schedule+0x58/0xa0
>   io_schedule+0x18/0x34
>   blk_mq_get_tag+0x138/0x244
>   __blk_mq_alloc_requests+0x130/0x2f0
>   blk_mq_alloc_request+0x74/0xa8
>   scsi_alloc_request+0x10/0x30
>   __scsi_execute+0x5c/0x18c
>   scsi_vpd_inquiry+0x7c/0xdc
>   scsi_get_vpd_size+0x34/0xa8
>   scsi_get_vpd_buf+0x28/0xf4
>   scsi_attach_vpd+0x44/0x170
>   scsi_rescan_device+0x30/0x98
>   ata_scsi_dev_rescan+0xc8/0xfc
>   process_one_work+0x2e0/0x474
>   worker_thread+0x1cc/0x270
>   kthread+0xd8/0xe8
>   ret_from_fork+0x10/0x20
> 
> This doesn't look like it's blocked in the R-Car SATA driver, but on
> some I/O scheduling event in the block core?

I'm not familiar with the SATA code but from a quick look it seems like 
the above code is only triggered from inside the ATA error handler 
(ata_do_eh() -> ata_eh_recover() -> ata_eh_revalidate_and_attach() -> 
schedule_work(&(ap->scsi_rescan_task) -> ata_scsi_dev_rescan()). It 
doesn't seem normal to me that the ATA error handler gets invoked during 
a resume. How about testing the following two code changes?
* In sd_start_stop_device(), change "return sd_submit_start(sdkp, cmd, 
sizeof(cmd))" into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and below 
that call add "flush_work(&sdkp->start_done_work)". This makes 
sd_start_stop_device() again synchronous. This will learn us whether the 
behavior change is caused by submitting the START command from another 
context or by not waiting until the START command has finished.
* Back out the above change, change "return sd_submit_start(sdkp, cmd, 
sizeof(cmd))" again into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and 
below that statement add a call to 
scsi_run_queue(sdkp->device->request_queue). If this change helps it 
means that the scsi_run_queue() call is necessary to prevent reordering 
of the START command with other SCSI commands.

Thanks,

Bart.

Geert Uytterhoeven July 20, 2022, 5:44 p.m. UTC | #6

Hi Bart,

On Wed, Jul 20, 2022 at 6:51 PM Bart Van Assche <bvanassche@acm.org> wrote:
> On 7/20/22 00:47, Geert Uytterhoeven wrote:
> > With more debug options enabled, it prints:
> >
> > INFO: task kworker/0:7:283 blocked for more than 120 seconds.
> >        Not tainted 5.19.0-rc7-salvator-x-00794-g6780eb02b605 #1287
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:kworker/0:7     state:D stack:    0 pid:  283 ppid:     2 flags:0x00000008
> > Workqueue: events ata_scsi_dev_rescan
> > Call trace:
> >   __switch_to+0xbc/0x124
> >   __schedule+0x540/0x71c
> >   schedule+0x58/0xa0
> >   io_schedule+0x18/0x34
> >   blk_mq_get_tag+0x138/0x244
> >   __blk_mq_alloc_requests+0x130/0x2f0
> >   blk_mq_alloc_request+0x74/0xa8
> >   scsi_alloc_request+0x10/0x30
> >   __scsi_execute+0x5c/0x18c
> >   scsi_vpd_inquiry+0x7c/0xdc
> >   scsi_get_vpd_size+0x34/0xa8
> >   scsi_get_vpd_buf+0x28/0xf4
> >   scsi_attach_vpd+0x44/0x170
> >   scsi_rescan_device+0x30/0x98
> >   ata_scsi_dev_rescan+0xc8/0xfc
> >   process_one_work+0x2e0/0x474
> >   worker_thread+0x1cc/0x270
> >   kthread+0xd8/0xe8
> >   ret_from_fork+0x10/0x20
> >
> > This doesn't look like it's blocked in the R-Car SATA driver, but on
> > some I/O scheduling event in the block core?
>
> I'm not familiar with the SATA code but from a quick look it seems like
> the above code is only triggered from inside the ATA error handler
> (ata_do_eh() -> ata_eh_recover() -> ata_eh_revalidate_and_attach() ->
> schedule_work(&(ap->scsi_rescan_task) -> ata_scsi_dev_rescan()). It
> doesn't seem normal to me that the ATA error handler gets invoked during
> a resume. How about testing the following two code changes?

Thanks for your suggestions!

> * In sd_start_stop_device(), change "return sd_submit_start(sdkp, cmd,
> sizeof(cmd))" into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and below
> that call add "flush_work(&sdkp->start_done_work)". This makes
> sd_start_stop_device() again synchronous. This will learn us whether the
> behavior change is caused by submitting the START command from another
> context or by not waiting until the START command has finished.

Unfortunately this doesn't have any impact.

> * Back out the above change, change "return sd_submit_start(sdkp, cmd,
> sizeof(cmd))" again into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and
> below that statement add a call to
> scsi_run_queue(sdkp->device->request_queue). If this change helps it

(that's the static scsi_run_queue() in drivers/scsi/scsi_lib.c?)

> means that the scsi_run_queue() call is necessary to prevent reordering
> of the START command with other SCSI commands.

Unfortunately this doesn't have any impact either.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Bart Van Assche July 20, 2022, 6:04 p.m. UTC | #7

On 7/20/22 10:44, Geert Uytterhoeven wrote:
> On Wed, Jul 20, 2022 at 6:51 PM Bart Van Assche <bvanassche@acm.org> wrote:
>> I'm not familiar with the SATA code but from a quick look it seems like
>> the above code is only triggered from inside the ATA error handler
>> (ata_do_eh() -> ata_eh_recover() -> ata_eh_revalidate_and_attach() ->
>> schedule_work(&(ap->scsi_rescan_task) -> ata_scsi_dev_rescan()). It
>> doesn't seem normal to me that the ATA error handler gets invoked during
>> a resume. How about testing the following two code changes?
> 
> Thanks for your suggestions!
> 
>> * In sd_start_stop_device(), change "return sd_submit_start(sdkp, cmd,
>> sizeof(cmd))" into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and below
>> that call add "flush_work(&sdkp->start_done_work)". This makes
>> sd_start_stop_device() again synchronous. This will learn us whether the
>> behavior change is caused by submitting the START command from another
>> context or by not waiting until the START command has finished.
> 
> Unfortunately this doesn't have any impact.
> 
>> * Back out the above change, change "return sd_submit_start(sdkp, cmd,
>> sizeof(cmd))" again into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and
>> below that statement add a call to
>> scsi_run_queue(sdkp->device->request_queue). If this change helps it
> 
> (that's the static scsi_run_queue() in drivers/scsi/scsi_lib.c?)
> 
>> means that the scsi_run_queue() call is necessary to prevent reordering
>> of the START command with other SCSI commands.
> 
> Unfortunately this doesn't have any impact either.

That's surprising. Is there anything unusual about the test setup that I 
should know, e.g. very small number of CPU cores or a very small queue 
depth of the SATA device? How about adding pr_info() statements at the 
start and end of the following functions and also before the return 
statements in these functions to determine where execution of the START 
command hangs?
* sd_start_done().
* sd_start_done_work().

Thanks,

Bart.

Hannes Reinecke July 21, 2022, 5:40 a.m. UTC | #8

On 7/20/22 18:51, Bart Van Assche wrote:
> On 7/20/22 00:47, Geert Uytterhoeven wrote:
>> With more debug options enabled, it prints:
>>
>> INFO: task kworker/0:7:283 blocked for more than 120 seconds.
>>        Not tainted 5.19.0-rc7-salvator-x-00794-g6780eb02b605 #1287
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:kworker/0:7     state:D stack:    0 pid:  283 ppid:     2 
>> flags:0x00000008
>> Workqueue: events ata_scsi_dev_rescan
>> Call trace:
>>   __switch_to+0xbc/0x124
>>   __schedule+0x540/0x71c
>>   schedule+0x58/0xa0
>>   io_schedule+0x18/0x34
>>   blk_mq_get_tag+0x138/0x244
>>   __blk_mq_alloc_requests+0x130/0x2f0
>>   blk_mq_alloc_request+0x74/0xa8
>>   scsi_alloc_request+0x10/0x30
>>   __scsi_execute+0x5c/0x18c
>>   scsi_vpd_inquiry+0x7c/0xdc
>>   scsi_get_vpd_size+0x34/0xa8
>>   scsi_get_vpd_buf+0x28/0xf4
>>   scsi_attach_vpd+0x44/0x170
>>   scsi_rescan_device+0x30/0x98
>>   ata_scsi_dev_rescan+0xc8/0xfc
>>   process_one_work+0x2e0/0x474
>>   worker_thread+0x1cc/0x270
>>   kthread+0xd8/0xe8
>>   ret_from_fork+0x10/0x20
>>
>> This doesn't look like it's blocked in the R-Car SATA driver, but on
>> some I/O scheduling event in the block core?
> 
> I'm not familiar with the SATA code but from a quick look it seems like 
> the above code is only triggered from inside the ATA error handler 
> (ata_do_eh() -> ata_eh_recover() -> ata_eh_revalidate_and_attach() -> 
> schedule_work(&(ap->scsi_rescan_task) -> ata_scsi_dev_rescan()). It 
> doesn't seem normal to me that the ATA error handler gets invoked during 
> a resume. How about testing the following two code changes?

Sadly, that _is_ the normal workflow. SATA EH gets invoked regularly, 
and is handling the device resets after resume.

Cheers,

Hannes

Geert Uytterhoeven July 21, 2022, 8:07 a.m. UTC | #9

Hoi Bart,

On Wed, Jul 20, 2022 at 8:04 PM Bart Van Assche <bvanassche@acm.org> wrote:
> On 7/20/22 10:44, Geert Uytterhoeven wrote:
> > On Wed, Jul 20, 2022 at 6:51 PM Bart Van Assche <bvanassche@acm.org> wrote:
> >> I'm not familiar with the SATA code but from a quick look it seems like
> >> the above code is only triggered from inside the ATA error handler
> >> (ata_do_eh() -> ata_eh_recover() -> ata_eh_revalidate_and_attach() ->
> >> schedule_work(&(ap->scsi_rescan_task) -> ata_scsi_dev_rescan()). It
> >> doesn't seem normal to me that the ATA error handler gets invoked during
> >> a resume. How about testing the following two code changes?
> >
> > Thanks for your suggestions!
> >
> >> * In sd_start_stop_device(), change "return sd_submit_start(sdkp, cmd,
> >> sizeof(cmd))" into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and below
> >> that call add "flush_work(&sdkp->start_done_work)". This makes
> >> sd_start_stop_device() again synchronous. This will learn us whether the
> >> behavior change is caused by submitting the START command from another
> >> context or by not waiting until the START command has finished.
> >
> > Unfortunately this doesn't have any impact.
> >
> >> * Back out the above change, change "return sd_submit_start(sdkp, cmd,
> >> sizeof(cmd))" again into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and
> >> below that statement add a call to
> >> scsi_run_queue(sdkp->device->request_queue). If this change helps it
> >
> > (that's the static scsi_run_queue() in drivers/scsi/scsi_lib.c?)
> >
> >> means that the scsi_run_queue() call is necessary to prevent reordering
> >> of the START command with other SCSI commands.
> >
> > Unfortunately this doesn't have any impact either.
>
> That's surprising. Is there anything unusual about the test setup that I
> should know, e.g. very small number of CPU cores or a very small queue
> depth of the SATA device? How about adding pr_info() statements at the
> start and end of the following functions and also before the return
> statements in these functions to determine where execution of the START
> command hangs?
> * sd_start_done().
> * sd_start_done_work().

None of these functions seem to be called at all?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Bart Van Assche July 21, 2022, 6:14 p.m. UTC | #10

On 7/21/22 01:07, Geert Uytterhoeven wrote:
> On Wed, Jul 20, 2022 at 8:04 PM Bart Van Assche <bvanassche@acm.org> wrote:
>> That's surprising. Is there anything unusual about the test setup that I
>> should know, e.g. very small number of CPU cores or a very small queue
>> depth of the SATA device? How about adding pr_info() statements at the
>> start and end of the following functions and also before the return
>> statements in these functions to determine where execution of the START
>> command hangs?
>> * sd_start_done().
>> * sd_start_done_work().
> 
> None of these functions seem to be called at all?
That's weird. This means that either sd_submit_start() hangs or that the 
execution of the START command never finishes. The latter is unlikely 
since the SCSI error handler is assumed to abort commands that hang. It 
would also be weird if sd_submit_start() would hang before the START 
command is submitted since the code flow for submitting the START 
command is very similar to the code flow for submitting the START 
command without patch "scsi: sd: Rework asynchronous resume support" 
(calling scsi_execute()).

What is also weird is that there are at least two SATA setups on which 
this code works fine, including my Qemu setup.

Although it is possible to enable tracing at boot time, adding the 
following parameters to the kernel command line would generate too much 
logging data:

tp_printk 
trace_event=block_rq_complete,block_rq_error,block_rq_insert,block_rq_issue,block_rq_merge,block_rq_remap,block_rq_requeue,scsi_dispatch_cmd_done,scsi_dispatch_cmd_start,scsi_eh_wakeup,scsi_dispatch_cmd_error,scsi_dispatch_cmd_timeout 
scsi_mod.scsi_logging_level=32256

I'm not sure what the best way is to proceed since I cannot reproduce 
this issue.

Bart.

Geert Uytterhoeven July 22, 2022, 8:53 a.m. UTC | #11

Hoi Bart,

On Thu, Jul 21, 2022 at 8:15 PM Bart Van Assche <bvanassche@acm.org> wrote:
> On 7/21/22 01:07, Geert Uytterhoeven wrote:
> > On Wed, Jul 20, 2022 at 8:04 PM Bart Van Assche <bvanassche@acm.org> wrote:
> >> That's surprising. Is there anything unusual about the test setup that I
> >> should know, e.g. very small number of CPU cores or a very small queue
> >> depth of the SATA device? How about adding pr_info() statements at the
> >> start and end of the following functions and also before the return
> >> statements in these functions to determine where execution of the START
> >> command hangs?
> >> * sd_start_done().
> >> * sd_start_done_work().
> >
> > None of these functions seem to be called at all?
>
> That's weird. This means that either sd_submit_start() hangs or that the
> execution of the START command never finishes. The latter is unlikely
> since the SCSI error handler is assumed to abort commands that hang. It
> would also be weird if sd_submit_start() would hang before the START
> command is submitted since the code flow for submitting the START
> command is very similar to the code flow for submitting the START
> command without patch "scsi: sd: Rework asynchronous resume support"
> (calling scsi_execute()).

I think you misunderstood: none of these functions seem to be called,
even when reading from hard drive works fine.

> What is also weird is that there are at least two SATA setups on which
> this code works fine, including my Qemu setup.
>
> Although it is possible to enable tracing at boot time, adding the
> following parameters to the kernel command line would generate too much
> logging data:
>
> tp_printk
> trace_event=block_rq_complete,block_rq_error,block_rq_insert,block_rq_issue,block_rq_merge,block_rq_remap,block_rq_requeue,scsi_dispatch_cmd_done,scsi_dispatch_cmd_start,scsi_eh_wakeup,scsi_dispatch_cmd_error,scsi_dispatch_cmd_timeout
> scsi_mod.scsi_logging_level=32256
>
> I'm not sure what the best way is to proceed since I cannot reproduce
> this issue.

During s2idle, the following trace data is generated:

   kworker/u16:9-325     [000] ...2.   230.478731: block_rq_issue: 8,0
N 0 () 0 + 0 [kworker/u16:9]
   kworker/u16:9-325     [000] ...2.   230.478745:
scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=0
prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=0
cmnd=(SYNCHRONIZE_CACHE - raw=35 00 00 00 00 00 00 00 00 00)
          <idle>-0       [007] d.h3.   230.478832:
scsi_dispatch_cmd_done: host_no=0 channel=0 id=0 lun=0 data_sgl=0
prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=0
cmnd=(SYNCHRONIZE_CACHE - raw=35 00 00 00 00 00 00 00 00 00)
result=(driver=DRIVER_OK host=DID_OK message=COMMAND_COMPLETE
status=SAM_STAT_GOOD)
          <idle>-0       [000] ..s2.   230.478851: block_rq_complete:
8,0 N () 18446744073709551615 + 0 [0]
   kworker/u16:9-325     [000] ...2.   230.483134: block_rq_issue: 8,0
N 0 () 0 + 0 [kworker/u16:9]
   kworker/u16:9-325     [000] ...2.   230.483136:
scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=0
prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=1
cmnd=(START_STOP - raw=1b 00 00 00 00 00)
          <idle>-0       [007] d.h3.   230.624530:
scsi_dispatch_cmd_done: host_no=0 channel=0 id=0 lun=0 data_sgl=0
prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=1
cmnd=(START_STOP - raw=1b 00 00 00 00 00) result=(driver=DRIVER_OK
host=DID_OK message=COMMAND_COMPLETE status=SAM_STAT_GOOD)
          <idle>-0       [000] d.s4.   230.624634: scsi_eh_wakeup: host_no=0
          <idle>-0       [000] ..s2.   230.624642: block_rq_complete:
8,0 N () 18446744073709551615 + 0 [0]
  kworker/u16:14-1027    [007] d..3.   231.393642: scsi_eh_wakeup: host_no=0

When reading from hard drive after s2idle, no more trace data
is generated.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Bart Van Assche July 22, 2022, 5:56 p.m. UTC | #12

On 7/22/22 01:53, Geert Uytterhoeven wrote:
> During s2idle, the following trace data is generated:
> 
>     kworker/u16:9-325     [000] ...2.   230.478731: block_rq_issue: 8,0
> N 0 () 0 + 0 [kworker/u16:9]
>     kworker/u16:9-325     [000] ...2.   230.478745:
> scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=0
> prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=0
> cmnd=(SYNCHRONIZE_CACHE - raw=35 00 00 00 00 00 00 00 00 00)
>            <idle>-0       [007] d.h3.   230.478832:
> scsi_dispatch_cmd_done: host_no=0 channel=0 id=0 lun=0 data_sgl=0
> prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=0
> cmnd=(SYNCHRONIZE_CACHE - raw=35 00 00 00 00 00 00 00 00 00)
> result=(driver=DRIVER_OK host=DID_OK message=COMMAND_COMPLETE
> status=SAM_STAT_GOOD)
>            <idle>-0       [000] ..s2.   230.478851: block_rq_complete:
> 8,0 N () 18446744073709551615 + 0 [0]
>     kworker/u16:9-325     [000] ...2.   230.483134: block_rq_issue: 8,0
> N 0 () 0 + 0 [kworker/u16:9]
>     kworker/u16:9-325     [000] ...2.   230.483136:
> scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=0
> prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=1
> cmnd=(START_STOP - raw=1b 00 00 00 00 00)
>            <idle>-0       [007] d.h3.   230.624530:
> scsi_dispatch_cmd_done: host_no=0 channel=0 id=0 lun=0 data_sgl=0
> prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=1
> cmnd=(START_STOP - raw=1b 00 00 00 00 00) result=(driver=DRIVER_OK
> host=DID_OK message=COMMAND_COMPLETE status=SAM_STAT_GOOD)
>            <idle>-0       [000] d.s4.   230.624634: scsi_eh_wakeup: host_no=0
>            <idle>-0       [000] ..s2.   230.624642: block_rq_complete:
> 8,0 N () 18446744073709551615 + 0 [0]
>    kworker/u16:14-1027    [007] d..3.   231.393642: scsi_eh_wakeup: host_no=0
> 
> When reading from hard drive after s2idle, no more trace data
> is generated.

I think the above commands come from the suspend sequence. '1b 00 00 00 
00 00' stops a block device. The lowest bit in byte 4 needs to be set to 
start a block device.

Something that is not yet clear is whether or not sd_submit_start() 
hangs during the resume process. How about verifying whether or not 
sd_submit_start() hangs by either issuing SysRq-t or by adding pr_info() 
statements in that function?

Thanks,

Bart.

Geert Uytterhoeven Aug. 12, 2022, 10:48 a.m. UTC | #13

Hi Bart,

CC linux-ide

On Fri, Jul 22, 2022 at 7:56 PM Bart Van Assche <bvanassche@acm.org> wrote:
> On 7/22/22 01:53, Geert Uytterhoeven wrote:
> > During s2idle, the following trace data is generated:
> >
> >     kworker/u16:9-325     [000] ...2.   230.478731: block_rq_issue: 8,0
> > N 0 () 0 + 0 [kworker/u16:9]
> >     kworker/u16:9-325     [000] ...2.   230.478745:
> > scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=0
> > prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=0
> > cmnd=(SYNCHRONIZE_CACHE - raw=35 00 00 00 00 00 00 00 00 00)
> >            <idle>-0       [007] d.h3.   230.478832:
> > scsi_dispatch_cmd_done: host_no=0 channel=0 id=0 lun=0 data_sgl=0
> > prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=0
> > cmnd=(SYNCHRONIZE_CACHE - raw=35 00 00 00 00 00 00 00 00 00)
> > result=(driver=DRIVER_OK host=DID_OK message=COMMAND_COMPLETE
> > status=SAM_STAT_GOOD)
> >            <idle>-0       [000] ..s2.   230.478851: block_rq_complete:
> > 8,0 N () 18446744073709551615 + 0 [0]
> >     kworker/u16:9-325     [000] ...2.   230.483134: block_rq_issue: 8,0
> > N 0 () 0 + 0 [kworker/u16:9]
> >     kworker/u16:9-325     [000] ...2.   230.483136:
> > scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=0
> > prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=1
> > cmnd=(START_STOP - raw=1b 00 00 00 00 00)
> >            <idle>-0       [007] d.h3.   230.624530:
> > scsi_dispatch_cmd_done: host_no=0 channel=0 id=0 lun=0 data_sgl=0
> > prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=1
> > cmnd=(START_STOP - raw=1b 00 00 00 00 00) result=(driver=DRIVER_OK
> > host=DID_OK message=COMMAND_COMPLETE status=SAM_STAT_GOOD)
> >            <idle>-0       [000] d.s4.   230.624634: scsi_eh_wakeup: host_no=0
> >            <idle>-0       [000] ..s2.   230.624642: block_rq_complete:
> > 8,0 N () 18446744073709551615 + 0 [0]
> >    kworker/u16:14-1027    [007] d..3.   231.393642: scsi_eh_wakeup: host_no=0
> >
> > When reading from hard drive after s2idle, no more trace data
> > is generated.
>
> I think the above commands come from the suspend sequence. '1b 00 00 00
> 00 00' stops a block device. The lowest bit in byte 4 needs to be set to
> start a block device.
>
> Something that is not yet clear is whether or not sd_submit_start()
> hangs during the resume process. How about verifying whether or not
> sd_submit_start() hangs by either issuing SysRq-t or by adding pr_info()
> statements in that function?

sd_submit_start() is called once during suspend, and once during
resume.  It does not hang.

Reading from /dev/sda hangs after resume (not in sd_submit_start(),
which is never called for reading).

Two tasks are blocked in blk_mq_get_tag() calling io_schedule():

task:kworker/7:1     state:D stack:    0 pid:  122 ppid:     2 flags:0x00000008
Workqueue: events ata_scsi_dev_rescan
Call trace:
 __switch_to+0xbc/0x124
 __schedule+0x540/0x71c
 schedule+0x58/0xa0
 io_schedule+0x18/0x34
 blk_mq_get_tag+0x138/0x244
 __blk_mq_alloc_requests+0x130/0x2f0
 blk_mq_alloc_request+0x74/0xa8
 scsi_alloc_request+0x10/0x30
 __scsi_execute+0x5c/0x18c
 scsi_vpd_inquiry+0x7c/0xdc
 scsi_get_vpd_size+0x34/0xa8
 scsi_get_vpd_buf+0x28/0xf4
 scsi_attach_vpd+0x44/0x170
 scsi_rescan_device+0x30/0x98
 ata_scsi_dev_rescan+0xc8/0xfc
 process_one_work+0x2e0/0x474
 worker_thread+0x1cc/0x270
 kthread+0xd8/0xe8
 ret_from_fork+0x10/0x20


task:hd              state:D stack:    0 pid: 1163 ppid:  1076 flags:0x00000000
Call trace:
 __switch_to+0xbc/0x124
 __schedule+0x540/0x71c
 schedule+0x58/0xa0
 io_schedule+0x18/0x34
 blk_mq_get_tag+0x138/0x244
 __blk_mq_alloc_requests+0x130/0x2f0
 blk_mq_submit_bio+0x44c/0x5b4
 __submit_bio+0x24/0x5c
 submit_bio_noacct_nocheck+0x8c/0x178
 submit_bio_noacct+0x380/0x3b0
 submit_bio+0x34/0x3c
 mpage_bio_submit+0x28/0x38
 mpage_readahead+0xa8/0x178
 blkdev_readahead+0x14/0x1c
 read_pages+0x4c/0x158
 page_cache_ra_unbounded+0xd8/0x174
 do_page_cache_ra+0x40/0x4c
 page_cache_ra_order+0x14/0x1c
 ondemand_readahead+0x124/0x2fc
 page_cache_sync_ra+0x50/0x54
 filemap_read+0x130/0x6e8
 blkdev_read_iter+0xf0/0x164
 new_sync_read+0x74/0xc0
 vfs_read+0xbc/0xd8
 ksys_read+0x6c/0xd4
 __arm64_sys_read+0x14/0x1c
 invoke_syscall+0x70/0xf4
 el0_svc_common.constprop.0+0xbc/0xf0
 do_el0_svc+0x18/0x20
 el0_svc+0x30/0x84
 el0t_64_sync_handler+0x90/0xf8
 el0t_64_sync+0x14c/0x150

I hope this helps.
Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Bart Van Assche Aug. 12, 2022, 3:53 p.m. UTC | #14

On 8/12/22 03:48, Geert Uytterhoeven wrote:
> sd_submit_start() is called once during suspend, and once during
> resume.  It does not hang.
> 
> Reading from /dev/sda hangs after resume (not in sd_submit_start(),
> which is never called for reading).
> 
> Two tasks are blocked in blk_mq_get_tag() calling io_schedule():
> 
> task:kworker/7:1     state:D stack:    0 pid:  122 ppid:     2 flags:0x00000008
> Workqueue: events ata_scsi_dev_rescan
> Call trace:
>   __switch_to+0xbc/0x124
>   __schedule+0x540/0x71c
>   schedule+0x58/0xa0
>   io_schedule+0x18/0x34
>   blk_mq_get_tag+0x138/0x244
>   __blk_mq_alloc_requests+0x130/0x2f0
>   blk_mq_alloc_request+0x74/0xa8
>   scsi_alloc_request+0x10/0x30
>   __scsi_execute+0x5c/0x18c
>   scsi_vpd_inquiry+0x7c/0xdc
>   scsi_get_vpd_size+0x34/0xa8
>   scsi_get_vpd_buf+0x28/0xf4
>   scsi_attach_vpd+0x44/0x170
>   scsi_rescan_device+0x30/0x98
>   ata_scsi_dev_rescan+0xc8/0xfc
>   process_one_work+0x2e0/0x474
>   worker_thread+0x1cc/0x270
>   kthread+0xd8/0xe8
>   ret_from_fork+0x10/0x20
> 
> 
> task:hd              state:D stack:    0 pid: 1163 ppid:  1076 flags:0x00000000
> Call trace:
>   __switch_to+0xbc/0x124
>   __schedule+0x540/0x71c
>   schedule+0x58/0xa0
>   io_schedule+0x18/0x34
>   blk_mq_get_tag+0x138/0x244
>   __blk_mq_alloc_requests+0x130/0x2f0
>   blk_mq_submit_bio+0x44c/0x5b4
>   __submit_bio+0x24/0x5c
>   submit_bio_noacct_nocheck+0x8c/0x178
>   submit_bio_noacct+0x380/0x3b0
>   submit_bio+0x34/0x3c
>   mpage_bio_submit+0x28/0x38
>   mpage_readahead+0xa8/0x178
>   blkdev_readahead+0x14/0x1c
>   read_pages+0x4c/0x158
>   page_cache_ra_unbounded+0xd8/0x174
>   do_page_cache_ra+0x40/0x4c
>   page_cache_ra_order+0x14/0x1c
>   ondemand_readahead+0x124/0x2fc
>   page_cache_sync_ra+0x50/0x54
>   filemap_read+0x130/0x6e8
>   blkdev_read_iter+0xf0/0x164
>   new_sync_read+0x74/0xc0
>   vfs_read+0xbc/0xd8
>   ksys_read+0x6c/0xd4
>   __arm64_sys_read+0x14/0x1c
>   invoke_syscall+0x70/0xf4
>   el0_svc_common.constprop.0+0xbc/0xf0
>   do_el0_svc+0x18/0x20
>   el0_svc+0x30/0x84
>   el0t_64_sync_handler+0x90/0xf8
>   el0t_64_sync+0x14c/0x150

Hi Geert,

All that can be concluded from the above is that blk_mq_get_tag() is 
waiting for other I/O request(s) to finish. One or more other requests 
are in progress and either scsi_done() has not been called for these 
requests or the error handler got stuck. Since the issue reported above 
is not observed with other ATA interfaces, this may be related to the 
ATA interface driver used in your test setup.

Bart.

Geert Uytterhoeven Aug. 15, 2022, 10:13 a.m. UTC | #15

Hoi Bart,

On Fri, Aug 12, 2022 at 5:53 PM Bart Van Assche <bvanassche@acm.org> wrote:
> On 8/12/22 03:48, Geert Uytterhoeven wrote:
> > sd_submit_start() is called once during suspend, and once during
> > resume.  It does not hang.
> >
> > Reading from /dev/sda hangs after resume (not in sd_submit_start(),
> > which is never called for reading).

FTR, this issue is now present in v6.0-rc1. Reverting commit
88f1669019bd62b3 ("scsi: sd: Rework asynchronous resume support")
fixes it.

> > Two tasks are blocked in blk_mq_get_tag() calling io_schedule():
> >
> > task:kworker/7:1     state:D stack:    0 pid:  122 ppid:     2 flags:0x00000008
> > Workqueue: events ata_scsi_dev_rescan
> > Call trace:
> >   __switch_to+0xbc/0x124
> >   __schedule+0x540/0x71c
> >   schedule+0x58/0xa0
> >   io_schedule+0x18/0x34
> >   blk_mq_get_tag+0x138/0x244
> >   __blk_mq_alloc_requests+0x130/0x2f0
> >   blk_mq_alloc_request+0x74/0xa8
> >   scsi_alloc_request+0x10/0x30
> >   __scsi_execute+0x5c/0x18c
> >   scsi_vpd_inquiry+0x7c/0xdc
> >   scsi_get_vpd_size+0x34/0xa8
> >   scsi_get_vpd_buf+0x28/0xf4
> >   scsi_attach_vpd+0x44/0x170
> >   scsi_rescan_device+0x30/0x98
> >   ata_scsi_dev_rescan+0xc8/0xfc
> >   process_one_work+0x2e0/0x474
> >   worker_thread+0x1cc/0x270
> >   kthread+0xd8/0xe8
> >   ret_from_fork+0x10/0x20
> >
> >
> > task:hd              state:D stack:    0 pid: 1163 ppid:  1076 flags:0x00000000
> > Call trace:
> >   __switch_to+0xbc/0x124
> >   __schedule+0x540/0x71c
> >   schedule+0x58/0xa0
> >   io_schedule+0x18/0x34
> >   blk_mq_get_tag+0x138/0x244
> >   __blk_mq_alloc_requests+0x130/0x2f0
> >   blk_mq_submit_bio+0x44c/0x5b4
> >   __submit_bio+0x24/0x5c
> >   submit_bio_noacct_nocheck+0x8c/0x178
> >   submit_bio_noacct+0x380/0x3b0
> >   submit_bio+0x34/0x3c
> >   mpage_bio_submit+0x28/0x38
> >   mpage_readahead+0xa8/0x178
> >   blkdev_readahead+0x14/0x1c
> >   read_pages+0x4c/0x158
> >   page_cache_ra_unbounded+0xd8/0x174
> >   do_page_cache_ra+0x40/0x4c
> >   page_cache_ra_order+0x14/0x1c
> >   ondemand_readahead+0x124/0x2fc
> >   page_cache_sync_ra+0x50/0x54
> >   filemap_read+0x130/0x6e8
> >   blkdev_read_iter+0xf0/0x164
> >   new_sync_read+0x74/0xc0
> >   vfs_read+0xbc/0xd8
> >   ksys_read+0x6c/0xd4
> >   __arm64_sys_read+0x14/0x1c
> >   invoke_syscall+0x70/0xf4
> >   el0_svc_common.constprop.0+0xbc/0xf0
> >   do_el0_svc+0x18/0x20
> >   el0_svc+0x30/0x84
> >   el0t_64_sync_handler+0x90/0xf8
> >   el0t_64_sync+0x14c/0x150
>
> All that can be concluded from the above is that blk_mq_get_tag() is
> waiting for other I/O request(s) to finish. One or more other requests
> are in progress and either scsi_done() has not been called for these
> requests or the error handler got stuck. Since the issue reported above
> is not observed with other ATA interfaces, this may be related to the
> ATA interface driver used in your test setup.

I have added debug prints to all ata_port_operations in
sata_rcar_port_ops.  After s2idle, running "hd /dev/sda | head -70"
hangs before any of these functions are called.

Showing all locks held in the system:
1 lock held by rcu_tasks_kthre/10:
 #0: ffff800009575c38 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at:
rcu_tasks_one_gp+0x34/0x4c8
4 locks held by kworker/0:10/104:
 #0: ffff0004c0008738 ((wq_completion)events){+.+.}-{0:0}, at:
process_one_work+0x1f4/0x6a0
 #1: ffff80000a90bde0
((work_completion)(&ap->scsi_rescan_task)){+.+.}-{0:0}, at:
process_one_work+0x1f4/0x6a0
 #2: ffff0004c2b6bf60 (&ap->scsi_scan_mutex){+.+.}-{3:3}, at:
ata_scsi_dev_rescan+0x28/0x118
 #3: ffff0004c2902368 (&dev->mutex){....}-{3:3}, at:
scsi_rescan_device+0x28/0x78
1 lock held by in:imklog/636:
 #0: ffff0004c5ee86e8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x54/0x68
1 lock held by hd/1013:
 #0: ffff0004c06388b8 (mapping.invalidate_lock#2){.+.+}-{3:3}, at:
page_cache_ra_unbounded+0x64/0x1a8

I've just tried with a USB storage device on the same platform,
and it can be read fine after s2idle.  So it looks like the issue
is related to SATA.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Bart Van Assche Aug. 15, 2022, 1:49 p.m. UTC | #16

On 8/15/22 03:13, Geert Uytterhoeven wrote:
> Showing all locks held in the system:
> 1 lock held by rcu_tasks_kthre/10:
>   #0: ffff800009575c38 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at:
> rcu_tasks_one_gp+0x34/0x4c8
> 4 locks held by kworker/0:10/104:
>   #0: ffff0004c0008738 ((wq_completion)events){+.+.}-{0:0}, at:
> process_one_work+0x1f4/0x6a0
>   #1: ffff80000a90bde0
> ((work_completion)(&ap->scsi_rescan_task)){+.+.}-{0:0}, at:
> process_one_work+0x1f4/0x6a0
>   #2: ffff0004c2b6bf60 (&ap->scsi_scan_mutex){+.+.}-{3:3}, at:
> ata_scsi_dev_rescan+0x28/0x118
>   #3: ffff0004c2902368 (&dev->mutex){....}-{3:3}, at:
> scsi_rescan_device+0x28/0x78
> 1 lock held by in:imklog/636:
>   #0: ffff0004c5ee86e8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x54/0x68
> 1 lock held by hd/1013:
>   #0: ffff0004c06388b8 (mapping.invalidate_lock#2){.+.+}-{3:3}, at:
> page_cache_ra_unbounded+0x64/0x1a8

Thank you for having shared this information. I will take a closer look 
and see what I can derive from the above information.

> I've just tried with a USB storage device on the same platform,
> and it can be read fine after s2idle.  So it looks like the issue
> is related to SATA.

Unfortunately the above does not learn us anything new. The code 
modified by commit 88f1669019bd ("scsi: sd: Rework asynchronous resume 
support") is only called if sdev->manage_start_stop != 1. Only the SATA 
code, the Firewire code and the manage_start_stop sysfs attribute store 
method set that member variable:

$ git grep -nH 'manage_start_stop = '
drivers/ata/libata-scsi.c:1083:		sdev->manage_start_stop = 1;
drivers/firewire/sbp2.c:1521:		sdev->manage_start_stop = 1;
drivers/scsi/sd.c:240:	sdp->manage_start_stop = v;

Would it be possible to share the output of the command below? That 
should reveal which ATA driver is active on the test setup.

find /sys -name proc_name | xargs grep -aH .

Thanks,

Bart.

Geert Uytterhoeven Aug. 15, 2022, 6:26 p.m. UTC | #17

Hoi Bart,

On Mon, Aug 15, 2022 at 3:49 PM Bart Van Assche <bvanassche@acm.org> wrote:
> On 8/15/22 03:13, Geert Uytterhoeven wrote:
> > Showing all locks held in the system:
> > 1 lock held by rcu_tasks_kthre/10:
> >   #0: ffff800009575c38 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at:
> > rcu_tasks_one_gp+0x34/0x4c8
> > 4 locks held by kworker/0:10/104:
> >   #0: ffff0004c0008738 ((wq_completion)events){+.+.}-{0:0}, at:
> > process_one_work+0x1f4/0x6a0
> >   #1: ffff80000a90bde0
> > ((work_completion)(&ap->scsi_rescan_task)){+.+.}-{0:0}, at:
> > process_one_work+0x1f4/0x6a0
> >   #2: ffff0004c2b6bf60 (&ap->scsi_scan_mutex){+.+.}-{3:3}, at:
> > ata_scsi_dev_rescan+0x28/0x118
> >   #3: ffff0004c2902368 (&dev->mutex){....}-{3:3}, at:
> > scsi_rescan_device+0x28/0x78
> > 1 lock held by in:imklog/636:
> >   #0: ffff0004c5ee86e8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x54/0x68
> > 1 lock held by hd/1013:
> >   #0: ffff0004c06388b8 (mapping.invalidate_lock#2){.+.+}-{3:3}, at:
> > page_cache_ra_unbounded+0x64/0x1a8
>
> Thank you for having shared this information. I will take a closer look
> and see what I can derive from the above information.
>
> > I've just tried with a USB storage device on the same platform,
> > and it can be read fine after s2idle.  So it looks like the issue
> > is related to SATA.
>
> Unfortunately the above does not learn us anything new. The code
> modified by commit 88f1669019bd ("scsi: sd: Rework asynchronous resume
> support") is only called if sdev->manage_start_stop != 1. Only the SATA
> code, the Firewire code and the manage_start_stop sysfs attribute store
> method set that member variable:
>
> $ git grep -nH 'manage_start_stop = '
> drivers/ata/libata-scsi.c:1083:         sdev->manage_start_stop = 1;
> drivers/firewire/sbp2.c:1521:           sdev->manage_start_stop = 1;
> drivers/scsi/sd.c:240:  sdp->manage_start_stop = v;
>
> Would it be possible to share the output of the command below? That
> should reveal which ATA driver is active on the test setup.
>
> find /sys -name proc_name | xargs grep -aH .

/sys/devices/platform/soc/ee300000.sata/ata1/host0/scsi_host/host0/proc_name:sata_rcar

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Bart Van Assche Aug. 16, 2022, 8:21 p.m. UTC | #18

On 8/15/22 11:26, Geert Uytterhoeven wrote:
> On Mon, Aug 15, 2022 at 3:49 PM Bart Van Assche <bvanassche@acm.org> wrote:
>> Would it be possible to share the output of the command below? That
>> should reveal which ATA driver is active on the test setup.
>>
>> find /sys -name proc_name | xargs grep -aH .
> 
> /sys/devices/platform/soc/ee300000.sata/ata1/host0/scsi_host/host0/proc_name:sata_rcar

Thanks Geert for the help. Although I already posted a revert, I'm still 
trying to root-cause this issue. Do you perhaps know whether sata_rcar 
controllers support NCQ and if so, what queue depth these controllers 
support? I think that information is available in sysfs. Here is an 
example for a VM:

# (cd /sys/class/scsi_device && for a in */device/*/*/ncq_prio_enable; 
do p=${a%/ncq_prio_enable}; grep -qi ata $p/inquiry || continue; grep 
-aH . $p/{queue_depth,ncq*}; done)
2:0:0:0/device/driver/2:0:0:0/queue_depth:32
2:0:0:0/device/driver/2:0:0:0/ncq_prio_enable:0
2:0:0:0/device/driver/2:0:0:0/ncq_prio_supported:0
2:0:0:0/device/generic/device/queue_depth:32
2:0:0:0/device/generic/device/ncq_prio_enable:0
2:0:0:0/device/generic/device/ncq_prio_supported:0
6:0:0:1/device/driver/2:0:0:0/queue_depth:32
6:0:0:1/device/driver/2:0:0:0/ncq_prio_enable:0
6:0:0:1/device/driver/2:0:0:0/ncq_prio_supported:0

Thanks,

Bart.

Sergey Shtylyov Aug. 17, 2022, 8:53 a.m. UTC | #19

Hello!

On 8/16/22 11:21 PM, Bart Van Assche wrote:

>> On Mon, Aug 15, 2022 at 3:49 PM Bart Van Assche <bvanassche@acm.org> wrote:
>>> Would it be possible to share the output of the command below? That
>>> should reveal which ATA driver is active on the test setup.
>>>
>>> find /sys -name proc_name | xargs grep -aH .
>>
>> /sys/devices/platform/soc/ee300000.sata/ata1/host0/scsi_host/host0/proc_name:sata_rcar
> 
> Thanks Geert for the help. Although I already posted a revert, I'm still trying to
> root-cause this issue. Do you perhaps know whether sata_rcar controllers support NCQ

   They don't. :-)

> and if so, what queue depth these controllers support? I think that information is available in sysfs. Here is an example for a VM:
> 
> # (cd /sys/class/scsi_device && for a in */device/*/*/ncq_prio_enable; do p=${a%/ncq_prio_enable}; grep -qi ata $p/inquiry || continue; grep -aH . $p/{queue_depth,ncq*}; done)
> 2:0:0:0/device/driver/2:0:0:0/queue_depth:32
> 2:0:0:0/device/driver/2:0:0:0/ncq_prio_enable:0
> 2:0:0:0/device/driver/2:0:0:0/ncq_prio_supported:0
> 2:0:0:0/device/generic/device/queue_depth:32
> 2:0:0:0/device/generic/device/ncq_prio_enable:0
> 2:0:0:0/device/generic/device/ncq_prio_supported:0
> 6:0:0:1/device/driver/2:0:0:0/queue_depth:32
> 6:0:0:1/device/driver/2:0:0:0/ncq_prio_enable:0
> 6:0:0:1/device/driver/2:0:0:0/ncq_prio_supported:0
> 
> Thanks,
> 
> Bart.

MBR, Sergey

Vlastimil Babka Aug. 17, 2022, 7:07 p.m. UTC | #20

Hi, I have a T460 hanging on resume from suspend to ram in 6.0-rc1 that
I bisected to this commit.

> Unfortunately the above does not learn us anything new. The code 
> modified by commit 88f1669019bd ("scsi: sd: Rework asynchronous resume 
> support") is only called if sdev->manage_start_stop != 1. Only the SATA 
> code, the Firewire code and the manage_start_stop sysfs attribute store 
> method set that member variable:
> 
> $ git grep -nH 'manage_start_stop = '
> drivers/ata/libata-scsi.c:1083:		sdev->manage_start_stop = 1;
> drivers/firewire/sbp2.c:1521:		sdev->manage_start_stop = 1;
> drivers/scsi/sd.c:240:	sdp->manage_start_stop = v;
> 
> Would it be possible to share the output of the command below? That 
> should reveal which ATA driver is active on the test setup.
> 
> find /sys -name proc_name | xargs grep -aH .

In my case it's
/sys/devices/pci0000:00/0000:00:17.0/ata1/host0/scsi_host/host0/proc_name:ahci
/sys/devices/pci0000:00/0000:00:17.0/ata2/host1/scsi_host/host1/proc_name:ahci

Some more details from dmesg

[    0.849373] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    0.852849] ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
[    0.854671] ata2.00: supports DRM functions and may not be fully accessible
[    0.856181] ata2.00: ATA-9: SAMSUNG MZ7LN512HMJP-000L7, MAV01L6Q, max UDMA/133
[    0.858115] ata2.00: 1000215216 sectors, multi 1: LBA48 NCQ (depth 32), AA
[    0.861584] ata2.00: Features: Trust Dev-Sleep NCQ-sndrcv
[    0.863749] ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
[    0.865481] ata2.00: supports DRM functions and may not be fully accessible
[    0.870043] ata2.00: configured for UDMA/133
[    0.871871] scsi 1:0:0:0: Direct-Access     ATA      SAMSUNG MZ7LN512 1L6Q PQ: 0 ANSI: 5

Please Cc me on further questions/steps to try/patches to test.

#regzbot introduced: 88f1669019bd62b3
#regzbot monitor: https://lore.kernel.org/all/20220816172638.538734-1-bvanassche@acm.org/

> Thanks,
> 
> Bart.

Bart Van Assche Aug. 17, 2022, 7:28 p.m. UTC | #21

On 8/17/22 12:07, Vlastimil Babka wrote:
> In my case it's
> /sys/devices/pci0000:00/0000:00:17.0/ata1/host0/scsi_host/host0/proc_name:ahci
> /sys/devices/pci0000:00/0000:00:17.0/ata2/host1/scsi_host/host1/proc_name:ahci
> 
> Some more details from dmesg
> 
> [    0.849373] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [    0.852849] ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
> [    0.854671] ata2.00: supports DRM functions and may not be fully accessible
> [    0.856181] ata2.00: ATA-9: SAMSUNG MZ7LN512HMJP-000L7, MAV01L6Q, max UDMA/133
> [    0.858115] ata2.00: 1000215216 sectors, multi 1: LBA48 NCQ (depth 32), AA
> [    0.861584] ata2.00: Features: Trust Dev-Sleep NCQ-sndrcv
> [    0.863749] ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
> [    0.865481] ata2.00: supports DRM functions and may not be fully accessible
> [    0.870043] ata2.00: configured for UDMA/133
> [    0.871871] scsi 1:0:0:0: Direct-Access     ATA      SAMSUNG MZ7LN512 1L6Q PQ: 0 ANSI: 5
> 
> Please Cc me on further questions/steps to try/patches to test.

Hi Vlastimil,

Thank you for having provided the above information. The root cause of 
the hang is not yet clear to me. I was wondering whether the hang 
perhaps would be triggered by controllers that only support queue depth 
1. However, in the above output I see "depth 32".

As already reported in this email thread a revert for commit 
88f1669019bd62b3 has been posted on the linux-scsi mailing list. 
Additionally, Greg KH has been asked to drop that patch from the stable 
trees.

Thanks,

Bart.

Linux regression tracking (Thorsten Leemhuis) Aug. 28, 2022, 11:52 a.m. UTC | #22

TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

On 17.08.22 21:07, Vlastimil Babka wrote:
> Hi, I have a T460 hanging on resume from suspend to ram in 6.0-rc1 that
> I bisected to this commit.
> 
>> Unfortunately the above does not learn us anything new. The code 
>> modified by commit 88f1669019bd ("scsi: sd: Rework asynchronous resume 
>> support") is only called if sdev->manage_start_stop != 1. Only the SATA 
>> code, the Firewire code and the manage_start_stop sysfs attribute store 
>> method set that member variable:
> [...]
> #regzbot introduced: 88f1669019bd62b3
> #regzbot monitor: https://lore.kernel.org/all/20220816172638.538734-1-bvanassche@acm.org/

#regzbot fixed-by: 785538bfdd682c8e962341d585f9b88262a0475

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

[v2,2/2] scsi: sd: Rework asynchronous resume support

Commit Message

Comments

Patch