diff mbox

scsi-mq: Always unprepare before requeuing a request

Message ID 20170803214014.20332-1-bart.vanassche@wdc.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Bart Van Assche Aug. 3, 2017, 9:40 p.m. UTC
One of the two scsi-mq functions that requeue a request unprepares
a request before requeueing (scsi_io_completion()) but the other
function not (__scsi_queue_insert()). Make sure that a request is
unprepared before requeuing it.

Fixes: commit d285203cf647 ("scsi: add support for a blk-mq based I/O path.")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/scsi/scsi_lib.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Damien Le Moal Aug. 4, 2017, 8:06 a.m. UTC | #1
On 8/4/17 06:40, Bart Van Assche wrote:
> One of the two scsi-mq functions that requeue a request unprepares
> a request before requeueing (scsi_io_completion()) but the other
> function not (__scsi_queue_insert()). Make sure that a request is
> unprepared before requeuing it.
> 
> Fixes: commit d285203cf647 ("scsi: add support for a blk-mq based I/O path.")
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/scsi/scsi_lib.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 4a2f705cdb14..c7514f3b444a 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -44,6 +44,8 @@ static struct kmem_cache *scsi_sense_cache;
>  static struct kmem_cache *scsi_sense_isadma_cache;
>  static DEFINE_MUTEX(scsi_sense_cache_mutex);
>  
> +static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd);
> +
>  static inline struct kmem_cache *
>  scsi_select_sense_cache(bool unchecked_isa_dma)
>  {
> @@ -140,6 +142,12 @@ static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd)
>  {
>  	struct scsi_device *sdev = cmd->device;
>  
> +	if (cmd->request->rq_flags & RQF_DONTPREP) {
> +		cmd->request->rq_flags &= ~RQF_DONTPREP;
> +		scsi_mq_uninit_cmd(cmd);
> +	} else {
> +		WARN_ON_ONCE(true);
> +	}
>  	blk_mq_requeue_request(cmd->request, true);
>  	put_device(&sdev->sdev_gendev);
>  }
> @@ -995,8 +1003,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
>  		 * A new command will be prepared and issued.
>  		 */
>  		if (q->mq_ops) {
> -			cmd->request->rq_flags &= ~RQF_DONTPREP;
> -			scsi_mq_uninit_cmd(cmd);
>  			scsi_mq_requeue_cmd(cmd);
>  		} else {
>  			scsi_release_buffers(cmd);
> 

Tested-by: Damien Le Moal <damien.lemoal@wdc.com>

This patch is needed for the V2 of the series "Zoned block device
support fixes" that I sent.

Best regards.
Christoph Hellwig Aug. 5, 2017, 11:36 a.m. UTC | #2
Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Johannes Thumshirn Aug. 7, 2017, 7:33 a.m. UTC | #3
Thanks Bart,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Martin K. Petersen Aug. 7, 2017, 5:49 p.m. UTC | #4
Bart,

> One of the two scsi-mq functions that requeue a request unprepares a
> request before requeueing (scsi_io_completion()) but the other
> function not (__scsi_queue_insert()). Make sure that a request is
> unprepared before requeuing it.

Applied to 4.13/scsi-fixes. Thanks much!
Bart Van Assche Aug. 10, 2017, 3:26 p.m. UTC | #5
On Thu, 2017-08-10 at 20:32 +1000, Michael Ellerman wrote:
> "Martin K. Petersen" <martin.petersen@oracle.com> writes:

> > > One of the two scsi-mq functions that requeue a request unprepares a

> > > request before requeueing (scsi_io_completion()) but the other

> > > function not (__scsi_queue_insert()). Make sure that a request is

> > > unprepared before requeuing it.

> > 

> > Applied to 4.13/scsi-fixes. Thanks much!

> 

> This seems to be preventing my Power8 box, which uses IPR, from booting.

> 

> Bisect said so:

> 

> # first bad commit: [270065e92c317845d69095ec8e3d18616b5b39d5] scsi: scsi-mq: Always unprepare before requeuing a request

> 

> And if I revert that on top of next-20170809 my system boots again.

> 

> The symptom is that it just gets "stuck" during boot and never gets to

> mounting root, full log below, the end is:

> 

>   .

>   ready

>   ready

>   sd 0:2:4:0: [sde] 554287104 512-byte logical blocks: (284 GB/264 GiB)

>   sd 0:2:4:0: [sde] 4096-byte physical blocks

>   sd 0:2:5:0: [sdf] 272646144 512-byte logical blocks: (140 GB/130 GiB)

>   sd 0:2:5:0: [sdf] 4096-byte physical blocks

>   sd 0:2:4:0: [sde] Write Protect is off

>   sd 0:2:4:0: [sde] Mode Sense: 0b 00 00 08

>   sd 0:2:5:0: [sdf] Write Protect is off

>   sd 0:2:5:0: [sdf] Mode Sense: 0b 00 00 08

> 

> 

> And it just sits there for at least hours.

> 

> I compared a good and bad boot log and there appears to be essentially

> no difference. Certainly nothing that looks suspicous.


Hello Michael,

Thanks for having reported this early. Is there any chance that you can
reproduce this state, press SysRq-w on the console and collect the task
overview that is reported on the console (see also Documentation/admin-guide/
sysrq.rst)? If this is not possible or if that task overview does not report
any blocked tasks, can you add scsi_mod.scsi_logging_level=-1 to the kernel
command line (either through /etc/default/grub or in /boot/grub2/grub.cfg
when using GRUB), reboot the system, capture the console output and report
that output as a reply to this e-mail?

Thanks,

Bart.
Brian King Aug. 10, 2017, 3:56 p.m. UTC | #6
On 08/10/2017 10:26 AM, Bart Van Assche wrote:
> On Thu, 2017-08-10 at 20:32 +1000, Michael Ellerman wrote:
>> "Martin K. Petersen" <martin.petersen@oracle.com> writes:
>>>> One of the two scsi-mq functions that requeue a request unprepares a
>>>> request before requeueing (scsi_io_completion()) but the other
>>>> function not (__scsi_queue_insert()). Make sure that a request is
>>>> unprepared before requeuing it.
>>>
>>> Applied to 4.13/scsi-fixes. Thanks much!
>>
>> This seems to be preventing my Power8 box, which uses IPR, from booting.
>>
>> Bisect said so:
>>
>> # first bad commit: [270065e92c317845d69095ec8e3d18616b5b39d5] scsi: scsi-mq: Always unprepare before requeuing a request
>>
>> And if I revert that on top of next-20170809 my system boots again.
>>
>> The symptom is that it just gets "stuck" during boot and never gets to
>> mounting root, full log below, the end is:
>>
>>   .
>>   ready
>>   ready
>>   sd 0:2:4:0: [sde] 554287104 512-byte logical blocks: (284 GB/264 GiB)
>>   sd 0:2:4:0: [sde] 4096-byte physical blocks
>>   sd 0:2:5:0: [sdf] 272646144 512-byte logical blocks: (140 GB/130 GiB)
>>   sd 0:2:5:0: [sdf] 4096-byte physical blocks
>>   sd 0:2:4:0: [sde] Write Protect is off
>>   sd 0:2:4:0: [sde] Mode Sense: 0b 00 00 08
>>   sd 0:2:5:0: [sdf] Write Protect is off
>>   sd 0:2:5:0: [sdf] Mode Sense: 0b 00 00 08
>>
>>
>> And it just sits there for at least hours.
>>
>> I compared a good and bad boot log and there appears to be essentially
>> no difference. Certainly nothing that looks suspicous.
> 
> Hello Michael,
> 
> Thanks for having reported this early. Is there any chance that you can
> reproduce this state, press SysRq-w on the console and collect the task
> overview that is reported on the console (see also Documentation/admin-guide/
> sysrq.rst)? If this is not possible or if that task overview does not report
> any blocked tasks, can you add scsi_mod.scsi_logging_level=-1 to the kernel
> command line (either through /etc/default/grub or in /boot/grub2/grub.cfg
> when using GRUB), reboot the system, capture the console output and report
> that output as a reply to this e-mail?

I'm building a kernel to try to reproduce this on a machine with ipr.


-Brian
Michael Ellerman Aug. 11, 2017, 1:05 a.m. UTC | #7
Bart Van Assche <Bart.VanAssche@wdc.com> writes:
> On Thu, 2017-08-10 at 20:32 +1000, Michael Ellerman wrote:
>> "Martin K. Petersen" <martin.petersen@oracle.com> writes:
>> > > One of the two scsi-mq functions that requeue a request unprepares a
>> > > request before requeueing (scsi_io_completion()) but the other
>> > > function not (__scsi_queue_insert()). Make sure that a request is
>> > > unprepared before requeuing it.
>> > 
>> > Applied to 4.13/scsi-fixes. Thanks much!
>> 
>> This seems to be preventing my Power8 box, which uses IPR, from booting.
...
>
> Hello Michael,
>
> Thanks for having reported this early.

No worries.

> Is there any chance that you can
> reproduce this state, press SysRq-w on the console and collect the task
> overview that is reported on the console (see also Documentation/admin-guide/
> sysrq.rst)?

Here it is below. Doesn't seem to change over time.

I can do the scsi_logging_level thing as well as soon as I've got some
coffee :)

cheers


sysrq: SysRq : Show Blocked State
  task                        PC stack   pid father
swapper/0       D10080     1      0 0x00000800
Call Trace:
[c0000003f7583890] [c0000003f75838e0] 0xc0000003f75838e0 (unreliable)
[c0000003f7583a60] [c00000000001b3b8] __switch_to+0x2a8/0x460
[c0000003f7583ac0] [c0000000009abc60] __schedule+0x320/0xaa0
[c0000003f7583b90] [c0000000009ac420] schedule+0x40/0xb0
[c0000003f7583bc0] [c000000000110fc4] async_synchronize_cookie_domain+0xd4/0x150
[c0000003f7583c30] [c000000000619f94] wait_for_device_probe+0x44/0xe0
[c0000003f7583c90] [c000000000c64ce4] prepare_namespace+0x58/0x248
[c0000003f7583d00] [c000000000c64478] kernel_init_freeable+0x310/0x348
[c0000003f7583dc0] [c00000000000d6e4] kernel_init+0x24/0x150
[c0000003f7583e30] [c00000000000ba1c] ret_from_kernel_thread+0x5c/0xc0
kworker/u193:0  D12736     6      2 0x00000800
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[c0000003f7597410] [c000000000150d00] console_unlock+0x330/0x770 (unreliable)
[c0000003f75975e0] [c00000000001b3b8] __switch_to+0x2a8/0x460
[c0000003f7597640] [c0000000009abc60] __schedule+0x320/0xaa0
[c0000003f7597710] [c0000000009ac420] schedule+0x40/0xb0
[c0000003f7597740] [c0000000009b09d4] schedule_timeout+0x254/0x440
[c0000003f7597820] [c0000000009aca80] io_schedule_timeout+0x30/0x60
[c0000003f7597850] [c0000000009ad75c] wait_for_common_io.constprop.2+0xbc/0x1e0
[c0000003f75978d0] [c000000000509e6c] blk_execute_rq+0x4c/0x70
[c0000003f7597920] [c000000000654abc] scsi_execute+0xfc/0x260
[c0000003f7597990] [c000000000654f98] scsi_mode_sense+0x218/0x410
[c0000003f7597aa0] [c00000000068ee68] sd_revalidate_disk+0x908/0x1cd0
[c0000003f7597be0] [c000000000690434] sd_probe_async+0xb4/0x220
[c0000003f7597c60] [c000000000110a20] async_run_entry_fn+0x70/0x170
[c0000003f7597ca0] [c000000000103904] process_one_work+0x2b4/0x560
[c0000003f7597d30] [c000000000103c38] worker_thread+0x88/0x5a0
[c0000003f7597dc0] [c00000000010bfcc] kthread+0x15c/0x1a0
[c0000003f7597e30] [c00000000000ba1c] ret_from_kernel_thread+0x5c/0xc0
kworker/u193:1  D12480   412      2 0x00000800
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[c0000003f5907410] [c000000000150d00] console_unlock+0x330/0x770 (unreliable)
[c0000003f59075e0] [c00000000001b3b8] __switch_to+0x2a8/0x460
[c0000003f5907640] [c0000000009abc60] __schedule+0x320/0xaa0
[c0000003f5907710] [c0000000009ac420] schedule+0x40/0xb0
[c0000003f5907740] [c0000000009b09d4] schedule_timeout+0x254/0x440
[c0000003f5907820] [c0000000009aca80] io_schedule_timeout+0x30/0x60
[c0000003f5907850] [c0000000009ad75c] wait_for_common_io.constprop.2+0xbc/0x1e0
[c0000003f59078d0] [c000000000509e6c] blk_execute_rq+0x4c/0x70
[c0000003f5907920] [c000000000654abc] scsi_execute+0xfc/0x260
[c0000003f5907990] [c000000000654f98] scsi_mode_sense+0x218/0x410
[c0000003f5907aa0] [c00000000068ee68] sd_revalidate_disk+0x908/0x1cd0
[c0000003f5907be0] [c000000000690434] sd_probe_async+0xb4/0x220
[c0000003f5907c60] [c000000000110a20] async_run_entry_fn+0x70/0x170
[c0000003f5907ca0] [c000000000103904] process_one_work+0x2b4/0x560
[c0000003f5907d30] [c000000000103c38] worker_thread+0x88/0x5a0
[c0000003f5907dc0] [c00000000010bfcc] kthread+0x15c/0x1a0
[c0000003f5907e30] [c00000000000ba1c] ret_from_kernel_thread+0x5c/0xc0
kworker/u193:2  D12832   421      2 0x00000800
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[c0000003f4103410] [c0000003f41035f0] 0xc0000003f41035f0 (unreliable)
[c0000003f41035e0] [c00000000001b3b8] __switch_to+0x2a8/0x460
[c0000003f4103640] [c0000000009abc60] __schedule+0x320/0xaa0
[c0000003f4103710] [c0000000009ac420] schedule+0x40/0xb0
[c0000003f4103740] [c0000000009b09d4] schedule_timeout+0x254/0x440
[c0000003f4103820] [c0000000009aca80] io_schedule_timeout+0x30/0x60
[c0000003f4103850] [c0000000009ad75c] wait_for_common_io.constprop.2+0xbc/0x1e0
[c0000003f41038d0] [c000000000509e6c] blk_execute_rq+0x4c/0x70
[c0000003f4103920] [c000000000654abc] scsi_execute+0xfc/0x260
[c0000003f4103990] [c000000000654f98] scsi_mode_sense+0x218/0x410
[c0000003f4103aa0] [c00000000068ee68] sd_revalidate_disk+0x908/0x1cd0
[c0000003f4103be0] [c000000000690434] sd_probe_async+0xb4/0x220
[c0000003f4103c60] [c000000000110a20] async_run_entry_fn+0x70/0x170
[c0000003f4103ca0] [c000000000103904] process_one_work+0x2b4/0x560
[c0000003f4103d30] [c000000000103c38] worker_thread+0x88/0x5a0
[c0000003f4103dc0] [c00000000010bfcc] kthread+0x15c/0x1a0
[c0000003f4103e30] [c00000000000ba1c] ret_from_kernel_thread+0x5c/0xc0
kworker/u193:3  D12832   428      2 0x00000800
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[c0000003f4603410] [c000000000150d00] console_unlock+0x330/0x770 (unreliable)
[c0000003f46035e0] [c00000000001b3b8] __switch_to+0x2a8/0x460
[c0000003f4603640] [c0000000009abc60] __schedule+0x320/0xaa0
[c0000003f4603710] [c0000000009ac420] schedule+0x40/0xb0
[c0000003f4603740] [c0000000009b09d4] schedule_timeout+0x254/0x440
[c0000003f4603820] [c0000000009aca80] io_schedule_timeout+0x30/0x60
[c0000003f4603850] [c0000000009ad75c] wait_for_common_io.constprop.2+0xbc/0x1e0
[c0000003f46038d0] [c000000000509e6c] blk_execute_rq+0x4c/0x70
[c0000003f4603920] [c000000000654abc] scsi_execute+0xfc/0x260
[c0000003f4603990] [c000000000654f98] scsi_mode_sense+0x218/0x410
[c0000003f4603aa0] [c00000000068ee68] sd_revalidate_disk+0x908/0x1cd0
[c0000003f4603be0] [c000000000690434] sd_probe_async+0xb4/0x220
[c0000003f4603c60] [c000000000110a20] async_run_entry_fn+0x70/0x170
[c0000003f4603ca0] [c000000000103904] process_one_work+0x2b4/0x560
[c0000003f4603d30] [c000000000103c38] worker_thread+0x88/0x5a0
[c0000003f4603dc0] [c00000000010bfcc] kthread+0x15c/0x1a0
[c0000003f4603e30] [c00000000000ba1c] ret_from_kernel_thread+0x5c/0xc0
kworker/u193:4  D12832   546      2 0x00000800
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[c0000003f4607410] [c0000003f46075f0] 0xc0000003f46075f0 (unreliable)
[c0000003f46075e0] [c00000000001b3b8] __switch_to+0x2a8/0x460
[c0000003f4607640] [c0000000009abc60] __schedule+0x320/0xaa0
[c0000003f4607710] [c0000000009ac420] schedule+0x40/0xb0
[c0000003f4607740] [c0000000009b09d4] schedule_timeout+0x254/0x440
[c0000003f4607820] [c0000000009aca80] io_schedule_timeout+0x30/0x60
[c0000003f4607850] [c0000000009ad75c] wait_for_common_io.constprop.2+0xbc/0x1e0
[c0000003f46078d0] [c000000000509e6c] blk_execute_rq+0x4c/0x70
[c0000003f4607920] [c000000000654abc] scsi_execute+0xfc/0x260
[c0000003f4607990] [c000000000654f98] scsi_mode_sense+0x218/0x410
[c0000003f4607aa0] [c00000000068ee68] sd_revalidate_disk+0x908/0x1cd0
[c0000003f4607be0] [c000000000690434] sd_probe_async+0xb4/0x220
[c0000003f4607c60] [c000000000110a20] async_run_entry_fn+0x70/0x170
[c0000003f4607ca0] [c000000000103904] process_one_work+0x2b4/0x560
[c0000003f4607d30] [c000000000103c38] worker_thread+0x88/0x5a0
[c0000003f4607dc0] [c00000000010bfcc] kthread+0x15c/0x1a0
[c0000003f4607e30] [c00000000000ba1c] ret_from_kernel_thread+0x5c/0xc0
kworker/u193:5  D12848  1893      2 0x00000800
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[c0000003ec46f410] [c0000003ec46f5f0] 0xc0000003ec46f5f0 (unreliable)
[c0000003ec46f5e0] [c00000000001b3b8] __switch_to+0x2a8/0x460
[c0000003ec46f640] [c0000000009abc60] __schedule+0x320/0xaa0
[c0000003ec46f710] [c0000000009ac420] schedule+0x40/0xb0
[c0000003ec46f740] [c0000000009b09d4] schedule_timeout+0x254/0x440
[c0000003ec46f820] [c0000000009aca80] io_schedule_timeout+0x30/0x60
[c0000003ec46f850] [c0000000009ad75c] wait_for_common_io.constprop.2+0xbc/0x1e0
[c0000003ec46f8d0] [c000000000509e6c] blk_execute_rq+0x4c/0x70
[c0000003ec46f920] [c000000000654abc] scsi_execute+0xfc/0x260
[c0000003ec46f990] [c000000000654f98] scsi_mode_sense+0x218/0x410
[c0000003ec46faa0] [c00000000068ee68] sd_revalidate_disk+0x908/0x1cd0
[c0000003ec46fbe0] [c000000000690434] sd_probe_async+0xb4/0x220
[c0000003ec46fc60] [c000000000110a20] async_run_entry_fn+0x70/0x170
[c0000003ec46fca0] [c000000000103904] process_one_work+0x2b4/0x560
[c0000003ec46fd30] [c000000000103c38] worker_thread+0x88/0x5a0
[c0000003ec46fdc0] [c00000000010bfcc] kthread+0x15c/0x1a0
[c0000003ec46fe30] [c00000000000ba1c] ret_from_kernel_thread+0x5c/0xc0
Michael Ellerman Aug. 11, 2017, 3:18 a.m. UTC | #8
Bart Van Assche <Bart.VanAssche@wdc.com> writes:

> On Thu, 2017-08-10 at 20:32 +1000, Michael Ellerman wrote:
>> "Martin K. Petersen" <martin.petersen@oracle.com> writes:
>> > > One of the two scsi-mq functions that requeue a request unprepares a
>> > > request before requeueing (scsi_io_completion()) but the other
>> > > function not (__scsi_queue_insert()). Make sure that a request is
>> > > unprepared before requeuing it.
>> > 
>> > Applied to 4.13/scsi-fixes. Thanks much!
>> 
>> This seems to be preventing my Power8 box, which uses IPR, from booting.
..
>
> Thanks for having reported this early. Is there any chance that you can
> reproduce this state, press SysRq-w on the console and collect the task
> overview that is reported on the console (see also Documentation/admin-guide/
> sysrq.rst)? If this is not possible or if that task overview does not report
> any blocked tasks, can you add scsi_mod.scsi_logging_level=-1 to the kernel
> command line

That didn't seem to do anything?

I guess I need CONFIG_SCSI_LOGGING=y ? ...

Yes that fixed it.

OK so lots of output, it looks like it's just repeating but rather than
cut it off too early I let it run for ~60s, so it's a fairly big log,
attached.

One thing I didn't mention which might be relevant is that my bootloader
is Linux, so this kernel is started via kexec.

cheers
Bart Van Assche Aug. 11, 2017, 3:37 p.m. UTC | #9
On Fri, 2017-08-11 at 11:05 +1000, Michael Ellerman wrote:
> kworker/u193:0  D12736     6      2 0x00000800

> Workqueue: events_unbound async_run_entry_fn

> Call Trace:

> [c0000003f7597410] [c000000000150d00] console_unlock+0x330/0x770 (unreliable)

> [c0000003f75975e0] [c00000000001b3b8] __switch_to+0x2a8/0x460

> [c0000003f7597640] [c0000000009abc60] __schedule+0x320/0xaa0

> [c0000003f7597710] [c0000000009ac420] schedule+0x40/0xb0

> [c0000003f7597740] [c0000000009b09d4] schedule_timeout+0x254/0x440

> [c0000003f7597820] [c0000000009aca80] io_schedule_timeout+0x30/0x60

> [c0000003f7597850] [c0000000009ad75c] wait_for_common_io.constprop.2+0xbc/0x1e0

> [c0000003f75978d0] [c000000000509e6c] blk_execute_rq+0x4c/0x70

> [c0000003f7597920] [c000000000654abc] scsi_execute+0xfc/0x260

> [c0000003f7597990] [c000000000654f98] scsi_mode_sense+0x218/0x410

> [c0000003f7597aa0] [c00000000068ee68] sd_revalidate_disk+0x908/0x1cd0

> [c0000003f7597be0] [c000000000690434] sd_probe_async+0xb4/0x220

> [c0000003f7597c60] [c000000000110a20] async_run_entry_fn+0x70/0x170

> [c0000003f7597ca0] [c000000000103904] process_one_work+0x2b4/0x560

> [c0000003f7597d30] [c000000000103c38] worker_thread+0x88/0x5a0

> [c0000003f7597dc0] [c00000000010bfcc] kthread+0x15c/0x1a0

> [c0000003f7597e30] [c00000000000ba1c] ret_from_kernel_thread+0x5c/0xc0


Hello Michael,

It is suspicious that entries like the above appear in the SysRq-w output.
Every time I saw this in the past it was caused by a block driver not having
called blk_end_request() or a SCSI LLD not having called .scsi_done().
Additionally, it is unlikely that the patch at the start of this thread
introduced this issue. Can you have another look at the ipr driver? If a
shell is available at the time this lockup occurs, it will probably be
helpful to have a look at the debugfs entries under /sys/kernel/debug/block/.

Thanks,

Bart.
diff mbox

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 4a2f705cdb14..c7514f3b444a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -44,6 +44,8 @@  static struct kmem_cache *scsi_sense_cache;
 static struct kmem_cache *scsi_sense_isadma_cache;
 static DEFINE_MUTEX(scsi_sense_cache_mutex);
 
+static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd);
+
 static inline struct kmem_cache *
 scsi_select_sense_cache(bool unchecked_isa_dma)
 {
@@ -140,6 +142,12 @@  static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd)
 {
 	struct scsi_device *sdev = cmd->device;
 
+	if (cmd->request->rq_flags & RQF_DONTPREP) {
+		cmd->request->rq_flags &= ~RQF_DONTPREP;
+		scsi_mq_uninit_cmd(cmd);
+	} else {
+		WARN_ON_ONCE(true);
+	}
 	blk_mq_requeue_request(cmd->request, true);
 	put_device(&sdev->sdev_gendev);
 }
@@ -995,8 +1003,6 @@  void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 		 * A new command will be prepared and issued.
 		 */
 		if (q->mq_ops) {
-			cmd->request->rq_flags &= ~RQF_DONTPREP;
-			scsi_mq_uninit_cmd(cmd);
 			scsi_mq_requeue_cmd(cmd);
 		} else {
 			scsi_release_buffers(cmd);