diff mbox series

block: Fix bug in runtime-resume handling

Message ID 20200823145733.GC303967@rowland.harvard.edu (mailing list archive)
State Not Applicable
Headers show
Series block: Fix bug in runtime-resume handling | expand

Commit Message

Alan Stern Aug. 23, 2020, 2:57 p.m. UTC
Runtime power-management support in the block layer has somehow gotten
messed up in the past few years.  This area of code has been through a
lot of churn and it's not easy to track exactly when the bug addressed
here was introduced; it was present for a while, then fixed for a
while, and then it returned.

At any rate, the problem is that the block layer code thinks that it's
okay to issue a request with the BLK_MQ_REQ_PREEMPT flag set at any
time, even when the queue and underlying device are runtime suspended.
This belief is wrong; the flag merely indicates that it's okay to
issue the request while the queue and device are in the process of
suspending or resuming.  When they are actually suspended, no requests
may be issued.

The symptom of this bug is that a runtime-suspended block device
doesn't resume as it should.  The request which should cause a runtime
resume instead gets issued directly, without resuming the device
first.  Of course the device can't handle it properly, the I/O
fails, and the device remains suspended.

The problem is fixed by checking that the queue's runtime-PM status
isn't RPM_SUSPENDED before allowing a request to be issued, and
queuing a runtime-resume request if it is.  In particular, the inline
blk_pm_request_resume() routine is renamed blk_pm_resume_queue() and
the code is unified by merging the surrounding checks into the
routine.  If the queue isn't set up for runtime PM, or there currently
is no restriction on allowed requests, the request is allowed.
Likewise if the BLK_MQ_REQ_PREEMPT flag is set and the status isn't
RPM_SUSPENDED.  Otherwise a runtime resume is queued and the request
is blocked until conditions are more suitable.

Reported-and-tested-by: Martin Kepplinger <martin.kepplinger@puri.sm>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Bart Van Assche <bvanassche@acm.org>
CC: <stable@vger.kernel.org> # 5.2

---

The bug goes back way before 5.2, but other changes will prevent the
patch from applying directly to earlier kernels, so I'm limiting the 
@stable updates to 5.2 and after.


[as1941]


 block/blk-core.c |    6 +++---
 block/blk-pm.h   |   14 +++++++++-----
 2 files changed, 12 insertions(+), 8 deletions(-)

Comments

Bart Van Assche Aug. 24, 2020, 5:48 p.m. UTC | #1
On 2020-08-23 07:57, Alan Stern wrote:
> The problem is fixed by checking that the queue's runtime-PM status
> isn't RPM_SUSPENDED before allowing a request to be issued, [ ... ]

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

Can, can you help with testing this patch?

Thanks,

Bart.
Alan Stern Aug. 24, 2020, 8:13 p.m. UTC | #2
On Mon, Aug 24, 2020 at 10:48:32AM -0700, Bart Van Assche wrote:
> On 2020-08-23 07:57, Alan Stern wrote:
> > The problem is fixed by checking that the queue's runtime-PM status
> > isn't RPM_SUSPENDED before allowing a request to be issued, [ ... ]
> 
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> 
> Can, can you help with testing this patch?
> 
> Thanks,
> 
> Bart.

Martin:

(I forgot to ask this question several weeks ago, while you were running 
your tests.  Better ask it now before I forget again...)

I suspect the old runtime-PM code in the block layer would have worked 
okay in your SD cardreader test if the BLK_MQ_REQ_PREEMPT flag had not 
been set.  Do you know why the flag was set, or what line of code caused 
it to be set?

As I recall, the request that failed was a read-ahead.  It's not clear 
to me why such a request would need to have BLK_MQ_REQ_PREEMPT set.  
Although there may be other reasons for having that flag, at this point 
we're concerned with requests that are part of the runtime-resume 
procedure.  A read-ahead does not fall into that category.

Do you think you can find the answer?  Perhaps we should add a separate 
BLK_MQ_REQ_PM flag.

Alan Stern
Martin Kepplinger Aug. 26, 2020, 7:48 a.m. UTC | #3
On 24.08.20 22:13, Alan Stern wrote:
> On Mon, Aug 24, 2020 at 10:48:32AM -0700, Bart Van Assche wrote:
>> On 2020-08-23 07:57, Alan Stern wrote:
>>> The problem is fixed by checking that the queue's runtime-PM status
>>> isn't RPM_SUSPENDED before allowing a request to be issued, [ ... ]
>>
>> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
>>
>> Can, can you help with testing this patch?
>>
>> Thanks,
>>
>> Bart.
> 
> Martin:
> 
> (I forgot to ask this question several weeks ago, while you were running 
> your tests.  Better ask it now before I forget again...)
> 
> I suspect the old runtime-PM code in the block layer would have worked 
> okay in your SD cardreader test if the BLK_MQ_REQ_PREEMPT flag had not 
> been set.  Do you know why the flag was set, or what line of code caused 
> it to be set?

Correct. if not set, I could handle all I need in the scsi error path.

> 
> As I recall, the request that failed was a read-ahead.  It's not clear 
> to me why such a request would need to have BLK_MQ_REQ_PREEMPT set.  
> Although there may be other reasons for having that flag, at this point 
> we're concerned with requests that are part of the runtime-resume 
> procedure.  A read-ahead does not fall into that category.
> 
> Do you think you can find the answer?  Perhaps we should add a separate 
> BLK_MQ_REQ_PM flag.

I guess I can but give me some time as I'm on vacations currently.

> 
> Alan Stern
>
Martin Kepplinger Aug. 27, 2020, 5:42 p.m. UTC | #4
On 26.08.20 09:48, Martin Kepplinger wrote:
> On 24.08.20 22:13, Alan Stern wrote:
>> On Mon, Aug 24, 2020 at 10:48:32AM -0700, Bart Van Assche wrote:
>>> On 2020-08-23 07:57, Alan Stern wrote:
>>>> The problem is fixed by checking that the queue's runtime-PM status
>>>> isn't RPM_SUSPENDED before allowing a request to be issued, [ ... ]
>>>
>>> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
>>>
>>> Can, can you help with testing this patch?
>>>
>>> Thanks,
>>>
>>> Bart.
>>
>> Martin:
>>
>> (I forgot to ask this question several weeks ago, while you were running 
>> your tests.  Better ask it now before I forget again...)
>>
>> I suspect the old runtime-PM code in the block layer would have worked 
>> okay in your SD cardreader test if the BLK_MQ_REQ_PREEMPT flag had not 
>> been set.  Do you know why the flag was set, or what line of code caused 
>> it to be set?
> 
> Correct. if not set, I could handle all I need in the scsi error path.

this thread becomes a bit confusing. I thought about REQ_FAILFAST_DEV
but you're talking about something different.

the only place I see BLK_MQ_REQ_PREEMPT getting passed on is in
__scsi_execute() which is the case when mounting/unmounting. At least
that about the only place I can find.

I remember *only* your block pm fix would let me mount/unmount, but not
use files yet (REQ_FAILFAST_DEV and so on).

When I revert your fix and remove BLK_MQ_REQ_PREEMPT from being passed
on to blk_get_request() in __scsi_execute(), that line gets executed
exactly once during startup and I'm missing the /dev/sda device from the
cardreader then.

Is this what you're asking?


> 
>>
>> As I recall, the request that failed was a read-ahead.  It's not clear 
>> to me why such a request would need to have BLK_MQ_REQ_PREEMPT set.  
>> Although there may be other reasons for having that flag, at this point 
>> we're concerned with requests that are part of the runtime-resume 
>> procedure.  A read-ahead does not fall into that category.
>>
>> Do you think you can find the answer?  Perhaps we should add a separate 
>> BLK_MQ_REQ_PM flag.
Alan Stern Aug. 27, 2020, 8:29 p.m. UTC | #5
On Thu, Aug 27, 2020 at 07:42:43PM +0200, Martin Kepplinger wrote:
> On 26.08.20 09:48, Martin Kepplinger wrote:
> > On 24.08.20 22:13, Alan Stern wrote:

> >> Martin:
> >>
> >> (I forgot to ask this question several weeks ago, while you were running 
> >> your tests.  Better ask it now before I forget again...)
> >>
> >> I suspect the old runtime-PM code in the block layer would have worked 
> >> okay in your SD cardreader test if the BLK_MQ_REQ_PREEMPT flag had not 
> >> been set.  Do you know why the flag was set, or what line of code caused 
> >> it to be set?
> > 
> > Correct. if not set, I could handle all I need in the scsi error path.
> 
> this thread becomes a bit confusing. I thought about REQ_FAILFAST_DEV
> but you're talking about something different.
> 
> the only place I see BLK_MQ_REQ_PREEMPT getting passed on is in
> __scsi_execute() which is the case when mounting/unmounting. At least
> that about the only place I can find.

Ah yes, I see what you mean.

> I remember *only* your block pm fix would let me mount/unmount, but not
> use files yet (REQ_FAILFAST_DEV and so on).
> 
> When I revert your fix and remove BLK_MQ_REQ_PREEMPT from being passed
> on to blk_get_request() in __scsi_execute(), that line gets executed
> exactly once during startup and I'm missing the /dev/sda device from the
> cardreader then.
> 
> Is this what you're asking?

Not quite sure, but it doesn't matter.  Removing BLK_MQ_REQ_PREEMPT in 
__scsi_execute() is probably not a safe thing to do.

Instead, look at sd_resume().  That routine calls __scsi_execute() 
indirectly through sd_start_stop_device(), and the only reason it does 
this is because the sdkp->device->manage_start_stop flag is set.  You 
ought to be able to clear this flag in sysfs, by writing to 
/sys/block/sda/device/scsi_disk/*/manage_start_stop.  If you do this 
before allowing the card reader to go into runtime suspend, does it then 
resume okay?

(Yes, I know you still won't be able to read it because of the FAILFAST 
flag.  I just want to know if the runtime resume actually takes place.)

Alan Stern
Martin Kepplinger Aug. 29, 2020, 7:24 a.m. UTC | #6
On 27.08.20 22:29, Alan Stern wrote:
> On Thu, Aug 27, 2020 at 07:42:43PM +0200, Martin Kepplinger wrote:
>> On 26.08.20 09:48, Martin Kepplinger wrote:
>>> On 24.08.20 22:13, Alan Stern wrote:
> 
>>>> Martin:
>>>>
>>>> (I forgot to ask this question several weeks ago, while you were running 
>>>> your tests.  Better ask it now before I forget again...)
>>>>
>>>> I suspect the old runtime-PM code in the block layer would have worked 
>>>> okay in your SD cardreader test if the BLK_MQ_REQ_PREEMPT flag had not 
>>>> been set.  Do you know why the flag was set, or what line of code caused 
>>>> it to be set?
>>>
>>> Correct. if not set, I could handle all I need in the scsi error path.
>>
>> this thread becomes a bit confusing. I thought about REQ_FAILFAST_DEV
>> but you're talking about something different.
>>
>> the only place I see BLK_MQ_REQ_PREEMPT getting passed on is in
>> __scsi_execute() which is the case when mounting/unmounting. At least
>> that about the only place I can find.
> 
> Ah yes, I see what you mean.
> 
>> I remember *only* your block pm fix would let me mount/unmount, but not
>> use files yet (REQ_FAILFAST_DEV and so on).
>>
>> When I revert your fix and remove BLK_MQ_REQ_PREEMPT from being passed
>> on to blk_get_request() in __scsi_execute(), that line gets executed
>> exactly once during startup and I'm missing the /dev/sda device from the
>> cardreader then.
>>
>> Is this what you're asking?
> 
> Not quite sure, but it doesn't matter.  Removing BLK_MQ_REQ_PREEMPT in 
> __scsi_execute() is probably not a safe thing to do.
> 
> Instead, look at sd_resume().  That routine calls __scsi_execute() 
> indirectly through sd_start_stop_device(), and the only reason it does 
> this is because the sdkp->device->manage_start_stop flag is set.  You 
> ought to be able to clear this flag in sysfs, by writing to 
> /sys/block/sda/device/scsi_disk/*/manage_start_stop.  If you do this 
> before allowing the card reader to go into runtime suspend, does it then 
> resume okay?

manage_start_stop in sysfs is 0 here.

> 
> (Yes, I know you still won't be able to read it because of the FAILFAST 
> flag.  I just want to know if the runtime resume actually takes place.)
> 
> Alan Stern
>
Alan Stern Aug. 29, 2020, 3:26 p.m. UTC | #7
On Sat, Aug 29, 2020 at 09:24:30AM +0200, Martin Kepplinger wrote:
> On 27.08.20 22:29, Alan Stern wrote:
> > Instead, look at sd_resume().  That routine calls __scsi_execute() 
> > indirectly through sd_start_stop_device(), and the only reason it does 
> > this is because the sdkp->device->manage_start_stop flag is set.  You 
> > ought to be able to clear this flag in sysfs, by writing to 
> > /sys/block/sda/device/scsi_disk/*/manage_start_stop.  If you do this 
> > before allowing the card reader to go into runtime suspend, does it then 
> > resume okay?
> 
> manage_start_stop in sysfs is 0 here.

Hmmm.  I'm wondering about something you wrote back in June 
(https://marc.info/?l=linux-scsi&m=159345778431615&w=2):

	blk_queue_enter() always - especially when sd is runtime 
	suspended and I try to mount as above - sets success to be true 
	for me, so never continues down to bkl_pm_request_resume(). All 
	I see is "PM: Removing info for No Bus:sda1".

blk_queue_enter() would always set success to be true because pm 
(derived from the BLK_MQ_REQ_PREEMPT flag) is true.  But why was the 
BLK_MQ_REQ_PREEMPT flag set?  In other words, where was 
blk_queue_enter() called from?

Can you get a stack trace (i.e., call dump_stack()) at exactly this 
point, that is, when pm is true and q->rpm_status is RPM_SUSPENDED?  Or 
do you already know the answer?

Alan Stern
Martin Kepplinger Aug. 29, 2020, 4:33 p.m. UTC | #8
On 29.08.20 17:26, Alan Stern wrote:
> On Sat, Aug 29, 2020 at 09:24:30AM +0200, Martin Kepplinger wrote:
>> On 27.08.20 22:29, Alan Stern wrote:
>>> Instead, look at sd_resume().  That routine calls __scsi_execute() 
>>> indirectly through sd_start_stop_device(), and the only reason it does 
>>> this is because the sdkp->device->manage_start_stop flag is set.  You 
>>> ought to be able to clear this flag in sysfs, by writing to 
>>> /sys/block/sda/device/scsi_disk/*/manage_start_stop.  If you do this 
>>> before allowing the card reader to go into runtime suspend, does it then 
>>> resume okay?
>>
>> manage_start_stop in sysfs is 0 here.
> 
> Hmmm.  I'm wondering about something you wrote back in June 
> (https://marc.info/?l=linux-scsi&m=159345778431615&w=2):
> 
> 	blk_queue_enter() always - especially when sd is runtime 
> 	suspended and I try to mount as above - sets success to be true 
> 	for me, so never continues down to bkl_pm_request_resume(). All 
> 	I see is "PM: Removing info for No Bus:sda1".
> 
> blk_queue_enter() would always set success to be true because pm 
> (derived from the BLK_MQ_REQ_PREEMPT flag) is true.  But why was the 
> BLK_MQ_REQ_PREEMPT flag set?  In other words, where was 
> blk_queue_enter() called from?
> 
> Can you get a stack trace (i.e., call dump_stack()) at exactly this 
> point, that is, when pm is true and q->rpm_status is RPM_SUSPENDED?  Or 
> do you already know the answer?
> 
>

I reverted any scsi/block out-of-tree fixes for this.

when I try to mount, pm is TRUE (BLK_MQ_REQ_PREEMT set) and that's the
first stack trace I get in this condition, inside of blk_queue_enter():

There is more, but I don't know if that's interesting.

[   38.642202] CPU: 2 PID: 1522 Comm: mount Not tainted 5.8.0-1-librem5 #487
[   38.642207] Hardware name: Purism Librem 5r3 (DT)
[   38.642213] Call trace:
[   38.642233]  dump_backtrace+0x0/0x210
[   38.642242]  show_stack+0x20/0x30
[   38.642252]  dump_stack+0xc8/0x128
[   38.642262]  blk_queue_enter+0x1b8/0x2d8
[   38.642271]  blk_mq_alloc_request+0x54/0xb0
[   38.642277]  blk_get_request+0x34/0x78
[   38.642286]  __scsi_execute+0x60/0x1c8
[   38.642291]  scsi_test_unit_ready+0x88/0x118
[   38.642298]  sd_check_events+0x110/0x158
[   38.642306]  disk_check_events+0x68/0x188
[   38.642312]  disk_clear_events+0x84/0x198
[   38.642320]  check_disk_change+0x38/0x90
[   38.642325]  sd_open+0x60/0x148
[   38.642330]  __blkdev_get+0xcc/0x4c8
[   38.642335]  __blkdev_get+0x278/0x4c8
[   38.642339]  blkdev_get+0x128/0x1a8
[   38.642345]  blkdev_open+0x98/0xb0
[   38.642354]  do_dentry_open+0x130/0x3c8
[   38.642359]  vfs_open+0x34/0x40
[   38.642366]  path_openat+0xa30/0xe40
[   38.642372]  do_filp_open+0x84/0x100
[   38.642377]  do_sys_openat2+0x1f4/0x2b0
[   38.642382]  do_sys_open+0x60/0xa8
(...)

and of course it doesn't work and /dev/sda1 disappears, see the initial
discussion that led to your fix.
Alan Stern Aug. 29, 2020, 6:56 p.m. UTC | #9
On Sat, Aug 29, 2020 at 06:33:26PM +0200, Martin Kepplinger wrote:
> On 29.08.20 17:26, Alan Stern wrote:
> > Hmmm.  I'm wondering about something you wrote back in June 
> > (https://marc.info/?l=linux-scsi&m=159345778431615&w=2):
> > 
> > 	blk_queue_enter() always - especially when sd is runtime 
> > 	suspended and I try to mount as above - sets success to be true 
> > 	for me, so never continues down to bkl_pm_request_resume(). All 
> > 	I see is "PM: Removing info for No Bus:sda1".
> > 
> > blk_queue_enter() would always set success to be true because pm 
> > (derived from the BLK_MQ_REQ_PREEMPT flag) is true.  But why was the 
> > BLK_MQ_REQ_PREEMPT flag set?  In other words, where was 
> > blk_queue_enter() called from?
> > 
> > Can you get a stack trace (i.e., call dump_stack()) at exactly this 
> > point, that is, when pm is true and q->rpm_status is RPM_SUSPENDED?  Or 
> > do you already know the answer?
> > 
> >
> 
> I reverted any scsi/block out-of-tree fixes for this.
> 
> when I try to mount, pm is TRUE (BLK_MQ_REQ_PREEMT set) and that's the
> first stack trace I get in this condition, inside of blk_queue_enter():
> 
> There is more, but I don't know if that's interesting.
> 
> [   38.642202] CPU: 2 PID: 1522 Comm: mount Not tainted 5.8.0-1-librem5 #487
> [   38.642207] Hardware name: Purism Librem 5r3 (DT)
> [   38.642213] Call trace:
> [   38.642233]  dump_backtrace+0x0/0x210
> [   38.642242]  show_stack+0x20/0x30
> [   38.642252]  dump_stack+0xc8/0x128
> [   38.642262]  blk_queue_enter+0x1b8/0x2d8
> [   38.642271]  blk_mq_alloc_request+0x54/0xb0
> [   38.642277]  blk_get_request+0x34/0x78
> [   38.642286]  __scsi_execute+0x60/0x1c8
> [   38.642291]  scsi_test_unit_ready+0x88/0x118
> [   38.642298]  sd_check_events+0x110/0x158
> [   38.642306]  disk_check_events+0x68/0x188
> [   38.642312]  disk_clear_events+0x84/0x198
> [   38.642320]  check_disk_change+0x38/0x90
> [   38.642325]  sd_open+0x60/0x148
> [   38.642330]  __blkdev_get+0xcc/0x4c8
> [   38.642335]  __blkdev_get+0x278/0x4c8
> [   38.642339]  blkdev_get+0x128/0x1a8
> [   38.642345]  blkdev_open+0x98/0xb0
> [   38.642354]  do_dentry_open+0x130/0x3c8
> [   38.642359]  vfs_open+0x34/0x40
> [   38.642366]  path_openat+0xa30/0xe40
> [   38.642372]  do_filp_open+0x84/0x100
> [   38.642377]  do_sys_openat2+0x1f4/0x2b0
> [   38.642382]  do_sys_open+0x60/0xa8
> (...)
> 
> and of course it doesn't work and /dev/sda1 disappears, see the initial
> discussion that led to your fix.

Great!  That's exactly what I was looking for, thank you.

Bart, this is a perfect example of the potential race I've been talking 
about in the other email thread.  Suppose thread 0 is carrying out a 
runtime suspend of a SCSI disk and at the same time, thread 1 is opening 
the disk's block device (as we see in the stack trace here).  Then we 
could have the following:

	Thread 0		Thread 1
	--------		--------
	Start runtime suspend
	blk_pre_runtime_suspend calls
	  blk_set_pm_only and sets
	  q->rpm_status to RPM_SUSPENDING

				Call sd_open -> ... -> scsi_test_unit_ready
				  -> __scsi_execute -> ...
				  -> blk_queue_enter
				Sees BLK_MQ_REQ_PREEMPT set and
				  RPM_SUSPENDING queue status, so does 
				  not postpone the request

	blk_post_runtime_suspend sets
	  q->rpm_status to RPM_SUSPENDED
	The drive goes into runtime suspend

				Issues the TEST UNIT READY request
				Request fails because the drive is suspended

One way to avoid this race is mutual exclusion: We could make sd_open 
prevent the drive from being runtime suspended until it returns.  
However I don't like this approach; it would mean tracking down every 
possible pathway to __scsi_execute and making sure that runtime suspend 
is blocked.

A more fine-grained approach would be to have __scsi_execute itself call 
scsi_autopm_get/put_device whenever the rq_flags argument doesn't 
contain RQF_PM.  This way we wouldn't have to worry about missing any 
possiible pathways.  But it relies on an implicit assumption that 
__scsi_execute is the only place where the PREEMPT flag gets set.

A third possibility is the approach I outlined before, adding a 
BLK_MQ_REQ_PM flag.  But to avoid the deadlock you pointed out, I would 
make blk_queue_enter smarter about whether to postpone a request.  The 
logic would go like this:

	If !blk_queue_pm_only:
		Allow
	If !BLK_MQ_REQ_PREEMPT:
		Postpone
	If q->rpm_status is RPM_ACTIVE:
		Allow
	If !BLK_MQ_REQ_PM:
		Postpone
	If q->rpm_status is RPM_SUSPENDED:
		Postpone
	Else:
		Allow

The assumption here is that the PREEMPT flag is set whenever the PM flag 
is.

I believe either the second or third possibility would work.  The second 
looks to be the simplest

What do you think?

Alan Stern
Bart Van Assche Aug. 30, 2020, 12:38 a.m. UTC | #10
On 2020-08-29 11:56, Alan Stern wrote:
> A third possibility is the approach I outlined before, adding a 
> BLK_MQ_REQ_PM flag.  But to avoid the deadlock you pointed out, I would 
> make blk_queue_enter smarter about whether to postpone a request.  The 
> logic would go like this:
> 
> 	If !blk_queue_pm_only:
> 		Allow
> 	If !BLK_MQ_REQ_PREEMPT:
> 		Postpone
> 	If q->rpm_status is RPM_ACTIVE:
> 		Allow
> 	If !BLK_MQ_REQ_PM:
> 		Postpone
> 	If q->rpm_status is RPM_SUSPENDED:
> 		Postpone
> 	Else:
> 		Allow
> 
> The assumption here is that the PREEMPT flag is set whenever the PM flag 
> is.

Hi Alan,

Although interesting, these solutions sound like workarounds to me. How
about fixing the root cause by modifying the SCSI DV implementation such
that it doesn't use scsi_device_quiesce() anymore()? That change would
allow to remove BLK_MQ_REQ_PREEMPT / RQF_PREEMPT from the block layer and
move these flags into the SCSI and IDE code.

Thanks,

Bart.
Alan Stern Aug. 30, 2020, 1:06 a.m. UTC | #11
On Sat, Aug 29, 2020 at 05:38:50PM -0700, Bart Van Assche wrote:
> On 2020-08-29 11:56, Alan Stern wrote:
> > A third possibility is the approach I outlined before, adding a 
> > BLK_MQ_REQ_PM flag.  But to avoid the deadlock you pointed out, I would 
> > make blk_queue_enter smarter about whether to postpone a request.  The 
> > logic would go like this:
> > 
> > 	If !blk_queue_pm_only:
> > 		Allow
> > 	If !BLK_MQ_REQ_PREEMPT:
> > 		Postpone
> > 	If q->rpm_status is RPM_ACTIVE:
> > 		Allow
> > 	If !BLK_MQ_REQ_PM:
> > 		Postpone
> > 	If q->rpm_status is RPM_SUSPENDED:
> > 		Postpone
> > 	Else:
> > 		Allow
> > 
> > The assumption here is that the PREEMPT flag is set whenever the PM flag 
> > is.
> 
> Hi Alan,
> 
> Although interesting, these solutions sound like workarounds to me. How
> about fixing the root cause by modifying the SCSI DV implementation such
> that it doesn't use scsi_device_quiesce() anymore()? That change would
> allow to remove BLK_MQ_REQ_PREEMPT / RQF_PREEMPT from the block layer and
> move these flags into the SCSI and IDE code.

That's a perfectly reasonable approach, but I have no idea how to do it.
Any suggestions?

Alan Stern
diff mbox series

Patch

Index: usb-devel/block/blk-core.c
===================================================================
--- usb-devel.orig/block/blk-core.c
+++ usb-devel/block/blk-core.c
@@ -423,7 +423,8 @@  int blk_queue_enter(struct request_queue
 			 * responsible for ensuring that that counter is
 			 * globally visible before the queue is unfrozen.
 			 */
-			if (pm || !blk_queue_pm_only(q)) {
+			if ((pm && q->rpm_status != RPM_SUSPENDED) ||
+			    !blk_queue_pm_only(q)) {
 				success = true;
 			} else {
 				percpu_ref_put(&q->q_usage_counter);
@@ -448,8 +449,7 @@  int blk_queue_enter(struct request_queue
 
 		wait_event(q->mq_freeze_wq,
 			   (!q->mq_freeze_depth &&
-			    (pm || (blk_pm_request_resume(q),
-				    !blk_queue_pm_only(q)))) ||
+			    blk_pm_resume_queue(pm, q)) ||
 			   blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
Index: usb-devel/block/blk-pm.h
===================================================================
--- usb-devel.orig/block/blk-pm.h
+++ usb-devel/block/blk-pm.h
@@ -6,11 +6,14 @@ 
 #include <linux/pm_runtime.h>
 
 #ifdef CONFIG_PM
-static inline void blk_pm_request_resume(struct request_queue *q)
+static inline int blk_pm_resume_queue(const bool pm, struct request_queue *q)
 {
-	if (q->dev && (q->rpm_status == RPM_SUSPENDED ||
-		       q->rpm_status == RPM_SUSPENDING))
-		pm_request_resume(q->dev);
+	if (!q->dev || !blk_queue_pm_only(q))
+		return 1;	/* Nothing to do */
+	if (pm && q->rpm_status != RPM_SUSPENDED)
+		return 1;	/* Request allowed */
+	pm_request_resume(q->dev);
+	return 0;
 }
 
 static inline void blk_pm_mark_last_busy(struct request *rq)
@@ -44,8 +47,9 @@  static inline void blk_pm_put_request(st
 		--rq->q->nr_pending;
 }
 #else
-static inline void blk_pm_request_resume(struct request_queue *q)
+static inline int blk_pm_resume_queue(const bool pm, struct request_queue *q)
 {
+	return 1;
 }
 
 static inline void blk_pm_mark_last_busy(struct request *rq)