diff mbox

block: fix NPE when resuming SCSI devices using blk-mq

Message ID b712b87a509dc64619a3aec7c673935ed09be5de.1531488173.git.ps@pks.im (mailing list archive)
State New, archived
Headers show

Commit Message

Patrick Steinhardt July 13, 2018, 1:29 p.m. UTC
When power management for SCSI is enabled and if a device uses blk-mq,
it is possible to trigger a `NULL` pointer exception when resuming that
device. The NPE is triggered when trying to dereference the `request_fn`
function pointer of the device's `request_queue`:

    __blk_run_queue_uncond:470
    __blk_run_queue:490
    blk_post_runtime_resume:3889
    sdev_runtime_resume:263
    scsi_runtime_resume:275

When the SCSI device is being allocated by `scsi_alloc_sdev`, the
device's request queue will either be initialized via
`scsi_mq_alloc_queue` or `scsi_old_alloc_queue`. But the `request_fn`
member of the request queue is in fact only being set in
`scsi_old_alloc_queue`, which will then later cause the mentioned NPE.

Fix the issue by checking whether the `request_fn` is set in
`__blk_run_queue_uncond`. In case it is unset, we'll silently return and
not try to invoke the callback, thus fixing the NPE.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---

Since at least v4.14, I am easily able to trigger above NPE by
unplugging USB mass storage devices on my computer (Skylake, ASUS
Z170I) with CONFIG_SCSI_MQ_DEFAULT=y. The attached patch fixes
the issue, but keep in mind that this is my first patch, so the
proposed fix may not be appropriate at all. Feedback would be
highly appreciated.

 block/blk-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ming Lei July 13, 2018, 1:41 p.m. UTC | #1
On Fri, Jul 13, 2018 at 9:29 PM, Patrick Steinhardt <ps@pks.im> wrote:
> When power management for SCSI is enabled and if a device uses blk-mq,
> it is possible to trigger a `NULL` pointer exception when resuming that
> device. The NPE is triggered when trying to dereference the `request_fn`
> function pointer of the device's `request_queue`:
>
>     __blk_run_queue_uncond:470
>     __blk_run_queue:490
>     blk_post_runtime_resume:3889
>     sdev_runtime_resume:263
>     scsi_runtime_resume:275
>
> When the SCSI device is being allocated by `scsi_alloc_sdev`, the
> device's request queue will either be initialized via
> `scsi_mq_alloc_queue` or `scsi_old_alloc_queue`. But the `request_fn`
> member of the request queue is in fact only being set in
> `scsi_old_alloc_queue`, which will then later cause the mentioned NPE.
>
> Fix the issue by checking whether the `request_fn` is set in
> `__blk_run_queue_uncond`. In case it is unset, we'll silently return and
> not try to invoke the callback, thus fixing the NPE.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>
> Since at least v4.14, I am easily able to trigger above NPE by
> unplugging USB mass storage devices on my computer (Skylake, ASUS
> Z170I) with CONFIG_SCSI_MQ_DEFAULT=y. The attached patch fixes
> the issue, but keep in mind that this is my first patch, so the
> proposed fix may not be appropriate at all. Feedback would be
> highly appreciated.
>
>  block/blk-core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index f84a9b7b6f5a..0a2041660cd9 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -456,7 +456,7 @@ inline void __blk_run_queue_uncond(struct request_queue *q)
>         lockdep_assert_held(q->queue_lock);
>         WARN_ON_ONCE(q->mq_ops);
>
> -       if (unlikely(blk_queue_dead(q)))
> +       if (unlikely(!q->request_fn) || unlikely(blk_queue_dead(q)))
>                 return;
>

Now runtime PM is disabled for blk-mq/scsi_mq, not sure how this issue is
triggered on your machine.

Could you share the steps for reproducing this issue?

Thanks,
Ming Lei
Patrick Steinhardt July 16, 2018, 3:11 p.m. UTC | #2
On Fri, Jul 13, 2018 at 09:41:41PM +0800, Ming Lei wrote:
> On Fri, Jul 13, 2018 at 9:29 PM, Patrick Steinhardt <ps@pks.im> wrote:
> > When power management for SCSI is enabled and if a device uses blk-mq,
> > it is possible to trigger a `NULL` pointer exception when resuming that
> > device. The NPE is triggered when trying to dereference the `request_fn`
> > function pointer of the device's `request_queue`:
> >
> >     __blk_run_queue_uncond:470
> >     __blk_run_queue:490
> >     blk_post_runtime_resume:3889
> >     sdev_runtime_resume:263
> >     scsi_runtime_resume:275
> >
> > When the SCSI device is being allocated by `scsi_alloc_sdev`, the
> > device's request queue will either be initialized via
> > `scsi_mq_alloc_queue` or `scsi_old_alloc_queue`. But the `request_fn`
> > member of the request queue is in fact only being set in
> > `scsi_old_alloc_queue`, which will then later cause the mentioned NPE.
> >
> > Fix the issue by checking whether the `request_fn` is set in
> > `__blk_run_queue_uncond`. In case it is unset, we'll silently return and
> > not try to invoke the callback, thus fixing the NPE.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >
> > Since at least v4.14, I am easily able to trigger above NPE by
> > unplugging USB mass storage devices on my computer (Skylake, ASUS
> > Z170I) with CONFIG_SCSI_MQ_DEFAULT=y. The attached patch fixes
> > the issue, but keep in mind that this is my first patch, so the
> > proposed fix may not be appropriate at all. Feedback would be
> > highly appreciated.
> >
> >  block/blk-core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index f84a9b7b6f5a..0a2041660cd9 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -456,7 +456,7 @@ inline void __blk_run_queue_uncond(struct request_queue *q)
> >         lockdep_assert_held(q->queue_lock);
> >         WARN_ON_ONCE(q->mq_ops);
> >
> > -       if (unlikely(blk_queue_dead(q)))
> > +       if (unlikely(!q->request_fn) || unlikely(blk_queue_dead(q)))
> >                 return;
> >
> 
> Now runtime PM is disabled for blk-mq/scsi_mq, not sure how this issue is
> triggered on your machine.
> 
> Could you share the steps for reproducing this issue?

I bet that the issue stems from custom hotplugging scripts then,
which change the value of power/control. See the attachment of
this mail for all sysfs changes that are being performed after
plugging in the USB stick. Basically, the reproduction steps on
my machine are:

1. plug in USB stick (assumed to be /dev/sdb now)
2. wait a short amount of time
3. dd if=/dev/sdb of=/dev/null bs=4M
4. wait a short amount of time
5. unplug the USB stick, which immediately crashes the system

The issue isn't always reproducible, I think there is some
variance depending on how much time passes by at step 2 and/or 4.
It probably is related to the autosuspend delay. From my
experience the crash becomes more likely the longer I wait after
step 4.

Regards
Patrick
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/power/autosuspend_delay_ms -> 60000
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/power/control -> auto
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/power/control -> on
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/power/autosuspend_delay_ms -> 60000
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/power/control -> auto
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/scsi_host/host4/power/autosuspend_delay_ms -> 60000
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/scsi_host/host4/power/control -> auto
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/power/autosuspend_delay_ms -> 60000
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/scsi_device/4:0:0:0/power/autosuspend_delay_ms -> 60000
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/bsg/4:0:0:0/power/autosuspend_delay_ms -> 60000
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/power/control -> auto
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/scsi_device/4:0:0:0/power/control -> auto
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/bsg/4:0:0:0/power/control -> auto
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/power/autosuspend_delay_ms -> 60000
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/power/control -> auto
/sys//devices/virtual/bdi/8:16/power/autosuspend_delay_ms -> 60000
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/scsi_disk/4:0:0:0/power/autosuspend_delay_ms -> 60000
/sys//devices/virtual/bdi/8:16/power/control -> auto
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/scsi_disk/4:0:0:0/power/control -> auto
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/block/sdb/power/autosuspend_delay_ms -> 60000
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/block/sdb/sdb2/power/autosuspend_delay_ms -> 60000
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/block/sdb/power/control -> auto
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/block/sdb/sdb2/power/control -> auto
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/block/sdb/sdb1/power/autosuspend_delay_ms -> 60000
/sys//devices/pci0000:00/0000:00:14.0/usb2/2-2/2-2:1.0/host4/target4:0:0/4:0:0:0/block/sdb/sdb1/power/control -> auto
Bart Van Assche July 25, 2018, 6:13 p.m. UTC | #3
On Fri, 2018-07-13 at 15:29 +0200, Patrick Steinhardt wrote:
> When power management for SCSI is enabled and if a device uses blk-mq,
> it is possible to trigger a `NULL` pointer exception when resuming that
> device. The NPE is triggered when trying to dereference the `request_fn`
> function pointer of the device's `request_queue`:
> 
>     __blk_run_queue_uncond:470
>     __blk_run_queue:490
>     blk_post_runtime_resume:3889
>     sdev_runtime_resume:263
>     scsi_runtime_resume:275
> 
> When the SCSI device is being allocated by `scsi_alloc_sdev`, the
> device's request queue will either be initialized via
> `scsi_mq_alloc_queue` or `scsi_old_alloc_queue`. But the `request_fn`
> member of the request queue is in fact only being set in
> `scsi_old_alloc_queue`, which will then later cause the mentioned NPE.
> 
> Fix the issue by checking whether the `request_fn` is set in
> `__blk_run_queue_uncond`. In case it is unset, we'll silently return and
> not try to invoke the callback, thus fixing the NPE.

Which kernel version are you using? Can you check whether the following two
commits are in your kernel tree?

* 4fd41a8552af ("SCSI: Fix NULL pointer dereference in runtime PM"; December
  2015).
* 765e40b675a9 ("block: disable runtime-pm for blk-mq"; July 2017).

Thanks,

Bart.
Patrick Steinhardt July 26, 2018, 8:38 a.m. UTC | #4
On Wed, Jul 25, 2018 at 06:13:02PM +0000, Bart Van Assche wrote:
> On Fri, 2018-07-13 at 15:29 +-0200, Patrick Steinhardt wrote:
> +AD4- When power management for SCSI is enabled and if a device uses blk-mq,
> +AD4- it is possible to trigger a +AGA-NULL+AGA- pointer exception when resuming that
> +AD4- device. The NPE is triggered when trying to dereference the +AGA-request+AF8-fn+AGA-
> +AD4- function pointer of the device's +AGA-request+AF8-queue+AGA-:
> +AD4- 
> +AD4-     +AF8AXw-blk+AF8-run+AF8-queue+AF8-uncond:470
> +AD4-     +AF8AXw-blk+AF8-run+AF8-queue:490
> +AD4-     blk+AF8-post+AF8-runtime+AF8-resume:3889
> +AD4-     sdev+AF8-runtime+AF8-resume:263
> +AD4-     scsi+AF8-runtime+AF8-resume:275
> +AD4- 
> +AD4- When the SCSI device is being allocated by +AGA-scsi+AF8-alloc+AF8-sdev+AGA-, the
> +AD4- device's request queue will either be initialized via
> +AD4- +AGA-scsi+AF8-mq+AF8-alloc+AF8-queue+AGA- or +AGA-scsi+AF8-old+AF8-alloc+AF8-queue+AGA-. But the +AGA-request+AF8-fn+AGA-
> +AD4- member of the request queue is in fact only being set in
> +AD4- +AGA-scsi+AF8-old+AF8-alloc+AF8-queue+AGA-, which will then later cause the mentioned NPE.
> +AD4- 
> +AD4- Fix the issue by checking whether the +AGA-request+AF8-fn+AGA- is set in
> +AD4- +AGAAXwBf-blk+AF8-run+AF8-queue+AF8-uncond+AGA-. In case it is unset, we'll silently return and
> +AD4- not try to invoke the callback, thus fixing the NPE.
> 
> Which kernel version are you using? Can you check whether the following two
> commits are in your kernel tree?
> 
> +ACo- 4fd41a8552af (+ACI-SCSI: Fix NULL pointer dereference in runtime PM+ACIAOw- December
>   2015).
> +ACo- 765e40b675a9 (+ACI-block: disable runtime-pm for blk-mq+ACIAOw- July 2017).

Commit 765e40b675a9 (block: disable runtime-pm for blk-mq July
2017) was indeed not part of my kernel. Back when I upgraded to
v4.14, suspend to RAM on my laptop was broken, and a bisect
showed that this particular commit was the culprit. So I reverted
it and forgot until it bit me now -- so it's been my own
stupidity. Guess it serves me right for not checking my own
changes.

That still leaves the other problem of broken suspend. I've just
checked with v4.17.10, and it's still there: as soon as I resume
from suspend, the kernel oopses and reboots the machine. I guess
I'll have to do another debugging session and see where it fails
exactly (and no, this time there are no more changes to the
kernel tree).

Anyway, thanks for pointing out my own mistakes.

Regards
Patrick
Bart Van Assche July 26, 2018, 1:51 p.m. UTC | #5
On Thu, 2018-07-26 at 10:38 +0200, Patrick Steinhardt wrote:
> That still leaves the other problem of broken suspend. I've just
> checked with v4.17.10, and it's still there: as soon as I resume
> from suspend, the kernel oopses and reboots the machine. I guess
> I'll have to do another debugging session and see where it fails
> exactly (and no, this time there are no more changes to the
> kernel tree).

Can you share the kernel oops details before you start running a bisect?

Thanks,

Bart.
Patrick Steinhardt July 27, 2018, 12:35 p.m. UTC | #6
On Thu, Jul 26, 2018 at 01:51:48PM +0000, Bart Van Assche wrote:
> On Thu, 2018-07-26 at 10:38 +-0200, Patrick Steinhardt wrote:
> +AD4- That still leaves the other problem of broken suspend. I've just
> +AD4- checked with v4.17.10, and it's still there: as soon as I resume
> +AD4- from suspend, the kernel oopses and reboots the machine. I guess
> +AD4- I'll have to do another debugging session and see where it fails
> +AD4- exactly (and no, this time there are no more changes to the
> +AD4- kernel tree).
> 
> Can you share the kernel oops details before you start running a bisect?

I've already done the bisect a few months ago, which resulted in
commit 765e40b675a9 (block: disable runtime-pm for blk-mq July
2017). I've never been able to get the Oops, as the screen was
blank, but I tried a little harder with pstore today and finally
got hold of it. Please see below for the Oops and previous dmesg
entries. Kernel version was v4.17.10.

Thanks
Patrick

<6>[   49.088931] PM: suspend entry (deep)
<6>[   49.088933] PM: Syncing filesystems ... done.
<6>[   49.093066] Freezing user space processes ... (elapsed 0.001 seconds) done.
<6>[   49.094283] OOM killer disabled.
<6>[   49.094284] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
<6>[   49.095470] Suspending console(s) (use no_console_suspend to debug)
<6>[   50.350548] ACPI: EC: interrupt blocked
<6>[   50.383913] ACPI: Preparing to enter system sleep state S3
<6>[   50.397529] ACPI: EC: EC stopped
<6>[   50.397529] ACPI: EC: event blocked
<6>[   50.397530] PM: Saving platform NVS memory
<6>[   50.397605] Disabling non-boot CPUs ...
<6>[   50.414940] smpboot: CPU 1 is now offline
<6>[   50.445153] smpboot: CPU 2 is now offline
<6>[   50.471702] smpboot: CPU 3 is now offline
<6>[   50.473710] ACPI: Low-level resume complete
<6>[   50.473770] ACPI: EC: EC started
<6>[   50.473771] PM: Restoring platform NVS memory
<6>[   50.476149] Enabling non-boot CPUs ...
<6>[   50.476175] x86: Booting SMP configuration:
<6>[   50.476176] smpboot: Booting Node 0 Processor 1 APIC 0x2
<4>[   50.478031]  cache: parent cpu1 should not be sleeping
<6>[   50.478333] CPU1 is up
<6>[   50.478348] smpboot: Booting Node 0 Processor 2 APIC 0x1
<4>[   50.478950]  cache: parent cpu2 should not be sleeping
<6>[   50.479194] CPU2 is up
<6>[   50.479206] smpboot: Booting Node 0 Processor 3 APIC 0x3
<4>[   50.479824]  cache: parent cpu3 should not be sleeping
<6>[   50.480470] CPU3 is up
<6>[   50.490620] ACPI: Waking up from system sleep state S3
<6>[   53.334256] ACPI: EC: interrupt unblocked
<6>[   53.411349] ACPI: EC: event unblocked
<1>[   53.411700] BUG: unable to handle kernel NULL pointer dereference at 00000000000001a8
<6>[   53.411705] PGD 0 P4D 0 
<4>[   53.411711] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC PTI
<4>[   53.411716] CPU: 3 PID: 1600 Comm: kworker/u8:48 Tainted: G     U            4.17.10 #9
<4>[   53.411717] Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A15 01/23/2018
<4>[   53.411724] Workqueue: events_unbound async_run_entry_fn
<4>[   53.411733] RIP: 0010:blk_set_runtime_active+0x2d/0x50
<4>[   53.411736] RSP: 0000:ffff88020ef37df0 EFLAGS: 00010046
<4>[   53.411739] RAX: 0000000000000000 RBX: ffff880210a9a000 RCX: 0000000000000000
<4>[   53.411742] RDX: 00000000fffedf06 RSI: 0000000000000009 RDI: ffff880210a9a1b0
<4>[   53.411745] RBP: ffffffff815158d0 R08: ffffffff81c13ce0 R09: 0000000000000000
<4>[   53.411747] R10: 0000000000004400 R11: 0000000000005cd3 R12: ffff8802141671e8
<4>[   53.411749] R13: 0000000000000000 R14: ffff880215414c98 R15: 0000000000000000
<4>[   53.411753] FS:  0000000000000000(0000) GS:ffff88021e780000(0000) knlGS:0000000000000000
<4>[   53.411755] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[   53.411757] CR2: 00000000000001a8 CR3: 000000000300a003 CR4: 00000000003606e0
<4>[   53.411758] Call Trace:
<4>[   53.411766]  scsi_bus_resume_common+0xf8/0x110
<4>[   53.411771]  ? scsi_bus_thaw+0x10/0x10
<4>[   53.411776]  dpm_run_callback.isra.16+0x27/0x70
<4>[   53.411781]  device_resume+0xab/0x170
<4>[   53.411785]  async_resume+0x14/0x40
<4>[   53.411788]  async_run_entry_fn+0x34/0x100
<4>[   53.411793]  process_one_work+0x14d/0x2c0
<4>[   53.411797]  worker_thread+0x29/0x370
<4>[   53.411800]  ? process_one_work+0x2c0/0x2c0
<4>[   53.411803]  kthread+0x109/0x120
<4>[   53.411806]  ? __kthread_create_on_node+0x190/0x190
<4>[   53.411810]  ret_from_fork+0x35/0x40
<4>[   53.411813] Code: 89 fb 48 8b bf b8 01 00 00 e8 30 8c 4e 00 48 8b 83 40 02 00 00 be 09 00 00 00 c7 83 48 02 00 00 00 00 00 00 48 8b 15 03 31 cf 00 <48> 89 90 a8 01 00 00 48 8b bb 40 02 00 00 e8 00 1e 1e 00 48 8b 
<1>[   53.411873] RIP: blk_set_runtime_active+0x2d/0x50 RSP: ffff88020ef37df0
<4>[   53.411874] CR2: 00000000000001a8
<4>[   53.411877] ---[ end trace bbd10b6fcf31d6c4 ]---
Bart Van Assche July 27, 2018, 3:03 p.m. UTC | #7
On Fri, 2018-07-27 at 14:35 +0200, Patrick Steinhardt wrote:
> <1>[   53.411700] BUG: unable to handle kernel NULL pointer dereference at 00000000000001a8
> <4>[   53.411733] RIP: 0010:blk_set_runtime_active+0x2d/0x50

Are you sure commit 4fd41a8552af ("SCSI: Fix NULL pointer dereference in
runtime PM"; December 2015) is in your kernel tree?

Bart.
Tomas Janousek July 29, 2018, 9:41 a.m. UTC | #8
Hi,

On Fri, Jul 13, 2018 at 09:41:41PM +0800, Ming Lei wrote:
> Now runtime PM is disabled for blk-mq/scsi_mq, not sure how this issue is
> triggered on your machine.

While Patrick did miss the following patch:

* 765e40b675a9 ("block: disable runtime-pm for blk-mq"; July 2017).

there is at least one other way to trigger it -- enable laptop-mode-tools
or tlp which enable runtime-pm for all devices.

The "disable runtime-pm for blk-mq" only disables it _by_default_, but doesn't
prevent it from being enabled again from user-space, which it is unless one
manually blacklists sd devices from runtime-pm enablement. It's bitten a few
people already: https://github.com/rickysarraf/laptop-mode-tools/issues/123

(I found this thread because I'm also getting the NULL pointer dereference at
00000000000001a8 on resume from suspend.)
Patrick Steinhardt July 30, 2018, 7:59 a.m. UTC | #9
On Sun, Jul 29, 2018 at 11:41:31AM +0200, Tomas Janousek wrote:
> Hi,
> 
> On Fri, Jul 13, 2018 at 09:41:41PM +0800, Ming Lei wrote:
> > Now runtime PM is disabled for blk-mq/scsi_mq, not sure how this issue is
> > triggered on your machine.
> 
> While Patrick did miss the following patch:
> 
> * 765e40b675a9 ("block: disable runtime-pm for blk-mq"; July 2017).
> 
> there is at least one other way to trigger it -- enable laptop-mode-tools
> or tlp which enable runtime-pm for all devices.
> 
> The "disable runtime-pm for blk-mq" only disables it _by_default_, but doesn't
> prevent it from being enabled again from user-space, which it is unless one
> manually blacklists sd devices from runtime-pm enablement. It's bitten a few
> people already: https://github.com/rickysarraf/laptop-mode-tools/issues/123
> 
> (I found this thread because I'm also getting the NULL pointer dereference at
> 00000000000001a8 on resume from suspend.)

Huh, I did send out some more details on how I reproduce the
issue, but it seems like my mail didn't get through. While I
don't use laptop-mode-tools, I do have some custom hotplugging
scripts which do in fact enable runtime-PM for most devices.

Regards
Patrick
Ming Lei July 30, 2018, 11:50 a.m. UTC | #10
On Mon, Jul 30, 2018 at 09:59:49AM +0200, Patrick Steinhardt wrote:
> On Sun, Jul 29, 2018 at 11:41:31AM +0200, Tomas Janousek wrote:
> > Hi,
> > 
> > On Fri, Jul 13, 2018 at 09:41:41PM +0800, Ming Lei wrote:
> > > Now runtime PM is disabled for blk-mq/scsi_mq, not sure how this issue is
> > > triggered on your machine.
> > 
> > While Patrick did miss the following patch:
> > 
> > * 765e40b675a9 ("block: disable runtime-pm for blk-mq"; July 2017).
> > 
> > there is at least one other way to trigger it -- enable laptop-mode-tools
> > or tlp which enable runtime-pm for all devices.
> > 
> > The "disable runtime-pm for blk-mq" only disables it _by_default_, but doesn't
> > prevent it from being enabled again from user-space, which it is unless one
> > manually blacklists sd devices from runtime-pm enablement. It's bitten a few
> > people already: https://github.com/rickysarraf/laptop-mode-tools/issues/123
> > 
> > (I found this thread because I'm also getting the NULL pointer dereference at
> > 00000000000001a8 on resume from suspend.)
> 
> Huh, I did send out some more details on how I reproduce the
> issue, but it seems like my mail didn't get through. While I
> don't use laptop-mode-tools, I do have some custom hotplugging
> scripts which do in fact enable runtime-PM for most devices.

Now runtime PM is still enabled for sd/sr, and I believe the following
patch is needed until we figure out one perfect way for supporting it
well:

diff --git a/block/blk-core.c b/block/blk-core.c
index 03a4ea93a5f3..090b782df129 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3769,9 +3769,11 @@ EXPORT_SYMBOL(blk_finish_plug);
  */
 void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 {
-	/* not support for RQF_PM and ->rpm_status in blk-mq yet */
-	if (q->mq_ops)
+	/* Don't enable runtime PM for blk-mq until it is ready */
+	if (q->mq_ops) {
+		pm_runtime_disable(dev);
 		return;
+	}
 
 	q->dev = dev;
 	q->rpm_status = RPM_ACTIVE;

Thanks,
Ming
Tomas Janousek July 30, 2018, 3 p.m. UTC | #11
Hi Ming,

On Mon, Jul 30, 2018 at 07:50:03PM +0800, Ming Lei wrote:
> Now runtime PM is still enabled for sd/sr, and I believe the following
> patch is needed until we figure out one perfect way for supporting it
> well:
> [...]
> +		pm_runtime_disable(dev);

Awesome, thanks for the quick reaction! :-)

(I don't happen to have a .config for the hardware I'm experincing the problem
on, so don't expect a Tested-By from me, but I don't see why it wouldn't fix
the issue, and it did for Patrick.)
diff mbox

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index f84a9b7b6f5a..0a2041660cd9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -456,7 +456,7 @@  inline void __blk_run_queue_uncond(struct request_queue *q)
 	lockdep_assert_held(q->queue_lock);
 	WARN_ON_ONCE(q->mq_ops);
 
-	if (unlikely(blk_queue_dead(q)))
+	if (unlikely(!q->request_fn) || unlikely(blk_queue_dead(q)))
 		return;
 
 	/*