diff mbox

NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]

Message ID Pine.LNX.4.44L0.1601191532210.1312-200000@iolanthe.rowland.org (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Alan Stern Jan. 19, 2016, 9:08 p.m. UTC
On Tue, 19 Jan 2016, Paul Menzel wrote:

> Could you please attach the debugging patch. Hopefully Alexandre, Erich,
> or I will have some spare time to build an image from it.

Actually, this patch is an attempt at a fix.  After looking more 
carefully at your log pictures, I realized what the problem must be.  

It's too bad nobody was able to capture a log where the error 
occurred in sr_runtime_suspend, though -- all the logs in the bug 
report show sd_runtime_resume.

> Alan, thank you a lot for being so responsive and helpful!

You're welcome.

Alan Stern
drivers/scsi/sd.c |    7 +++++--
 drivers/scsi/sr.c |    4 ++++
 2 files changed, 9 insertions(+), 2 deletions(-)

Comments

Paul Menzel Jan. 19, 2016, 11:20 p.m. UTC | #1
Dear Alan,


Am Dienstag, den 19.01.2016, 16:08 -0500 schrieb Alan Stern:
> On Tue, 19 Jan 2016, Paul Menzel wrote:
>
> > Could you please attach the debugging patch. Hopefully Alexandre, Erich,
> > or I will have some spare time to build an image from it.
> 
> Actually, this patch is an attempt at a fix.  After looking more 
> carefully at your log pictures, I realized what the problem must be.  

that indeed fixed it for me. I applied your patch on
linux-image-4.4.0-rc8-686 [1] and was able to get to the LUKS passphrase
dialog. Awesome! Thank you very, very much!

[…]


Thanks,

Paul


[1] https://packages.debian.org/experimental/linux-image-4.4.0-rc8-686
Alexandre Rossi Jan. 20, 2016, 10:07 p.m. UTC | #2
Hi,

>> Could you please attach the debugging patch. Hopefully Alexandre, Erich,
>> or I will have some spare time to build an image from it.
>
> Actually, this patch is an attempt at a fix.  After looking more
> carefully at your log pictures, I realized what the problem must be.
>
> It's too bad nobody was able to capture a log where the error
> occurred in sr_runtime_suspend, though -- all the logs in the bug
> report show sd_runtime_resume.

I just tested the patch applied on top of 4.3.3 (4.3.3-6 in Debian).

It still crashes at boot, but the stacktrace is different : it happens
in blk_post_runtime_resume . Maybe I'm bit by a different bug or maybe
the I need to try with 4.4.

I'll post the captured log when I have access to a wired network. I'd
be happy to provide the logs of a debugging patch.

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Menzel Feb. 9, 2016, 4:47 p.m. UTC | #3
Dear Debian and Linux folks,


Am Mittwoch, den 20.01.2016, 23:07 +0100 schrieb Alexandre Rossi:

> >> Could you please attach the debugging patch. Hopefully Alexandre, Erich,
> >> or I will have some spare time to build an image from it.
> >
> > Actually, this patch is an attempt at a fix.  After looking more
> > carefully at your log pictures, I realized what the problem must be.
> >
> > It's too bad nobody was able to capture a log where the error
> > occurred in sr_runtime_suspend, though -- all the logs in the bug
> > report show sd_runtime_resume.
> 
> I just tested the patch applied on top of 4.3.3 (4.3.3-6 in Debian).
> 
> It still crashes at boot, but the stacktrace is different : it happens
> in blk_post_runtime_resume . Maybe I'm bit by a different bug or maybe
> the I need to try with 4.4.
> 
> I'll post the captured log when I have access to a wired network. I'd
> be happy to provide the logs of a debugging patch.

I tried Linux 4.3.5-1 [1], which entered Debian Sid/unstable yesterday,
and I get the same null pointer dereference as Alexandre.

As this is Linux 4.3 and not 4.4, I guess this is a different problem
though. Alexandre, where you able to capture the stack trace? I’d submit
a new bug report with this.


Thanks,

Paul


[1] http://metadata.ftp-master.debian.org/changelogs/main/l/linux/linux_4.3.5-1_changelog
Alexandre Rossi Feb. 9, 2016, 7:56 p.m. UTC | #4
Hi,

netconsole does not seem to work so early in the boot process this time.

> As this is Linux 4.3 and not 4.4, I guess this is a different problem
> though. Alexandre, where you able to capture the stack trace? I’d submit
> a new bug report with this.

Here is a photo. Please ping me if you need to test some debugging patches.

Alex
Ben Hutchings Feb. 9, 2016, 8:51 p.m. UTC | #5
On Tue, 2016-02-09 at 20:56 +0100, Alexandre Rossi wrote:
> Hi,
> 
> netconsole does not seem to work so early in the boot process this time.
> 
> > As this is Linux 4.3 and not 4.4, I guess this is a different problem
> > though. Alexandre, where you able to capture the stack trace? I’d submit
> > a new bug report with this.
> 
> Here is a photo. Please ping me if you need to test some debugging patches.

I'm pretty sure this crash is fixed by commit 4fd41a8552af ("SCSI: Fix NULL
pointer dereference in runtime PM"), which I've now queued up for 4.3
(though it's already in 4.4 which I'll probably upload to unstable soon).

Ben.
Alan Stern Feb. 18, 2016, 4:27 p.m. UTC | #6
On Tue, 9 Feb 2016, Alexandre Rossi wrote:

> Hi,
> 
> netconsole does not seem to work so early in the boot process this time.
> 
> > As this is Linux 4.3 and not 4.4, I guess this is a different problem
> > though. Alexandre, where you able to capture the stack trace? I’d submit
> > a new bug report with this.
> 
> Here is a photo. Please ping me if you need to test some debugging patches.

It looks like the problem occurs in blk_post_runtime_resume().  Since 
there have been recent changes to this routine, it's hard to tell 
whether you're using the most up-to-date code.

In particular, the first few lines of blk_post_runtime_resume() in 
block/blk-core.c should look like this:

void blk_post_runtime_resume(struct request_queue *q, int err)
{
	if (!q->dev)
		return;

The test was introduced by commit 4fd41a8552af ("SCSI: Fix NULL pointer
dereference in runtime PM"), which was added to the mainline kernel
between 4.3 and 4.4.  I don't know what the commit ID would be for a
.stable kernel.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexandre Rossi Feb. 22, 2016, 11:15 p.m. UTC | #7
Hello,

>> > As this is Linux 4.3 and not 4.4, I guess this is a different problem
>> > though. Alexandre, where you able to capture the stack trace? I’d submit
>> > a new bug report with this.
>>
>> Here is a photo. Please ping me if you need to test some debugging patches.
>
> It looks like the problem occurs in blk_post_runtime_resume().  Since
> there have been recent changes to this routine, it's hard to tell
> whether you're using the most up-to-date code.
>
> In particular, the first few lines of blk_post_runtime_resume() in
> block/blk-core.c should look like this:
>
> void blk_post_runtime_resume(struct request_queue *q, int err)
> {
>         if (!q->dev)
>                 return;
>
> The test was introduced by commit 4fd41a8552af ("SCSI: Fix NULL pointer
> dereference in runtime PM"), which was added to the mainline kernel
> between 4.3 and 4.4.  I don't know what the commit ID would be for a
> .stable kernel.

Okay now I've tried with 4.4. The oops does not occur. So this is
fixed for me in 4.4.

If there is interest in backporting to 4.3, 13b438914341 ("SCSI: fix
crashes in sd and sr runtime PM") is not enough to backport. Something
in 4.4, most probably 4fd41a8552af ("SCSI: Fix NULL pointer
dereference in runtime PM") is also needed.

Thanks a lot,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Feb. 23, 2016, 3:14 p.m. UTC | #8
On Tue, 23 Feb 2016, Alexandre Rossi wrote:

> Okay now I've tried with 4.4. The oops does not occur. So this is
> fixed for me in 4.4.
> 
> If there is interest in backporting to 4.3, 13b438914341 ("SCSI: fix
> crashes in sd and sr runtime PM") is not enough to backport. Something
> in 4.4, most probably 4fd41a8552af ("SCSI: Fix NULL pointer
> dereference in runtime PM") is also needed.

Although that commit isn't in 4.3.x yet, it should be added soon.  
Maybe in the next release.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: usb-4.4/drivers/scsi/sd.c
===================================================================
--- usb-4.4.orig/drivers/scsi/sd.c
+++ usb-4.4/drivers/scsi/sd.c
@@ -3275,8 +3275,8 @@  static int sd_suspend_common(struct devi
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
 	int ret = 0;
 
-	if (!sdkp)
-		return 0;	/* this can happen */
+	if (!sdkp)	/* E.g.: runtime suspend following sd_remove() */
+		return 0;
 
 	if (sdkp->WCE && sdkp->media_present) {
 		sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n");
@@ -3315,6 +3315,9 @@  static int sd_resume(struct device *dev)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
 
+	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
+		return 0;
+
 	if (!sdkp->device->manage_start_stop)
 		return 0;
 
Index: usb-4.4/drivers/scsi/sr.c
===================================================================
--- usb-4.4.orig/drivers/scsi/sr.c
+++ usb-4.4/drivers/scsi/sr.c
@@ -144,6 +144,9 @@  static int sr_runtime_suspend(struct dev
 {
 	struct scsi_cd *cd = dev_get_drvdata(dev);
 
+	if (!cd)	/* E.g.: runtime suspend following sr_remove() */
+		return 0;
+
 	if (cd->media_present)
 		return -EBUSY;
 	else
@@ -985,6 +988,7 @@  static int sr_remove(struct device *dev)
 	scsi_autopm_get_device(cd->device);
 
 	del_gendisk(cd->disk);
+	dev_set_drvdata(dev, NULL);
 
 	mutex_lock(&sr_ref_mutex);
 	kref_put(&cd->kref, sr_kref_release);