Message ID | Pine.LNX.4.44L0.1601191532210.1312-200000@iolanthe.rowland.org (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Dear Alan, Am Dienstag, den 19.01.2016, 16:08 -0500 schrieb Alan Stern: > On Tue, 19 Jan 2016, Paul Menzel wrote: > > > Could you please attach the debugging patch. Hopefully Alexandre, Erich, > > or I will have some spare time to build an image from it. > > Actually, this patch is an attempt at a fix. After looking more > carefully at your log pictures, I realized what the problem must be. that indeed fixed it for me. I applied your patch on linux-image-4.4.0-rc8-686 [1] and was able to get to the LUKS passphrase dialog. Awesome! Thank you very, very much! […] Thanks, Paul [1] https://packages.debian.org/experimental/linux-image-4.4.0-rc8-686
Hi, >> Could you please attach the debugging patch. Hopefully Alexandre, Erich, >> or I will have some spare time to build an image from it. > > Actually, this patch is an attempt at a fix. After looking more > carefully at your log pictures, I realized what the problem must be. > > It's too bad nobody was able to capture a log where the error > occurred in sr_runtime_suspend, though -- all the logs in the bug > report show sd_runtime_resume. I just tested the patch applied on top of 4.3.3 (4.3.3-6 in Debian). It still crashes at boot, but the stacktrace is different : it happens in blk_post_runtime_resume . Maybe I'm bit by a different bug or maybe the I need to try with 4.4. I'll post the captured log when I have access to a wired network. I'd be happy to provide the logs of a debugging patch. Alex -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dear Debian and Linux folks, Am Mittwoch, den 20.01.2016, 23:07 +0100 schrieb Alexandre Rossi: > >> Could you please attach the debugging patch. Hopefully Alexandre, Erich, > >> or I will have some spare time to build an image from it. > > > > Actually, this patch is an attempt at a fix. After looking more > > carefully at your log pictures, I realized what the problem must be. > > > > It's too bad nobody was able to capture a log where the error > > occurred in sr_runtime_suspend, though -- all the logs in the bug > > report show sd_runtime_resume. > > I just tested the patch applied on top of 4.3.3 (4.3.3-6 in Debian). > > It still crashes at boot, but the stacktrace is different : it happens > in blk_post_runtime_resume . Maybe I'm bit by a different bug or maybe > the I need to try with 4.4. > > I'll post the captured log when I have access to a wired network. I'd > be happy to provide the logs of a debugging patch. I tried Linux 4.3.5-1 [1], which entered Debian Sid/unstable yesterday, and I get the same null pointer dereference as Alexandre. As this is Linux 4.3 and not 4.4, I guess this is a different problem though. Alexandre, where you able to capture the stack trace? I’d submit a new bug report with this. Thanks, Paul [1] http://metadata.ftp-master.debian.org/changelogs/main/l/linux/linux_4.3.5-1_changelog
Hi, netconsole does not seem to work so early in the boot process this time. > As this is Linux 4.3 and not 4.4, I guess this is a different problem > though. Alexandre, where you able to capture the stack trace? I’d submit > a new bug report with this. Here is a photo. Please ping me if you need to test some debugging patches. Alex
On Tue, 2016-02-09 at 20:56 +0100, Alexandre Rossi wrote: > Hi, > > netconsole does not seem to work so early in the boot process this time. > > > As this is Linux 4.3 and not 4.4, I guess this is a different problem > > though. Alexandre, where you able to capture the stack trace? I’d submit > > a new bug report with this. > > Here is a photo. Please ping me if you need to test some debugging patches. I'm pretty sure this crash is fixed by commit 4fd41a8552af ("SCSI: Fix NULL pointer dereference in runtime PM"), which I've now queued up for 4.3 (though it's already in 4.4 which I'll probably upload to unstable soon). Ben.
On Tue, 9 Feb 2016, Alexandre Rossi wrote: > Hi, > > netconsole does not seem to work so early in the boot process this time. > > > As this is Linux 4.3 and not 4.4, I guess this is a different problem > > though. Alexandre, where you able to capture the stack trace? I’d submit > > a new bug report with this. > > Here is a photo. Please ping me if you need to test some debugging patches. It looks like the problem occurs in blk_post_runtime_resume(). Since there have been recent changes to this routine, it's hard to tell whether you're using the most up-to-date code. In particular, the first few lines of blk_post_runtime_resume() in block/blk-core.c should look like this: void blk_post_runtime_resume(struct request_queue *q, int err) { if (!q->dev) return; The test was introduced by commit 4fd41a8552af ("SCSI: Fix NULL pointer dereference in runtime PM"), which was added to the mainline kernel between 4.3 and 4.4. I don't know what the commit ID would be for a .stable kernel. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, >> > As this is Linux 4.3 and not 4.4, I guess this is a different problem >> > though. Alexandre, where you able to capture the stack trace? I’d submit >> > a new bug report with this. >> >> Here is a photo. Please ping me if you need to test some debugging patches. > > It looks like the problem occurs in blk_post_runtime_resume(). Since > there have been recent changes to this routine, it's hard to tell > whether you're using the most up-to-date code. > > In particular, the first few lines of blk_post_runtime_resume() in > block/blk-core.c should look like this: > > void blk_post_runtime_resume(struct request_queue *q, int err) > { > if (!q->dev) > return; > > The test was introduced by commit 4fd41a8552af ("SCSI: Fix NULL pointer > dereference in runtime PM"), which was added to the mainline kernel > between 4.3 and 4.4. I don't know what the commit ID would be for a > .stable kernel. Okay now I've tried with 4.4. The oops does not occur. So this is fixed for me in 4.4. If there is interest in backporting to 4.3, 13b438914341 ("SCSI: fix crashes in sd and sr runtime PM") is not enough to backport. Something in 4.4, most probably 4fd41a8552af ("SCSI: Fix NULL pointer dereference in runtime PM") is also needed. Thanks a lot, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 23 Feb 2016, Alexandre Rossi wrote: > Okay now I've tried with 4.4. The oops does not occur. So this is > fixed for me in 4.4. > > If there is interest in backporting to 4.3, 13b438914341 ("SCSI: fix > crashes in sd and sr runtime PM") is not enough to backport. Something > in 4.4, most probably 4fd41a8552af ("SCSI: Fix NULL pointer > dereference in runtime PM") is also needed. Although that commit isn't in 4.3.x yet, it should be added soon. Maybe in the next release. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: usb-4.4/drivers/scsi/sd.c =================================================================== --- usb-4.4.orig/drivers/scsi/sd.c +++ usb-4.4/drivers/scsi/sd.c @@ -3275,8 +3275,8 @@ static int sd_suspend_common(struct devi struct scsi_disk *sdkp = dev_get_drvdata(dev); int ret = 0; - if (!sdkp) - return 0; /* this can happen */ + if (!sdkp) /* E.g.: runtime suspend following sd_remove() */ + return 0; if (sdkp->WCE && sdkp->media_present) { sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n"); @@ -3315,6 +3315,9 @@ static int sd_resume(struct device *dev) { struct scsi_disk *sdkp = dev_get_drvdata(dev); + if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ + return 0; + if (!sdkp->device->manage_start_stop) return 0; Index: usb-4.4/drivers/scsi/sr.c =================================================================== --- usb-4.4.orig/drivers/scsi/sr.c +++ usb-4.4/drivers/scsi/sr.c @@ -144,6 +144,9 @@ static int sr_runtime_suspend(struct dev { struct scsi_cd *cd = dev_get_drvdata(dev); + if (!cd) /* E.g.: runtime suspend following sr_remove() */ + return 0; + if (cd->media_present) return -EBUSY; else @@ -985,6 +988,7 @@ static int sr_remove(struct device *dev) scsi_autopm_get_device(cd->device); del_gendisk(cd->disk); + dev_set_drvdata(dev, NULL); mutex_lock(&sr_ref_mutex); kref_put(&cd->kref, sr_kref_release);