diff mbox

[0/13,v2] block: Fix block device shutdown related races

Message ID 20170222102425.GB23312@quack2.suse.cz (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kara Feb. 22, 2017, 10:24 a.m. UTC
On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> On 02/21/2017 10:09 AM, Jan Kara wrote:
> > Hello,
> > 
> > this is a second revision of the patch set to fix several different races and
> > issues I've found when testing device shutdown and reuse. The first three
> > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > where get_gendisk() can return already freed gendisk structure (again triggered
> > by Omar's stress test).
> > 
> > People, please have a look at patches. They are mostly simple however the
> > interactions are rather complex so I may have missed something. Also I'm
> > happy for any additional testing these patches can get - I've stressed them
> > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > device inodes.
> > 
> > Jens, I think at least patches 1-3 should go in together with my fixes you
> > already have in your tree (or shortly after them). It is up to you whether
> > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > to use it instead of those patches.
> 
> I have applied 1-3 to my for-linus branch, which will go in after
> the initial pull request has been pulled by Linus. Consider fixing up
> #4 so it applies, I like it.

OK, attached is patch 4 rebased on top of Linus' tree from today which
already has linux-block changes pulled in. I've left put_disk_devt() in
blk_cleanup_queue() to maintain the logic in the original patch (now commit
0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
The bdi_unregister() call that is moved to del_gendisk() by this patch is
now protected by the gendisk reference instead of the request_queue one
so it still maintains the property that devt reference protects bdi
registration-unregistration lifetime (as much as that is not needed anymore
after this patch).

I have also updated the comment in the code and the changelog - they were
somewhat stale after changes to the whole series Tejun suggested.

								Honza

Comments

Omar Sandoval March 1, 2017, 7:25 a.m. UTC | #1
On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > Hello,
> > > 
> > > this is a second revision of the patch set to fix several different races and
> > > issues I've found when testing device shutdown and reuse. The first three
> > > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > > where get_gendisk() can return already freed gendisk structure (again triggered
> > > by Omar's stress test).
> > > 
> > > People, please have a look at patches. They are mostly simple however the
> > > interactions are rather complex so I may have missed something. Also I'm
> > > happy for any additional testing these patches can get - I've stressed them
> > > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > > device inodes.
> > > 
> > > Jens, I think at least patches 1-3 should go in together with my fixes you
> > > already have in your tree (or shortly after them). It is up to you whether
> > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > > to use it instead of those patches.
> > 
> > I have applied 1-3 to my for-linus branch, which will go in after
> > the initial pull request has been pulled by Linus. Consider fixing up
> > #4 so it applies, I like it.
> 
> OK, attached is patch 4 rebased on top of Linus' tree from today which
> already has linux-block changes pulled in. I've left put_disk_devt() in
> blk_cleanup_queue() to maintain the logic in the original patch (now commit
> 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> The bdi_unregister() call that is moved to del_gendisk() by this patch is
> now protected by the gendisk reference instead of the request_queue one
> so it still maintains the property that devt reference protects bdi
> registration-unregistration lifetime (as much as that is not needed anymore
> after this patch).
> 
> I have also updated the comment in the code and the changelog - they were
> somewhat stale after changes to the whole series Tejun suggested.
> 
> 								Honza

Hey, Jan, I just tested this out when I was seeing similar crashes with
sr instead of sd, and this fixed it.

Tested-by: Omar Sandoval <osandov@fb.com>
Omar Sandoval March 1, 2017, 7:26 a.m. UTC | #2
On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
> On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> > On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > > Hello,
> > > > 
> > > > this is a second revision of the patch set to fix several different races and
> > > > issues I've found when testing device shutdown and reuse. The first three
> > > > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > > > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > > > where get_gendisk() can return already freed gendisk structure (again triggered
> > > > by Omar's stress test).
> > > > 
> > > > People, please have a look at patches. They are mostly simple however the
> > > > interactions are rather complex so I may have missed something. Also I'm
> > > > happy for any additional testing these patches can get - I've stressed them
> > > > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > > > device inodes.
> > > > 
> > > > Jens, I think at least patches 1-3 should go in together with my fixes you
> > > > already have in your tree (or shortly after them). It is up to you whether
> > > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > > > to use it instead of those patches.
> > > 
> > > I have applied 1-3 to my for-linus branch, which will go in after
> > > the initial pull request has been pulled by Linus. Consider fixing up
> > > #4 so it applies, I like it.
> > 
> > OK, attached is patch 4 rebased on top of Linus' tree from today which
> > already has linux-block changes pulled in. I've left put_disk_devt() in
> > blk_cleanup_queue() to maintain the logic in the original patch (now commit
> > 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> > The bdi_unregister() call that is moved to del_gendisk() by this patch is
> > now protected by the gendisk reference instead of the request_queue one
> > so it still maintains the property that devt reference protects bdi
> > registration-unregistration lifetime (as much as that is not needed anymore
> > after this patch).
> > 
> > I have also updated the comment in the code and the changelog - they were
> > somewhat stale after changes to the whole series Tejun suggested.
> > 
> > 								Honza
> 
> Hey, Jan, I just tested this out when I was seeing similar crashes with
> sr instead of sd, and this fixed it.
> 
> Tested-by: Omar Sandoval <osandov@fb.com>

Just realized it wasn't clear, I'm talking about patch 4 specifically.
Jan Kara March 1, 2017, 3:11 p.m. UTC | #3
On Tue 28-02-17 23:26:53, Omar Sandoval wrote:
> On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
> > On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> > > On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > > > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > > > Hello,
> > > > > 
> > > > > this is a second revision of the patch set to fix several different races and
> > > > > issues I've found when testing device shutdown and reuse. The first three
> > > > > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > > > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > > > > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > > > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > > > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > > > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > > > > where get_gendisk() can return already freed gendisk structure (again triggered
> > > > > by Omar's stress test).
> > > > > 
> > > > > People, please have a look at patches. They are mostly simple however the
> > > > > interactions are rather complex so I may have missed something. Also I'm
> > > > > happy for any additional testing these patches can get - I've stressed them
> > > > > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > > > > device inodes.
> > > > > 
> > > > > Jens, I think at least patches 1-3 should go in together with my fixes you
> > > > > already have in your tree (or shortly after them). It is up to you whether
> > > > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > > > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > > > > to use it instead of those patches.
> > > > 
> > > > I have applied 1-3 to my for-linus branch, which will go in after
> > > > the initial pull request has been pulled by Linus. Consider fixing up
> > > > #4 so it applies, I like it.
> > > 
> > > OK, attached is patch 4 rebased on top of Linus' tree from today which
> > > already has linux-block changes pulled in. I've left put_disk_devt() in
> > > blk_cleanup_queue() to maintain the logic in the original patch (now commit
> > > 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> > > The bdi_unregister() call that is moved to del_gendisk() by this patch is
> > > now protected by the gendisk reference instead of the request_queue one
> > > so it still maintains the property that devt reference protects bdi
> > > registration-unregistration lifetime (as much as that is not needed anymore
> > > after this patch).
> > > 
> > > I have also updated the comment in the code and the changelog - they were
> > > somewhat stale after changes to the whole series Tejun suggested.
> > > 
> > > 								Honza
> > 
> > Hey, Jan, I just tested this out when I was seeing similar crashes with
> > sr instead of sd, and this fixed it.
> > 
> > Tested-by: Omar Sandoval <osandov@fb.com>
> 
> Just realized it wasn't clear, I'm talking about patch 4 specifically.

Thanks for confirmation!

								Honza
Omar Sandoval March 6, 2017, 8:38 p.m. UTC | #4
On Wed, Mar 01, 2017 at 04:11:09PM +0100, Jan Kara wrote:
> On Tue 28-02-17 23:26:53, Omar Sandoval wrote:
> > On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
> > > On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> > > > On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > > > > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > this is a second revision of the patch set to fix several different races and
> > > > > > issues I've found when testing device shutdown and reuse. The first three
> > > > > > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > > > > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > > > > > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > > > > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > > > > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > > > > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > > > > > where get_gendisk() can return already freed gendisk structure (again triggered
> > > > > > by Omar's stress test).
> > > > > > 
> > > > > > People, please have a look at patches. They are mostly simple however the
> > > > > > interactions are rather complex so I may have missed something. Also I'm
> > > > > > happy for any additional testing these patches can get - I've stressed them
> > > > > > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > > > > > device inodes.
> > > > > > 
> > > > > > Jens, I think at least patches 1-3 should go in together with my fixes you
> > > > > > already have in your tree (or shortly after them). It is up to you whether
> > > > > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > > > > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > > > > > to use it instead of those patches.
> > > > > 
> > > > > I have applied 1-3 to my for-linus branch, which will go in after
> > > > > the initial pull request has been pulled by Linus. Consider fixing up
> > > > > #4 so it applies, I like it.
> > > > 
> > > > OK, attached is patch 4 rebased on top of Linus' tree from today which
> > > > already has linux-block changes pulled in. I've left put_disk_devt() in
> > > > blk_cleanup_queue() to maintain the logic in the original patch (now commit
> > > > 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> > > > The bdi_unregister() call that is moved to del_gendisk() by this patch is
> > > > now protected by the gendisk reference instead of the request_queue one
> > > > so it still maintains the property that devt reference protects bdi
> > > > registration-unregistration lifetime (as much as that is not needed anymore
> > > > after this patch).
> > > > 
> > > > I have also updated the comment in the code and the changelog - they were
> > > > somewhat stale after changes to the whole series Tejun suggested.
> > > > 
> > > > 								Honza
> > > 
> > > Hey, Jan, I just tested this out when I was seeing similar crashes with
> > > sr instead of sd, and this fixed it.
> > > 
> > > Tested-by: Omar Sandoval <osandov@fb.com>
> > 
> > Just realized it wasn't clear, I'm talking about patch 4 specifically.
> 
> Thanks for confirmation!
> 
> 								Honza

Unfortunately, this patch actually seems to have introduced a
regression. Reverting the patch fixes it.

I'm running a QEMU kvm vm with a virtio-scsi device and I get oopses
like this:

[    6.741773] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[    6.744401] IP: virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi]
[    6.744985] PGD 0
[    6.744985]
[    6.744985] Oops: 0002 [#1] PREEMPT SMP
[    6.744985] Modules linked in: virtio_scsi scsi_mod virtio_net
[    6.744985] CPU: 3 PID: 5 Comm: kworker/u8:0 Not tainted 4.11.0-rc1 #36
[    6.744985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
[    6.744985] Workqueue: events_unbound async_run_entry_fn
[    6.744985] task: ffff88007c8672c0 task.stack: ffffc9000033c000
[    6.750038] RIP: 0010:virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi]
[    6.750038] RSP: 0018:ffffc9000033f8b0 EFLAGS: 00010246
[    6.750038] RAX: ffff88007c98d000 RBX: ffff880078c814c8 RCX: 0000000000000000
[    6.750038] RDX: ffff880078c814c8 RSI: ffff880078c814c8 RDI: ffff88007c98d000
[    6.750038] RBP: ffffc9000033f8b0 R08: 0000000000000000 R09: 0000000000000400
[    6.750038] R10: ffff88007b1fe748 R11: 0000000000000024 R12: ffff880078c814c8
[    6.750038] R13: ffff88007c98d000 R14: ffff880078c81380 R15: ffff88007b532000
[    6.750038] FS:  0000000000000000(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000
[    6.750038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.750038] CR2: 0000000000000004 CR3: 0000000001c09000 CR4: 00000000000006e0
[    6.750038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    6.750038] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    6.750038] Call Trace:
[    6.750038]  scsi_dispatch_cmd+0xa3/0x1d0 [scsi_mod]
[    6.750038]  scsi_queue_rq+0x586/0x740 [scsi_mod]
[    6.750038]  blk_mq_dispatch_rq_list+0x22a/0x420
[    6.750038]  blk_mq_sched_dispatch_requests+0x142/0x1b0
[    6.750038]  __blk_mq_run_hw_queue+0x94/0xb0
[    6.750038]  blk_mq_run_hw_queue+0x82/0xa0
[    6.750038]  blk_mq_sched_insert_request+0x89/0x1a0
[    6.750038]  ? blk_execute_rq+0xc0/0xc0
[    6.750038]  blk_execute_rq_nowait+0x9a/0x140
[    6.750038]  ? bio_phys_segments+0x19/0x20
[    6.750038]  blk_execute_rq+0x56/0xc0
[    6.750038]  scsi_execute+0x128/0x270 [scsi_mod]
[    6.750038]  scsi_probe_and_add_lun+0x247/0xc60 [scsi_mod]
[    6.750038]  ? __pm_runtime_resume+0x4c/0x60
[    6.750038]  __scsi_scan_target+0x102/0x520 [scsi_mod]
[    6.750038]  ? account_entity_dequeue+0xab/0xe0
[    6.750038]  scsi_scan_channel+0x81/0xa0 [scsi_mod]
[    6.750038]  scsi_scan_host_selected+0xcc/0x100 [scsi_mod]
[    6.750038]  do_scsi_scan_host+0x8d/0x90 [scsi_mod]
[    6.750038]  do_scan_async+0x1c/0x1a0 [scsi_mod]
[    6.750038]  async_run_entry_fn+0x4a/0x170
[    6.750038]  process_one_work+0x165/0x430
[    6.750038]  worker_thread+0x4e/0x490
[    6.750038]  kthread+0x101/0x140
[    6.750038]  ? process_one_work+0x430/0x430
[    6.750038]  ? kthread_create_on_node+0x60/0x60
[    6.750038]  ret_from_fork+0x2c/0x40
[    6.750038] Code: ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 4e 30 48 89 f8 48 89 f2 48 89 e5 48 8b 89 88 01 00 00 48 8b 89 08 03 00 00 <f0> ff 41 04 48 8d bf c0 07 00 00 48 8d b0 c8 09 00 00 e8 08 fd
[    6.750038] RIP: virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi] RSP: ffffc9000033f8b0
[    6.750038] CR2: 0000000000000004
[    6.750038] ---[ end trace cee5df55872abfa0 ]---
[    6.750038] note: kworker/u8:0[5] exited with preempt_count 1

Arch Linux 4.11.0-rc1 (ttyS0)

silver login: [   27.370267] scsi 0:0:53:0: tag#766 abort
[   27.374501] scsi 0:0:53:0: device reset
[   27.378539] scsi 0:0:53:0: Device offlined - not ready after error recovery

That crash is because tgt is NULL here:

----
static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
					struct scsi_cmnd *sc)
{
	struct virtio_scsi *vscsi = shost_priv(sh);
	struct virtio_scsi_target_state *tgt =
				scsi_target(sc->device)->hostdata;

======> atomic_inc(&tgt->reqs);
	return virtscsi_queuecommand(vscsi, &vscsi->req_vqs[0], sc);
}
----

Applying the rest of this patch set didn't fix it. Additionally,
enabling KASAN gives this use-after-free spew:
----
[    7.789280] SCSI subsystem initialized
[    7.809991] virtio_net virtio0 ens2: renamed from eth0
[    7.828301] scsi host0: Virtio SCSI HBA
[    7.916214] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[    7.946713] ==================================================================
[    7.947609] BUG: KASAN: use-after-free in rb_erase+0x13a2/0x1a20 at addr ffff88006a915e30
[    7.948583] Write of size 8 by task dhcpcd-run-hook/146
[    7.949202] CPU: 0 PID: 146 Comm: dhcpcd-run-hook Not tainted 4.11.0-rc1-00011-gc35ccd2d43e9 #42
[    7.950021] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
[    7.950021] Call Trace:
[    7.950021]  <IRQ>
[    7.950021]  dump_stack+0x63/0x86
[    7.950021]  kasan_object_err+0x21/0x70
[    7.950021]  kasan_report.part.1+0x23a/0x520
[    7.950021]  ? rb_erase+0x13a2/0x1a20
[    7.950021]  ? swake_up_locked+0x94/0x1c0
[    7.950021]  __asan_report_store8_noabort+0x31/0x40
[    7.950021]  rb_erase+0x13a2/0x1a20
[    7.950021]  wb_congested_put+0x6a/0xd0
[    7.950021]  __blkg_release_rcu+0x16b/0x230
[    7.950021]  rcu_process_callbacks+0x602/0xfc0
[    7.950021]  __do_softirq+0x14e/0x5b3
[    7.950021]  irq_exit+0x14e/0x180
[    7.950021]  smp_apic_timer_interrupt+0x7b/0xa0
[    7.950021]  apic_timer_interrupt+0x89/0x90
[    7.950021] RIP: 0010:copy_strings.isra.7+0x312/0x6b0
[    7.950021] RSP: 0018:ffff880063bffcd0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[    7.950021] RAX: 0000000000002000 RBX: 0000000000000fe3 RCX: 0000000000000002
[    7.950021] RDX: ffff88006a734880 RSI: 0000000000000002 RDI: ffff880069750808
[    7.950021] RBP: ffff880063bffdd0 R08: 8000000063ec3067 R09: 0000000000000040
[    7.950021] R10: ffffed000c7d8600 R11: ffffffff82afa6a0 R12: 00007fffffffefe3
[    7.950021] R13: 0000000000000015 R14: ffff880069750780 R15: dffffc0000000000
[    7.950021]  </IRQ>
[    7.950021]  ? set_binfmt+0x120/0x120
[    7.950021]  ? insert_vm_struct+0x148/0x2e0
[    7.950021]  ? kasan_slab_alloc+0x12/0x20
[    7.950021]  ? count.isra.6.constprop.16+0x52/0x100
[    7.950021]  do_execveat_common.isra.14+0xef1/0x1b80
[    7.950021]  ? prepare_bprm_creds+0x100/0x100
[    7.950021]  ? getname_flags+0x90/0x3f0
[    7.950021]  ? __do_page_fault+0x5cc/0xbc0
[    7.950021]  SyS_execve+0x3a/0x50
[    7.950021]  ? ptregs_sys_vfork+0x10/0x10
[    7.950021]  do_syscall_64+0x180/0x380
[    7.950021]  entry_SYSCALL64_slow_path+0x25/0x25
[    7.950021] RIP: 0033:0x7f85de145477
[    7.950021] RSP: 002b:00007ffdf2da3568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[    7.950021] RAX: ffffffffffffffda RBX: 0000000000d771c0 RCX: 00007f85de145477
[    7.950021] RDX: 0000000000d35fb0 RSI: 0000000000d697e0 RDI: 0000000000d771c0
[    7.950021] RBP: 0000000000000000 R08: 00007ffdf2da3560 R09: 0000000000000030
[    7.950021] R10: 000000000000059a R11: 0000000000000202 R12: 0000000000d771c0
[    7.950021] R13: 0000000000d697e0 R14: 0000000000d35fb0 R15: 0000000000000000
[    7.950021] Object at ffff88006a915b00, in cache kmalloc-1024 size: 1024
[    7.950021] Allocated:
[    7.950021] PID = 72
[    7.950021]  save_stack_trace+0x1b/0x20
[    7.950021]  kasan_kmalloc+0xee/0x190
[    7.950021]  kmem_cache_alloc_node_trace+0x138/0x200
[    7.950021]  bdi_alloc_node+0x4c/0xa0
[    7.950021]  blk_alloc_queue_node+0xdd/0x870
[    7.950021]  blk_mq_init_queue+0x41/0x90
[    7.950021]  scsi_mq_alloc_queue+0x41/0x130 [scsi_mod]
[    7.950021]  scsi_alloc_sdev+0x90e/0xd00 [scsi_mod]
[    7.950021]  scsi_probe_and_add_lun+0x14b3/0x2250 [scsi_mod]
[    7.950021]  __scsi_scan_target+0x23d/0xb60 [scsi_mod]
[    7.950021]  scsi_scan_channel+0x107/0x160 [scsi_mod]
[    7.950021]  scsi_scan_host_selected+0x1bf/0x250 [scsi_mod]
[    7.950021]  do_scsi_scan_host+0x1bc/0x250 [scsi_mod]
[    7.950021]  do_scan_async+0x41/0x4b0 [scsi_mod]
[    7.950021]  async_run_entry_fn+0xc3/0x630
[    7.950021]  process_one_work+0x531/0xf40
[    7.950021]  worker_thread+0xe4/0x10d0
[    7.950021]  kthread+0x298/0x390
[    7.950021]  ret_from_fork+0x2c/0x40
[    7.950021] Freed:
[    7.950021] PID = 72
[    7.950021]  save_stack_trace+0x1b/0x20
[    7.950021]  kasan_slab_free+0xb0/0x180
[    7.950021]  kfree+0x9f/0x1d0
[    7.950021]  bdi_put+0x2a/0x30
[    7.950021]  blk_release_queue+0x51/0x320
[    7.950021]  kobject_put+0x154/0x430
[    7.950021]  blk_put_queue+0x15/0x20
[    7.950021]  scsi_device_dev_release_usercontext+0x59b/0x880 [scsi_mod]
[    7.950021]  execute_in_process_context+0xd9/0x130
[    7.950021]  scsi_device_dev_release+0x1c/0x20 [scsi_mod]
[    7.950021]  device_release+0x76/0x1e0
[    7.950021]  kobject_put+0x154/0x430
[    7.950021]  put_device+0x17/0x20
[    7.950021]  __scsi_remove_device+0x18d/0x250 [scsi_mod]
[    7.950021]  scsi_probe_and_add_lun+0x14d6/0x2250 [scsi_mod]
[    7.950021]  __scsi_scan_target+0x23d/0xb60 [scsi_mod]
[    7.950021]  scsi_scan_channel+0x107/0x160 [scsi_mod]
[    7.950021]  scsi_scan_host_selected+0x1bf/0x250 [scsi_mod]
[    7.950021]  do_scsi_scan_host+0x1bc/0x250 [scsi_mod]
[    7.950021]  do_scan_async+0x41/0x4b0 [scsi_mod]
[    7.950021]  async_run_entry_fn+0xc3/0x630
[    7.950021]  process_one_work+0x531/0xf40
[    7.950021]  worker_thread+0xe4/0x10d0
[    7.950021]  kthread+0x298/0x390
[    7.950021]  ret_from_fork+0x2c/0x40
[    7.950021] Memory state around the buggy address:
[    7.950021]  ffff88006a915d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    7.950021]  ffff88006a915d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    7.950021] >ffff88006a915e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    7.950021]                                      ^
[    7.950021]  ffff88006a915e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[    7.950021]  ffff88006a915f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
----

Any ideas Jan?

Thanks.
Jan Kara March 7, 2017, 1:57 p.m. UTC | #5
On Mon 06-03-17 12:38:18, Omar Sandoval wrote:
> On Wed, Mar 01, 2017 at 04:11:09PM +0100, Jan Kara wrote:
> > On Tue 28-02-17 23:26:53, Omar Sandoval wrote:
> > > On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
> > > > On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:
> > > > > On Tue 21-02-17 10:19:28, Jens Axboe wrote:
> > > > > > On 02/21/2017 10:09 AM, Jan Kara wrote:
> > > > > > > Hello,
> > > > > > > 
> > > > > > > this is a second revision of the patch set to fix several different races and
> > > > > > > issues I've found when testing device shutdown and reuse. The first three
> > > > > > > patches are fixes to problems in my previous series fixing BDI lifetime issues.
> > > > > > > Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot
> > > > > > > reproduce the BDI name reuse issues using Omar's stress test using scsi_debug
> > > > > > > so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that
> > > > > > > is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem
> > > > > > > reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code
> > > > > > > where get_gendisk() can return already freed gendisk structure (again triggered
> > > > > > > by Omar's stress test).
> > > > > > > 
> > > > > > > People, please have a look at patches. They are mostly simple however the
> > > > > > > interactions are rather complex so I may have missed something. Also I'm
> > > > > > > happy for any additional testing these patches can get - I've stressed them
> > > > > > > with Omar's script, tested memcg writeback, tested static (not udev managed)
> > > > > > > device inodes.
> > > > > > > 
> > > > > > > Jens, I think at least patches 1-3 should go in together with my fixes you
> > > > > > > already have in your tree (or shortly after them). It is up to you whether
> > > > > > > you decide to delay my first fixes or pick these up quickly. Patch 4 is
> > > > > > > (IMHO a cleaner) replacement of Dan's patches so consider whether you want
> > > > > > > to use it instead of those patches.
> > > > > > 
> > > > > > I have applied 1-3 to my for-linus branch, which will go in after
> > > > > > the initial pull request has been pulled by Linus. Consider fixing up
> > > > > > #4 so it applies, I like it.
> > > > > 
> > > > > OK, attached is patch 4 rebased on top of Linus' tree from today which
> > > > > already has linux-block changes pulled in. I've left put_disk_devt() in
> > > > > blk_cleanup_queue() to maintain the logic in the original patch (now commit
> > > > > 0dba1314d4f8) that request_queue and gendisk each hold one devt reference.
> > > > > The bdi_unregister() call that is moved to del_gendisk() by this patch is
> > > > > now protected by the gendisk reference instead of the request_queue one
> > > > > so it still maintains the property that devt reference protects bdi
> > > > > registration-unregistration lifetime (as much as that is not needed anymore
> > > > > after this patch).
> > > > > 
> > > > > I have also updated the comment in the code and the changelog - they were
> > > > > somewhat stale after changes to the whole series Tejun suggested.
> > > > > 
> > > > > 								Honza
> > > > 
> > > > Hey, Jan, I just tested this out when I was seeing similar crashes with
> > > > sr instead of sd, and this fixed it.
> > > > 
> > > > Tested-by: Omar Sandoval <osandov@fb.com>
> > > 
> > > Just realized it wasn't clear, I'm talking about patch 4 specifically.
> > 
> > Thanks for confirmation!
> > 
> > 								Honza
> 
> Unfortunately, this patch actually seems to have introduced a
> regression. Reverting the patch fixes it.
> 
> I'm running a QEMU kvm vm with a virtio-scsi device and I get oopses
> like this:

What workload do you run? Or is it enough to just boot the kvm with
virtio-scsi enabled? Can you send me the kvm setup so that I can reproduce
this?

At least the KASAN error could be a result of the situation where someone
called bdi_register() but didn't call bdi_unregister() before dropping the
last bdi reference (which would make sense given I've moved
bdi_unregister() call). However I'd expect a warning from bdi_exit() in that
case and I don't see that in your kernel log. So I'll try to reproduce the
issue and debug more.

								Honza

> 
> [    6.741773] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> [    6.744401] IP: virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi]
> [    6.744985] PGD 0
> [    6.744985]
> [    6.744985] Oops: 0002 [#1] PREEMPT SMP
> [    6.744985] Modules linked in: virtio_scsi scsi_mod virtio_net
> [    6.744985] CPU: 3 PID: 5 Comm: kworker/u8:0 Not tainted 4.11.0-rc1 #36
> [    6.744985] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
> [    6.744985] Workqueue: events_unbound async_run_entry_fn
> [    6.744985] task: ffff88007c8672c0 task.stack: ffffc9000033c000
> [    6.750038] RIP: 0010:virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi]
> [    6.750038] RSP: 0018:ffffc9000033f8b0 EFLAGS: 00010246
> [    6.750038] RAX: ffff88007c98d000 RBX: ffff880078c814c8 RCX: 0000000000000000
> [    6.750038] RDX: ffff880078c814c8 RSI: ffff880078c814c8 RDI: ffff88007c98d000
> [    6.750038] RBP: ffffc9000033f8b0 R08: 0000000000000000 R09: 0000000000000400
> [    6.750038] R10: ffff88007b1fe748 R11: 0000000000000024 R12: ffff880078c814c8
> [    6.750038] R13: ffff88007c98d000 R14: ffff880078c81380 R15: ffff88007b532000
> [    6.750038] FS:  0000000000000000(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000
> [    6.750038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    6.750038] CR2: 0000000000000004 CR3: 0000000001c09000 CR4: 00000000000006e0
> [    6.750038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    6.750038] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    6.750038] Call Trace:
> [    6.750038]  scsi_dispatch_cmd+0xa3/0x1d0 [scsi_mod]
> [    6.750038]  scsi_queue_rq+0x586/0x740 [scsi_mod]
> [    6.750038]  blk_mq_dispatch_rq_list+0x22a/0x420
> [    6.750038]  blk_mq_sched_dispatch_requests+0x142/0x1b0
> [    6.750038]  __blk_mq_run_hw_queue+0x94/0xb0
> [    6.750038]  blk_mq_run_hw_queue+0x82/0xa0
> [    6.750038]  blk_mq_sched_insert_request+0x89/0x1a0
> [    6.750038]  ? blk_execute_rq+0xc0/0xc0
> [    6.750038]  blk_execute_rq_nowait+0x9a/0x140
> [    6.750038]  ? bio_phys_segments+0x19/0x20
> [    6.750038]  blk_execute_rq+0x56/0xc0
> [    6.750038]  scsi_execute+0x128/0x270 [scsi_mod]
> [    6.750038]  scsi_probe_and_add_lun+0x247/0xc60 [scsi_mod]
> [    6.750038]  ? __pm_runtime_resume+0x4c/0x60
> [    6.750038]  __scsi_scan_target+0x102/0x520 [scsi_mod]
> [    6.750038]  ? account_entity_dequeue+0xab/0xe0
> [    6.750038]  scsi_scan_channel+0x81/0xa0 [scsi_mod]
> [    6.750038]  scsi_scan_host_selected+0xcc/0x100 [scsi_mod]
> [    6.750038]  do_scsi_scan_host+0x8d/0x90 [scsi_mod]
> [    6.750038]  do_scan_async+0x1c/0x1a0 [scsi_mod]
> [    6.750038]  async_run_entry_fn+0x4a/0x170
> [    6.750038]  process_one_work+0x165/0x430
> [    6.750038]  worker_thread+0x4e/0x490
> [    6.750038]  kthread+0x101/0x140
> [    6.750038]  ? process_one_work+0x430/0x430
> [    6.750038]  ? kthread_create_on_node+0x60/0x60
> [    6.750038]  ret_from_fork+0x2c/0x40
> [    6.750038] Code: ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 4e 30 48 89 f8 48 89 f2 48 89 e5 48 8b 89 88 01 00 00 48 8b 89 08 03 00 00 <f0> ff 41 04 48 8d bf c0 07 00 00 48 8d b0 c8 09 00 00 e8 08 fd
> [    6.750038] RIP: virtscsi_queuecommand_single+0x21/0x40 [virtio_scsi] RSP: ffffc9000033f8b0
> [    6.750038] CR2: 0000000000000004
> [    6.750038] ---[ end trace cee5df55872abfa0 ]---
> [    6.750038] note: kworker/u8:0[5] exited with preempt_count 1
> 
> Arch Linux 4.11.0-rc1 (ttyS0)
> 
> silver login: [   27.370267] scsi 0:0:53:0: tag#766 abort
> [   27.374501] scsi 0:0:53:0: device reset
> [   27.378539] scsi 0:0:53:0: Device offlined - not ready after error recovery
> 
> That crash is because tgt is NULL here:
> 
> ----
> static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
> 					struct scsi_cmnd *sc)
> {
> 	struct virtio_scsi *vscsi = shost_priv(sh);
> 	struct virtio_scsi_target_state *tgt =
> 				scsi_target(sc->device)->hostdata;
> 
> ======> atomic_inc(&tgt->reqs);
> 	return virtscsi_queuecommand(vscsi, &vscsi->req_vqs[0], sc);
> }
> ----
> 
> Applying the rest of this patch set didn't fix it. Additionally,
> enabling KASAN gives this use-after-free spew:
> ----
> [    7.789280] SCSI subsystem initialized
> [    7.809991] virtio_net virtio0 ens2: renamed from eth0
> [    7.828301] scsi host0: Virtio SCSI HBA
> [    7.916214] scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
> [    7.946713] ==================================================================
> [    7.947609] BUG: KASAN: use-after-free in rb_erase+0x13a2/0x1a20 at addr ffff88006a915e30
> [    7.948583] Write of size 8 by task dhcpcd-run-hook/146
> [    7.949202] CPU: 0 PID: 146 Comm: dhcpcd-run-hook Not tainted 4.11.0-rc1-00011-gc35ccd2d43e9 #42
> [    7.950021] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
> [    7.950021] Call Trace:
> [    7.950021]  <IRQ>
> [    7.950021]  dump_stack+0x63/0x86
> [    7.950021]  kasan_object_err+0x21/0x70
> [    7.950021]  kasan_report.part.1+0x23a/0x520
> [    7.950021]  ? rb_erase+0x13a2/0x1a20
> [    7.950021]  ? swake_up_locked+0x94/0x1c0
> [    7.950021]  __asan_report_store8_noabort+0x31/0x40
> [    7.950021]  rb_erase+0x13a2/0x1a20
> [    7.950021]  wb_congested_put+0x6a/0xd0
> [    7.950021]  __blkg_release_rcu+0x16b/0x230
> [    7.950021]  rcu_process_callbacks+0x602/0xfc0
> [    7.950021]  __do_softirq+0x14e/0x5b3
> [    7.950021]  irq_exit+0x14e/0x180
> [    7.950021]  smp_apic_timer_interrupt+0x7b/0xa0
> [    7.950021]  apic_timer_interrupt+0x89/0x90
> [    7.950021] RIP: 0010:copy_strings.isra.7+0x312/0x6b0
> [    7.950021] RSP: 0018:ffff880063bffcd0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
> [    7.950021] RAX: 0000000000002000 RBX: 0000000000000fe3 RCX: 0000000000000002
> [    7.950021] RDX: ffff88006a734880 RSI: 0000000000000002 RDI: ffff880069750808
> [    7.950021] RBP: ffff880063bffdd0 R08: 8000000063ec3067 R09: 0000000000000040
> [    7.950021] R10: ffffed000c7d8600 R11: ffffffff82afa6a0 R12: 00007fffffffefe3
> [    7.950021] R13: 0000000000000015 R14: ffff880069750780 R15: dffffc0000000000
> [    7.950021]  </IRQ>
> [    7.950021]  ? set_binfmt+0x120/0x120
> [    7.950021]  ? insert_vm_struct+0x148/0x2e0
> [    7.950021]  ? kasan_slab_alloc+0x12/0x20
> [    7.950021]  ? count.isra.6.constprop.16+0x52/0x100
> [    7.950021]  do_execveat_common.isra.14+0xef1/0x1b80
> [    7.950021]  ? prepare_bprm_creds+0x100/0x100
> [    7.950021]  ? getname_flags+0x90/0x3f0
> [    7.950021]  ? __do_page_fault+0x5cc/0xbc0
> [    7.950021]  SyS_execve+0x3a/0x50
> [    7.950021]  ? ptregs_sys_vfork+0x10/0x10
> [    7.950021]  do_syscall_64+0x180/0x380
> [    7.950021]  entry_SYSCALL64_slow_path+0x25/0x25
> [    7.950021] RIP: 0033:0x7f85de145477
> [    7.950021] RSP: 002b:00007ffdf2da3568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
> [    7.950021] RAX: ffffffffffffffda RBX: 0000000000d771c0 RCX: 00007f85de145477
> [    7.950021] RDX: 0000000000d35fb0 RSI: 0000000000d697e0 RDI: 0000000000d771c0
> [    7.950021] RBP: 0000000000000000 R08: 00007ffdf2da3560 R09: 0000000000000030
> [    7.950021] R10: 000000000000059a R11: 0000000000000202 R12: 0000000000d771c0
> [    7.950021] R13: 0000000000d697e0 R14: 0000000000d35fb0 R15: 0000000000000000
> [    7.950021] Object at ffff88006a915b00, in cache kmalloc-1024 size: 1024
> [    7.950021] Allocated:
> [    7.950021] PID = 72
> [    7.950021]  save_stack_trace+0x1b/0x20
> [    7.950021]  kasan_kmalloc+0xee/0x190
> [    7.950021]  kmem_cache_alloc_node_trace+0x138/0x200
> [    7.950021]  bdi_alloc_node+0x4c/0xa0
> [    7.950021]  blk_alloc_queue_node+0xdd/0x870
> [    7.950021]  blk_mq_init_queue+0x41/0x90
> [    7.950021]  scsi_mq_alloc_queue+0x41/0x130 [scsi_mod]
> [    7.950021]  scsi_alloc_sdev+0x90e/0xd00 [scsi_mod]
> [    7.950021]  scsi_probe_and_add_lun+0x14b3/0x2250 [scsi_mod]
> [    7.950021]  __scsi_scan_target+0x23d/0xb60 [scsi_mod]
> [    7.950021]  scsi_scan_channel+0x107/0x160 [scsi_mod]
> [    7.950021]  scsi_scan_host_selected+0x1bf/0x250 [scsi_mod]
> [    7.950021]  do_scsi_scan_host+0x1bc/0x250 [scsi_mod]
> [    7.950021]  do_scan_async+0x41/0x4b0 [scsi_mod]
> [    7.950021]  async_run_entry_fn+0xc3/0x630
> [    7.950021]  process_one_work+0x531/0xf40
> [    7.950021]  worker_thread+0xe4/0x10d0
> [    7.950021]  kthread+0x298/0x390
> [    7.950021]  ret_from_fork+0x2c/0x40
> [    7.950021] Freed:
> [    7.950021] PID = 72
> [    7.950021]  save_stack_trace+0x1b/0x20
> [    7.950021]  kasan_slab_free+0xb0/0x180
> [    7.950021]  kfree+0x9f/0x1d0
> [    7.950021]  bdi_put+0x2a/0x30
> [    7.950021]  blk_release_queue+0x51/0x320
> [    7.950021]  kobject_put+0x154/0x430
> [    7.950021]  blk_put_queue+0x15/0x20
> [    7.950021]  scsi_device_dev_release_usercontext+0x59b/0x880 [scsi_mod]
> [    7.950021]  execute_in_process_context+0xd9/0x130
> [    7.950021]  scsi_device_dev_release+0x1c/0x20 [scsi_mod]
> [    7.950021]  device_release+0x76/0x1e0
> [    7.950021]  kobject_put+0x154/0x430
> [    7.950021]  put_device+0x17/0x20
> [    7.950021]  __scsi_remove_device+0x18d/0x250 [scsi_mod]
> [    7.950021]  scsi_probe_and_add_lun+0x14d6/0x2250 [scsi_mod]
> [    7.950021]  __scsi_scan_target+0x23d/0xb60 [scsi_mod]
> [    7.950021]  scsi_scan_channel+0x107/0x160 [scsi_mod]
> [    7.950021]  scsi_scan_host_selected+0x1bf/0x250 [scsi_mod]
> [    7.950021]  do_scsi_scan_host+0x1bc/0x250 [scsi_mod]
> [    7.950021]  do_scan_async+0x41/0x4b0 [scsi_mod]
> [    7.950021]  async_run_entry_fn+0xc3/0x630
> [    7.950021]  process_one_work+0x531/0xf40
> [    7.950021]  worker_thread+0xe4/0x10d0
> [    7.950021]  kthread+0x298/0x390
> [    7.950021]  ret_from_fork+0x2c/0x40
> [    7.950021] Memory state around the buggy address:
> [    7.950021]  ffff88006a915d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    7.950021]  ffff88006a915d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    7.950021] >ffff88006a915e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    7.950021]                                      ^
> [    7.950021]  ffff88006a915e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [    7.950021]  ffff88006a915f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ----
> 
> Any ideas Jan?
> 
> Thanks.
Omar Sandoval March 7, 2017, 4:28 p.m. UTC | #6
On Tue, Mar 07, 2017 at 02:57:30PM +0100, Jan Kara wrote:
> On Mon 06-03-17 12:38:18, Omar Sandoval wrote:
> > Unfortunately, this patch actually seems to have introduced a
> > regression. Reverting the patch fixes it.
> > 
> > I'm running a QEMU kvm vm with a virtio-scsi device and I get oopses
> > like this:
> 
> What workload do you run? Or is it enough to just boot the kvm with
> virtio-scsi enabled? Can you send me the kvm setup so that I can reproduce
> this?

This is just while booting. In fact, you don't even need a virtio-scsi
disk attached, it's enough to have the virtio-scsi-pci controller. This
is my exact command line:

qemu-system-x86_64 -nodefaults -nographic -serial mon:stdio -cpu kvm64 \
	-enable-kvm -smp 1 -m 2G -watchdog i6300esb \
	-netdev user,id=vlan0,hostfwd=tcp:127.0.0.1:2222-:22 \
	-device virtio-net,netdev=vlan0 \
	-drive file=Silver/Silver.qcow2,index=0,media=disk,if=virtio,cache=none \
	-device virtio-scsi-pci \
	-kernel /home/osandov/linux/builds/virtio-scsi-crash/arch/x86/boot/bzImage \
	-virtfs local,path=/home/osandov/linux/builds/virtio-scsi-crash,security_model=none,readonly,mount_tag=modules \
	-append 'root=/dev/vda1 console=ttyS0,115200 scsi_mod.use_blk_mq=y'

My kernel config is here:
https://github.com/osandov/osandov-linux/blob/master/configs/qemu.config

> At least the KASAN error could be a result of the situation where someone
> called bdi_register() but didn't call bdi_unregister() before dropping the
> last bdi reference (which would make sense given I've moved
> bdi_unregister() call). However I'd expect a warning from bdi_exit() in that
> case and I don't see that in your kernel log. So I'll try to reproduce the
> issue and debug more.
> 
> 								Honza

Thanks for taking a look!
diff mbox

Patch

From 9abe9565c83af6b653b159a7bf5b895aff638c65 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 8 Feb 2017 08:05:56 +0100
Subject: [PATCH] block: Move bdi_unregister() to del_gendisk()

Commit 6cd18e711dd8 "block: destroy bdi before blockdev is
unregistered." moved bdi unregistration (at that time through
bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because
it needs to happen before blk_unregister_region() call in del_gendisk()
for MD. SCSI though will free up the device number from sd_remove()
called through a maze of callbacks from device_del() in
__scsi_remove_device() before blk_cleanup_queue() and thus similar races
as described in 6cd18e711dd8 can happen for SCSI as well as reported by
Omar [1].

Moving bdi_unregister() to del_gendisk() works for MD and fixes the
problem for SCSI since del_gendisk() gets called from sd_remove() before
freeing the device number.

This also makes device_add_disk() (calling bdi_register_owner()) more
symmetric with del_gendisk().

[1] http://marc.info/?l=linux-block&m=148554717109098&w=2

Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-core.c | 1 -
 block/genhd.c    | 5 +++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b9e857f4afe8..1086dac8724c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -578,7 +578,6 @@  void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = &q->__queue_lock;
 	spin_unlock_irq(lock);
 
-	bdi_unregister(q->backing_dev_info);
 	put_disk_devt(q->disk_devt);
 
 	/* @q is and will stay empty, shutdown and put */
diff --git a/block/genhd.c b/block/genhd.c
index 2f444b87a5f2..b26a5ea115d0 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -681,6 +681,11 @@  void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
+	/*
+	 * Unregister bdi before releasing device numbers (as they can get
+	 * reused and we'd get clashes in sysfs).
+	 */
+	bdi_unregister(disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
-- 
2.10.2