Message ID | 20151001083149.GA13075@xzibit.linux.bs1.fc.nec.co.jp (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Mike Snitzer |
Headers | show |
I think this patch is OK. It should be also backported to stable kernels starting with 3.11. I think older versions are not affected because they don't have srcu. Acked-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # 3.11+ Mikulas On Thu, 1 Oct 2015, Junichi Nomura wrote: > __dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and > suspend_lock in reverse order. That can cause AB-BA deadlock: > > Example: > > __dm_destroy dm_swap_table > --------------------------------------------------- > mutex_lock(suspend_lock) > dm_get_live_table() > srcu_read_lock(io_barrier) > dm_sync_table() > synchronize_srcu(io_barrier) > .. waiting for dm_put_live_table() > mutex_lock(suspend_lock) > .. waiting for suspend_lock > > This patch fixes the lock ordering. > > Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> > Fixes: ab7c7bb6f4ab ("dm: hold suspend_lock while suspending device during device deletion") > Cc: Mikulas Patocka <mpatocka@redhat.com> > --- > The problem could be reproduced with this script but it might take long. > (In my environment, it took more than 10 minutes) > > -- cut here -- > #!/bin/bash > > t0="0 1024 zero" > t1="0 1024 error" > mapname=testmap > > work1() > { > while true; do > dmsetup create --notable $mapname > echo "$t0" | dmsetup load $mapname > dmsetup resume $mapname > dmsetup remove_all > done > } > > work2() > { > while true; do > echo "$t1" | dmsetup load $mapname > dmsetup resume $mapname > echo "$t0" | dmsetup load $mapname > dmsetup resume $mapname > done > } > > work1 & > work2 & > wait > -- cut here -- > > When starting the script, it will emit a lot of errors such as "No such > device or address" and stops when the deadlock occurs. > Backtrace of dmsetup will look like this: > > # ps auxw|grep dmsetup > root 32209 0.0 0.0 130024 3060 pts/0 D+ 03:26 0:00 dmsetup resume testmap > root 32210 0.0 0.0 130024 3048 pts/0 D+ 03:26 0:00 dmsetup remove_all > > # cat /proc/32210/stack > [<ffffffffa00029ea>] __dm_destroy+0xba/0x280 [dm_mod] > [<ffffffffa0003ec3>] dm_destroy+0x13/0x20 [dm_mod] > [<ffffffffa0007edd>] dm_hash_remove_all+0x6d/0x130 [dm_mod] > [<ffffffffa0007fc2>] remove_all+0x22/0x30 [dm_mod] > [<ffffffffa0009a65>] ctl_ioctl+0x255/0x4d0 [dm_mod] > [<ffffffffa0009cf3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] > [<ffffffff81210c82>] do_vfs_ioctl+0x2d2/0x4b0 > [<ffffffff81210ed9>] SyS_ioctl+0x79/0x90 > [<ffffffff816859ee>] entry_SYSCALL_64_fastpath+0x12/0x71 > [<ffffffffffffffff>] 0xffffffffffffffff > > # cat /proc/32209/stack > [<ffffffff810e1d34>] __synchronize_srcu+0xf4/0x130 > [<ffffffff810e1d94>] synchronize_srcu+0x24/0x30 > [<ffffffffa000406d>] dm_swap_table+0x17d/0x2e0 [dm_mod] > [<ffffffffa00090fa>] dev_suspend+0x9a/0x240 [dm_mod] > [<ffffffffa0009a65>] ctl_ioctl+0x255/0x4d0 [dm_mod] > [<ffffffffa0009cf3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] > [<ffffffff81210c82>] do_vfs_ioctl+0x2d2/0x4b0 > [<ffffffff81210ed9>] SyS_ioctl+0x79/0x90 > [<ffffffff816859ee>] entry_SYSCALL_64_fastpath+0x12/0x71 > [<ffffffffffffffff>] 0xffffffffffffffff > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 6264781..7289ece 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -2837,8 +2837,6 @@ static void __dm_destroy(struct mapped_device *md, bool wait) > > might_sleep(); > > - map = dm_get_live_table(md, &srcu_idx); > - > spin_lock(&_minor_lock); > idr_replace(&_minor_idr, MINOR_ALLOCED, MINOR(disk_devt(dm_disk(md)))); > set_bit(DMF_FREEING, &md->flags); > @@ -2852,14 +2850,14 @@ static void __dm_destroy(struct mapped_device *md, bool wait) > * do not race with internal suspend. > */ > mutex_lock(&md->suspend_lock); > + map = dm_get_live_table(md, &srcu_idx); > if (!dm_suspended_md(md)) { > dm_table_presuspend_targets(map); > dm_table_postsuspend_targets(map); > } > - mutex_unlock(&md->suspend_lock); > - > /* dm_put_live_table must be before msleep, otherwise deadlock is possible */ > dm_put_live_table(md, srcu_idx); > + mutex_unlock(&md->suspend_lock); > > /* > * Rare, but there may be I/O requests still going to complete, -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Thu, Oct 01 2015 at 4:31am -0400, Junichi Nomura <j-nomura@ce.jp.nec.com> wrote: > __dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and > suspend_lock in reverse order. That can cause AB-BA deadlock: > > Example: > > __dm_destroy dm_swap_table > --------------------------------------------------- > mutex_lock(suspend_lock) > dm_get_live_table() > srcu_read_lock(io_barrier) > dm_sync_table() > synchronize_srcu(io_barrier) > .. waiting for dm_put_live_table() > mutex_lock(suspend_lock) > .. waiting for suspend_lock > > This patch fixes the lock ordering. > > Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> > Fixes: ab7c7bb6f4ab ("dm: hold suspend_lock while suspending device during device deletion") > Cc: Mikulas Patocka <mpatocka@redhat.com> > --- > The problem could be reproduced with this script but it might take long. > (In my environment, it took more than 10 minutes) Hi, Thanks for fixing this. What prompted you to chase this down? Was it the work you were doing to reproduce Bart's blk-mq mpath failure that exposed this issue? FYI, interestingly, your fix looks to be applicable to this issue too: https://bugzilla.redhat.com/show_bug.cgi?id=1267650 Thanks again, Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 10/02/15 05:45, Mike Snitzer wrote: >> The problem could be reproduced with this script but it might take long. >> (In my environment, it took more than 10 minutes) > > Hi, > > Thanks for fixing this. What prompted you to chase this down? Was it > the work you were doing to reproduce Bart's blk-mq mpath failure that > exposed this issue? Yes. > FYI, interestingly, your fix looks to be applicable to this issue too: > https://bugzilla.redhat.com/show_bug.cgi?id=1267650 Oh, good coincidence.
diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 6264781..7289ece 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2837,8 +2837,6 @@ static void __dm_destroy(struct mapped_device *md, bool wait) might_sleep(); - map = dm_get_live_table(md, &srcu_idx); - spin_lock(&_minor_lock); idr_replace(&_minor_idr, MINOR_ALLOCED, MINOR(disk_devt(dm_disk(md)))); set_bit(DMF_FREEING, &md->flags); @@ -2852,14 +2850,14 @@ static void __dm_destroy(struct mapped_device *md, bool wait) * do not race with internal suspend. */ mutex_lock(&md->suspend_lock); + map = dm_get_live_table(md, &srcu_idx); if (!dm_suspended_md(md)) { dm_table_presuspend_targets(map); dm_table_postsuspend_targets(map); } - mutex_unlock(&md->suspend_lock); - /* dm_put_live_table must be before msleep, otherwise deadlock is possible */ dm_put_live_table(md, srcu_idx); + mutex_unlock(&md->suspend_lock); /* * Rare, but there may be I/O requests still going to complete,
__dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and suspend_lock in reverse order. That can cause AB-BA deadlock: Example: __dm_destroy dm_swap_table --------------------------------------------------- mutex_lock(suspend_lock) dm_get_live_table() srcu_read_lock(io_barrier) dm_sync_table() synchronize_srcu(io_barrier) .. waiting for dm_put_live_table() mutex_lock(suspend_lock) .. waiting for suspend_lock This patch fixes the lock ordering. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Fixes: ab7c7bb6f4ab ("dm: hold suspend_lock while suspending device during device deletion") Cc: Mikulas Patocka <mpatocka@redhat.com> --- The problem could be reproduced with this script but it might take long. (In my environment, it took more than 10 minutes) -- cut here -- #!/bin/bash t0="0 1024 zero" t1="0 1024 error" mapname=testmap work1() { while true; do dmsetup create --notable $mapname echo "$t0" | dmsetup load $mapname dmsetup resume $mapname dmsetup remove_all done } work2() { while true; do echo "$t1" | dmsetup load $mapname dmsetup resume $mapname echo "$t0" | dmsetup load $mapname dmsetup resume $mapname done } work1 & work2 & wait -- cut here -- When starting the script, it will emit a lot of errors such as "No such device or address" and stops when the deadlock occurs. Backtrace of dmsetup will look like this: # ps auxw|grep dmsetup root 32209 0.0 0.0 130024 3060 pts/0 D+ 03:26 0:00 dmsetup resume testmap root 32210 0.0 0.0 130024 3048 pts/0 D+ 03:26 0:00 dmsetup remove_all # cat /proc/32210/stack [<ffffffffa00029ea>] __dm_destroy+0xba/0x280 [dm_mod] [<ffffffffa0003ec3>] dm_destroy+0x13/0x20 [dm_mod] [<ffffffffa0007edd>] dm_hash_remove_all+0x6d/0x130 [dm_mod] [<ffffffffa0007fc2>] remove_all+0x22/0x30 [dm_mod] [<ffffffffa0009a65>] ctl_ioctl+0x255/0x4d0 [dm_mod] [<ffffffffa0009cf3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [<ffffffff81210c82>] do_vfs_ioctl+0x2d2/0x4b0 [<ffffffff81210ed9>] SyS_ioctl+0x79/0x90 [<ffffffff816859ee>] entry_SYSCALL_64_fastpath+0x12/0x71 [<ffffffffffffffff>] 0xffffffffffffffff # cat /proc/32209/stack [<ffffffff810e1d34>] __synchronize_srcu+0xf4/0x130 [<ffffffff810e1d94>] synchronize_srcu+0x24/0x30 [<ffffffffa000406d>] dm_swap_table+0x17d/0x2e0 [dm_mod] [<ffffffffa00090fa>] dev_suspend+0x9a/0x240 [dm_mod] [<ffffffffa0009a65>] ctl_ioctl+0x255/0x4d0 [dm_mod] [<ffffffffa0009cf3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [<ffffffff81210c82>] do_vfs_ioctl+0x2d2/0x4b0 [<ffffffff81210ed9>] SyS_ioctl+0x79/0x90 [<ffffffff816859ee>] entry_SYSCALL_64_fastpath+0x12/0x71 [<ffffffffffffffff>] 0xffffffffffffffff -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel