diff mbox

dm: fix AB-BA deadlock in __dm_destroy()

Message ID 20151001083149.GA13075@xzibit.linux.bs1.fc.nec.co.jp (mailing list archive)
State Accepted, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Junichi Nomura Oct. 1, 2015, 8:31 a.m. UTC
__dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and
suspend_lock in reverse order. That can cause AB-BA deadlock:

Example:

  __dm_destroy                    dm_swap_table
  ---------------------------------------------------
                                  mutex_lock(suspend_lock)
  dm_get_live_table()
    srcu_read_lock(io_barrier)
                                  dm_sync_table()
                                    synchronize_srcu(io_barrier)
                                      .. waiting for dm_put_live_table()
  mutex_lock(suspend_lock)
    .. waiting for suspend_lock

This patch fixes the lock ordering.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Fixes: ab7c7bb6f4ab ("dm: hold suspend_lock while suspending device during device deletion")
Cc: Mikulas Patocka <mpatocka@redhat.com>
---
The problem could be reproduced with this script but it might take long.
(In my environment, it took more than 10 minutes)

-- cut here --
#!/bin/bash

t0="0 1024 zero"
t1="0 1024 error"
mapname=testmap

work1()
{
	while true; do
		dmsetup create --notable $mapname
		echo "$t0" | dmsetup load $mapname
		dmsetup resume $mapname
		dmsetup remove_all
	done
}

work2()
{
	while true; do
		echo "$t1" | dmsetup load $mapname
		dmsetup resume $mapname
		echo "$t0" | dmsetup load $mapname
		dmsetup resume $mapname
	done
}

work1 &
work2 &
wait
-- cut here --

When starting the script, it will emit a lot of errors such as "No such
device or address" and stops when the deadlock occurs.
Backtrace of dmsetup will look like this:

  # ps auxw|grep dmsetup
  root     32209  0.0  0.0 130024  3060 pts/0    D+   03:26   0:00 dmsetup resume testmap
  root     32210  0.0  0.0 130024  3048 pts/0    D+   03:26   0:00 dmsetup remove_all

  # cat /proc/32210/stack 
  [<ffffffffa00029ea>] __dm_destroy+0xba/0x280 [dm_mod]
  [<ffffffffa0003ec3>] dm_destroy+0x13/0x20 [dm_mod]
  [<ffffffffa0007edd>] dm_hash_remove_all+0x6d/0x130 [dm_mod]
  [<ffffffffa0007fc2>] remove_all+0x22/0x30 [dm_mod]
  [<ffffffffa0009a65>] ctl_ioctl+0x255/0x4d0 [dm_mod]
  [<ffffffffa0009cf3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
  [<ffffffff81210c82>] do_vfs_ioctl+0x2d2/0x4b0
  [<ffffffff81210ed9>] SyS_ioctl+0x79/0x90
  [<ffffffff816859ee>] entry_SYSCALL_64_fastpath+0x12/0x71
  [<ffffffffffffffff>] 0xffffffffffffffff

  # cat /proc/32209/stack 
  [<ffffffff810e1d34>] __synchronize_srcu+0xf4/0x130
  [<ffffffff810e1d94>] synchronize_srcu+0x24/0x30
  [<ffffffffa000406d>] dm_swap_table+0x17d/0x2e0 [dm_mod]
  [<ffffffffa00090fa>] dev_suspend+0x9a/0x240 [dm_mod]
  [<ffffffffa0009a65>] ctl_ioctl+0x255/0x4d0 [dm_mod]
  [<ffffffffa0009cf3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
  [<ffffffff81210c82>] do_vfs_ioctl+0x2d2/0x4b0
  [<ffffffff81210ed9>] SyS_ioctl+0x79/0x90
  [<ffffffff816859ee>] entry_SYSCALL_64_fastpath+0x12/0x71
  [<ffffffffffffffff>] 0xffffffffffffffff


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Mikulas Patocka Oct. 1, 2015, 12:56 p.m. UTC | #1
I think this patch is OK.

It should be also backported to stable kernels starting with 3.11. I think 
older versions are not affected because they don't have srcu.

Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org	# 3.11+

Mikulas



On Thu, 1 Oct 2015, Junichi Nomura wrote:

> __dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and
> suspend_lock in reverse order. That can cause AB-BA deadlock:
> 
> Example:
> 
>   __dm_destroy                    dm_swap_table
>   ---------------------------------------------------
>                                   mutex_lock(suspend_lock)
>   dm_get_live_table()
>     srcu_read_lock(io_barrier)
>                                   dm_sync_table()
>                                     synchronize_srcu(io_barrier)
>                                       .. waiting for dm_put_live_table()
>   mutex_lock(suspend_lock)
>     .. waiting for suspend_lock
> 
> This patch fixes the lock ordering.
> 
> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Fixes: ab7c7bb6f4ab ("dm: hold suspend_lock while suspending device during device deletion")
> Cc: Mikulas Patocka <mpatocka@redhat.com>
> ---
> The problem could be reproduced with this script but it might take long.
> (In my environment, it took more than 10 minutes)
> 
> -- cut here --
> #!/bin/bash
> 
> t0="0 1024 zero"
> t1="0 1024 error"
> mapname=testmap
> 
> work1()
> {
> 	while true; do
> 		dmsetup create --notable $mapname
> 		echo "$t0" | dmsetup load $mapname
> 		dmsetup resume $mapname
> 		dmsetup remove_all
> 	done
> }
> 
> work2()
> {
> 	while true; do
> 		echo "$t1" | dmsetup load $mapname
> 		dmsetup resume $mapname
> 		echo "$t0" | dmsetup load $mapname
> 		dmsetup resume $mapname
> 	done
> }
> 
> work1 &
> work2 &
> wait
> -- cut here --
> 
> When starting the script, it will emit a lot of errors such as "No such
> device or address" and stops when the deadlock occurs.
> Backtrace of dmsetup will look like this:
> 
>   # ps auxw|grep dmsetup
>   root     32209  0.0  0.0 130024  3060 pts/0    D+   03:26   0:00 dmsetup resume testmap
>   root     32210  0.0  0.0 130024  3048 pts/0    D+   03:26   0:00 dmsetup remove_all
> 
>   # cat /proc/32210/stack 
>   [<ffffffffa00029ea>] __dm_destroy+0xba/0x280 [dm_mod]
>   [<ffffffffa0003ec3>] dm_destroy+0x13/0x20 [dm_mod]
>   [<ffffffffa0007edd>] dm_hash_remove_all+0x6d/0x130 [dm_mod]
>   [<ffffffffa0007fc2>] remove_all+0x22/0x30 [dm_mod]
>   [<ffffffffa0009a65>] ctl_ioctl+0x255/0x4d0 [dm_mod]
>   [<ffffffffa0009cf3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
>   [<ffffffff81210c82>] do_vfs_ioctl+0x2d2/0x4b0
>   [<ffffffff81210ed9>] SyS_ioctl+0x79/0x90
>   [<ffffffff816859ee>] entry_SYSCALL_64_fastpath+0x12/0x71
>   [<ffffffffffffffff>] 0xffffffffffffffff
> 
>   # cat /proc/32209/stack 
>   [<ffffffff810e1d34>] __synchronize_srcu+0xf4/0x130
>   [<ffffffff810e1d94>] synchronize_srcu+0x24/0x30
>   [<ffffffffa000406d>] dm_swap_table+0x17d/0x2e0 [dm_mod]
>   [<ffffffffa00090fa>] dev_suspend+0x9a/0x240 [dm_mod]
>   [<ffffffffa0009a65>] ctl_ioctl+0x255/0x4d0 [dm_mod]
>   [<ffffffffa0009cf3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
>   [<ffffffff81210c82>] do_vfs_ioctl+0x2d2/0x4b0
>   [<ffffffff81210ed9>] SyS_ioctl+0x79/0x90
>   [<ffffffff816859ee>] entry_SYSCALL_64_fastpath+0x12/0x71
>   [<ffffffffffffffff>] 0xffffffffffffffff
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 6264781..7289ece 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2837,8 +2837,6 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
>  
>  	might_sleep();
>  
> -	map = dm_get_live_table(md, &srcu_idx);
> -
>  	spin_lock(&_minor_lock);
>  	idr_replace(&_minor_idr, MINOR_ALLOCED, MINOR(disk_devt(dm_disk(md))));
>  	set_bit(DMF_FREEING, &md->flags);
> @@ -2852,14 +2850,14 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
>  	 * do not race with internal suspend.
>  	 */
>  	mutex_lock(&md->suspend_lock);
> +	map = dm_get_live_table(md, &srcu_idx);
>  	if (!dm_suspended_md(md)) {
>  		dm_table_presuspend_targets(map);
>  		dm_table_postsuspend_targets(map);
>  	}
> -	mutex_unlock(&md->suspend_lock);
> -
>  	/* dm_put_live_table must be before msleep, otherwise deadlock is possible */
>  	dm_put_live_table(md, srcu_idx);
> +	mutex_unlock(&md->suspend_lock);
>  
>  	/*
>  	 * Rare, but there may be I/O requests still going to complete,

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer Oct. 1, 2015, 8:45 p.m. UTC | #2
On Thu, Oct 01 2015 at  4:31am -0400,
Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> __dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and
> suspend_lock in reverse order. That can cause AB-BA deadlock:
> 
> Example:
> 
>   __dm_destroy                    dm_swap_table
>   ---------------------------------------------------
>                                   mutex_lock(suspend_lock)
>   dm_get_live_table()
>     srcu_read_lock(io_barrier)
>                                   dm_sync_table()
>                                     synchronize_srcu(io_barrier)
>                                       .. waiting for dm_put_live_table()
>   mutex_lock(suspend_lock)
>     .. waiting for suspend_lock
> 
> This patch fixes the lock ordering.
> 
> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Fixes: ab7c7bb6f4ab ("dm: hold suspend_lock while suspending device during device deletion")
> Cc: Mikulas Patocka <mpatocka@redhat.com>

> ---
> The problem could be reproduced with this script but it might take long.
> (In my environment, it took more than 10 minutes)

Hi,

Thanks for fixing this.  What prompted you to chase this down?  Was it
the work you were doing to reproduce Bart's blk-mq mpath failure that
exposed this issue?

FYI, interestingly, your fix looks to be applicable to this issue too:
https://bugzilla.redhat.com/show_bug.cgi?id=1267650

Thanks again,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Junichi Nomura Oct. 2, 2015, 2:45 a.m. UTC | #3
On 10/02/15 05:45, Mike Snitzer wrote:
>> The problem could be reproduced with this script but it might take long.
>> (In my environment, it took more than 10 minutes)
> 
> Hi,
> 
> Thanks for fixing this.  What prompted you to chase this down?  Was it
> the work you were doing to reproduce Bart's blk-mq mpath failure that
> exposed this issue?

Yes. 

> FYI, interestingly, your fix looks to be applicable to this issue too:
> https://bugzilla.redhat.com/show_bug.cgi?id=1267650

Oh, good coincidence.
diff mbox

Patch

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 6264781..7289ece 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2837,8 +2837,6 @@  static void __dm_destroy(struct mapped_device *md, bool wait)
 
 	might_sleep();
 
-	map = dm_get_live_table(md, &srcu_idx);
-
 	spin_lock(&_minor_lock);
 	idr_replace(&_minor_idr, MINOR_ALLOCED, MINOR(disk_devt(dm_disk(md))));
 	set_bit(DMF_FREEING, &md->flags);
@@ -2852,14 +2850,14 @@  static void __dm_destroy(struct mapped_device *md, bool wait)
 	 * do not race with internal suspend.
 	 */
 	mutex_lock(&md->suspend_lock);
+	map = dm_get_live_table(md, &srcu_idx);
 	if (!dm_suspended_md(md)) {
 		dm_table_presuspend_targets(map);
 		dm_table_postsuspend_targets(map);
 	}
-	mutex_unlock(&md->suspend_lock);
-
 	/* dm_put_live_table must be before msleep, otherwise deadlock is possible */
 	dm_put_live_table(md, srcu_idx);
+	mutex_unlock(&md->suspend_lock);
 
 	/*
 	 * Rare, but there may be I/O requests still going to complete,