diff mbox series

loop: drop loop_ctl_mutex around del_gendisk() in loop_remove()

Message ID d15e9392-44d0-f42c-cbac-859459a99395@i-love.sakura.ne.jp (mailing list archive)
State New, archived
Headers show
Series loop: drop loop_ctl_mutex around del_gendisk() in loop_remove() | expand

Commit Message

Tetsuo Handa June 11, 2021, 3:14 p.m. UTC
syzbot is reporting circular locking dependency between loop_ctl_mutex and
bdev->bd_mutex [1] due to commit c76f48eb5c084b1e ("block: take bd_mutex
around delete_partitions in del_gendisk").

But calling del_gendisk() from loop_remove() without loop_ctl_mutex held
triggers a different race problem regarding sysfs entry management. We
somehow need to serialize "add_disk() from loop_add()" and "del_gendisk()
 from loop_remove()". Fortunately, since loop_control_ioctl() is called
with no locks held, we can use "sleep and retry" approach without risking
deadlock.

Since "struct loop_device"->lo_disk->private_data is set to non-NULL at
loop_add() and is reset to NULL before calling loop_remove(), we can use
it as a flag for taking appropriate action ("sleep and retry" or "skip")
when loop_remove() is in progress.

Link: https://syzkaller.appspot.com/bug?extid=61e04e51b7ac86930589 [1]
Reported-by: syzbot <syzbot+61e04e51b7ac86930589@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+61e04e51b7ac86930589@syzkaller.appspotmail.com>
Fixes: c76f48eb5c084b1e ("block: take bd_mutex around delete_partitions in del_gendisk")
---
 drivers/block/loop.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Tetsuo Handa June 15, 2021, 5:30 a.m. UTC | #1
Found that commit 990e78116d38059c ("block: loop: fix deadlock between open and remove") went to 5.13-rc6.

#syz fix: block: loop: fix deadlock between open and remove

Now syzbot is reporting

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(nbd_index_mutex);
                               lock(&bdev->bd_mutex);
                               lock(nbd_index_mutex);
  lock(&bdev->bd_mutex);

at https://syzkaller.appspot.com/text?tag=CrashReport&x=11b8a5bbd00000 .
syzbot June 15, 2021, 5:31 a.m. UTC | #2
> Found that commit 990e78116d38059c ("block: loop: fix deadlock between open and remove") went to 5.13-rc6.
>
> #syz fix: block: loop: fix deadlock between open and remove

Your 'fix:' command is accepted, but please keep syzkaller-bugs@googlegroups.com mailing list in CC next time. It serves as a history of what happened with each bug report. Thank you.

>
> Now syzbot is reporting
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(nbd_index_mutex);
>                                lock(&bdev->bd_mutex);
>                                lock(nbd_index_mutex);
>   lock(&bdev->bd_mutex);
>
> at https://syzkaller.appspot.com/text?tag=CrashReport&x=11b8a5bbd00000 .
>
Jan Kara June 15, 2021, 9:04 a.m. UTC | #3
On Sat 12-06-21 00:14:20, Tetsuo Handa wrote:
> syzbot is reporting circular locking dependency between loop_ctl_mutex and
> bdev->bd_mutex [1] due to commit c76f48eb5c084b1e ("block: take bd_mutex
> around delete_partitions in del_gendisk").
> 
> But calling del_gendisk() from loop_remove() without loop_ctl_mutex held
> triggers a different race problem regarding sysfs entry management. We
> somehow need to serialize "add_disk() from loop_add()" and "del_gendisk()
>  from loop_remove()". Fortunately, since loop_control_ioctl() is called
> with no locks held, we can use "sleep and retry" approach without risking
> deadlock.
> 
> Since "struct loop_device"->lo_disk->private_data is set to non-NULL at
> loop_add() and is reset to NULL before calling loop_remove(), we can use
> it as a flag for taking appropriate action ("sleep and retry" or "skip")
> when loop_remove() is in progress.
> 
> Link: https://syzkaller.appspot.com/bug?extid=61e04e51b7ac86930589 [1]
> Reported-by: syzbot <syzbot+61e04e51b7ac86930589@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Tested-by: syzbot <syzbot+61e04e51b7ac86930589@syzkaller.appspotmail.com>
> Fixes: c76f48eb5c084b1e ("block: take bd_mutex around delete_partitions in del_gendisk")

Christoph seems to have already fixed this by 990e78116d380 ("block: loop:
fix deadlock between open and remove").

								Honza
diff mbox series

Patch

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d58d68f3c7cd..d4c9567b2904 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -2188,7 +2188,9 @@  static int loop_add(struct loop_device **l, int i)
 
 static void loop_remove(struct loop_device *lo)
 {
+	mutex_unlock(&loop_ctl_mutex);
 	del_gendisk(lo->lo_disk);
+	mutex_lock(&loop_ctl_mutex);
 	blk_cleanup_queue(lo->lo_queue);
 	blk_mq_free_tag_set(&lo->tag_set);
 	put_disk(lo->lo_disk);
@@ -2201,7 +2203,7 @@  static int find_free_cb(int id, void *ptr, void *data)
 	struct loop_device *lo = ptr;
 	struct loop_device **l = data;
 
-	if (lo->lo_state == Lo_unbound) {
+	if (lo->lo_state == Lo_unbound && lo->lo_disk->private_data) {
 		*l = lo;
 		return 1;
 	}
@@ -2254,6 +2256,13 @@  static long loop_control_ioctl(struct file *file, unsigned int cmd,
 	struct loop_device *lo;
 	int ret;
 
+	if (0) {
+unlock_and_retry:
+		mutex_unlock(&loop_ctl_mutex);
+		if (schedule_timeout_killable(HZ / 10))
+			return -EINTR;
+	}
+	debug_check_no_locks_held();
 	ret = mutex_lock_killable(&loop_ctl_mutex);
 	if (ret)
 		return ret;
@@ -2263,6 +2272,8 @@  static long loop_control_ioctl(struct file *file, unsigned int cmd,
 	case LOOP_CTL_ADD:
 		ret = loop_lookup(&lo, parm);
 		if (ret >= 0) {
+			if (!lo->lo_disk->private_data)
+				goto unlock_and_retry;
 			ret = -EEXIST;
 			break;
 		}
@@ -2272,6 +2283,8 @@  static long loop_control_ioctl(struct file *file, unsigned int cmd,
 		ret = loop_lookup(&lo, parm);
 		if (ret < 0)
 			break;
+		if (!lo->lo_disk->private_data)
+			goto unlock_and_retry;
 		ret = mutex_lock_killable(&lo->lo_mutex);
 		if (ret)
 			break;
@@ -2286,9 +2299,10 @@  static long loop_control_ioctl(struct file *file, unsigned int cmd,
 			break;
 		}
 		lo->lo_disk->private_data = NULL;
+		parm = lo->lo_number;
 		mutex_unlock(&lo->lo_mutex);
-		idr_remove(&loop_index_idr, lo->lo_number);
 		loop_remove(lo);
+		idr_remove(&loop_index_idr, parm);
 		break;
 	case LOOP_CTL_GET_FREE:
 		ret = loop_lookup(&lo, -1);