diff mbox series

rfkill: fix deadlock in rfkill_send_events

Message ID 20231010010814.1799012-2-twuufnxlz@gmail.com (mailing list archive)
State Superseded
Delegated to: Johannes Berg
Headers show
Series rfkill: fix deadlock in rfkill_send_events | expand

Commit Message

Edward AD Oct. 10, 2023, 1:08 a.m. UTC
syzbot report:
syz-executor675/5132 is trying to acquire lock:
ffff8880297ee088 (&data->mtx){+.+.}-{3:3}, at: rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286

but task is already holding lock:
ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&data->mtx);
  lock(&data->mtx);

 *** DEADLOCK ***

In 2c3dfba4cf84 insert rfkill_sync() to rfkill_fop_open(), it will call
rfkill_send_events() and then triger this issue.

Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes")
Reported-and-tested-by: syzbot+509238e523e032442b80@syzkaller.appspotmail.com
Signed-off-by: Edward AD <twuufnxlz@gmail.com>
---
 net/rfkill/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Simon Horman Oct. 13, 2023, 11:06 a.m. UTC | #1
On Tue, Oct 10, 2023 at 09:08:15AM +0800, Edward AD wrote:
> syzbot report:
> syz-executor675/5132 is trying to acquire lock:
> ffff8880297ee088 (&data->mtx){+.+.}-{3:3}, at: rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286
> 
> but task is already holding lock:
> ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&data->mtx);
>   lock(&data->mtx);
> 
>  *** DEADLOCK ***
> 
> In 2c3dfba4cf84 insert rfkill_sync() to rfkill_fop_open(), it will call
> rfkill_send_events() and then triger this issue.
> 
> Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes")
> Reported-and-tested-by: syzbot+509238e523e032442b80@syzkaller.appspotmail.com
> Signed-off-by: Edward AD <twuufnxlz@gmail.com>

Hi Edward,

I am wondering if you considered moving the rfkill_sync() calls
to before &data->mtx is taken, to avoid the need to drop and
retake it?

Perhaps it doesn't work for some reason (compile tested only!).
But this does seem somehow cleaner for me.
Edward AD Oct. 14, 2023, 2:43 a.m. UTC | #2
Hi Simon Horman,
On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> I am wondering if you considered moving the rfkill_sync() calls
> to before &data->mtx is taken, to avoid the need to drop and
> retake it?
If you move rfkill_sync() before calling &data->mtx, more code will be added 
because rfkill_sync() is in the loop body.
> 
> Perhaps it doesn't work for some reason (compile tested only!).
> But this does seem somehow cleaner for me.
BR,
edward
Simon Horman Oct. 14, 2023, 7:29 a.m. UTC | #3
On Sat, Oct 14, 2023 at 10:43:22AM +0800, Edward AD wrote:
> Hi Simon Horman,
> On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> > I am wondering if you considered moving the rfkill_sync() calls
> > to before &data->mtx is taken, to avoid the need to drop and
> > retake it?
> If you move rfkill_sync() before calling &data->mtx, more code will be added 
> because rfkill_sync() is in the loop body.

Maybe that is true. And maybe that is a good argument for
not taking the approach that I suggested. But I do think it
is simpler from a locking perspective, and that has some merit.

> > 
> > Perhaps it doesn't work for some reason (compile tested only!).
> > But this does seem somehow cleaner for me.
> BR,
> edward
>
Johannes Berg Oct. 14, 2023, 8:01 p.m. UTC | #4
On Sat, 2023-10-14 at 09:29 +0200, Simon Horman wrote:
> On Sat, Oct 14, 2023 at 10:43:22AM +0800, Edward AD wrote:
> > Hi Simon Horman,
> > On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> > > I am wondering if you considered moving the rfkill_sync() calls
> > > to before &data->mtx is taken, to avoid the need to drop and
> > > retake it?
> > If you move rfkill_sync() before calling &data->mtx, more code will be added 
> > because rfkill_sync() is in the loop body.
> 
> Maybe that is true. And maybe that is a good argument for
> not taking the approach that I suggested. But I do think it
> is simpler from a locking perspective, and that has some merit.
> 

FWIW, I missed this patch and discussion until now, but I already fixed
the issue differently:

https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless.git/commit/?id=f2ac54ebf85615a6d78f5eb213a8bbeeb17ebe5d

There was never any need to hold the data->mtx for anything but the list
manipulation, and even that isn't _really_ needed since the 'data' is
completely fresh and not seen anywhere else yet.

(I'll also note that the subject of this thread is wrong since this was
never an *actual* deadlock, just a *possible* one reported by lockdep.)

johannes
diff mbox series

Patch

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 08630896b6c8..a14e0d4a0b00 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -1180,7 +1180,6 @@  static int rfkill_fop_open(struct inode *inode, struct file *file)
 	init_waitqueue_head(&data->read_wait);
 
 	mutex_lock(&rfkill_global_mutex);
-	mutex_lock(&data->mtx);
 	/*
 	 * start getting events from elsewhere but hold mtx to get
 	 * startup events added first
@@ -1191,9 +1190,12 @@  static int rfkill_fop_open(struct inode *inode, struct file *file)
 		if (!ev)
 			goto free;
 		rfkill_sync(rfkill);
+		mutex_lock(&data->mtx);
 		rfkill_fill_event(&ev->ev, rfkill, RFKILL_OP_ADD);
 		list_add_tail(&ev->list, &data->events);
+		mutex_unlock(&data->mtx);
 	}
+	mutex_lock(&data->mtx);
 	list_add(&data->list, &rfkill_fds);
 	mutex_unlock(&data->mtx);
 	mutex_unlock(&rfkill_global_mutex);