Message ID | 20231010010814.1799012-2-twuufnxlz@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Johannes Berg |
Headers | show |
Series | rfkill: fix deadlock in rfkill_send_events | expand |
On Tue, Oct 10, 2023 at 09:08:15AM +0800, Edward AD wrote: > syzbot report: > syz-executor675/5132 is trying to acquire lock: > ffff8880297ee088 (&data->mtx){+.+.}-{3:3}, at: rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286 > > but task is already holding lock: > ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&data->mtx); > lock(&data->mtx); > > *** DEADLOCK *** > > In 2c3dfba4cf84 insert rfkill_sync() to rfkill_fop_open(), it will call > rfkill_send_events() and then triger this issue. > > Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes") > Reported-and-tested-by: syzbot+509238e523e032442b80@syzkaller.appspotmail.com > Signed-off-by: Edward AD <twuufnxlz@gmail.com> Hi Edward, I am wondering if you considered moving the rfkill_sync() calls to before &data->mtx is taken, to avoid the need to drop and retake it? Perhaps it doesn't work for some reason (compile tested only!). But this does seem somehow cleaner for me.
Hi Simon Horman, On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote: > I am wondering if you considered moving the rfkill_sync() calls > to before &data->mtx is taken, to avoid the need to drop and > retake it? If you move rfkill_sync() before calling &data->mtx, more code will be added because rfkill_sync() is in the loop body. > > Perhaps it doesn't work for some reason (compile tested only!). > But this does seem somehow cleaner for me. BR, edward
On Sat, Oct 14, 2023 at 10:43:22AM +0800, Edward AD wrote: > Hi Simon Horman, > On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote: > > I am wondering if you considered moving the rfkill_sync() calls > > to before &data->mtx is taken, to avoid the need to drop and > > retake it? > If you move rfkill_sync() before calling &data->mtx, more code will be added > because rfkill_sync() is in the loop body. Maybe that is true. And maybe that is a good argument for not taking the approach that I suggested. But I do think it is simpler from a locking perspective, and that has some merit. > > > > Perhaps it doesn't work for some reason (compile tested only!). > > But this does seem somehow cleaner for me. > BR, > edward >
On Sat, 2023-10-14 at 09:29 +0200, Simon Horman wrote: > On Sat, Oct 14, 2023 at 10:43:22AM +0800, Edward AD wrote: > > Hi Simon Horman, > > On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote: > > > I am wondering if you considered moving the rfkill_sync() calls > > > to before &data->mtx is taken, to avoid the need to drop and > > > retake it? > > If you move rfkill_sync() before calling &data->mtx, more code will be added > > because rfkill_sync() is in the loop body. > > Maybe that is true. And maybe that is a good argument for > not taking the approach that I suggested. But I do think it > is simpler from a locking perspective, and that has some merit. > FWIW, I missed this patch and discussion until now, but I already fixed the issue differently: https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless.git/commit/?id=f2ac54ebf85615a6d78f5eb213a8bbeeb17ebe5d There was never any need to hold the data->mtx for anything but the list manipulation, and even that isn't _really_ needed since the 'data' is completely fresh and not seen anywhere else yet. (I'll also note that the subject of this thread is wrong since this was never an *actual* deadlock, just a *possible* one reported by lockdep.) johannes
diff --git a/net/rfkill/core.c b/net/rfkill/core.c index 08630896b6c8..a14e0d4a0b00 100644 --- a/net/rfkill/core.c +++ b/net/rfkill/core.c @@ -1180,7 +1180,6 @@ static int rfkill_fop_open(struct inode *inode, struct file *file) init_waitqueue_head(&data->read_wait); mutex_lock(&rfkill_global_mutex); - mutex_lock(&data->mtx); /* * start getting events from elsewhere but hold mtx to get * startup events added first @@ -1191,9 +1190,12 @@ static int rfkill_fop_open(struct inode *inode, struct file *file) if (!ev) goto free; rfkill_sync(rfkill); + mutex_lock(&data->mtx); rfkill_fill_event(&ev->ev, rfkill, RFKILL_OP_ADD); list_add_tail(&ev->list, &data->events); + mutex_unlock(&data->mtx); } + mutex_lock(&data->mtx); list_add(&data->list, &rfkill_fds); mutex_unlock(&data->mtx); mutex_unlock(&rfkill_global_mutex);
syzbot report: syz-executor675/5132 is trying to acquire lock: ffff8880297ee088 (&data->mtx){+.+.}-{3:3}, at: rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286 but task is already holding lock: ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&data->mtx); lock(&data->mtx); *** DEADLOCK *** In 2c3dfba4cf84 insert rfkill_sync() to rfkill_fop_open(), it will call rfkill_send_events() and then triger this issue. Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes") Reported-and-tested-by: syzbot+509238e523e032442b80@syzkaller.appspotmail.com Signed-off-by: Edward AD <twuufnxlz@gmail.com> --- net/rfkill/core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)