rfkill: fix deadlock in rfkill_send_events

Message ID	20231010010814.1799012-2-twuufnxlz@gmail.com (mailing list archive)
State	Superseded
Delegated to:	Johannes Berg
Headers	show Return-Path: <linux-wireless-owner@vger.kernel.org> From: Edward AD <twuufnxlz@gmail.com> To: syzbot+509238e523e032442b80@syzkaller.appspotmail.com Cc: davem@davemloft.net, edumazet@google.com, johannes.berg@intel.com, johannes@sipsolutions.net, kuba@kernel.org, linux-kernel@vger.kernel.org, linux-wireless@vger.kernel.org, netdev@vger.kernel.org, pabeni@redhat.com, syzkaller-bugs@googlegroups.com Subject: [PATCH] rfkill: fix deadlock in rfkill_send_events Date: Tue, 10 Oct 2023 09:08:15 +0800 Message-ID: <20231010010814.1799012-2-twuufnxlz@gmail.com> In-Reply-To: <000000000000e3788c06074e2b84@google.com> References: <000000000000e3788c06074e2b84@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	rfkill: fix deadlock in rfkill_send_events \| expand rfkill: fix deadlock in rfkill_send_events

Message ID

20231010010814.1799012-2-twuufnxlz@gmail.com (mailing list archive)

State

Superseded

Delegated to:

Johannes Berg

Headers

From: Edward AD <twuufnxlz@gmail.com>
To: syzbot+509238e523e032442b80@syzkaller.appspotmail.com
Cc: davem@davemloft.net, edumazet@google.com, johannes.berg@intel.com,
        johannes@sipsolutions.net, kuba@kernel.org,
        linux-kernel@vger.kernel.org, linux-wireless@vger.kernel.org,
        netdev@vger.kernel.org, pabeni@redhat.com,
        syzkaller-bugs@googlegroups.com
Subject: [PATCH] rfkill: fix deadlock in rfkill_send_events
Date: Tue, 10 Oct 2023 09:08:15 +0800
Message-ID: <20231010010814.1799012-2-twuufnxlz@gmail.com>
In-Reply-To: <000000000000e3788c06074e2b84@google.com>
References: <000000000000e3788c06074e2b84@google.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

rfkill: fix deadlock in rfkill_send_events | expand

Commit Message

Edward AD Oct. 10, 2023, 1:08 a.m. UTC

syzbot report:
syz-executor675/5132 is trying to acquire lock:
ffff8880297ee088 (&data->mtx){+.+.}-{3:3}, at: rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286

but task is already holding lock:
ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&data->mtx);
  lock(&data->mtx);

 *** DEADLOCK ***

In 2c3dfba4cf84 insert rfkill_sync() to rfkill_fop_open(), it will call
rfkill_send_events() and then triger this issue.

Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes")
Reported-and-tested-by: syzbot+509238e523e032442b80@syzkaller.appspotmail.com
Signed-off-by: Edward AD <twuufnxlz@gmail.com>
---
 net/rfkill/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Simon Horman Oct. 13, 2023, 11:06 a.m. UTC | #1

On Tue, Oct 10, 2023 at 09:08:15AM +0800, Edward AD wrote:
> syzbot report:
> syz-executor675/5132 is trying to acquire lock:
> ffff8880297ee088 (&data->mtx){+.+.}-{3:3}, at: rfkill_send_events+0x226/0x3f0 net/rfkill/core.c:286
> 
> but task is already holding lock:
> ffff88801bfc0088 (&data->mtx){+.+.}-{3:3}, at: rfkill_fop_open+0x146/0x750 net/rfkill/core.c:1183
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&data->mtx);
>   lock(&data->mtx);
> 
>  *** DEADLOCK ***
> 
> In 2c3dfba4cf84 insert rfkill_sync() to rfkill_fop_open(), it will call
> rfkill_send_events() and then triger this issue.
> 
> Fixes: 2c3dfba4cf84 ("rfkill: sync before userspace visibility/changes")
> Reported-and-tested-by: syzbot+509238e523e032442b80@syzkaller.appspotmail.com
> Signed-off-by: Edward AD <twuufnxlz@gmail.com>

Hi Edward,

I am wondering if you considered moving the rfkill_sync() calls
to before &data->mtx is taken, to avoid the need to drop and
retake it?

Perhaps it doesn't work for some reason (compile tested only!).
But this does seem somehow cleaner for me.

Edward AD Oct. 14, 2023, 2:43 a.m. UTC | #2

Hi Simon Horman,
On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> I am wondering if you considered moving the rfkill_sync() calls
> to before &data->mtx is taken, to avoid the need to drop and
> retake it?
If you move rfkill_sync() before calling &data->mtx, more code will be added 
because rfkill_sync() is in the loop body.
> 
> Perhaps it doesn't work for some reason (compile tested only!).
> But this does seem somehow cleaner for me.
BR,
edward

Simon Horman Oct. 14, 2023, 7:29 a.m. UTC | #3

On Sat, Oct 14, 2023 at 10:43:22AM +0800, Edward AD wrote:
> Hi Simon Horman,
> On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> > I am wondering if you considered moving the rfkill_sync() calls
> > to before &data->mtx is taken, to avoid the need to drop and
> > retake it?
> If you move rfkill_sync() before calling &data->mtx, more code will be added 
> because rfkill_sync() is in the loop body.

Maybe that is true. And maybe that is a good argument for
not taking the approach that I suggested. But I do think it
is simpler from a locking perspective, and that has some merit.

> > 
> > Perhaps it doesn't work for some reason (compile tested only!).
> > But this does seem somehow cleaner for me.
> BR,
> edward
>

Johannes Berg Oct. 14, 2023, 8:01 p.m. UTC | #4

On Sat, 2023-10-14 at 09:29 +0200, Simon Horman wrote:
> On Sat, Oct 14, 2023 at 10:43:22AM +0800, Edward AD wrote:
> > Hi Simon Horman,
> > On Fri, 13 Oct 2023 13:06:38 +0200, Simon Horman wrote:
> > > I am wondering if you considered moving the rfkill_sync() calls
> > > to before &data->mtx is taken, to avoid the need to drop and
> > > retake it?
> > If you move rfkill_sync() before calling &data->mtx, more code will be added 
> > because rfkill_sync() is in the loop body.
> 
> Maybe that is true. And maybe that is a good argument for
> not taking the approach that I suggested. But I do think it
> is simpler from a locking perspective, and that has some merit.
> 

FWIW, I missed this patch and discussion until now, but I already fixed
the issue differently:

https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless.git/commit/?id=f2ac54ebf85615a6d78f5eb213a8bbeeb17ebe5d

There was never any need to hold the data->mtx for anything but the list
manipulation, and even that isn't _really_ needed since the 'data' is
completely fresh and not seen anywhere else yet.

(I'll also note that the subject of this thread is wrong since this was
never an *actual* deadlock, just a *possible* one reported by lockdep.)

johannes

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 08630896b6c8..a14e0d4a0b00 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -1180,7 +1180,6 @@  static int rfkill_fop_open(struct inode *inode, struct file *file)
 	init_waitqueue_head(&data->read_wait);
 
 	mutex_lock(&rfkill_global_mutex);
-	mutex_lock(&data->mtx);
 	/*
 	 * start getting events from elsewhere but hold mtx to get
 	 * startup events added first
@@ -1191,9 +1190,12 @@  static int rfkill_fop_open(struct inode *inode, struct file *file)
 		if (!ev)
 			goto free;
 		rfkill_sync(rfkill);
+		mutex_lock(&data->mtx);
 		rfkill_fill_event(&ev->ev, rfkill, RFKILL_OP_ADD);
 		list_add_tail(&ev->list, &data->events);
+		mutex_unlock(&data->mtx);
 	}
+	mutex_lock(&data->mtx);
 	list_add(&data->list, &rfkill_fds);
 	mutex_unlock(&data->mtx);
 	mutex_unlock(&rfkill_global_mutex);

rfkill: fix deadlock in rfkill_send_events

Commit Message

Comments

Patch