diff mbox series

[v2,net-next,9/9] l2tp: flush workqueue before draining it

Message ID 2bdc4b63a4caea153f614c1f041f2ac3492044ed.1723011569.git.jchapman@katalix.com (mailing list archive)
State Accepted
Commit c1b2e36b8776a20a1fbdeb9d3be0637c00d671b0
Delegated to: Netdev Maintainers
Headers show
Series l2tp: misc improvements | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 29 this patch: 29
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 29 this patch: 29
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 29 this patch: 29
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 17 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-07--09-00 (tests: 707)

Commit Message

James Chapman Aug. 7, 2024, 6:54 a.m. UTC
syzbot exposes a race where a net used by l2tp is removed while an
existing pppol2tp socket is closed. In l2tp_pre_exit_net, l2tp queues
TUNNEL_DELETE work items to close each tunnel in the net. When these
are run, new SESSION_DELETE work items are queued to delete each
session in the tunnel. This all happens in drain_workqueue. However,
drain_workqueue allows only new work items if they are queued by other
work items which are already in the queue. If pppol2tp_release runs
after drain_workqueue has started, it may queue a SESSION_DELETE work
item, which results in the warning below in drain_workqueue.

Address this by flushing the workqueue before drain_workqueue such
that all queued TUNNEL_DELETE work items run before drain_workqueue is
started. This will queue SESSION_DELETE work items for each session in
the tunnel, hence pppol2tp_release or other API requests won't queue
SESSION_DELETE requests once drain_workqueue is started.

  WARNING: CPU: 1 PID: 5467 at kernel/workqueue.c:2259 __queue_work+0xcd3/0xf50 kernel/workqueue.c:2258
  Modules linked in:
  CPU: 1 UID: 0 PID: 5467 Comm: syz.3.43 Not tainted 6.11.0-rc1-syzkaller-00247-g3608d6aca5e7 #0
  Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
  RIP: 0010:__queue_work+0xcd3/0xf50 kernel/workqueue.c:2258
  Code: ff e8 11 84 36 00 90 0f 0b 90 e9 1e fd ff ff e8 03 84 36 00 eb 13 e8 fc 83 36 00 eb 0c e8 f5 83 36 00 eb 05 e8 ee 83 36 00 90 <0f> 0b 90 48 83 c4 60 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc
  RSP: 0018:ffffc90004607b48 EFLAGS: 00010093
  RAX: ffffffff815ce274 RBX: ffff8880661fda00 RCX: ffff8880661fda00
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: 0000000000000000 R08: ffffffff815cd6d4 R09: 0000000000000000
  R10: ffffc90004607c20 R11: fffff520008c0f85 R12: ffff88802ac33800
  R13: ffff88802ac339c0 R14: dffffc0000000000 R15: 0000000000000008
  FS:  00005555713eb500(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000008 CR3: 000000001eda6000 CR4: 00000000003506f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   queue_work_on+0x1c2/0x380 kernel/workqueue.c:2392
   pppol2tp_release+0x163/0x230 net/l2tp/l2tp_ppp.c:445
   __sock_release net/socket.c:659 [inline]
   sock_close+0xbc/0x240 net/socket.c:1421
   __fput+0x24a/0x8a0 fs/file_table.c:422
   task_work_run+0x24f/0x310 kernel/task_work.c:228
   resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
   exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
   syscall_exit_to_user_mode+0x168/0x370 kernel/entry/common.c:218
   do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7f061e9779f9
  Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007ffff1c1fce8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
  RAX: 0000000000000000 RBX: 000000000001017d RCX: 00007f061e9779f9
  RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
  RBP: 00007ffff1c1fdc0 R08: 0000000000000001 R09: 00007ffff1c1ffcf
  R10: 00007f061e800000 R11: 0000000000000246 R12: 0000000000000032
  R13: 00007ffff1c1fde0 R14: 00007ffff1c1fe00 R15: ffffffffffffffff
  </TASK>

Fixes: fc7ec7f554d7 ("l2tp: delete sessions using work queue")
Reported-by: syzbot+0e85b10481d2f5478053@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0e85b10481d2f5478053
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Tom Parkin <tparkin@katalix.com>
---
 net/l2tp/l2tp_core.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Cong Wang Aug. 11, 2024, 5:54 p.m. UTC | #1
On Wed, Aug 07, 2024 at 07:54:52AM +0100, James Chapman wrote:
> syzbot exposes a race where a net used by l2tp is removed while an
> existing pppol2tp socket is closed. In l2tp_pre_exit_net, l2tp queues
> TUNNEL_DELETE work items to close each tunnel in the net. When these
> are run, new SESSION_DELETE work items are queued to delete each
> session in the tunnel. This all happens in drain_workqueue. However,
> drain_workqueue allows only new work items if they are queued by other
> work items which are already in the queue. If pppol2tp_release runs
> after drain_workqueue has started, it may queue a SESSION_DELETE work
> item, which results in the warning below in drain_workqueue.
> 
> Address this by flushing the workqueue before drain_workqueue such
> that all queued TUNNEL_DELETE work items run before drain_workqueue is
> started. This will queue SESSION_DELETE work items for each session in
> the tunnel, hence pppol2tp_release or other API requests won't queue
> SESSION_DELETE requests once drain_workqueue is started.
> 

I am not convinced here.

1) There is a __flush_workqueue() inside drain_workqueue() too. Why
calling it again?

2) What prevents new work items be queued right between your
__flush_workqueue() and drain_workqueue()?

Thanks.
James Chapman Aug. 12, 2024, 9:24 a.m. UTC | #2
On 11/08/2024 18:54, Cong Wang wrote:
> On Wed, Aug 07, 2024 at 07:54:52AM +0100, James Chapman wrote:
>> syzbot exposes a race where a net used by l2tp is removed while an
>> existing pppol2tp socket is closed. In l2tp_pre_exit_net, l2tp queues
>> TUNNEL_DELETE work items to close each tunnel in the net. When these
>> are run, new SESSION_DELETE work items are queued to delete each
>> session in the tunnel. This all happens in drain_workqueue. However,
>> drain_workqueue allows only new work items if they are queued by other
>> work items which are already in the queue. If pppol2tp_release runs
>> after drain_workqueue has started, it may queue a SESSION_DELETE work
>> item, which results in the warning below in drain_workqueue.
>>
>> Address this by flushing the workqueue before drain_workqueue such
>> that all queued TUNNEL_DELETE work items run before drain_workqueue is
>> started. This will queue SESSION_DELETE work items for each session in
>> the tunnel, hence pppol2tp_release or other API requests won't queue
>> SESSION_DELETE requests once drain_workqueue is started.
>>
> 
> I am not convinced here.
> 
> 1) There is a __flush_workqueue() inside drain_workqueue() too. Why
> calling it again?

Once drain_workqueue starts, it considers any new work item queued 
outside of its processing as an unexpected condition. By doing 
__flush_workqueue first, we ensure that all TUNNEL_DELETE items are 
processed, which means that SESSION_DELETE work items are queued for all 
existing sessions before drain_workqueue starts. Now if an API request 
is processed to delete a session, l2tp_session_delete will do nothing 
because session->dead has already been set.

> 2) What prevents new work items be queued right between your
> __flush_workqueue() and drain_workqueue()?

Hmm, to do so, a new tunnel would need to be created and then deleted by 
API request once drain_workqueue starts. Is it possible to create new 
sockets in a net once a net's pre_exit starts? I didn't think so, but 
I'll recheck. Thanks!
Cong Wang Aug. 14, 2024, 6:01 p.m. UTC | #3
On Mon, Aug 12, 2024 at 10:24:11AM +0100, James Chapman wrote:
> On 11/08/2024 18:54, Cong Wang wrote:
> > On Wed, Aug 07, 2024 at 07:54:52AM +0100, James Chapman wrote:
> > > syzbot exposes a race where a net used by l2tp is removed while an
> > > existing pppol2tp socket is closed. In l2tp_pre_exit_net, l2tp queues
> > > TUNNEL_DELETE work items to close each tunnel in the net. When these
> > > are run, new SESSION_DELETE work items are queued to delete each
> > > session in the tunnel. This all happens in drain_workqueue. However,
> > > drain_workqueue allows only new work items if they are queued by other
> > > work items which are already in the queue. If pppol2tp_release runs
> > > after drain_workqueue has started, it may queue a SESSION_DELETE work
> > > item, which results in the warning below in drain_workqueue.
> > > 
> > > Address this by flushing the workqueue before drain_workqueue such
> > > that all queued TUNNEL_DELETE work items run before drain_workqueue is
> > > started. This will queue SESSION_DELETE work items for each session in
> > > the tunnel, hence pppol2tp_release or other API requests won't queue
> > > SESSION_DELETE requests once drain_workqueue is started.
> > > 
> > 
> > I am not convinced here.
> > 
> > 1) There is a __flush_workqueue() inside drain_workqueue() too. Why
> > calling it again?
> 
> Once drain_workqueue starts, it considers any new work item queued outside
> of its processing as an unexpected condition. By doing __flush_workqueue
> first, we ensure that all TUNNEL_DELETE items are processed, which means
> that SESSION_DELETE work items are queued for all existing sessions before
> drain_workqueue starts. Now if an API request is processed to delete a
> session, l2tp_session_delete will do nothing because session->dead has
> already been set.

What about the following case?

CPU 0			CPU1
test_and_set_bit()
			__flush_workqueue()
  queue_work()
  			drain_workqueue()

> 
> > 2) What prevents new work items be queued right between your
> > __flush_workqueue() and drain_workqueue()?
> 
> Hmm, to do so, a new tunnel would need to be created and then deleted by API
> request once drain_workqueue starts. Is it possible to create new sockets in
> a net once a net's pre_exit starts? I didn't think so, but I'll recheck.

I don't quite understand why you need to create a new one while tearing
down the old one(s). I think what you need here is some way to properly
waiting for the socket tear down to finish.

Thanks.
diff mbox series

Patch

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 6ef1c80bb3bf..ddcf07d85877 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1844,8 +1844,16 @@  static __net_exit void l2tp_pre_exit_net(struct net *net)
 	}
 	rcu_read_unlock_bh();
 
-	if (l2tp_wq)
+	if (l2tp_wq) {
+		/* ensure that all TUNNEL_DELETE work items are run before
+		 * draining the work queue since TUNNEL_DELETE requests may
+		 * queue SESSION_DELETE work items for each session in the
+		 * tunnel. drain_workqueue may otherwise warn if SESSION_DELETE
+		 * requests are queued while the work queue is being drained.
+		 */
+		__flush_workqueue(l2tp_wq);
 		drain_workqueue(l2tp_wq);
+	}
 }
 
 static __net_exit void l2tp_exit_net(struct net *net)