diff mbox series

[net] netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu()

Message ID 20240515132339.3346267-1-edumazet@google.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series [net] netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu() | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 925 this patch: 925
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 8 of 9 maintainers
netdev/build_clang success Errors and warnings before: 936 this patch: 936
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 936 this patch: 936
netdev/checkpatch warning WARNING: Possible repeated word: 'Google'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-17--09-00 (tests: 1035)

Commit Message

Eric Dumazet May 15, 2024, 1:23 p.m. UTC
syzbot reported that nf_reinject() could be called without rcu_read_lock() :

WARNING: suspicious RCU usage
6.9.0-rc7-syzkaller-02060-g5c1672705a1a #0 Not tainted

net/netfilter/nfnetlink_queue.c:263 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
2 locks held by syz-executor.4/13427:
  #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
  #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_do_batch kernel/rcu/tree.c:2190 [inline]
  #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_core+0xa86/0x1830 kernel/rcu/tree.c:2471
  #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
  #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: nfqnl_flush net/netfilter/nfnetlink_queue.c:405 [inline]
  #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: instance_destroy_rcu+0x30/0x220 net/netfilter/nfnetlink_queue.c:172

stack backtrace:
CPU: 0 PID: 13427 Comm: syz-executor.4 Not tainted 6.9.0-rc7-syzkaller-02060-g5c1672705a1a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Call Trace:
 <IRQ>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
  lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712
  nf_reinject net/netfilter/nfnetlink_queue.c:323 [inline]
  nfqnl_reinject+0x6ec/0x1120 net/netfilter/nfnetlink_queue.c:397
  nfqnl_flush net/netfilter/nfnetlink_queue.c:410 [inline]
  instance_destroy_rcu+0x1ae/0x220 net/netfilter/nfnetlink_queue.c:172
  rcu_do_batch kernel/rcu/tree.c:2196 [inline]
  rcu_core+0xafd/0x1830 kernel/rcu/tree.c:2471
  handle_softirqs+0x2d6/0x990 kernel/softirq.c:554
  __do_softirq kernel/softirq.c:588 [inline]
  invoke_softirq kernel/softirq.c:428 [inline]
  __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
  irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
  sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
 </IRQ>
 <TASK>

Fixes: 9872bec773c2 ("[NETFILTER]: nfnetlink: use RCU for queue instances hash")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/netfilter/nfnetlink_queue.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Florian Westphal May 15, 2024, 1:27 p.m. UTC | #1
Eric Dumazet <edumazet@google.com> wrote:
> diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
> index 00f4bd21c59b419e96794127693c21ccb05e45b0..f1c31757e4969e8f975c7a1ebbc3b96148ec9724 100644
> --- a/net/netfilter/nfnetlink_queue.c
> +++ b/net/netfilter/nfnetlink_queue.c
> @@ -169,7 +169,9 @@ instance_destroy_rcu(struct rcu_head *head)
>  	struct nfqnl_instance *inst = container_of(head, struct nfqnl_instance,
>  						   rcu);
>  
> +	rcu_read_lock();
>  	nfqnl_flush(inst, NULL, 0);
> +	rcu_read_unlock();

That works too.  I sent a different patch for the same issue yesterday:

https://patchwork.ozlabs.org/project/netfilter-devel/patch/20240514103133.2784-1-fw@strlen.de/

If you prefer Erics patch thats absolutely fine with me, I'll rebase in
that case to keep the selftest around.
Eric Dumazet May 15, 2024, 1:39 p.m. UTC | #2
On Wed, May 15, 2024 at 3:27 PM Florian Westphal <fw@strlen.de> wrote:
>
> Eric Dumazet <edumazet@google.com> wrote:
> > diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
> > index 00f4bd21c59b419e96794127693c21ccb05e45b0..f1c31757e4969e8f975c7a1ebbc3b96148ec9724 100644
> > --- a/net/netfilter/nfnetlink_queue.c
> > +++ b/net/netfilter/nfnetlink_queue.c
> > @@ -169,7 +169,9 @@ instance_destroy_rcu(struct rcu_head *head)
> >       struct nfqnl_instance *inst = container_of(head, struct nfqnl_instance,
> >                                                  rcu);
> >
> > +     rcu_read_lock();
> >       nfqnl_flush(inst, NULL, 0);
> > +     rcu_read_unlock();
>
> That works too.  I sent a different patch for the same issue yesterday:
>
> https://patchwork.ozlabs.org/project/netfilter-devel/patch/20240514103133.2784-1-fw@strlen.de/
>
> If you prefer Erics patch thats absolutely fine with me, I'll rebase in
> that case to keep the selftest around.

I missed your patch, otherwise I would have done nothing ;)

I saw the recent changes about nf_reinject() and tried to have a patch
that would be easily backported without conflicts.

Do you think the splat is caused by recent changes, or is it simply
syzbot getting smarter ?

Thanks !
Eric Dumazet May 15, 2024, 1:42 p.m. UTC | #3
On Wed, May 15, 2024 at 3:39 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, May 15, 2024 at 3:27 PM Florian Westphal <fw@strlen.de> wrote:
> >
> > Eric Dumazet <edumazet@google.com> wrote:
> > > diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
> > > index 00f4bd21c59b419e96794127693c21ccb05e45b0..f1c31757e4969e8f975c7a1ebbc3b96148ec9724 100644
> > > --- a/net/netfilter/nfnetlink_queue.c
> > > +++ b/net/netfilter/nfnetlink_queue.c
> > > @@ -169,7 +169,9 @@ instance_destroy_rcu(struct rcu_head *head)
> > >       struct nfqnl_instance *inst = container_of(head, struct nfqnl_instance,
> > >                                                  rcu);
> > >
> > > +     rcu_read_lock();
> > >       nfqnl_flush(inst, NULL, 0);
> > > +     rcu_read_unlock();
> >
> > That works too.  I sent a different patch for the same issue yesterday:
> >
> > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20240514103133.2784-1-fw@strlen.de/
> >
> > If you prefer Erics patch thats absolutely fine with me, I'll rebase in
> > that case to keep the selftest around.
>
> I missed your patch, otherwise I would have done nothing ;)
>
> I saw the recent changes about nf_reinject() and tried to have a patch
> that would be easily backported without conflicts.
>
> Do you think the splat is caused by recent changes, or is it simply
> syzbot getting smarter ?

(It took me a fair amount of time to find a Fixes: tag, this is why I am asking)
Florian Westphal May 15, 2024, 2:10 p.m. UTC | #4
Eric Dumazet <edumazet@google.com> wrote:
> > If you prefer Erics patch thats absolutely fine with me, I'll rebase in
> > that case to keep the selftest around.
> 
> I missed your patch, otherwise I would have done nothing ;)
> 
> I saw the recent changes about nf_reinject() and tried to have a patch
> that would be easily backported without conflicts.

Right, makes sense from that pov.
I think its fine to apply the patch in this case, I'll followup later.

Thus:
Acked-by: Florian Westphal <fw@strlen.de>

> Do you think the splat is caused by recent changes, or is it simply
> syzbot getting smarter ?

Its old bug, AFAICS your Fixes tag is correct.

1. Userspace prog needs to subscribe to queue x
2. iptables/nftables rule needs to send packets to queue x
3. actual packets that match that have to be sent
4. Userspace program needs to exit while at least one packet
   is queued

Amazing that syzbot managed to hit all 4 checkboxes :)
diff mbox series

Patch

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 00f4bd21c59b419e96794127693c21ccb05e45b0..f1c31757e4969e8f975c7a1ebbc3b96148ec9724 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -169,7 +169,9 @@  instance_destroy_rcu(struct rcu_head *head)
 	struct nfqnl_instance *inst = container_of(head, struct nfqnl_instance,
 						   rcu);
 
+	rcu_read_lock();
 	nfqnl_flush(inst, NULL, 0);
+	rcu_read_unlock();
 	kfree(inst);
 	module_put(THIS_MODULE);
 }