diff mbox series

bridge: Fix a deadlock when enabling multicast snooping

Message ID 20201201214047.128948-1-Joseph.Huang@garmin.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series bridge: Fix a deadlock when enabling multicast snooping | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Guessed tree name to be net-next
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 4 this patch: 4
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/verify_fixes fail Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 52 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Joseph Huang Dec. 1, 2020, 9:40 p.m. UTC
When enabling multicast snooping, bridge module deadlocks on multicast_lock
if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
network.

The deadlock was caused by the following sequence: While holding the lock,
br_multicast_open calls br_multicast_join_snoopers, which eventually causes
IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
Since the destination Ethernet address is a multicast address, br_dev_xmit
feeds the packet back to the bridge via br_multicast_rcv, which in turn
calls br_multicast_add_group, which then deadlocks on multicast_lock.

The fix is to move the call br_multicast_join_snoopers outside of the
critical section. This works since br_multicast_join_snoopers only deals
with IP and does not modify any multicast data structures of the bridge,
so there's no need to hold the lock.

Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")

Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
---
 net/bridge/br_device.c    | 3 +++
 net/bridge/br_multicast.c | 8 ++++----
 net/bridge/br_private.h   | 5 +++++
 3 files changed, 12 insertions(+), 4 deletions(-)

Comments

Jakub Kicinski Dec. 3, 2020, 6:28 p.m. UTC | #1
On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote:
> When enabling multicast snooping, bridge module deadlocks on multicast_lock
> if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
> network.
> 
> The deadlock was caused by the following sequence: While holding the lock,
> br_multicast_open calls br_multicast_join_snoopers, which eventually causes
> IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
> Since the destination Ethernet address is a multicast address, br_dev_xmit
> feeds the packet back to the bridge via br_multicast_rcv, which in turn
> calls br_multicast_add_group, which then deadlocks on multicast_lock.
> 
> The fix is to move the call br_multicast_join_snoopers outside of the
> critical section. This works since br_multicast_join_snoopers only deals
> with IP and does not modify any multicast data structures of the bridge,
> so there's no need to hold the lock.
> 
> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
> 
> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>

Nik, Linus - how does this one look?
Nikolay Aleksandrov Dec. 3, 2020, 8:46 p.m. UTC | #2
On 03/12/2020 20:28, Jakub Kicinski wrote:
> On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote:
>> When enabling multicast snooping, bridge module deadlocks on multicast_lock
>> if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
>> network.
>>
>> The deadlock was caused by the following sequence: While holding the lock,
>> br_multicast_open calls br_multicast_join_snoopers, which eventually causes
>> IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
>> Since the destination Ethernet address is a multicast address, br_dev_xmit
>> feeds the packet back to the bridge via br_multicast_rcv, which in turn
>> calls br_multicast_add_group, which then deadlocks on multicast_lock.
>>
>> The fix is to move the call br_multicast_join_snoopers outside of the
>> critical section. This works since br_multicast_join_snoopers only deals
>> with IP and does not modify any multicast data structures of the bridge,
>> so there's no need to hold the lock.
>>
>> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
>>
>> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
> 
> Nik, Linus - how does this one look?
> 

Hi,
Thanks, somehow I missed this one too. Need to check my email config. :)
I believe I see how it can happen, although it's not straight-forward to follow.
A selftest for this case would be great, and any traces (e.g. hung task) would
help a lot as well.
Correct me if I'm wrong but the sequence is something like:
br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc -> igmp6_group_added
-> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode igmp6_join_group() -> igmp6_join_group
-> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge mcast processing code
which uses the multicast_lock spinlock. Right?

One question - shouldn't leaving have the same problem? I.e. br_multicast_toggle -> br_multicast_leave_snoopers
-> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec -> igmp6_group_dropped -> igmp6_leave_group ->
MLDv1 mode && last reporter -> igmp6_send() ?

I think it was saved by the fact that !br_opt_get(br, BROPT_MULTICAST_ENABLED) would be true and the
multicast lock won't be acquired in the br_dev_xmit path? If so, I'd appreciate a comment about that
because it's not really trivial to find out. :)

Anyhow, the patch is fine as-is too:
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>

Thanks,
 Nik
Joseph Huang Dec. 3, 2020, 9:53 p.m. UTC | #3
> From: Nikolay Aleksandrov <nikolay@nvidia.com>
> Sent: Thursday, December 3, 2020 3:47 PM
> To: Jakub Kicinski <kuba@kernel.org>; Huang, Joseph
> <Joseph.Huang@garmin.com>
> Cc: Roopa Prabhu <roopa@nvidia.com>; David S. Miller
> <davem@davemloft.net>; bridge@lists.linux-foundation.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing
> <linus.luessing@c0d3.blue>
> Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast
> snooping
>
> On 03/12/2020 20:28, Jakub Kicinski wrote:
> > On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote:
> >> When enabling multicast snooping, bridge module deadlocks on
> >> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing
> >> querier on the same L2 network.
> >>
> >> The deadlock was caused by the following sequence: While holding the
> >> lock, br_multicast_open calls br_multicast_join_snoopers, which
> >> eventually causes IP stack to (attempt to) send out a Listener Report (in
> igmp6_join_group).
> >> Since the destination Ethernet address is a multicast address,
> >> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv,
> >> which in turn calls br_multicast_add_group, which then deadlocks on
> multicast_lock.
> >>
> >> The fix is to move the call br_multicast_join_snoopers outside of the
> >> critical section. This works since br_multicast_join_snoopers only
> >> deals with IP and does not modify any multicast data structures of
> >> the bridge, so there's no need to hold the lock.
> >>
> >> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
> >>
> >> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
> >
> > Nik, Linus - how does this one look?
> >
>
> Hi,
> Thanks, somehow I missed this one too. Need to check my email config. :) I
> believe I see how it can happen, although it's not straight-forward to follow.
> A selftest for this case would be great, and any traces (e.g. hung task) would
> help a lot as well.
> Correct me if I'm wrong but the sequence is something like:
> br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc ->
> igmp6_group_added
> -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode
> -> igmp6_join_group() -> igmp6_join_group
> -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge
> -> mcast processing code
> which uses the multicast_lock spinlock. Right?

That is correct.

Here's a stack trace from a typical run:

echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping
[  936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU
[  936.152534] rcu: 0-....: (5594 ticks this GP) idle=75a/1/0x4000000000000002 softirq=2787/2789 fqs=2625
[  936.162026] (t=5253 jiffies g=4205 q=12)
[  936.166041] Task dump for CPU 0:
[  936.169272] sh              R  running task        0  1315   1295 0x00000002
[  936.176332] Call trace:
[  936.178797]  dump_backtrace+0x0/0x140
[  936.182469]  show_stack+0x14/0x20
[  936.185793]  sched_show_task+0x108/0x138
[  936.189727]  dump_cpu_task+0x40/0x50
[  936.193313]  rcu_dump_cpu_stacks+0x94/0xd0
[  936.197420]  rcu_sched_clock_irq+0x75c/0x9c0
[  936.201698]  update_process_times+0x2c/0x68
[  936.205893]  tick_sched_handle.isra.0+0x30/0x50
[  936.210432]  tick_sched_timer+0x48/0x98
[  936.214272]  __hrtimer_run_queues+0x110/0x1b0
[  936.218635]  hrtimer_interrupt+0xe4/0x240
[  936.222656]  arch_timer_handler_phys+0x30/0x40
[  936.227106]  handle_percpu_devid_irq+0x80/0x140
[  936.231654]  generic_handle_irq+0x24/0x38
[  936.235669]  __handle_domain_irq+0x60/0xb8
[  936.239774]  gic_handle_irq+0x5c/0x148
[  936.243535]  el1_irq+0xb8/0x180
[  936.246689]  queued_spin_lock_slowpath+0x118/0x3b0
[  936.251495]  _raw_spin_lock+0x5c/0x68
[  936.255221]  br_multicast_add_group+0x40/0x170 [bridge]
[  936.260491]  br_multicast_rcv+0x7ac/0xe30 [bridge]
[  936.265322]  br_dev_xmit+0x140/0x368 [bridge]
[  936.269689]  dev_hard_start_xmit+0x94/0x158
[  936.273876]  __dev_queue_xmit+0x5ac/0x7f8
[  936.277890]  dev_queue_xmit+0x10/0x18
[  936.281563]  neigh_resolve_output+0xec/0x198
[  936.285845]  ip6_finish_output2+0x240/0x710
[  936.290039]  __ip6_finish_output+0x130/0x170
[  936.294318]  ip6_output+0x6c/0x1c8
[  936.297731]  NF_HOOK.constprop.0+0xd8/0xe8
[  936.301834]  igmp6_send+0x358/0x558
[  936.305326]  igmp6_join_group.part.0+0x30/0xf0
[  936.309774]  igmp6_group_added+0xfc/0x110
[  936.313787]  __ipv6_dev_mc_inc+0x1a4/0x290
[  936.317885]  ipv6_dev_mc_inc+0x10/0x18
[  936.321677]  br_multicast_open+0xbc/0x110 [bridge]
[  936.326506]  br_multicast_toggle+0xec/0x140 [bridge]


>
> One question - shouldn't leaving have the same problem? I.e.
> br_multicast_toggle -> br_multicast_leave_snoopers
> -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec ->
> -> igmp6_group_dropped -> igmp6_leave_group ->
> MLDv1 mode && last reporter -> igmp6_send() ?
>
> I think it was saved by the fact that !br_opt_get(br,
> BROPT_MULTICAST_ENABLED) would be true and the multicast lock won't be
> acquired in the br_dev_xmit path? If so, I'd appreciate a comment about that
> because it's not really trivial to find out. :)

That's a really good point. Leave should have deadlocked as well, but when I
tested the patch, I was able to turn on/off multicast snooping multiple times
without any problem.

Is it because this line in igmp6_leave_group?

if (ma->mca_flags & MAF_LAST_REPORTER)
igmp6_send(&ma->mca_addr, ma->idev->dev,
ICMPV6_MGM_REDUCTION);

Perhaps MAF_LAST_REPORTER was not set, so igmp6_send was not called?

>
> Anyhow, the patch is fine as-is too:
> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
>
> Thanks,
>  Nik

Thanks,
Joseph
Joseph Huang Dec. 3, 2020, 10:42 p.m. UTC | #4
> From: Huang, Joseph
> Sent: Thursday, December 3, 2020 4:53 PM
> To: Nikolay Aleksandrov <nikolay@nvidia.com>; Jakub Kicinski
> <kuba@kernel.org>
> Cc: Roopa Prabhu <roopa@nvidia.com>; David S. Miller
> <davem@davemloft.net>; bridge@lists.linux-foundation.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing
> <linus.luessing@c0d3.blue>
> Subject: RE: [PATCH] bridge: Fix a deadlock when enabling multicast snooping
>
> > From: Nikolay Aleksandrov <nikolay@nvidia.com>
> > Sent: Thursday, December 3, 2020 3:47 PM
> > To: Jakub Kicinski <kuba@kernel.org>; Huang, Joseph
> > <Joseph.Huang@garmin.com>
> > Cc: Roopa Prabhu <roopa@nvidia.com>; David S. Miller
> > <davem@davemloft.net>; bridge@lists.linux-foundation.org;
> > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing
> > <linus.luessing@c0d3.blue>
> > Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast
> > snooping
> >
> > On 03/12/2020 20:28, Jakub Kicinski wrote:
> > > On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote:
> > >> When enabling multicast snooping, bridge module deadlocks on
> > >> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing
> > >> querier on the same L2 network.
> > >>
> > >> The deadlock was caused by the following sequence: While holding the
> > >> lock, br_multicast_open calls br_multicast_join_snoopers, which
> > >> eventually causes IP stack to (attempt to) send out a Listener Report (in
> > igmp6_join_group).
> > >> Since the destination Ethernet address is a multicast address,
> > >> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv,
> > >> which in turn calls br_multicast_add_group, which then deadlocks on
> > multicast_lock.
> > >>
> > >> The fix is to move the call br_multicast_join_snoopers outside of the
> > >> critical section. This works since br_multicast_join_snoopers only
> > >> deals with IP and does not modify any multicast data structures of
> > >> the bridge, so there's no need to hold the lock.
> > >>
> > >> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
> > >>
> > >> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
> > >
> > > Nik, Linus - how does this one look?
> > >
> >
> > Hi,
> > Thanks, somehow I missed this one too. Need to check my email config. :) I
> > believe I see how it can happen, although it's not straight-forward to follow.
> > A selftest for this case would be great, and any traces (e.g. hung task)
> would
> > help a lot as well.
> > Correct me if I'm wrong but the sequence is something like:
> > br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc ->
> > igmp6_group_added
> > -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode
> > -> igmp6_join_group() -> igmp6_join_group
> > -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge
> > -> mcast processing code
> > which uses the multicast_lock spinlock. Right?
>
> That is correct.
>
> Here's a stack trace from a typical run:
>
> echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping
> [  936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU
> [  936.152534] rcu:   0-....: (5594 ticks this GP)
> idle=75a/1/0x4000000000000002 softirq=2787/2789 fqs=2625
> [  936.162026]        (t=5253 jiffies g=4205 q=12)
> [  936.166041] Task dump for CPU 0:
> [  936.169272] sh              R  running task        0  1315   1295 0x00000002
> [  936.176332] Call trace:
> [  936.178797]  dump_backtrace+0x0/0x140
> [  936.182469]  show_stack+0x14/0x20
> [  936.185793]  sched_show_task+0x108/0x138
> [  936.189727]  dump_cpu_task+0x40/0x50
> [  936.193313]  rcu_dump_cpu_stacks+0x94/0xd0
> [  936.197420]  rcu_sched_clock_irq+0x75c/0x9c0
> [  936.201698]  update_process_times+0x2c/0x68
> [  936.205893]  tick_sched_handle.isra.0+0x30/0x50
> [  936.210432]  tick_sched_timer+0x48/0x98
> [  936.214272]  __hrtimer_run_queues+0x110/0x1b0
> [  936.218635]  hrtimer_interrupt+0xe4/0x240
> [  936.222656]  arch_timer_handler_phys+0x30/0x40
> [  936.227106]  handle_percpu_devid_irq+0x80/0x140
> [  936.231654]  generic_handle_irq+0x24/0x38
> [  936.235669]  __handle_domain_irq+0x60/0xb8
> [  936.239774]  gic_handle_irq+0x5c/0x148
> [  936.243535]  el1_irq+0xb8/0x180
> [  936.246689]  queued_spin_lock_slowpath+0x118/0x3b0
> [  936.251495]  _raw_spin_lock+0x5c/0x68
> [  936.255221]  br_multicast_add_group+0x40/0x170 [bridge]
> [  936.260491]  br_multicast_rcv+0x7ac/0xe30 [bridge]
> [  936.265322]  br_dev_xmit+0x140/0x368 [bridge]
> [  936.269689]  dev_hard_start_xmit+0x94/0x158
> [  936.273876]  __dev_queue_xmit+0x5ac/0x7f8
> [  936.277890]  dev_queue_xmit+0x10/0x18
> [  936.281563]  neigh_resolve_output+0xec/0x198
> [  936.285845]  ip6_finish_output2+0x240/0x710
> [  936.290039]  __ip6_finish_output+0x130/0x170
> [  936.294318]  ip6_output+0x6c/0x1c8
> [  936.297731]  NF_HOOK.constprop.0+0xd8/0xe8
> [  936.301834]  igmp6_send+0x358/0x558
> [  936.305326]  igmp6_join_group.part.0+0x30/0xf0
> [  936.309774]  igmp6_group_added+0xfc/0x110
> [  936.313787]  __ipv6_dev_mc_inc+0x1a4/0x290
> [  936.317885]  ipv6_dev_mc_inc+0x10/0x18
> [  936.321677]  br_multicast_open+0xbc/0x110 [bridge]
> [  936.326506]  br_multicast_toggle+0xec/0x140 [bridge]
>
>
> >
> > One question - shouldn't leaving have the same problem? I.e.
> > br_multicast_toggle -> br_multicast_leave_snoopers
> > -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec ->
> > -> igmp6_group_dropped -> igmp6_leave_group ->
> > MLDv1 mode && last reporter -> igmp6_send() ?
> >
> > I think it was saved by the fact that !br_opt_get(br,
> > BROPT_MULTICAST_ENABLED) would be true and the multicast lock won't
> be
> > acquired in the br_dev_xmit path? If so, I'd appreciate a comment about
> that
> > because it's not really trivial to find out. :)
>
> That's a really good point. Leave should have deadlocked as well, but when I
> tested the patch, I was able to turn on/off multicast snooping multiple times
> without any problem.
>
> Is it because this line in igmp6_leave_group?
>
>               if (ma->mca_flags & MAF_LAST_REPORTER)
>                       igmp6_send(&ma->mca_addr, ma->idev->dev,
>                               ICMPV6_MGM_REDUCTION);
>
> Perhaps MAF_LAST_REPORTER was not set, so igmp6_send was not called?
>
> >
> > Anyhow, the patch is fine as-is too:
> > Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
> >
> > Thanks,
> >  Nik
>
> Thanks,
> Joseph

Would it be advisable if we move br_multicast_leave_snoopers out of the critical
section as well? Even though I can't really verify that if this is helpful since I haven't
seen it deadlock when disabling multicast snooping.

Thanks,
Joseph
Nikolay Aleksandrov Dec. 3, 2020, 11:34 p.m. UTC | #5
On 04/12/2020 00:42, Huang, Joseph wrote:
>> From: Huang, Joseph
>> Sent: Thursday, December 3, 2020 4:53 PM
>> To: Nikolay Aleksandrov <nikolay@nvidia.com>; Jakub Kicinski
>> <kuba@kernel.org>
>> Cc: Roopa Prabhu <roopa@nvidia.com>; David S. Miller
>> <davem@davemloft.net>; bridge@lists.linux-foundation.org;
>> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing
>> <linus.luessing@c0d3.blue>
>> Subject: RE: [PATCH] bridge: Fix a deadlock when enabling multicast snooping
>>
>>> From: Nikolay Aleksandrov <nikolay@nvidia.com>
>>> Sent: Thursday, December 3, 2020 3:47 PM
>>> To: Jakub Kicinski <kuba@kernel.org>; Huang, Joseph
>>> <Joseph.Huang@garmin.com>
>>> Cc: Roopa Prabhu <roopa@nvidia.com>; David S. Miller
>>> <davem@davemloft.net>; bridge@lists.linux-foundation.org;
>>> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing
>>> <linus.luessing@c0d3.blue>
>>> Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast
>>> snooping
>>>
>>> On 03/12/2020 20:28, Jakub Kicinski wrote:
>>>> On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote:
>>>>> When enabling multicast snooping, bridge module deadlocks on
>>>>> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing
>>>>> querier on the same L2 network.
>>>>>
>>>>> The deadlock was caused by the following sequence: While holding the
>>>>> lock, br_multicast_open calls br_multicast_join_snoopers, which
>>>>> eventually causes IP stack to (attempt to) send out a Listener Report (in
>>> igmp6_join_group).
>>>>> Since the destination Ethernet address is a multicast address,
>>>>> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv,
>>>>> which in turn calls br_multicast_add_group, which then deadlocks on
>>> multicast_lock.
>>>>>
>>>>> The fix is to move the call br_multicast_join_snoopers outside of the
>>>>> critical section. This works since br_multicast_join_snoopers only
>>>>> deals with IP and does not modify any multicast data structures of
>>>>> the bridge, so there's no need to hold the lock.
>>>>>
>>>>> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
>>>>>
>>>>> Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com>
>>>>
>>>> Nik, Linus - how does this one look?
>>>>
>>>
>>> Hi,
>>> Thanks, somehow I missed this one too. Need to check my email config. :) I
>>> believe I see how it can happen, although it's not straight-forward to follow.
>>> A selftest for this case would be great, and any traces (e.g. hung task)
>> would
>>> help a lot as well.
>>> Correct me if I'm wrong but the sequence is something like:
>>> br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc ->
>>> igmp6_group_added
>>> -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode
>>> -> igmp6_join_group() -> igmp6_join_group
>>> -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge
>>> -> mcast processing code
>>> which uses the multicast_lock spinlock. Right?
>>
>> That is correct.
>>
>> Here's a stack trace from a typical run:
>>
>> echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping
>> [  936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU
>> [  936.152534] rcu:   0-....: (5594 ticks this GP)
>> idle=75a/1/0x4000000000000002 softirq=2787/2789 fqs=2625
>> [  936.162026]        (t=5253 jiffies g=4205 q=12)
>> [  936.166041] Task dump for CPU 0:
>> [  936.169272] sh              R  running task        0  1315   1295 0x00000002
>> [  936.176332] Call trace:
>> [  936.178797]  dump_backtrace+0x0/0x140
>> [  936.182469]  show_stack+0x14/0x20
>> [  936.185793]  sched_show_task+0x108/0x138
>> [  936.189727]  dump_cpu_task+0x40/0x50
>> [  936.193313]  rcu_dump_cpu_stacks+0x94/0xd0
>> [  936.197420]  rcu_sched_clock_irq+0x75c/0x9c0
>> [  936.201698]  update_process_times+0x2c/0x68
>> [  936.205893]  tick_sched_handle.isra.0+0x30/0x50
>> [  936.210432]  tick_sched_timer+0x48/0x98
>> [  936.214272]  __hrtimer_run_queues+0x110/0x1b0
>> [  936.218635]  hrtimer_interrupt+0xe4/0x240
>> [  936.222656]  arch_timer_handler_phys+0x30/0x40
>> [  936.227106]  handle_percpu_devid_irq+0x80/0x140
>> [  936.231654]  generic_handle_irq+0x24/0x38
>> [  936.235669]  __handle_domain_irq+0x60/0xb8
>> [  936.239774]  gic_handle_irq+0x5c/0x148
>> [  936.243535]  el1_irq+0xb8/0x180
>> [  936.246689]  queued_spin_lock_slowpath+0x118/0x3b0
>> [  936.251495]  _raw_spin_lock+0x5c/0x68
>> [  936.255221]  br_multicast_add_group+0x40/0x170 [bridge]
>> [  936.260491]  br_multicast_rcv+0x7ac/0xe30 [bridge]
>> [  936.265322]  br_dev_xmit+0x140/0x368 [bridge]
>> [  936.269689]  dev_hard_start_xmit+0x94/0x158
>> [  936.273876]  __dev_queue_xmit+0x5ac/0x7f8
>> [  936.277890]  dev_queue_xmit+0x10/0x18
>> [  936.281563]  neigh_resolve_output+0xec/0x198
>> [  936.285845]  ip6_finish_output2+0x240/0x710
>> [  936.290039]  __ip6_finish_output+0x130/0x170
>> [  936.294318]  ip6_output+0x6c/0x1c8
>> [  936.297731]  NF_HOOK.constprop.0+0xd8/0xe8
>> [  936.301834]  igmp6_send+0x358/0x558
>> [  936.305326]  igmp6_join_group.part.0+0x30/0xf0
>> [  936.309774]  igmp6_group_added+0xfc/0x110
>> [  936.313787]  __ipv6_dev_mc_inc+0x1a4/0x290
>> [  936.317885]  ipv6_dev_mc_inc+0x10/0x18
>> [  936.321677]  br_multicast_open+0xbc/0x110 [bridge]
>> [  936.326506]  br_multicast_toggle+0xec/0x140 [bridge]
>>
>>
>>>
>>> One question - shouldn't leaving have the same problem? I.e.
>>> br_multicast_toggle -> br_multicast_leave_snoopers
>>> -> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec ->
>>> -> igmp6_group_dropped -> igmp6_leave_group ->
>>> MLDv1 mode && last reporter -> igmp6_send() ?
>>>
>>> I think it was saved by the fact that !br_opt_get(br,
>>> BROPT_MULTICAST_ENABLED) would be true and the multicast lock won't
>> be
>>> acquired in the br_dev_xmit path? If so, I'd appreciate a comment about
>> that
>>> because it's not really trivial to find out. :)
>>
>> That's a really good point. Leave should have deadlocked as well, but when I
>> tested the patch, I was able to turn on/off multicast snooping multiple times
>> without any problem.
>>
>> Is it because this line in igmp6_leave_group?
>>
>>               if (ma->mca_flags & MAF_LAST_REPORTER)
>>                       igmp6_send(&ma->mca_addr, ma->idev->dev,
>>                               ICMPV6_MGM_REDUCTION);
>>
>> Perhaps MAF_LAST_REPORTER was not set, so igmp6_send was not called?
>>
>>>
>>> Anyhow, the patch is fine as-is too:
>>> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
>>>
>>> Thanks,
>>>  Nik
>>
>> Thanks,
>> Joseph
> 
> Would it be advisable if we move br_multicast_leave_snoopers out of the critical
> section as well? Even though I can't really verify that if this is helpful since I haven't
> seen it deadlock when disabling multicast snooping.
> 
> Thanks,
> Joseph
> 
 The reason we're not seeing a deadlock is because the multicast snooping bit
is not set and the lock is never acquired when sending the packet through br_multicast_rcv().
Anyway I'd move it out for symmetry and it would be less confusing as you'd just have a standard
if/else in the end. By the way now that I looked again the patch is not entirely correct as you'll
do multiple joins/leaves on every br_multicast_toggle(), i.e. with the patch you bypass the check
for the same value and actually try to change state each time (the
if (!!br_opt_get(br, BROPT_MULTICAST_ENABLED) == !!val) check).

We also lose the symmetry between br_dev_open and stop, and expose otherwise
private multicast code, so I'd pull out the snoopers leave for br_dev_stop() as well.

Please add a comment why it's needed, so we won't wonder about it later. And also include the
trace in the commit message so we'd have it.

Edit: I just tried it and it's very easy to reproduce, steps:
1. sysctl net.ipv6.conf.all.force_mld_version=1
2. have another querier
3. ip link set dev bridge type bridge mcast_snooping 0 && ip link set dev bridge type bridge mcast_snooping 1
< deadlock >

Thanks,
 Nik
diff mbox series

Patch

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 7730c8f3cb53..8d3ec29d3875 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -177,6 +177,9 @@  static int br_dev_open(struct net_device *dev)
 	br_stp_enable_bridge(br);
 	br_multicast_open(br);
 
+	if (br_opt_get(br, BROPT_MULTICAST_ENABLED))
+		br_multicast_join_snoopers(br);
+
 	return 0;
 }
 
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index eae898c3cff7..6c8d6b2eed91 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -3286,7 +3286,7 @@  static inline void br_ip6_multicast_join_snoopers(struct net_bridge *br)
 }
 #endif
 
-static void br_multicast_join_snoopers(struct net_bridge *br)
+void br_multicast_join_snoopers(struct net_bridge *br)
 {
 	br_ip4_multicast_join_snoopers(br);
 	br_ip6_multicast_join_snoopers(br);
@@ -3336,9 +3336,6 @@  static void __br_multicast_open(struct net_bridge *br,
 
 void br_multicast_open(struct net_bridge *br)
 {
-	if (br_opt_get(br, BROPT_MULTICAST_ENABLED))
-		br_multicast_join_snoopers(br);
-
 	__br_multicast_open(br, &br->ip4_own_query);
 #if IS_ENABLED(CONFIG_IPV6)
 	__br_multicast_open(br, &br->ip6_own_query);
@@ -3509,6 +3506,9 @@  int br_multicast_toggle(struct net_bridge *br, unsigned long val)
 unlock:
 	spin_unlock_bh(&br->multicast_lock);
 
+	if (br_opt_get(br, BROPT_MULTICAST_ENABLED))
+		br_multicast_join_snoopers(br);
+
 	return 0;
 }
 
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 345118e35c42..7085dd747130 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -792,6 +792,7 @@  void br_multicast_del_port(struct net_bridge_port *port);
 void br_multicast_enable_port(struct net_bridge_port *port);
 void br_multicast_disable_port(struct net_bridge_port *port);
 void br_multicast_init(struct net_bridge *br);
+void br_multicast_join_snoopers(struct net_bridge *br);
 void br_multicast_open(struct net_bridge *br);
 void br_multicast_stop(struct net_bridge *br);
 void br_multicast_dev_del(struct net_bridge *br);
@@ -969,6 +970,10 @@  static inline void br_multicast_init(struct net_bridge *br)
 {
 }
 
+static inline void br_multicast_join_snoopers(struct net_bridge *br)
+{
+}
+
 static inline void br_multicast_open(struct net_bridge *br)
 {
 }