diff mbox series

[net] net: fix stack overflow when LRO is disabled for virtual interfaces

Message ID 20230515053740.3065735-1-ap420073@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] net: fix stack overflow when LRO is disabled for virtual interfaces | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net, async
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4193 this patch: 4193
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 921 this patch: 921
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4414 this patch: 4414
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 39 lines checked
netdev/kdoc fail Errors and warnings before: 0 this patch: 1
netdev/source_inline success Was 0 now: 0

Commit Message

Taehee Yoo May 15, 2023, 5:37 a.m. UTC
When the virtual interface's feature is updated, it synchronizes the
updated feature for its own lower interface.
This propagation logic should be worked as the iteration, not recursively.
But it works recursively due to the netdev notification unexpectedly.
This problem occurs when it disables LRO only for the team and bonding
interface type.

       team0
         |
  +------+------+-----+-----+
  |      |      |     |     |
team1  team2  team3  ...  team200

If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
event to its own lower interfaces(team1 ~ team200).
It is worked by netdev_sync_lower_features().
So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
work iteratively.
But generated NETDEV_FEAT_CHANGE event is also sent to the upper
interface too.
upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own
lower interfaces again.
lower and upper interfaces receive this event and generate this
event again and again.
So, the stack overflow occurs.

But it is not the infinite loop issue.
Because the netdev_sync_lower_features() updates features before
generating the NETDEV_FEAT_CHANGE event.
Already synchronized lower interfaces skip notification logic.
So, it is just the problem that iteration logic is changed to the
recursive unexpectedly due to the notification mechanism.

Reproducer:

ip link add team0 type team
ethtool -K team0 lro on
for i in {1..200}
do
        ip link add team$i master team0 type team
        ethtool -K team$i lro on
done

ethtool -K team0 lro off

In order to fix it, the priv_notifier_ctx net_device member is introduced.
This variable can be used by each interface in its own way in the
notification context. The bonding and team interface is going to use it
to avoid duplicated NETDEV_FEAT_CHANGE event handling.

Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
 drivers/net/bonding/bond_main.c | 6 +++++-
 drivers/net/team/team.c         | 6 +++++-
 include/linux/netdevice.h       | 1 +
 net/core/dev.c                  | 2 ++
 4 files changed, 13 insertions(+), 2 deletions(-)

Comments

Nikolay Aleksandrov May 15, 2023, 6:24 a.m. UTC | #1
On 15/05/2023 08:37, Taehee Yoo wrote:
> When the virtual interface's feature is updated, it synchronizes the
> updated feature for its own lower interface.
> This propagation logic should be worked as the iteration, not recursively.
> But it works recursively due to the netdev notification unexpectedly.
> This problem occurs when it disables LRO only for the team and bonding
> interface type.
> 
>        team0
>          |
>   +------+------+-----+-----+
>   |      |      |     |     |
> team1  team2  team3  ...  team200
> 
> If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
> event to its own lower interfaces(team1 ~ team200).
> It is worked by netdev_sync_lower_features().
> So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
> work iteratively.
> But generated NETDEV_FEAT_CHANGE event is also sent to the upper
> interface too.
> upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own
> lower interfaces again.
> lower and upper interfaces receive this event and generate this
> event again and again.
> So, the stack overflow occurs.
> 
> But it is not the infinite loop issue.
> Because the netdev_sync_lower_features() updates features before
> generating the NETDEV_FEAT_CHANGE event.
> Already synchronized lower interfaces skip notification logic.
> So, it is just the problem that iteration logic is changed to the
> recursive unexpectedly due to the notification mechanism.
> 
> Reproducer:
> 
> ip link add team0 type team
> ethtool -K team0 lro on
> for i in {1..200}
> do
>         ip link add team$i master team0 type team
>         ethtool -K team$i lro on
> done
> 
> ethtool -K team0 lro off
> 
> In order to fix it, the priv_notifier_ctx net_device member is introduced.
> This variable can be used by each interface in its own way in the
> notification context. The bonding and team interface is going to use it
> to avoid duplicated NETDEV_FEAT_CHANGE event handling.
> 
> Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
> Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack")
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
>  drivers/net/bonding/bond_main.c | 6 +++++-
>  drivers/net/team/team.c         | 6 +++++-
>  include/linux/netdevice.h       | 1 +
>  net/core/dev.c                  | 2 ++
>  4 files changed, 13 insertions(+), 2 deletions(-)
> 

Since you're syncing to lower devices, can't you check if the event source device
is lower to the current one (i.e. reverse propagation has happened) in the affected
drivers ? Adding a new struct netdevice member just for this seems unnecessary to me.
Especially for a setup like a bond of bonds or a team of teams, these are corner case
setups that shouldn't exist in general. :)

Cheers,
 Nik
Taehee Yoo May 15, 2023, 9:12 a.m. UTC | #2
On 5/15/23 15:24, Nikolay Aleksandrov wrote:

Hi Nikolay,
Thank you so much for the review!

 > On 15/05/2023 08:37, Taehee Yoo wrote:
 >> When the virtual interface's feature is updated, it synchronizes the
 >> updated feature for its own lower interface.
 >> This propagation logic should be worked as the iteration, not 
recursively.
 >> But it works recursively due to the netdev notification unexpectedly.
 >> This problem occurs when it disables LRO only for the team and bonding
 >> interface type.
 >>
 >>         team0
 >>           |
 >>    +------+------+-----+-----+
 >>    |      |      |     |     |
 >> team1  team2  team3  ...  team200
 >>
 >> If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
 >> event to its own lower interfaces(team1 ~ team200).
 >> It is worked by netdev_sync_lower_features().
 >> So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
 >> work iteratively.
 >> But generated NETDEV_FEAT_CHANGE event is also sent to the upper
 >> interface too.
 >> upper interface(team0) generates the NETDEV_FEAT_CHANGE event for 
its own
 >> lower interfaces again.
 >> lower and upper interfaces receive this event and generate this
 >> event again and again.
 >> So, the stack overflow occurs.
 >>
 >> But it is not the infinite loop issue.
 >> Because the netdev_sync_lower_features() updates features before
 >> generating the NETDEV_FEAT_CHANGE event.
 >> Already synchronized lower interfaces skip notification logic.
 >> So, it is just the problem that iteration logic is changed to the
 >> recursive unexpectedly due to the notification mechanism.
 >>
 >> Reproducer:
 >>
 >> ip link add team0 type team
 >> ethtool -K team0 lro on
 >> for i in {1..200}
 >> do
 >>          ip link add team$i master team0 type team
 >>          ethtool -K team$i lro on
 >> done
 >>
 >> ethtool -K team0 lro off
 >>
 >> In order to fix it, the priv_notifier_ctx net_device member is 
introduced.
 >> This variable can be used by each interface in its own way in the
 >> notification context. The bonding and team interface is going to use it
 >> to avoid duplicated NETDEV_FEAT_CHANGE event handling.
 >>
 >> Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
 >> Fixes: fd867d51f889 ("net/core: generic support for disabling netdev 
features down stack")
 >> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
 >> ---
 >>   drivers/net/bonding/bond_main.c | 6 +++++-
 >>   drivers/net/team/team.c         | 6 +++++-
 >>   include/linux/netdevice.h       | 1 +
 >>   net/core/dev.c                  | 2 ++
 >>   4 files changed, 13 insertions(+), 2 deletions(-)
 >>
 >
 > Since you're syncing to lower devices, can't you check if the event 
source device
 > is lower to the current one (i.e. reverse propagation has happened) 
in the affected
 > drivers ? Adding a new struct netdevice member just for this seems 
unnecessary to me.
 > Especially for a setup like a bond of bonds or a team of teams, these 
are corner case
 > setups that shouldn't exist in general. :)
 >

I agree that this new variable is unnecessary right now.
I tried to avoid introducing new variables, but unfortunately, I 
couldn't find a solution to detect duplicated notification events.

The reason why I introduced the new member of the net_device is that I 
thought there might be similar problems in the future such as mtu.
so, I hoped that it can be used as a general variable to avoid similar 
problems.
But I really agree that this new variable is over-spec.
So, adding a new boolean variable into the struct bonding and team, not 
net_device would be reasonable if I can't find a proper solution.

Yes, the above interface graph is not a real-world case.
The purpose of the above is just to trigger stack overflow problems for 
anyone with just copy-and-paste to make it easy for testing.
It can't reproduce this problem with LRO non-support virtual interfaces 
such as dummy, VLAN, and others.
we can reproduce this problem with a team and bonding interface, so I 
used team over team as a reproducer.

I will send a v2 patch after trying to find better solution for days, 
which would not introduce the new member of net_device.
If I can't find it, v2 would introduce a new member into struct bonding 
and struct team.
Of course, any ideas are welcome!

Thank you so much!
Taehee Yoo

 > Cheers,
 >   Nik
 >
Simon Horman May 15, 2023, 1:11 p.m. UTC | #3
On Mon, May 15, 2023 at 05:37:40AM +0000, Taehee Yoo wrote:
> When the virtual interface's feature is updated, it synchronizes the
> updated feature for its own lower interface.
> This propagation logic should be worked as the iteration, not recursively.
> But it works recursively due to the netdev notification unexpectedly.
> This problem occurs when it disables LRO only for the team and bonding
> interface type.
> 
>        team0
>          |
>   +------+------+-----+-----+
>   |      |      |     |     |
> team1  team2  team3  ...  team200
> 
> If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
> event to its own lower interfaces(team1 ~ team200).
> It is worked by netdev_sync_lower_features().
> So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
> work iteratively.
> But generated NETDEV_FEAT_CHANGE event is also sent to the upper
> interface too.
> upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own
> lower interfaces again.
> lower and upper interfaces receive this event and generate this
> event again and again.
> So, the stack overflow occurs.
> 
> But it is not the infinite loop issue.
> Because the netdev_sync_lower_features() updates features before
> generating the NETDEV_FEAT_CHANGE event.
> Already synchronized lower interfaces skip notification logic.
> So, it is just the problem that iteration logic is changed to the
> recursive unexpectedly due to the notification mechanism.
> 
> Reproducer:
> 
> ip link add team0 type team
> ethtool -K team0 lro on
> for i in {1..200}
> do
>         ip link add team$i master team0 type team
>         ethtool -K team$i lro on
> done
> 
> ethtool -K team0 lro off
> 
> In order to fix it, the priv_notifier_ctx net_device member is introduced.
> This variable can be used by each interface in its own way in the
> notification context. The bonding and team interface is going to use it
> to avoid duplicated NETDEV_FEAT_CHANGE event handling.
> 
> Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
> Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack")
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>

...

> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 08fbd4622ccf..ebd49a54f0d5 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2393,6 +2393,7 @@ struct net_device {
>  	unsigned		threaded:1;
>  
>  	struct list_head	net_notifier_list;
> +	u32			priv_notifier_ctx;

Hi Taehee,

Please add this new field to the kdoc for struct net_device.

>  
>  #if IS_ENABLED(CONFIG_MACSEC)
>  	/* MACsec management functions */

...

---
pw-bot: cr
Taehee Yoo May 15, 2023, 4:21 p.m. UTC | #4
On 5/15/23 22:11, Simon Horman wrote:
Hi Simon,
Thank you so much for the review!

 > On Mon, May 15, 2023 at 05:37:40AM +0000, Taehee Yoo wrote:
 >> When the virtual interface's feature is updated, it synchronizes the
 >> updated feature for its own lower interface.
 >> This propagation logic should be worked as the iteration, not 
recursively.
 >> But it works recursively due to the netdev notification unexpectedly.
 >> This problem occurs when it disables LRO only for the team and bonding
 >> interface type.
 >>
 >>         team0
 >>           |
 >>    +------+------+-----+-----+
 >>    |      |      |     |     |
 >> team1  team2  team3  ...  team200
 >>
 >> If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
 >> event to its own lower interfaces(team1 ~ team200).
 >> It is worked by netdev_sync_lower_features().
 >> So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
 >> work iteratively.
 >> But generated NETDEV_FEAT_CHANGE event is also sent to the upper
 >> interface too.
 >> upper interface(team0) generates the NETDEV_FEAT_CHANGE event for 
its own
 >> lower interfaces again.
 >> lower and upper interfaces receive this event and generate this
 >> event again and again.
 >> So, the stack overflow occurs.
 >>
 >> But it is not the infinite loop issue.
 >> Because the netdev_sync_lower_features() updates features before
 >> generating the NETDEV_FEAT_CHANGE event.
 >> Already synchronized lower interfaces skip notification logic.
 >> So, it is just the problem that iteration logic is changed to the
 >> recursive unexpectedly due to the notification mechanism.
 >>
 >> Reproducer:
 >>
 >> ip link add team0 type team
 >> ethtool -K team0 lro on
 >> for i in {1..200}
 >> do
 >>          ip link add team$i master team0 type team
 >>          ethtool -K team$i lro on
 >> done
 >>
 >> ethtool -K team0 lro off
 >>
 >> In order to fix it, the priv_notifier_ctx net_device member is 
introduced.
 >> This variable can be used by each interface in its own way in the
 >> notification context. The bonding and team interface is going to use it
 >> to avoid duplicated NETDEV_FEAT_CHANGE event handling.
 >>
 >> Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
 >> Fixes: fd867d51f889 ("net/core: generic support for disabling netdev 
features down stack")
 >> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
 >
 > ...
 >
 >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
 >> index 08fbd4622ccf..ebd49a54f0d5 100644
 >> --- a/include/linux/netdevice.h
 >> +++ b/include/linux/netdevice.h
 >> @@ -2393,6 +2393,7 @@ struct net_device {
 >>   	unsigned		threaded:1;
 >>
 >>   	struct list_head	net_notifier_list;
 >> +	u32			priv_notifier_ctx;
 >
 > Hi Taehee,
 >
 > Please add this new field to the kdoc for struct net_device.
 >

Thanks! I will check this before submitting the v2 patch.

Thank you so much,
Taehee Yoo


 >>
 >>   #if IS_ENABLED(CONFIG_MACSEC)
 >>   	/* MACsec management functions */
 >
 > ...
 >
 > ---
 > pw-bot: cr
Paolo Abeni May 16, 2023, 8:34 a.m. UTC | #5
On Mon, 2023-05-15 at 18:12 +0900, Taehee Yoo wrote:
> On 5/15/23 15:24, Nikolay Aleksandrov wrote:
>  > On 15/05/2023 08:37, Taehee Yoo wrote:
>  >> When the virtual interface's feature is updated, it synchronizes the
>  >> updated feature for its own lower interface.
>  >> This propagation logic should be worked as the iteration, not 
> recursively.
>  >> But it works recursively due to the netdev notification unexpectedly.
>  >> This problem occurs when it disables LRO only for the team and bonding
>  >> interface type.
>  >>
>  >>         team0
>  >>           |
>  >>    +------+------+-----+-----+
>  >>    |      |      |     |     |
>  >> team1  team2  team3  ...  team200
>  >>
>  >> If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
>  >> event to its own lower interfaces(team1 ~ team200).
>  >> It is worked by netdev_sync_lower_features().
>  >> So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
>  >> work iteratively.
>  >> But generated NETDEV_FEAT_CHANGE event is also sent to the upper
>  >> interface too.
>  >> upper interface(team0) generates the NETDEV_FEAT_CHANGE event for 
> its own
>  >> lower interfaces again.
>  >> lower and upper interfaces receive this event and generate this
>  >> event again and again.
>  >> So, the stack overflow occurs.
>  >>
>  >> But it is not the infinite loop issue.
>  >> Because the netdev_sync_lower_features() updates features before
>  >> generating the NETDEV_FEAT_CHANGE event.
>  >> Already synchronized lower interfaces skip notification logic.
>  >> So, it is just the problem that iteration logic is changed to the
>  >> recursive unexpectedly due to the notification mechanism.
>  >>
>  >> Reproducer:
>  >>
>  >> ip link add team0 type team
>  >> ethtool -K team0 lro on
>  >> for i in {1..200}
>  >> do
>  >>          ip link add team$i master team0 type team
>  >>          ethtool -K team$i lro on
>  >> done
>  >>
>  >> ethtool -K team0 lro off
>  >>
>  >> In order to fix it, the priv_notifier_ctx net_device member is 
> introduced.
>  >> This variable can be used by each interface in its own way in the
>  >> notification context. The bonding and team interface is going to use it
>  >> to avoid duplicated NETDEV_FEAT_CHANGE event handling.
>  >>
>  >> Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
>  >> Fixes: fd867d51f889 ("net/core: generic support for disabling netdev 
> features down stack")
>  >> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
>  >> ---
>  >>   drivers/net/bonding/bond_main.c | 6 +++++-
>  >>   drivers/net/team/team.c         | 6 +++++-
>  >>   include/linux/netdevice.h       | 1 +
>  >>   net/core/dev.c                  | 2 ++
>  >>   4 files changed, 13 insertions(+), 2 deletions(-)
>  >>
>  >
>  > Since you're syncing to lower devices, can't you check if the event 
> source device
>  > is lower to the current one (i.e. reverse propagation has happened) 
> in the affected
>  > drivers ? Adding a new struct netdevice member just for this seems 
> unnecessary to me.
>  > Especially for a setup like a bond of bonds or a team of teams, these 
> are corner case
>  > setups that shouldn't exist in general. :)
>  >
> 
> I agree that this new variable is unnecessary right now.
> I tried to avoid introducing new variables, but unfortunately, I 
> couldn't find a solution to detect duplicated notification events.
> 
> The reason why I introduced the new member of the net_device is that I 
> thought there might be similar problems in the future such as mtu.
> so, I hoped that it can be used as a general variable to avoid similar 
> problems.
> But I really agree that this new variable is over-spec.
> So, adding a new boolean variable into the struct bonding and team, not 
> net_device would be reasonable if I can't find a proper solution.

I think adding a bool variable to bonding/team priv would be better, as
it looks like the issues is specific to such kind of devices.

Thanks!

Paolo
Taehee Yoo May 16, 2023, 11:29 a.m. UTC | #6
On 5/16/23 17:34, Paolo Abeni wrote:

Hi Paolo,
Thank you so much for the review!

 > On Mon, 2023-05-15 at 18:12 +0900, Taehee Yoo wrote:
 >> On 5/15/23 15:24, Nikolay Aleksandrov wrote:
 >>   > On 15/05/2023 08:37, Taehee Yoo wrote:
 >>   >> When the virtual interface's feature is updated, it 
synchronizes the
 >>   >> updated feature for its own lower interface.
 >>   >> This propagation logic should be worked as the iteration, not
 >> recursively.
 >>   >> But it works recursively due to the netdev notification 
unexpectedly.
 >>   >> This problem occurs when it disables LRO only for the team and 
bonding
 >>   >> interface type.
 >>   >>
 >>   >>         team0
 >>   >>           |
 >>   >>    +------+------+-----+-----+
 >>   >>    |      |      |     |     |
 >>   >> team1  team2  team3  ...  team200
 >>   >>
 >>   >> If team0's LRO feature is updated, it generates the 
NETDEV_FEAT_CHANGE
 >>   >> event to its own lower interfaces(team1 ~ team200).
 >>   >> It is worked by netdev_sync_lower_features().
 >>   >> So, the NETDEV_FEAT_CHANGE notification logic of each lower 
interface
 >>   >> work iteratively.
 >>   >> But generated NETDEV_FEAT_CHANGE event is also sent to the upper
 >>   >> interface too.
 >>   >> upper interface(team0) generates the NETDEV_FEAT_CHANGE event for
 >> its own
 >>   >> lower interfaces again.
 >>   >> lower and upper interfaces receive this event and generate this
 >>   >> event again and again.
 >>   >> So, the stack overflow occurs.
 >>   >>
 >>   >> But it is not the infinite loop issue.
 >>   >> Because the netdev_sync_lower_features() updates features before
 >>   >> generating the NETDEV_FEAT_CHANGE event.
 >>   >> Already synchronized lower interfaces skip notification logic.
 >>   >> So, it is just the problem that iteration logic is changed to the
 >>   >> recursive unexpectedly due to the notification mechanism.
 >>   >>
 >>   >> Reproducer:
 >>   >>
 >>   >> ip link add team0 type team
 >>   >> ethtool -K team0 lro on
 >>   >> for i in {1..200}
 >>   >> do
 >>   >>          ip link add team$i master team0 type team
 >>   >>          ethtool -K team$i lro on
 >>   >> done
 >>   >>
 >>   >> ethtool -K team0 lro off
 >>   >>
 >>   >> In order to fix it, the priv_notifier_ctx net_device member is
 >> introduced.
 >>   >> This variable can be used by each interface in its own way in the
 >>   >> notification context. The bonding and team interface is going 
to use it
 >>   >> to avoid duplicated NETDEV_FEAT_CHANGE event handling.
 >>   >>
 >>   >> Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
 >>   >> Fixes: fd867d51f889 ("net/core: generic support for disabling 
netdev
 >> features down stack")
 >>   >> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
 >>   >> ---
 >>   >>   drivers/net/bonding/bond_main.c | 6 +++++-
 >>   >>   drivers/net/team/team.c         | 6 +++++-
 >>   >>   include/linux/netdevice.h       | 1 +
 >>   >>   net/core/dev.c                  | 2 ++
 >>   >>   4 files changed, 13 insertions(+), 2 deletions(-)
 >>   >>
 >>   >
 >>   > Since you're syncing to lower devices, can't you check if the event
 >> source device
 >>   > is lower to the current one (i.e. reverse propagation has happened)
 >> in the affected
 >>   > drivers ? Adding a new struct netdevice member just for this seems
 >> unnecessary to me.
 >>   > Especially for a setup like a bond of bonds or a team of teams, 
these
 >> are corner case
 >>   > setups that shouldn't exist in general. :)
 >>   >
 >>
 >> I agree that this new variable is unnecessary right now.
 >> I tried to avoid introducing new variables, but unfortunately, I
 >> couldn't find a solution to detect duplicated notification events.
 >>
 >> The reason why I introduced the new member of the net_device is that I
 >> thought there might be similar problems in the future such as mtu.
 >> so, I hoped that it can be used as a general variable to avoid similar
 >> problems.
 >> But I really agree that this new variable is over-spec.
 >> So, adding a new boolean variable into the struct bonding and team, not
 >> net_device would be reasonable if I can't find a proper solution.
 >
 > I think adding a bool variable to bonding/team priv would be better, as
 > it looks like the issues is specific to such kind of devices.
 >

Thanks, I will add a bool variable to the bonding and team struct in the v2.

Thank you so much!
Taehee Yoo

 > Thanks!
 >
 > Paolo
 >
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3fed888629f7..dc6d5172475c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3947,7 +3947,11 @@  static int bond_slave_netdev_event(unsigned long event,
 		unblock_netpoll_tx();
 		break;
 	case NETDEV_FEAT_CHANGE:
-		bond_compute_features(bond);
+		if (!bond_dev->priv_notifier_ctx) {
+			bond_dev->priv_notifier_ctx = NETDEV_FEAT_CHANGE;
+			bond_compute_features(bond);
+			bond_dev->priv_notifier_ctx = 0;
+		}
 		break;
 	case NETDEV_RESEND_IGMP:
 		/* Propagate to master device */
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index d10606f257c4..c7af29d3cda0 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -3022,7 +3022,11 @@  static int team_device_event(struct notifier_block *unused,
 		team_del_slave(port->team->dev, dev);
 		break;
 	case NETDEV_FEAT_CHANGE:
-		team_compute_features(port->team);
+		if (!port->team->dev->priv_notifier_ctx) {
+			port->team->dev->priv_notifier_ctx = NETDEV_FEAT_CHANGE;
+			team_compute_features(port->team);
+			port->team->dev->priv_notifier_ctx = 0;
+		}
 		break;
 	case NETDEV_PRECHANGEMTU:
 		/* Forbid to change mtu of underlaying device */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 08fbd4622ccf..ebd49a54f0d5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2393,6 +2393,7 @@  struct net_device {
 	unsigned		threaded:1;
 
 	struct list_head	net_notifier_list;
+	u32			priv_notifier_ctx;
 
 #if IS_ENABLED(CONFIG_MACSEC)
 	/* MACsec management functions */
diff --git a/net/core/dev.c b/net/core/dev.c
index b3c13e041935..cc6b5f626054 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10677,6 +10677,8 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
 	setup(dev);
 
+	dev->priv_notifier_ctx = 0;
+
 	if (!dev->tx_queue_len) {
 		dev->priv_flags |= IFF_NO_QUEUE;
 		dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;