diff mbox series

[net,v2] bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves

Message ID 20230802114320.4156068-1-william.xuanziyang@huawei.com (mailing list archive)
State Accepted
Commit 01f4fd27087078c90a0e22860d1dfa2cd0510791
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1328 this patch: 1328
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 1351 this patch: 1351
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1351 this patch: 1351
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Ziyang Xuan (William) Aug. 2, 2023, 11:43 a.m. UTC
BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with
following testcase:

  # ip netns add ns1
  # ip netns exec ns1 ip link add bond0 type bond mode 0
  # ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
  # ip netns exec ns1 ip link set bond_slave_1 master bond0
  # ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad
  # ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad
  # ip netns exec ns1 ip link set bond_slave_1 nomaster
  # ip netns del ns1

The logical analysis of the problem is as follows:

1. create ETH_P_8021AD protocol vlan10 for bond_slave_1:
register_vlan_dev()
  vlan_vid_add()
    vlan_info_alloc()
    __vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1

2. create ETH_P_8021AD protocol bond0_vlan10 for bond0:
register_vlan_dev()
  vlan_vid_add()
    __vlan_vid_add()
      vlan_add_rx_filter_info()
          if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER
              return 0;

          if (netif_device_present(dev))
              return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called
              // The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid.

3. detach bond_slave_1 from bond0:
__bond_release_one()
  vlan_vids_del_by_dev()
    list_for_each_entry(vid_info, &vlan_info->vid_list, list)
        vlan_vid_del(dev, vid_info->proto, vid_info->vid);
        // bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted.
        // bond_slave_1->vlan_info will be assigned NULL.

4. delete vlan10 during delete ns1:
default_device_exit_batch()
  dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10
    vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1
	BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!!

Add S-VLAN tag related features support to bond driver. So the bond driver
will always propagate the VLAN info to its slaves.

Fixes: 8ad227ff89a7 ("net: vlan: add 802.1ad support")
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
---
v2:
  - Do not add vlan_hw_filter_capable() check in vlan_vids_del_by_dev().
  - Add S-VLAN tag related features support to bond driver to fix the bug.
---
 drivers/net/bonding/bond_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Ido Schimmel Aug. 4, 2023, 12:40 p.m. UTC | #1
On Wed, Aug 02, 2023 at 07:43:20PM +0800, Ziyang Xuan wrote:
> BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with
> following testcase:
> 
>   # ip netns add ns1
>   # ip netns exec ns1 ip link add bond0 type bond mode 0
>   # ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
>   # ip netns exec ns1 ip link set bond_slave_1 master bond0
>   # ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad
>   # ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad
>   # ip netns exec ns1 ip link set bond_slave_1 nomaster
>   # ip netns del ns1
> 
> The logical analysis of the problem is as follows:
> 
> 1. create ETH_P_8021AD protocol vlan10 for bond_slave_1:
> register_vlan_dev()
>   vlan_vid_add()
>     vlan_info_alloc()
>     __vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1
> 
> 2. create ETH_P_8021AD protocol bond0_vlan10 for bond0:
> register_vlan_dev()
>   vlan_vid_add()
>     __vlan_vid_add()
>       vlan_add_rx_filter_info()
>           if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER
>               return 0;
> 
>           if (netif_device_present(dev))
>               return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called
>               // The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid.
> 
> 3. detach bond_slave_1 from bond0:
> __bond_release_one()
>   vlan_vids_del_by_dev()
>     list_for_each_entry(vid_info, &vlan_info->vid_list, list)
>         vlan_vid_del(dev, vid_info->proto, vid_info->vid);
>         // bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted.
>         // bond_slave_1->vlan_info will be assigned NULL.
> 
> 4. delete vlan10 during delete ns1:
> default_device_exit_batch()
>   dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10
>     vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1
> 	BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!!
> 
> Add S-VLAN tag related features support to bond driver. So the bond driver
> will always propagate the VLAN info to its slaves.
> 
> Fixes: 8ad227ff89a7 ("net: vlan: add 802.1ad support")
> Suggested-by: Ido Schimmel <idosch@idosch.org>
> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
patchwork-bot+netdevbpf@kernel.org Aug. 7, 2023, 7:30 p.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 2 Aug 2023 19:43:20 +0800 you wrote:
> BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with
> following testcase:
> 
>   # ip netns add ns1
>   # ip netns exec ns1 ip link add bond0 type bond mode 0
>   # ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
>   # ip netns exec ns1 ip link set bond_slave_1 master bond0
>   # ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad
>   # ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad
>   # ip netns exec ns1 ip link set bond_slave_1 nomaster
>   # ip netns del ns1
> 
> [...]

Here is the summary with links:
  - [net,v2] bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
    https://git.kernel.org/netdev/net/c/01f4fd270870

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 484c9e3e5e82..447b06ea4fc9 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5901,7 +5901,9 @@  void bond_setup(struct net_device *bond_dev)
 
 	bond_dev->hw_features = BOND_VLAN_FEATURES |
 				NETIF_F_HW_VLAN_CTAG_RX |
-				NETIF_F_HW_VLAN_CTAG_FILTER;
+				NETIF_F_HW_VLAN_CTAG_FILTER |
+				NETIF_F_HW_VLAN_STAG_RX |
+				NETIF_F_HW_VLAN_STAG_FILTER;
 
 	bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
 	bond_dev->features |= bond_dev->hw_features;