diff mbox series

答复: [Discuss]Questions about active slave select in bonding 8023ad

Message ID b2785db6fbe9421ca6510ca92ddfa650@huawei.com (mailing list archive)
State New
Headers show
Series 答复: [Discuss]Questions about active slave select in bonding 8023ad | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

chengyechun Sept. 19, 2024, 8:42 a.m. UTC
Here is patch:

Subject: [PATCH] bonding: enable best slave after switch under condition 3a
---
drivers/net/bonding/bond_3ad.c | 2 ++
1 file changed, 2 insertions(+)

--

-----邮件原件-----
发件人: chengyechun 
发送时间: 2024年9月19日 15:22
收件人: 'netdev@vger.kernel.org' <netdev@vger.kernel.org>
抄送: 'Jay Vosburgh' <j.vosburgh@gmail.com>; ',Andy Gospodarek' <andy@greyhouse.net>
主题: [Discuss]Questions about active slave select in bonding 8023ad

Hi all,
Recently,I'm having a problem starting bond. It's an occasional problem.
After the slave and bond are configured, the network fails to be restarted. The failure cause is as follows:
“/etc/sysconfig/network-scripts/ifup-eth[2747129]: Error, some other host () already uses address 1.1.1.39.”
When the network uses arping to check whether an IP address conflict occurs, an error occurs, but the IP address conflict is not caused. this is very strange.
The kernel version 5.10 is used. The bond configuration is as follows:

BONDING_OPTS='mode=4 miimon=100 lacp_rate=fast xmit_hash_policy=layer3+4'
TYPE=Bond
BONDING_MASTER=yes
BOOTPROTO=static
NM_CONTROLLED=no
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=bond0
DEVICE=bond0
ONBOOT=yes
IPADDR=1.1.1.38
NETMASK=255.255.0.0
IPV6ADDR=1:1:1::39/64

The slave configuration is as follows: and I have four similar slaves enp13s0,enp14s0,enp15s0

NAME=enp12s0
DEVICE=enp12s0
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
NM_CONTROLLED=no
MASTER=bond0
SLAVE=yes
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no

After I discovered this problem, I restarted the network multiple times and it always happened once or twice.
After some debugging, it is found that the bond interface does not have an available slave when the arping packet is sent. As a result, the arping packet fails to be sent.
When the problem occurs, the active slave node is switched from enp12s0 to enp13s0. However, the backup of enp13s0 is not changed from 1 to 0 immediately after the switchover is complete. This is a mechanism or bug?

After thinking about it, I have a doubt about the select of active slave. In the ad_agg_selection_test function, if condition 3a is met, that is, if (__agg_has_partner(curr) && !__agg_has_partner(best)),and after the active slave switch is successful, why not enable_port the best slave in ad_agg_selection_logic?
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index ae0393dff..8494420ed 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1819,6 +1819,8 @@  static void ad_agg_selection_logic(struct aggregator *agg,
                                __disable_port(port);
                        }
                }
+               port = best->lag_ports;
+               __enbale_port(port);
                /* Slave array needs update. */
                *update_slave_arr = true;
        }