From patchwork Thu Sep 19 08:42:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: chengyechun X-Patchwork-Id: 13807512 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E79E1990C1 for ; Thu, 19 Sep 2024 08:42:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.191 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726735367; cv=none; b=oXRwLgnnH70vnbkHvnzfLqvSVsNM8JPW5vzMry6b22oCIXcZ+w3XqRiZw8griw7t/eLvM/z2azJCG4w/88/niCdCbWhNa+AFrVKZ98nGmv1U2WkdD7LjevimkBts2GN82n+/4Tn6N3Wj8pNyBgDxBGQvHfH6OF8p5xbhdtjPrBk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726735367; c=relaxed/simple; bh=YvVRopY5m2rLxyTiLQzQUKKHIUlbSWq4wZLpxmiPzR0=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=oJH47gijf5nkGneUOkEmcECnBpO8XcPyah2IeXrm/Aazo9Iay9112K34xBj6VDX1IETDsd/JuvTX2/Hpjb3raS5kquWmNpRL4V+1L88jHB66EHbq/bxmWj5FXj10QzNrI7ZW9KkdzMuB0FjJG6IV+3HfsTHSqwcB8yfplitFL7E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.191 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4X8TXS721Wz2QTwS; Thu, 19 Sep 2024 16:42:00 +0800 (CST) Received: from kwepemm600020.china.huawei.com (unknown [7.193.23.147]) by mail.maildlp.com (Postfix) with ESMTPS id 501001402DE; Thu, 19 Sep 2024 16:42:40 +0800 (CST) Received: from kwepemm000018.china.huawei.com (7.193.23.4) by kwepemm600020.china.huawei.com (7.193.23.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 19 Sep 2024 16:42:39 +0800 Received: from kwepemm000018.china.huawei.com ([7.193.23.4]) by kwepemm000018.china.huawei.com ([7.193.23.4]) with mapi id 15.01.2507.039; Thu, 19 Sep 2024 16:42:39 +0800 From: chengyechun To: "netdev@vger.kernel.org" CC: Jay Vosburgh , =?eucgb2312_cn?b?o6xBbmR5IEdvc3BvZGFyZWs=?= Subject: =?eucgb2312_cn?b?tPC4tDogW0Rpc2N1c3NdUXVlc3Rpb25zIGFib3V0IGFjdGl2ZSBzbGF2?= =?eucgb2312_cn?b?ZSBzZWxlY3QgaW4gYm9uZGluZyA4MDIzYWQ=?= Thread-Topic: [Discuss]Questions about active slave select in bonding 8023ad Thread-Index: AdsKZKjErjnqndRMTZCIcmgDUhEG9gACy1PA Date: Thu, 19 Sep 2024 08:42:39 +0000 Message-ID: References: In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Here is patch: Subject: [PATCH] bonding: enable best slave after switch under condition 3a --- drivers/net/bonding/bond_3ad.c | 2 ++ 1 file changed, 2 insertions(+) -- -----邮件原件----- 发件人: chengyechun 发送时间: 2024年9月19日 15:22 收件人: 'netdev@vger.kernel.org' 抄送: 'Jay Vosburgh' ; ',Andy Gospodarek' 主题: [Discuss]Questions about active slave select in bonding 8023ad Hi all, Recently,I'm having a problem starting bond. It's an occasional problem. After the slave and bond are configured, the network fails to be restarted. The failure cause is as follows: “/etc/sysconfig/network-scripts/ifup-eth[2747129]: Error, some other host () already uses address 1.1.1.39.” When the network uses arping to check whether an IP address conflict occurs, an error occurs, but the IP address conflict is not caused. this is very strange. The kernel version 5.10 is used. The bond configuration is as follows: BONDING_OPTS='mode=4 miimon=100 lacp_rate=fast xmit_hash_policy=layer3+4' TYPE=Bond BONDING_MASTER=yes BOOTPROTO=static NM_CONTROLLED=no IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_FAILURE_FATAL=no IPV6_ADDR_GEN_MODE=stable-privacy NAME=bond0 DEVICE=bond0 ONBOOT=yes IPADDR=1.1.1.38 NETMASK=255.255.0.0 IPV6ADDR=1:1:1::39/64 The slave configuration is as follows: and I have four similar slaves enp13s0,enp14s0,enp15s0 NAME=enp12s0 DEVICE=enp12s0 BOOTPROTO=none ONBOOT=yes USERCTL=no NM_CONTROLLED=no MASTER=bond0 SLAVE=yes IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_FAILURE_FATAL=no After I discovered this problem, I restarted the network multiple times and it always happened once or twice. After some debugging, it is found that the bond interface does not have an available slave when the arping packet is sent. As a result, the arping packet fails to be sent. When the problem occurs, the active slave node is switched from enp12s0 to enp13s0. However, the backup of enp13s0 is not changed from 1 to 0 immediately after the switchover is complete. This is a mechanism or bug? After thinking about it, I have a doubt about the select of active slave. In the ad_agg_selection_test function, if condition 3a is met, that is, if (__agg_has_partner(curr) && !__agg_has_partner(best)),and after the active slave switch is successful, why not enable_port the best slave in ad_agg_selection_logic? diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index ae0393dff..8494420ed 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -1819,6 +1819,8 @@ static void ad_agg_selection_logic(struct aggregator *agg, __disable_port(port); } } + port = best->lag_ports; + __enbale_port(port); /* Slave array needs update. */ *update_slave_arr = true; }