Message ID | c2f698e6f73e6e78232ab4ded065c3828d245dbd.1660065706.git.jtoppins@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC,net] bonding: 802.3ad: fix no transmission of LACPDUs | expand |
Jonathan Toppins <jtoppins@redhat.com> wrote: >Running the script in >`tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh` puts >bonding into a state where it never transmits LACPDUs. > >line 53: echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio >setting bond param: ad_actor_sys_prio >given: > params.ad_actor_system = 0 >call stack: > bond_option_ad_actor_sys_prio() > -> bond_3ad_update_ad_actor_settings() > -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio > -> ad.system.sys_mac_addr = bond->dev->dev_addr; because > params.ad_actor_system == 0 >results: > ad.system.sys_mac_addr = bond->dev->dev_addr > >line 59: ip link set fbond address 52:54:00:3B:7C:A6 >setting bond MAC addr >call stack: > bond->dev->dev_addr = new_mac > >line 63: echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio >setting bond param: ad_actor_sys_prio >given: > params.ad_actor_system = 0 >call stack: > bond_option_ad_actor_sys_prio() > -> bond_3ad_update_ad_actor_settings() > -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio > -> ad.system.sys_mac_addr = bond->dev->dev_addr; because > params.ad_actor_system == 0 >results: > ad.system.sys_mac_addr = bond->dev->dev_addr > >line 71: ip link set veth1-bond down master fbond >given: > params.ad_actor_system = 0 > params.mode = BOND_MODE_8023AD > ad.system.sys_mac_addr == bond->dev->dev_addr >call stack: > bond_enslave > -> bond_3ad_initialize(); because first slave > -> if ad.system.sys_mac_addr != bond->dev->dev_addr > return >results: > Nothing is run in bond_3ad_initialize() because dev_add equals > sys_mac_addr leaving the global ad_ticks_per_sec zero as it is > never initialized anywhere else. > >Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") >Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> >--- > MAINTAINERS | 1 + > drivers/net/bonding/bond_3ad.c | 2 +- > .../net/bonding/bond-break-lacpdu-tx.sh | 88 +++++++++++++++++++ > 3 files changed, 90 insertions(+), 1 deletion(-) > create mode 100644 tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh > >diff --git a/MAINTAINERS b/MAINTAINERS >index 386178699ae7..6e7cebc1bca3 100644 >--- a/MAINTAINERS >+++ b/MAINTAINERS >@@ -3636,6 +3636,7 @@ F: Documentation/networking/bonding.rst > F: drivers/net/bonding/ > F: include/net/bond* > F: include/uapi/linux/if_bonding.h >+F: tools/testing/selftests/net/bonding/ > > BOSCH SENSORTEC BMA400 ACCELEROMETER IIO DRIVER > M: Dan Robertson <dan@dlrobertson.com> >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c >index d7fb33c078e8..e357bc6b8e05 100644 >--- a/drivers/net/bonding/bond_3ad.c >+++ b/drivers/net/bonding/bond_3ad.c >@@ -84,7 +84,7 @@ enum ad_link_speed_type { > static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = { > 0, 0, 0, 0, 0, 0 > }; >-static u16 ad_ticks_per_sec; >+static u16 ad_ticks_per_sec = 1000/AD_TIMER_INTERVAL; How does this resolve the problem? Does bond_3ad_initialize actually run, or is this change sort of jump-starting things? > static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000; > > static const u8 lacpdu_mcast_addr[ETH_ALEN + 2] __long_aligned = >diff --git a/tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh b/tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh >new file mode 100644 >index 000000000000..be9f1b64e89e >--- /dev/null >+++ b/tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh >@@ -0,0 +1,88 @@ >+#!/bin/sh >+ >+# Regression Test: >+# Verify LACPDUs get transmitted after setting the MAC address of >+# the bond. >+# >+# https://bugzilla.redhat.com/show_bug.cgi?id=2020773 >+# >+# +---------+ >+# | fab-br0 | >+# +---------+ >+# | >+# +---------+ >+# | fbond | >+# +---------+ >+# | | >+# +------+ +------+ >+# |veth1 | |veth2 | >+# +------+ +------+ >+# >+# We use veths instead of physical interfaces >+ >+set -e >+#set -x >+tmp=$(mktemp -q dump.XXXXXX) >+cleanup() { >+ ip link del fab-br0 >/dev/null 2>&1 || : >+ ip link del fbond >/dev/null 2>&1 || : >+ ip link del veth1-bond >/dev/null 2>&1 || : >+ ip link del veth2-bond >/dev/null 2>&1 || : >+ modprobe -r bonding >/dev/null 2>&1 || : >+ rm -f -- ${tmp} >+} >+ >+trap cleanup 0 1 2 >+cleanup >+sleep 1 >+ >+# create the bridge >+ip link add fab-br0 address 52:54:00:3B:7C:A6 mtu 1500 type bridge \ >+ forward_delay 15 >+ >+# create the bond >+ip link add fbond type bond >+ip link set fbond up >+ >+# set bond sysfs parameters >+ip link set fbond down >+echo 802.3ad > /sys/class/net/fbond/bonding/mode >+echo 200 > /sys/class/net/fbond/bonding/miimon >+echo 1 > /sys/class/net/fbond/bonding/xmit_hash_policy >+echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio >+echo stable > /sys/class/net/fbond/bonding/ad_select >+echo slow > /sys/class/net/fbond/bonding/lacp_rate >+echo any > /sys/class/net/fbond/bonding/arp_all_targets Having a test case is very nice; would it be possible to avoid using sysfs, though? I believe all of these parameters are available via /sbin/ip. Also, is setting "arp_all_targets" necessary for the test? -J >+ >+# set bond address >+ip link set fbond address 52:54:00:3B:7C:A6 >+ip link set fbond up >+ >+# set again bond sysfs parameters >+echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio >+ >+# create veths >+ip link add name veth1-bond type veth peer name veth1-end >+ip link add name veth2-bond type veth peer name veth2-end >+ >+# add ports >+ip link set fbond master fab-br0 >+ip link set veth1-bond down master fbond >+ip link set veth2-bond down master fbond >+ >+# bring up >+ip link set veth1-end up >+ip link set veth2-end up >+ip link set fab-br0 up >+ip link set fbond up >+ip addr add dev fab-br0 10.0.0.3 >+ >+tcpdump -n -i veth1-end -e ether proto 0x8809 >${tmp} 2>&1 & >+sleep 60 >+pkill tcpdump >/dev/null 2>&1 >+num=$(grep "packets captured" ${tmp} | awk '{print $1}') >+if test "$num" -gt 0; then >+ echo "PASS, captured ${num}" >+else >+ echo "FAIL" >+fi >-- >2.31.1 > --- -Jay Vosburgh, jay.vosburgh@canonical.com
On 8/9/22 13:36, Jay Vosburgh wrote: > Jonathan Toppins <jtoppins@redhat.com> wrote: > >> Running the script in >> `tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh` puts >> bonding into a state where it never transmits LACPDUs. >> >> line 53: echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio >> setting bond param: ad_actor_sys_prio >> given: >> params.ad_actor_system = 0 >> call stack: >> bond_option_ad_actor_sys_prio() >> -> bond_3ad_update_ad_actor_settings() >> -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio >> -> ad.system.sys_mac_addr = bond->dev->dev_addr; because >> params.ad_actor_system == 0 >> results: >> ad.system.sys_mac_addr = bond->dev->dev_addr >> >> line 59: ip link set fbond address 52:54:00:3B:7C:A6 >> setting bond MAC addr >> call stack: >> bond->dev->dev_addr = new_mac >> >> line 63: echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio >> setting bond param: ad_actor_sys_prio >> given: >> params.ad_actor_system = 0 >> call stack: >> bond_option_ad_actor_sys_prio() >> -> bond_3ad_update_ad_actor_settings() >> -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio >> -> ad.system.sys_mac_addr = bond->dev->dev_addr; because >> params.ad_actor_system == 0 >> results: >> ad.system.sys_mac_addr = bond->dev->dev_addr >> >> line 71: ip link set veth1-bond down master fbond >> given: >> params.ad_actor_system = 0 >> params.mode = BOND_MODE_8023AD >> ad.system.sys_mac_addr == bond->dev->dev_addr >> call stack: >> bond_enslave >> -> bond_3ad_initialize(); because first slave >> -> if ad.system.sys_mac_addr != bond->dev->dev_addr >> return >> results: >> Nothing is run in bond_3ad_initialize() because dev_add equals >> sys_mac_addr leaving the global ad_ticks_per_sec zero as it is >> never initialized anywhere else. >> >> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") >> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> >> --- >> MAINTAINERS | 1 + >> drivers/net/bonding/bond_3ad.c | 2 +- >> .../net/bonding/bond-break-lacpdu-tx.sh | 88 +++++++++++++++++++ >> 3 files changed, 90 insertions(+), 1 deletion(-) >> create mode 100644 tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index 386178699ae7..6e7cebc1bca3 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -3636,6 +3636,7 @@ F: Documentation/networking/bonding.rst >> F: drivers/net/bonding/ >> F: include/net/bond* >> F: include/uapi/linux/if_bonding.h >> +F: tools/testing/selftests/net/bonding/ >> >> BOSCH SENSORTEC BMA400 ACCELEROMETER IIO DRIVER >> M: Dan Robertson <dan@dlrobertson.com> >> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c >> index d7fb33c078e8..e357bc6b8e05 100644 >> --- a/drivers/net/bonding/bond_3ad.c >> +++ b/drivers/net/bonding/bond_3ad.c >> @@ -84,7 +84,7 @@ enum ad_link_speed_type { >> static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = { >> 0, 0, 0, 0, 0, 0 >> }; >> -static u16 ad_ticks_per_sec; >> +static u16 ad_ticks_per_sec = 1000/AD_TIMER_INTERVAL; > > How does this resolve the problem? Does bond_3ad_initialize > actually run, or is this change sort of jump-starting things? It is jump-starting things. Really bond_3ad_initialize() should be fixed, but it seemed this change would be easier from a backporting perspective. The real issue seems to be bond_3ad_initialize() checks to make sure the "bond is not initialized yet" and if this check fails no initialization is done, which seems incorrect. Some minimal amount of initialization it seems needs to be done. This is also an order of execution bug, as I tried to layout in the commit message. Basically setting fbond's MAC and then resetting the option ad_actor_sys_prio causes the if check in bond_3ad_initialize() to not execute anything. We first saw this when using NetworkManager as for some reason NetworkManager was setting options twice, this is being looked at as well. I am open to other possible fixes, I just chose the one that appeared to be the easiest to backport, hence the RFC. > >> static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000; >> >> static const u8 lacpdu_mcast_addr[ETH_ALEN + 2] __long_aligned = >> diff --git a/tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh b/tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh >> new file mode 100644 >> index 000000000000..be9f1b64e89e >> --- /dev/null >> +++ b/tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh >> @@ -0,0 +1,88 @@ >> +#!/bin/sh >> + >> +# Regression Test: >> +# Verify LACPDUs get transmitted after setting the MAC address of >> +# the bond. >> +# >> +# https://bugzilla.redhat.com/show_bug.cgi?id=2020773 >> +# >> +# +---------+ >> +# | fab-br0 | >> +# +---------+ >> +# | >> +# +---------+ >> +# | fbond | >> +# +---------+ >> +# | | >> +# +------+ +------+ >> +# |veth1 | |veth2 | >> +# +------+ +------+ >> +# >> +# We use veths instead of physical interfaces >> + >> +set -e >> +#set -x >> +tmp=$(mktemp -q dump.XXXXXX) >> +cleanup() { >> + ip link del fab-br0 >/dev/null 2>&1 || : >> + ip link del fbond >/dev/null 2>&1 || : >> + ip link del veth1-bond >/dev/null 2>&1 || : >> + ip link del veth2-bond >/dev/null 2>&1 || : >> + modprobe -r bonding >/dev/null 2>&1 || : >> + rm -f -- ${tmp} >> +} >> + >> +trap cleanup 0 1 2 >> +cleanup >> +sleep 1 >> + >> +# create the bridge >> +ip link add fab-br0 address 52:54:00:3B:7C:A6 mtu 1500 type bridge \ >> + forward_delay 15 >> + >> +# create the bond >> +ip link add fbond type bond >> +ip link set fbond up >> + >> +# set bond sysfs parameters >> +ip link set fbond down >> +echo 802.3ad > /sys/class/net/fbond/bonding/mode >> +echo 200 > /sys/class/net/fbond/bonding/miimon >> +echo 1 > /sys/class/net/fbond/bonding/xmit_hash_policy >> +echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio >> +echo stable > /sys/class/net/fbond/bonding/ad_select >> +echo slow > /sys/class/net/fbond/bonding/lacp_rate >> +echo any > /sys/class/net/fbond/bonding/arp_all_targets > > Having a test case is very nice; would it be possible to avoid > using sysfs, though? I believe all of these parameters are available > via /sbin/ip. I can convert the test case to `ip link`, it doesn't appear the method of configuration would cause a difference in the result. > > Also, is setting "arp_all_targets" necessary for the test? Its probably not, I probably do not need to configure most of these options because most are default values. I can work on trimming it down even more. > > -J > >> + >> +# set bond address >> +ip link set fbond address 52:54:00:3B:7C:A6 >> +ip link set fbond up >> + >> +# set again bond sysfs parameters >> +echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio >> + >> +# create veths >> +ip link add name veth1-bond type veth peer name veth1-end >> +ip link add name veth2-bond type veth peer name veth2-end >> + >> +# add ports >> +ip link set fbond master fab-br0 >> +ip link set veth1-bond down master fbond >> +ip link set veth2-bond down master fbond >> + >> +# bring up >> +ip link set veth1-end up >> +ip link set veth2-end up >> +ip link set fab-br0 up >> +ip link set fbond up >> +ip addr add dev fab-br0 10.0.0.3 >> + >> +tcpdump -n -i veth1-end -e ether proto 0x8809 >${tmp} 2>&1 & >> +sleep 60 >> +pkill tcpdump >/dev/null 2>&1 >> +num=$(grep "packets captured" ${tmp} | awk '{print $1}') >> +if test "$num" -gt 0; then >> + echo "PASS, captured ${num}" >> +else >> + echo "FAIL" >> +fi >> -- >> 2.31.1 >> > > --- > -Jay Vosburgh, jay.vosburgh@canonical.com >
On Tue, Aug 09, 2022 at 01:21:46PM -0400, Jonathan Toppins wrote: > --- > MAINTAINERS | 1 + > drivers/net/bonding/bond_3ad.c | 2 +- > .../net/bonding/bond-break-lacpdu-tx.sh | 88 +++++++++++++++++++ Hi Jon, You need a Makefile in this folder and set TEST_PROGS so we can generate the test in kselftest-list.txt. Thanks Hangbin
On 8/9/22 21:26, Hangbin Liu wrote: > On Tue, Aug 09, 2022 at 01:21:46PM -0400, Jonathan Toppins wrote: >> --- >> MAINTAINERS | 1 + >> drivers/net/bonding/bond_3ad.c | 2 +- >> .../net/bonding/bond-break-lacpdu-tx.sh | 88 +++++++++++++++++++ > > Hi Jon, > > You need a Makefile in this folder and set TEST_PROGS so we can generate the > test in kselftest-list.txt. > Thank you. I have a v2 coming. I also broke up the fix from the test that way it is simpler to backport. -Jon
diff --git a/MAINTAINERS b/MAINTAINERS index 386178699ae7..6e7cebc1bca3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3636,6 +3636,7 @@ F: Documentation/networking/bonding.rst F: drivers/net/bonding/ F: include/net/bond* F: include/uapi/linux/if_bonding.h +F: tools/testing/selftests/net/bonding/ BOSCH SENSORTEC BMA400 ACCELEROMETER IIO DRIVER M: Dan Robertson <dan@dlrobertson.com> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index d7fb33c078e8..e357bc6b8e05 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -84,7 +84,7 @@ enum ad_link_speed_type { static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = { 0, 0, 0, 0, 0, 0 }; -static u16 ad_ticks_per_sec; +static u16 ad_ticks_per_sec = 1000/AD_TIMER_INTERVAL; static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000; static const u8 lacpdu_mcast_addr[ETH_ALEN + 2] __long_aligned = diff --git a/tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh b/tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh new file mode 100644 index 000000000000..be9f1b64e89e --- /dev/null +++ b/tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh @@ -0,0 +1,88 @@ +#!/bin/sh + +# Regression Test: +# Verify LACPDUs get transmitted after setting the MAC address of +# the bond. +# +# https://bugzilla.redhat.com/show_bug.cgi?id=2020773 +# +# +---------+ +# | fab-br0 | +# +---------+ +# | +# +---------+ +# | fbond | +# +---------+ +# | | +# +------+ +------+ +# |veth1 | |veth2 | +# +------+ +------+ +# +# We use veths instead of physical interfaces + +set -e +#set -x +tmp=$(mktemp -q dump.XXXXXX) +cleanup() { + ip link del fab-br0 >/dev/null 2>&1 || : + ip link del fbond >/dev/null 2>&1 || : + ip link del veth1-bond >/dev/null 2>&1 || : + ip link del veth2-bond >/dev/null 2>&1 || : + modprobe -r bonding >/dev/null 2>&1 || : + rm -f -- ${tmp} +} + +trap cleanup 0 1 2 +cleanup +sleep 1 + +# create the bridge +ip link add fab-br0 address 52:54:00:3B:7C:A6 mtu 1500 type bridge \ + forward_delay 15 + +# create the bond +ip link add fbond type bond +ip link set fbond up + +# set bond sysfs parameters +ip link set fbond down +echo 802.3ad > /sys/class/net/fbond/bonding/mode +echo 200 > /sys/class/net/fbond/bonding/miimon +echo 1 > /sys/class/net/fbond/bonding/xmit_hash_policy +echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio +echo stable > /sys/class/net/fbond/bonding/ad_select +echo slow > /sys/class/net/fbond/bonding/lacp_rate +echo any > /sys/class/net/fbond/bonding/arp_all_targets + +# set bond address +ip link set fbond address 52:54:00:3B:7C:A6 +ip link set fbond up + +# set again bond sysfs parameters +echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio + +# create veths +ip link add name veth1-bond type veth peer name veth1-end +ip link add name veth2-bond type veth peer name veth2-end + +# add ports +ip link set fbond master fab-br0 +ip link set veth1-bond down master fbond +ip link set veth2-bond down master fbond + +# bring up +ip link set veth1-end up +ip link set veth2-end up +ip link set fab-br0 up +ip link set fbond up +ip addr add dev fab-br0 10.0.0.3 + +tcpdump -n -i veth1-end -e ether proto 0x8809 >${tmp} 2>&1 & +sleep 60 +pkill tcpdump >/dev/null 2>&1 +num=$(grep "packets captured" ${tmp} | awk '{print $1}') +if test "$num" -gt 0; then + echo "PASS, captured ${num}" +else + echo "FAIL" +fi
Running the script in `tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh` puts bonding into a state where it never transmits LACPDUs. line 53: echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio setting bond param: ad_actor_sys_prio given: params.ad_actor_system = 0 call stack: bond_option_ad_actor_sys_prio() -> bond_3ad_update_ad_actor_settings() -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio -> ad.system.sys_mac_addr = bond->dev->dev_addr; because params.ad_actor_system == 0 results: ad.system.sys_mac_addr = bond->dev->dev_addr line 59: ip link set fbond address 52:54:00:3B:7C:A6 setting bond MAC addr call stack: bond->dev->dev_addr = new_mac line 63: echo 65535 > /sys/class/net/fbond/bonding/ad_actor_sys_prio setting bond param: ad_actor_sys_prio given: params.ad_actor_system = 0 call stack: bond_option_ad_actor_sys_prio() -> bond_3ad_update_ad_actor_settings() -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio -> ad.system.sys_mac_addr = bond->dev->dev_addr; because params.ad_actor_system == 0 results: ad.system.sys_mac_addr = bond->dev->dev_addr line 71: ip link set veth1-bond down master fbond given: params.ad_actor_system = 0 params.mode = BOND_MODE_8023AD ad.system.sys_mac_addr == bond->dev->dev_addr call stack: bond_enslave -> bond_3ad_initialize(); because first slave -> if ad.system.sys_mac_addr != bond->dev->dev_addr return results: Nothing is run in bond_3ad_initialize() because dev_add equals sys_mac_addr leaving the global ad_ticks_per_sec zero as it is never initialized anywhere else. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> --- MAINTAINERS | 1 + drivers/net/bonding/bond_3ad.c | 2 +- .../net/bonding/bond-break-lacpdu-tx.sh | 88 +++++++++++++++++++ 3 files changed, 90 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/net/bonding/bond-break-lacpdu-tx.sh