diff mbox series

[v2,net] net/sched: make dev_trans_start() have a better chance of working with stacked interfaces

Message ID 20220727152000.3616086-1-vladimir.oltean@nxp.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [v2,net] net/sched: make dev_trans_start() have a better chance of working with stacked interfaces | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers success CCed 10 of 10 maintainers
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 53 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Vladimir Oltean July 27, 2022, 3:20 p.m. UTC
Documentation/networking/bonding.rst points out that for ARP monitoring
to work, dev_trans_start() must be able to verify the latest trans_start
update of any slave_dev TX queue. However, with NETIF_F_LLTX,
dev_trans_start() simply doesn't make much sense.

DSA has declared NETIF_F_LLTX to be in line with other stackable
interfaces, and this has introduced a regression in the form of breaking
ARP monitoring with bonding.

There is a workaround already in place in dev_trans_start() to fix just
this kind of breakage for non-stacked cases of vlan and macvlan. Since
DSA doesn't export any flag which says "this interface is DSA", or "this
interface's master is this device", we need to generalize this logic by
traversing the netdev adjacency lists, so that DSA is also covered.

Link to the discussion on a previous approach:
https://patchwork.kernel.org/project/netdevbpf/patch/20220715232641.952532-1-vladimir.oltean@nxp.com/

Fixes: 2b86cb829976 ("net: dsa: declare lockless TX feature for slave ports")
Reported-by: Brian Hutchinson <b.hutchman@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/sched/sch_generic.c | 37 +++++++++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 4 deletions(-)

Comments

Stephen Hemminger July 27, 2022, 3:44 p.m. UTC | #1
On Wed, 27 Jul 2022 18:20:00 +0300
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> +	do {
> +		have_lowers = false;
> +
> +		netdev_for_each_lower_dev(dev, lower, iter) {
> +			have_lowers = true;
> +			dev = lower;
> +			break;
> +		}
> +	} while (have_lower

Would be clearer if this was a helper function.
Something like dev_leaf_device?
Vladimir Oltean July 27, 2022, 4:17 p.m. UTC | #2
On Wed, Jul 27, 2022 at 08:44:04AM -0700, Stephen Hemminger wrote:
> On Wed, 27 Jul 2022 18:20:00 +0300
> Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> 
> > +	do {
> > +		have_lowers = false;
> > +
> > +		netdev_for_each_lower_dev(dev, lower, iter) {
> > +			have_lowers = true;
> > +			dev = lower;
> > +			break;
> > +		}
> > +	} while (have_lower
> 
> Would be clearer if this was a helper function.
> Something like dev_leaf_device?

Probably dev_leaf_device_rcu() I presume, so that the caller takes and
keeps the RCU critical section for as long as the leaf device is
actually being used, right?
diff mbox series

Patch

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cc6eabee2830..fb964e2b0436 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -427,20 +427,49 @@  void __qdisc_run(struct Qdisc *q)
 
 unsigned long dev_trans_start(struct net_device *dev)
 {
+	struct net_device *lower;
+	struct list_head *iter;
 	unsigned long val, res;
+	bool have_lowers;
 	unsigned int i;
 
-	if (is_vlan_dev(dev))
-		dev = vlan_dev_real_dev(dev);
-	else if (netif_is_macvlan(dev))
-		dev = macvlan_dev_real_dev(dev);
+	rcu_read_lock();
+
+	/* Stacked network interfaces usually have NETIF_F_LLTX so
+	 * netdev_start_xmit() -> txq_trans_update() fails to do anything,
+	 * because they don't lock the TX queue. Calling dev_trans_start() on a
+	 * virtual device makes little sense, since it is a mechanism intended
+	 * for the TX watchdog. That notwithstanding, layers such as the
+	 * bonding arp monitor may still use dev_trans_start() on slave
+	 * interfaces, probably to see if any transmission took place in the
+	 * last ARP interval. This use is antiquated, however we don't know
+	 * what to replace it with. While we can't solve the general case of
+	 * virtual interfaces, for stackable ones (vlan, macvlan, DSA or
+	 * potentially stacked combinations), we can work around by returning
+	 * the trans_start of the physical, real device backing them. In this
+	 * case, walk the adjacency lists all the way down, hoping that the
+	 * lower-most device won't have NETIF_F_LLTX.
+	 */
+	do {
+		have_lowers = false;
+
+		netdev_for_each_lower_dev(dev, lower, iter) {
+			have_lowers = true;
+			dev = lower;
+			break;
+		}
+	} while (have_lowers);
+
 	res = READ_ONCE(netdev_get_tx_queue(dev, 0)->trans_start);
+
 	for (i = 1; i < dev->num_tx_queues; i++) {
 		val = READ_ONCE(netdev_get_tx_queue(dev, i)->trans_start);
 		if (val && time_after(val, res))
 			res = val;
 	}
 
+	rcu_read_unlock();
+
 	return res;
 }
 EXPORT_SYMBOL(dev_trans_start);