@@ -427,20 +427,49 @@ void __qdisc_run(struct Qdisc *q)
unsigned long dev_trans_start(struct net_device *dev)
{
+ struct net_device *lower;
+ struct list_head *iter;
unsigned long val, res;
+ bool have_lowers;
unsigned int i;
- if (is_vlan_dev(dev))
- dev = vlan_dev_real_dev(dev);
- else if (netif_is_macvlan(dev))
- dev = macvlan_dev_real_dev(dev);
+ rcu_read_lock();
+
+ /* Stacked network interfaces usually have NETIF_F_LLTX so
+ * netdev_start_xmit() -> txq_trans_update() fails to do anything,
+ * because they don't lock the TX queue. Calling dev_trans_start() on a
+ * virtual device makes little sense, since it is a mechanism intended
+ * for the TX watchdog. That notwithstanding, layers such as the
+ * bonding arp monitor may still use dev_trans_start() on slave
+ * interfaces, probably to see if any transmission took place in the
+ * last ARP interval. This use is antiquated, however we don't know
+ * what to replace it with. While we can't solve the general case of
+ * virtual interfaces, for stackable ones (vlan, macvlan, DSA or
+ * potentially stacked combinations), we can work around by returning
+ * the trans_start of the physical, real device backing them. In this
+ * case, walk the adjacency lists all the way down, hoping that the
+ * lower-most device won't have NETIF_F_LLTX.
+ */
+ do {
+ have_lowers = false;
+
+ netdev_for_each_lower_dev(dev, lower, iter) {
+ have_lowers = true;
+ dev = lower;
+ break;
+ }
+ } while (have_lowers);
+
res = READ_ONCE(netdev_get_tx_queue(dev, 0)->trans_start);
+
for (i = 1; i < dev->num_tx_queues; i++) {
val = READ_ONCE(netdev_get_tx_queue(dev, i)->trans_start);
if (val && time_after(val, res))
res = val;
}
+ rcu_read_unlock();
+
return res;
}
EXPORT_SYMBOL(dev_trans_start);
Documentation/networking/bonding.rst points out that for ARP monitoring to work, dev_trans_start() must be able to verify the latest trans_start update of any slave_dev TX queue. However, with NETIF_F_LLTX, dev_trans_start() simply doesn't make much sense. DSA has declared NETIF_F_LLTX to be in line with other stackable interfaces, and this has introduced a regression in the form of breaking ARP monitoring with bonding. There is a workaround already in place in dev_trans_start() to fix just this kind of breakage for non-stacked cases of vlan and macvlan. Since DSA doesn't export any flag which says "this interface is DSA", or "this interface's master is this device", we need to generalize this logic by traversing the netdev adjacency lists, so that DSA is also covered. Link to the discussion on a previous approach: https://patchwork.kernel.org/project/netdevbpf/patch/20220715232641.952532-1-vladimir.oltean@nxp.com/ Fixes: 2b86cb829976 ("net: dsa: declare lockless TX feature for slave ports") Reported-by: Brian Hutchinson <b.hutchman@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- net/sched/sch_generic.c | 37 +++++++++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 4 deletions(-)