Message ID | 9088.1655407590@famine (mailing list archive) |
---|---|
State | Accepted |
Commit | e66e257a5d8368d9c0ba13d4630f474436533e8b |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] veth: Add updating of trans_start | expand |
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Thu, 16 Jun 2022 12:26:30 -0700 you wrote: > Since commit 21a75f0915dd ("bonding: Fix ARP monitor validation"), > the bonding ARP / ND link monitors depend on the trans_start time to > determine link availability. NETIF_F_LLTX drivers must update trans_start > directly, which veth does not do. This prevents use of the ARP or ND link > monitors with veth interfaces in a bond. > > Resolve this by having veth_xmit update the trans_start time. > > [...] Here is the summary with links: - [net] veth: Add updating of trans_start https://git.kernel.org/netdev/net/c/e66e257a5d83 You are awesome, thank you!
On Thu, 16 Jun 2022 12:26:30 -0700 Jay Vosburgh wrote: > Since commit 21a75f0915dd ("bonding: Fix ARP monitor validation"), > the bonding ARP / ND link monitors depend on the trans_start time to > determine link availability. NETIF_F_LLTX drivers must update trans_start > directly, which veth does not do. This prevents use of the ARP or ND link > monitors with veth interfaces in a bond. Why is a SW device required to update its trans_start? trans_start is for the Tx hang watchdog, AFAIK, not a general use attribute. There's plenty of NETIF_F_LLTX devices, are they all broken?
Jakub Kicinski <kuba@kernel.org> wrote: >On Thu, 16 Jun 2022 12:26:30 -0700 Jay Vosburgh wrote: >> Since commit 21a75f0915dd ("bonding: Fix ARP monitor validation"), >> the bonding ARP / ND link monitors depend on the trans_start time to >> determine link availability. NETIF_F_LLTX drivers must update trans_start >> directly, which veth does not do. This prevents use of the ARP or ND link >> monitors with veth interfaces in a bond. > >Why is a SW device required to update its trans_start? trans_start is >for the Tx hang watchdog, AFAIK, not a general use attribute. There's >plenty of NETIF_F_LLTX devices, are they all broken? In this case, it's to permit the bonding ARP / ND monitor to function if that software device (veth in this case) is added to a bond using the ARP / ND monitor (which relies on trans_start, and has done so since at least 2.6.0). I'll agree it's a niche case; this was broken for veth for quite some time, but veth + netns is handy for software only test cases, so it seems worth doing. I didn't exhaustively check all LLTX drivers, but, e.g., tun does update trans_start: drivers/net/tun.c: /* NETIF_F_LLTX requires to do our own update of trans_start */ queue = netdev_get_tx_queue(dev, txq); txq_trans_cond_update(queue); -J --- -Jay Vosburgh, jay.vosburgh@canonical.com
On Fri, 17 Jun 2022 09:42:55 -0700 Jay Vosburgh wrote: > In this case, it's to permit the bonding ARP / ND monitor to > function if that software device (veth in this case) is added to a bond > using the ARP / ND monitor (which relies on trans_start, and has done so > since at least 2.6.0). I'll agree it's a niche case; this was broken > for veth for quite some time, but veth + netns is handy for software > only test cases, so it seems worth doing. I presume it needs it to check if the device has transmitted anything in the last unit of time, can we look at the device stats for LLTX for example? > I didn't exhaustively check all LLTX drivers, but, e.g., tun > does update trans_start: > > drivers/net/tun.c: > > /* NETIF_F_LLTX requires to do our own update of trans_start */ > queue = netdev_get_tx_queue(dev, txq); > txq_trans_cond_update(queue); Well, it is _an_ example, but the only one I can find. And the justification is the same as yours now -- make bonding work a31d27fb. Because of that I don't think we can use tun as a proof that trans start should be updated on LLTX devices as a general, stack-wide rule. There's a lot more LLTX devices than veth and tun.
Jakub Kicinski <kuba@kernel.org> wrote: >On Fri, 17 Jun 2022 09:42:55 -0700 Jay Vosburgh wrote: >> In this case, it's to permit the bonding ARP / ND monitor to >> function if that software device (veth in this case) is added to a bond >> using the ARP / ND monitor (which relies on trans_start, and has done so >> since at least 2.6.0). I'll agree it's a niche case; this was broken >> for veth for quite some time, but veth + netns is handy for software >> only test cases, so it seems worth doing. > >I presume it needs it to check if the device has transmitted anything >in the last unit of time, can we look at the device stats for LLTX for >example? Yes, that's the use case. Hmm. Polling the device stats would likely work for software devices, although the unit of time varies (some checks are fixed at one unit, but others can be N units depending on the missed_max option setting). Polling hardware devices might not work; as I recall, some devices only update the statistics on timespans on the order of seconds, e.g., bnx2 and tg3 appear to update once per second. But those do update trans_start. The question then becomes how to distinguish a software LLTX device from a hardware LLTX device. >> I didn't exhaustively check all LLTX drivers, but, e.g., tun >> does update trans_start: >> >> drivers/net/tun.c: >> >> /* NETIF_F_LLTX requires to do our own update of trans_start */ >> queue = netdev_get_tx_queue(dev, txq); >> txq_trans_cond_update(queue); > >Well, it is _an_ example, but the only one I can find. And the >justification is the same as yours now -- make bonding work a31d27fb. >Because of that I don't think we can use tun as a proof that trans >start should be updated on LLTX devices as a general, stack-wide rule. >There's a lot more LLTX devices than veth and tun. I'm not suggesting that all (software) LLTX software devices be updated. -J --- -Jay Vosburgh, jay.vosburgh@canonical.com
On Fri, 17 Jun 2022 17:27:43 -0700 Jay Vosburgh wrote: > Jakub Kicinski <kuba@kernel.org> wrote: > >On Fri, 17 Jun 2022 09:42:55 -0700 Jay Vosburgh wrote: > >> In this case, it's to permit the bonding ARP / ND monitor to > >> function if that software device (veth in this case) is added to a bond > >> using the ARP / ND monitor (which relies on trans_start, and has done so > >> since at least 2.6.0). I'll agree it's a niche case; this was broken > >> for veth for quite some time, but veth + netns is handy for software > >> only test cases, so it seems worth doing. > > > >I presume it needs it to check if the device has transmitted anything > >in the last unit of time, can we look at the device stats for LLTX for > >example? > > Yes, that's the use case. > > Hmm. Polling the device stats would likely work for software > devices, although the unit of time varies (some checks are fixed at one > unit, but others can be N units depending on the missed_max option > setting). > > Polling hardware devices might not work; as I recall, some > devices only update the statistics on timespans on the order of seconds, > e.g., bnx2 and tg3 appear to update once per second. But those do > update trans_start. Right, unfortunately. > The question then becomes how to distinguish a software LLTX > device from a hardware LLTX device. If my way of thinking about trans_start is correct then we can test for presence of ndo_tx_timeout. Anything that has the tx_timeout NDO must be maintaining trans_start. > >> I didn't exhaustively check all LLTX drivers, but, e.g., tun > >> does update trans_start: > >> > >> drivers/net/tun.c: > >> > >> /* NETIF_F_LLTX requires to do our own update of trans_start */ > >> queue = netdev_get_tx_queue(dev, txq); > >> txq_trans_cond_update(queue); > > > >Well, it is _an_ example, but the only one I can find. And the > >justification is the same as yours now -- make bonding work a31d27fb. > >Because of that I don't think we can use tun as a proof that trans > >start should be updated on LLTX devices as a general, stack-wide rule. > >There's a lot more LLTX devices than veth and tun. > > I'm not suggesting that all (software) LLTX software devices be > updated. The ones which are not updated would remain broken then, no? Waiting for someone to try to bond them and discover it doesn't work.
On Fri, 17 Jun 2022 17:55:50 -0700 Jakub Kicinski wrote: > > >I presume it needs it to check if the device has transmitted anything > > >in the last unit of time, can we look at the device stats for LLTX for > > >example? > > > > Yes, that's the use case. > > > > Hmm. Polling the device stats would likely work for software > > devices, although the unit of time varies (some checks are fixed at one > > unit, but others can be N units depending on the missed_max option > > setting). > > > > Polling hardware devices might not work; as I recall, some > > devices only update the statistics on timespans on the order of seconds, > > e.g., bnx2 and tg3 appear to update once per second. But those do > > update trans_start. > > Right, unfortunately. > > > The question then becomes how to distinguish a software LLTX > > device from a hardware LLTX device. > > If my way of thinking about trans_start is correct then we can test > for presence of ndo_tx_timeout. Anything that has the tx_timeout NDO > must be maintaining trans_start. So what's your thinking Jay? Keep this as an immediate small fix for net but work on using a different approach in net-next?
Jakub Kicinski <kuba@kernel.org> wrote: >On Fri, 17 Jun 2022 17:55:50 -0700 Jakub Kicinski wrote: >> > >I presume it needs it to check if the device has transmitted anything >> > >in the last unit of time, can we look at the device stats for LLTX for >> > >example? >> > >> > Yes, that's the use case. >> > >> > Hmm. Polling the device stats would likely work for software >> > devices, although the unit of time varies (some checks are fixed at one >> > unit, but others can be N units depending on the missed_max option >> > setting). >> > >> > Polling hardware devices might not work; as I recall, some >> > devices only update the statistics on timespans on the order of seconds, >> > e.g., bnx2 and tg3 appear to update once per second. But those do >> > update trans_start. >> >> Right, unfortunately. >> >> > The question then becomes how to distinguish a software LLTX >> > device from a hardware LLTX device. >> >> If my way of thinking about trans_start is correct then we can test >> for presence of ndo_tx_timeout. Anything that has the tx_timeout NDO >> must be maintaining trans_start. > >So what's your thinking Jay? Keep this as an immediate small fix >for net but work on using a different approach in net-next? Sorry, was out for the three day weekend. I had a quick look and I think you're probably right that anything with a ndo_tx_timeout will deal with trans_start, and anything without ndo_tx_timeout will be a software device not subject to delayed batching of stats updates. And, yes, if there are no objections, what I'd like to do now is apply the veth change to get things working and work up the bifurcated approach separately (which would ultimately include removing the trans_start updates from veth and tun). -J --- -Jay Vosburgh, jay.vosburgh@canonical.com
On Tue, 21 Jun 2022 18:42:19 -0700 Jay Vosburgh wrote: > Sorry, was out for the three day weekend. > > I had a quick look and I think you're probably right that > anything with a ndo_tx_timeout will deal with trans_start, and anything > without ndo_tx_timeout will be a software device not subject to delayed > batching of stats updates. > > And, yes, if there are no objections, what I'd like to do now is > apply the veth change to get things working and work up the bifurcated > approach separately (which would ultimately include removing the > trans_start updates from veth and tun). Works for me, thanks!
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 466da01ba2e3..2cb833b3006a 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -312,6 +312,7 @@ static bool veth_skb_is_eligible_for_gro(const struct net_device *dev, static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); + struct netdev_queue *queue = NULL; struct veth_rq *rq = NULL; struct net_device *rcv; int length = skb->len; @@ -329,6 +330,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) rxq = skb_get_queue_mapping(skb); if (rxq < rcv->real_num_rx_queues) { rq = &rcv_priv->rq[rxq]; + queue = netdev_get_tx_queue(dev, rxq); /* The napi pointer is available when an XDP program is * attached or when GRO is enabled @@ -340,6 +342,8 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) skb_tx_timestamp(skb); if (likely(veth_forward_skb(rcv, skb, rq, use_napi) == NET_RX_SUCCESS)) { + if (queue) + txq_trans_cond_update(queue); if (!use_napi) dev_lstats_add(dev, length); } else {