diff mbox series

[net] veth: Add updating of trans_start

Message ID 9088.1655407590@famine (mailing list archive)
State Accepted
Commit e66e257a5d8368d9c0ba13d4630f474436533e8b
Delegated to: Netdev Maintainers
Headers show
Series [net] veth: Add updating of trans_start | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 22 this patch: 22
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 9 this patch: 9
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 19 this patch: 19
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 22 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jay Vosburgh June 16, 2022, 7:26 p.m. UTC
Since commit 21a75f0915dd ("bonding: Fix ARP monitor validation"),
the bonding ARP / ND link monitors depend on the trans_start time to
determine link availability.  NETIF_F_LLTX drivers must update trans_start
directly, which veth does not do.  This prevents use of the ARP or ND link
monitors with veth interfaces in a bond.

	Resolve this by having veth_xmit update the trans_start time.

Reported-by: Jonathan Toppins <jtoppins@redhat.com>
Tested-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Fixes: 21a75f0915dd ("bonding: Fix ARP monitor validation")
Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/

---
 drivers/net/veth.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

patchwork-bot+netdevbpf@kernel.org June 17, 2022, 10:40 a.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu, 16 Jun 2022 12:26:30 -0700 you wrote:
> Since commit 21a75f0915dd ("bonding: Fix ARP monitor validation"),
> the bonding ARP / ND link monitors depend on the trans_start time to
> determine link availability.  NETIF_F_LLTX drivers must update trans_start
> directly, which veth does not do.  This prevents use of the ARP or ND link
> monitors with veth interfaces in a bond.
> 
> 	Resolve this by having veth_xmit update the trans_start time.
> 
> [...]

Here is the summary with links:
  - [net] veth: Add updating of trans_start
    https://git.kernel.org/netdev/net/c/e66e257a5d83

You are awesome, thank you!
Jakub Kicinski June 17, 2022, 3:45 p.m. UTC | #2
On Thu, 16 Jun 2022 12:26:30 -0700 Jay Vosburgh wrote:
> 	Since commit 21a75f0915dd ("bonding: Fix ARP monitor validation"),
> the bonding ARP / ND link monitors depend on the trans_start time to
> determine link availability.  NETIF_F_LLTX drivers must update trans_start
> directly, which veth does not do.  This prevents use of the ARP or ND link
> monitors with veth interfaces in a bond.

Why is a SW device required to update its trans_start? trans_start is
for the Tx hang watchdog, AFAIK, not a general use attribute. There's
plenty of NETIF_F_LLTX devices, are they all broken?
Jay Vosburgh June 17, 2022, 4:42 p.m. UTC | #3
Jakub Kicinski <kuba@kernel.org> wrote:

>On Thu, 16 Jun 2022 12:26:30 -0700 Jay Vosburgh wrote:
>> 	Since commit 21a75f0915dd ("bonding: Fix ARP monitor validation"),
>> the bonding ARP / ND link monitors depend on the trans_start time to
>> determine link availability.  NETIF_F_LLTX drivers must update trans_start
>> directly, which veth does not do.  This prevents use of the ARP or ND link
>> monitors with veth interfaces in a bond.
>
>Why is a SW device required to update its trans_start? trans_start is
>for the Tx hang watchdog, AFAIK, not a general use attribute. There's
>plenty of NETIF_F_LLTX devices, are they all broken? 

	In this case, it's to permit the bonding ARP / ND monitor to
function if that software device (veth in this case) is added to a bond
using the ARP / ND monitor (which relies on trans_start, and has done so
since at least 2.6.0).  I'll agree it's a niche case; this was broken
for veth for quite some time, but veth + netns is handy for software
only test cases, so it seems worth doing.

	I didn't exhaustively check all LLTX drivers, but, e.g., tun
does update trans_start:

drivers/net/tun.c:

       /* NETIF_F_LLTX requires to do our own update of trans_start */
        queue = netdev_get_tx_queue(dev, txq);
        txq_trans_cond_update(queue);

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Jakub Kicinski June 17, 2022, 7:44 p.m. UTC | #4
On Fri, 17 Jun 2022 09:42:55 -0700 Jay Vosburgh wrote:
> 	In this case, it's to permit the bonding ARP / ND monitor to
> function if that software device (veth in this case) is added to a bond
> using the ARP / ND monitor (which relies on trans_start, and has done so
> since at least 2.6.0).  I'll agree it's a niche case; this was broken
> for veth for quite some time, but veth + netns is handy for software
> only test cases, so it seems worth doing.

I presume it needs it to check if the device has transmitted anything
in the last unit of time, can we look at the device stats for LLTX for
example?

> 	I didn't exhaustively check all LLTX drivers, but, e.g., tun
> does update trans_start:
> 
> drivers/net/tun.c:
> 
>        /* NETIF_F_LLTX requires to do our own update of trans_start */
>         queue = netdev_get_tx_queue(dev, txq);
>         txq_trans_cond_update(queue);

Well, it is _an_ example, but the only one I can find. And the
justification is the same as yours now -- make bonding work a31d27fb.
Because of that I don't think we can use tun as a proof that trans 
start should be updated on LLTX devices as a general, stack-wide rule.
There's a lot more LLTX devices than veth and tun.
Jay Vosburgh June 18, 2022, 12:27 a.m. UTC | #5
Jakub Kicinski <kuba@kernel.org> wrote:

>On Fri, 17 Jun 2022 09:42:55 -0700 Jay Vosburgh wrote:
>> 	In this case, it's to permit the bonding ARP / ND monitor to
>> function if that software device (veth in this case) is added to a bond
>> using the ARP / ND monitor (which relies on trans_start, and has done so
>> since at least 2.6.0).  I'll agree it's a niche case; this was broken
>> for veth for quite some time, but veth + netns is handy for software
>> only test cases, so it seems worth doing.
>
>I presume it needs it to check if the device has transmitted anything
>in the last unit of time, can we look at the device stats for LLTX for
>example?

	Yes, that's the use case.  

	Hmm.  Polling the device stats would likely work for software
devices, although the unit of time varies (some checks are fixed at one
unit, but others can be N units depending on the missed_max option
setting).

	Polling hardware devices might not work; as I recall, some
devices only update the statistics on timespans on the order of seconds,
e.g., bnx2 and tg3 appear to update once per second.  But those do
update trans_start.

	The question then becomes how to distinguish a software LLTX
device from a hardware LLTX device.

>> 	I didn't exhaustively check all LLTX drivers, but, e.g., tun
>> does update trans_start:
>> 
>> drivers/net/tun.c:
>> 
>>        /* NETIF_F_LLTX requires to do our own update of trans_start */
>>         queue = netdev_get_tx_queue(dev, txq);
>>         txq_trans_cond_update(queue);
>
>Well, it is _an_ example, but the only one I can find. And the
>justification is the same as yours now -- make bonding work a31d27fb.
>Because of that I don't think we can use tun as a proof that trans 
>start should be updated on LLTX devices as a general, stack-wide rule.
>There's a lot more LLTX devices than veth and tun.

	I'm not suggesting that all (software) LLTX software devices be
updated.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Jakub Kicinski June 18, 2022, 12:55 a.m. UTC | #6
On Fri, 17 Jun 2022 17:27:43 -0700 Jay Vosburgh wrote:
> Jakub Kicinski <kuba@kernel.org> wrote:
> >On Fri, 17 Jun 2022 09:42:55 -0700 Jay Vosburgh wrote:  
> >> 	In this case, it's to permit the bonding ARP / ND monitor to
> >> function if that software device (veth in this case) is added to a bond
> >> using the ARP / ND monitor (which relies on trans_start, and has done so
> >> since at least 2.6.0).  I'll agree it's a niche case; this was broken
> >> for veth for quite some time, but veth + netns is handy for software
> >> only test cases, so it seems worth doing.  
> >
> >I presume it needs it to check if the device has transmitted anything
> >in the last unit of time, can we look at the device stats for LLTX for
> >example?  
> 
> 	Yes, that's the use case.  
> 
> 	Hmm.  Polling the device stats would likely work for software
> devices, although the unit of time varies (some checks are fixed at one
> unit, but others can be N units depending on the missed_max option
> setting).
> 
> 	Polling hardware devices might not work; as I recall, some
> devices only update the statistics on timespans on the order of seconds,
> e.g., bnx2 and tg3 appear to update once per second.  But those do
> update trans_start.

Right, unfortunately.

> 	The question then becomes how to distinguish a software LLTX
> device from a hardware LLTX device.

If my way of thinking about trans_start is correct then we can test 
for presence of ndo_tx_timeout. Anything that has the tx_timeout NDO
must be maintaining trans_start.

> >> 	I didn't exhaustively check all LLTX drivers, but, e.g., tun
> >> does update trans_start:
> >> 
> >> drivers/net/tun.c:
> >> 
> >>        /* NETIF_F_LLTX requires to do our own update of trans_start */
> >>         queue = netdev_get_tx_queue(dev, txq);
> >>         txq_trans_cond_update(queue);  
> >
> >Well, it is _an_ example, but the only one I can find. And the
> >justification is the same as yours now -- make bonding work a31d27fb.
> >Because of that I don't think we can use tun as a proof that trans 
> >start should be updated on LLTX devices as a general, stack-wide rule.
> >There's a lot more LLTX devices than veth and tun.  
> 
> 	I'm not suggesting that all (software) LLTX software devices be
> updated.

The ones which are not updated would remain broken then, no?
Waiting for someone to try to bond them and discover it doesn't work.
Jakub Kicinski June 21, 2022, 7:52 p.m. UTC | #7
On Fri, 17 Jun 2022 17:55:50 -0700 Jakub Kicinski wrote:
> > >I presume it needs it to check if the device has transmitted anything
> > >in the last unit of time, can we look at the device stats for LLTX for
> > >example?    
> > 
> > 	Yes, that's the use case.  
> > 
> > 	Hmm.  Polling the device stats would likely work for software
> > devices, although the unit of time varies (some checks are fixed at one
> > unit, but others can be N units depending on the missed_max option
> > setting).
> > 
> > 	Polling hardware devices might not work; as I recall, some
> > devices only update the statistics on timespans on the order of seconds,
> > e.g., bnx2 and tg3 appear to update once per second.  But those do
> > update trans_start.  
> 
> Right, unfortunately.
> 
> > 	The question then becomes how to distinguish a software LLTX
> > device from a hardware LLTX device.  
> 
> If my way of thinking about trans_start is correct then we can test 
> for presence of ndo_tx_timeout. Anything that has the tx_timeout NDO
> must be maintaining trans_start.

So what's your thinking Jay? Keep this as an immediate small fix 
for net but work on using a different approach in net-next?
Jay Vosburgh June 22, 2022, 1:42 a.m. UTC | #8
Jakub Kicinski <kuba@kernel.org> wrote:

>On Fri, 17 Jun 2022 17:55:50 -0700 Jakub Kicinski wrote:
>> > >I presume it needs it to check if the device has transmitted anything
>> > >in the last unit of time, can we look at the device stats for LLTX for
>> > >example?    
>> > 
>> > 	Yes, that's the use case.  
>> > 
>> > 	Hmm.  Polling the device stats would likely work for software
>> > devices, although the unit of time varies (some checks are fixed at one
>> > unit, but others can be N units depending on the missed_max option
>> > setting).
>> > 
>> > 	Polling hardware devices might not work; as I recall, some
>> > devices only update the statistics on timespans on the order of seconds,
>> > e.g., bnx2 and tg3 appear to update once per second.  But those do
>> > update trans_start.  
>> 
>> Right, unfortunately.
>> 
>> > 	The question then becomes how to distinguish a software LLTX
>> > device from a hardware LLTX device.  
>> 
>> If my way of thinking about trans_start is correct then we can test 
>> for presence of ndo_tx_timeout. Anything that has the tx_timeout NDO
>> must be maintaining trans_start.
>
>So what's your thinking Jay? Keep this as an immediate small fix 
>for net but work on using a different approach in net-next?

	Sorry, was out for the three day weekend.

	I had a quick look and I think you're probably right that
anything with a ndo_tx_timeout will deal with trans_start, and anything
without ndo_tx_timeout will be a software device not subject to delayed
batching of stats updates.

	And, yes, if there are no objections, what I'd like to do now is
apply the veth change to get things working and work up the bifurcated
approach separately (which would ultimately include removing the
trans_start updates from veth and tun).

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Jakub Kicinski June 22, 2022, 4:38 a.m. UTC | #9
On Tue, 21 Jun 2022 18:42:19 -0700 Jay Vosburgh wrote:
> 	Sorry, was out for the three day weekend.
> 
> 	I had a quick look and I think you're probably right that
> anything with a ndo_tx_timeout will deal with trans_start, and anything
> without ndo_tx_timeout will be a software device not subject to delayed
> batching of stats updates.
> 
> 	And, yes, if there are no objections, what I'd like to do now is
> apply the veth change to get things working and work up the bifurcated
> approach separately (which would ultimately include removing the
> trans_start updates from veth and tun).

Works for me, thanks!
diff mbox series

Patch

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 466da01ba2e3..2cb833b3006a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -312,6 +312,7 @@  static bool veth_skb_is_eligible_for_gro(const struct net_device *dev,
 static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
+	struct netdev_queue *queue = NULL;
 	struct veth_rq *rq = NULL;
 	struct net_device *rcv;
 	int length = skb->len;
@@ -329,6 +330,7 @@  static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 	rxq = skb_get_queue_mapping(skb);
 	if (rxq < rcv->real_num_rx_queues) {
 		rq = &rcv_priv->rq[rxq];
+		queue = netdev_get_tx_queue(dev, rxq);
 
 		/* The napi pointer is available when an XDP program is
 		 * attached or when GRO is enabled
@@ -340,6 +342,8 @@  static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	skb_tx_timestamp(skb);
 	if (likely(veth_forward_skb(rcv, skb, rq, use_napi) == NET_RX_SUCCESS)) {
+		if (queue)
+			txq_trans_cond_update(queue);
 		if (!use_napi)
 			dev_lstats_add(dev, length);
 	} else {