mbox series

[v3,net-next,00/23] net: add preliminary netdev refcount tracking

Message ID 20211205042217.982127-1-eric.dumazet@gmail.com (mailing list archive)
Headers show
Series net: add preliminary netdev refcount tracking | expand

Message

Eric Dumazet Dec. 5, 2021, 4:21 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

Two first patches add a generic infrastructure, that will be used
to get tracking of refcount increments/decrements.

The general idea is to be able to precisely pair each decrement with
a corresponding prior increment. Both share a cookie, basically
a pointer to private data storing stack traces.

The third patch adds dev_hold_track() and dev_put_track() helpers
(CONFIG_NET_DEV_REFCNT_TRACKER)

Then a series of 20 patches converts some dev_hold()/dev_put()
pairs to new hepers : dev_hold_track() and dev_put_track().

Hopefully this will be used by developpers and syzbot to
root cause bugs that cause netdevice dismantles freezes.

With CONFIG_PCPU_DEV_REFCNT=n option, we were able to detect
some class of bugs, but too late (when too many dev_put()
were happening).

Another series will be sent after this one is merged.

v3: moved NET_DEV_REFCNT_TRACKER to net/Kconfig.debug
    added "depends on DEBUG_KERNEL && STACKTRACE_SUPPORT"
    to hopefully get rid of kbuild reports for ARCH=nios2
    Reworded patch 3 changelog.
    Added missing htmldocs (Jakub)

v2: added four additional patches,
    added netdev_tracker_alloc() and netdev_tracker_free()
    addressed build error (kernel bots),
    use GFP_ATOMIC in test_ref_tracker_timer_func()

Eric Dumazet (23):
  lib: add reference counting tracking infrastructure
  lib: add tests for reference tracker
  net: add net device refcount tracker infrastructure
  net: add net device refcount tracker to struct netdev_rx_queue
  net: add net device refcount tracker to struct netdev_queue
  net: add net device refcount tracker to ethtool_phys_id()
  net: add net device refcount tracker to dev_ifsioc()
  drop_monitor: add net device refcount tracker
  net: dst: add net device refcount tracking to dst_entry
  ipv6: add net device refcount tracker to rt6_probe_deferred()
  sit: add net device refcount tracking to ip_tunnel
  ipv6: add net device refcount tracker to struct ip6_tnl
  net: add net device refcount tracker to struct neighbour
  net: add net device refcount tracker to struct pneigh_entry
  net: add net device refcount tracker to struct neigh_parms
  net: add net device refcount tracker to struct netdev_adjacent
  ipv6: add net device refcount tracker to struct inet6_dev
  ipv4: add net device refcount tracker to struct in_device
  net/sched: add net device refcount tracker to struct Qdisc
  net: linkwatch: add net device refcount tracker
  net: failover: add net device refcount tracker
  ipmr, ip6mr: add net device refcount tracker to struct vif_device
  netpoll: add net device refcount tracker to struct netpoll

 drivers/net/netconsole.c    |   2 +-
 include/linux/inetdevice.h  |   2 +
 include/linux/mroute_base.h |   2 +
 include/linux/netdevice.h   |  68 ++++++++++++++++++
 include/linux/netpoll.h     |   1 +
 include/linux/ref_tracker.h |  73 +++++++++++++++++++
 include/net/devlink.h       |   4 ++
 include/net/dst.h           |   1 +
 include/net/failover.h      |   1 +
 include/net/if_inet6.h      |   1 +
 include/net/ip6_tunnel.h    |   1 +
 include/net/ip_tunnels.h    |   3 +
 include/net/neighbour.h     |   3 +
 include/net/sch_generic.h   |   2 +-
 lib/Kconfig                 |   5 ++
 lib/Kconfig.debug           |  15 ++++
 lib/Makefile                |   4 +-
 lib/ref_tracker.c           | 140 ++++++++++++++++++++++++++++++++++++
 lib/test_ref_tracker.c      | 115 +++++++++++++++++++++++++++++
 net/Kconfig.debug           |  11 +++
 net/core/dev.c              |  10 ++-
 net/core/dev_ioctl.c        |   5 +-
 net/core/drop_monitor.c     |   6 +-
 net/core/dst.c              |   8 +--
 net/core/failover.c         |   4 +-
 net/core/link_watch.c       |   4 +-
 net/core/neighbour.c        |  18 ++---
 net/core/net-sysfs.c        |   8 +--
 net/core/netpoll.c          |   4 +-
 net/ethtool/ioctl.c         |   5 +-
 net/ipv4/devinet.c          |   4 +-
 net/ipv4/ipmr.c             |   3 +-
 net/ipv4/route.c            |   7 +-
 net/ipv6/addrconf.c         |   4 +-
 net/ipv6/addrconf_core.c    |   2 +-
 net/ipv6/ip6_gre.c          |   8 +--
 net/ipv6/ip6_tunnel.c       |   4 +-
 net/ipv6/ip6_vti.c          |   4 +-
 net/ipv6/ip6mr.c            |   3 +-
 net/ipv6/route.c            |  10 +--
 net/ipv6/sit.c              |   4 +-
 net/sched/sch_generic.c     |   4 +-
 42 files changed, 521 insertions(+), 62 deletions(-)
 create mode 100644 include/linux/ref_tracker.h
 create mode 100644 lib/ref_tracker.c
 create mode 100644 lib/test_ref_tracker.c
 create mode 100644 net/Kconfig.debug

Comments

Andrew Lunn Dec. 6, 2021, 11:23 p.m. UTC | #1
On Sat, Dec 04, 2021 at 08:21:54PM -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Two first patches add a generic infrastructure, that will be used
> to get tracking of refcount increments/decrements.

Hi Eric

Using this i found:

[  774.108901] unregister_netdevice: waiting for eth0 to become free. Usage count = 4
[  774.110864] leaked reference.
[  774.110874]  dst_alloc+0x7a/0x180
[  774.110887]  ip6_dst_alloc+0x27/0x90
[  774.110894]  ip6_pol_route+0x257/0x430
[  774.110900]  ip6_pol_route_output+0x19/0x20
[  774.110905]  fib6_rule_lookup+0x18b/0x270
[  774.110914]  ip6_route_output_flags_noref+0xaa/0x110
[  774.110918]  ip6_route_output_flags+0x32/0xa0
[  774.110922]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
[  774.110929]  ip6_dst_lookup_flow+0x43/0xa0
[  774.110934]  inet6_csk_route_socket+0x166/0x200
[  774.110943]  inet6_csk_xmit+0x56/0x130
[  774.110946]  __tcp_transmit_skb+0x53b/0xc30
[  774.110953]  __tcp_send_ack.part.0+0xc6/0x1a0
[  774.110958]  tcp_send_ack+0x1c/0x20
[  774.110964]  __tcp_ack_snd_check+0x42/0x200
[  774.110968]  tcp_rcv_established+0x27a/0x6f0
[  774.110973] leaked reference.
[  774.110975]  ipv6_add_dev+0x13e/0x4f0
[  774.110982]  addrconf_notify+0x2ca/0x950
[  774.110989]  raw_notifier_call_chain+0x49/0x60
[  774.111000]  call_netdevice_notifiers_info+0x50/0x90
[  774.111007]  __dev_change_net_namespace+0x30d/0x6c0
[  774.111016]  do_setlink+0xdc/0x10b0
[  774.111024]  __rtnl_newlink+0x608/0xa10
[  774.111031]  rtnl_newlink+0x49/0x70
[  774.111038]  rtnetlink_rcv_msg+0x14f/0x380
[  774.111046]  netlink_rcv_skb+0x55/0x100
[  774.111053]  rtnetlink_rcv+0x15/0x20
[  774.111059]  netlink_unicast+0x230/0x340
[  774.111064]  netlink_sendmsg+0x252/0x4b0
[  774.111075]  sock_sendmsg+0x65/0x70
[  774.111080]  ____sys_sendmsg+0x24e/0x290
[  774.111084]  ___sys_sendmsg+0x81/0xc0

I'm using GNS3 to simulate a network topology. So a collection of veth
pairs, bridges and tap interfaces spread over a few namespaces. The
network being simulated uses Segment Routing. And traceroute might also
involved in this somehow. I have 3 patches applied, to make traceroute
actually work when SRv6 is being used. You can find v3 here:

https://lore.kernel.org/netdev/20211203162926.3680281-3-andrew@lunn.ch/T/

I'm not sure if these patches are part of the problem or not. None of
the traces i've seen are directly on the ICMP path. traceroute is
using udp, and one of the traces above is for tcp, and the other looks
like it is moving an interface into a different namespace?

This is net-next from today.

Any ideas?

Thanks
	Andrew
Eric Dumazet Dec. 6, 2021, 11:44 p.m. UTC | #2
On Mon, Dec 6, 2021 at 3:24 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Sat, Dec 04, 2021 at 08:21:54PM -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Two first patches add a generic infrastructure, that will be used
> > to get tracking of refcount increments/decrements.
>
> Hi Eric
>
> Using this i found:
>
> [  774.108901] unregister_netdevice: waiting for eth0 to become free. Usage count = 4
> [  774.110864] leaked reference.
> [  774.110874]  dst_alloc+0x7a/0x180
> [  774.110887]  ip6_dst_alloc+0x27/0x90
> [  774.110894]  ip6_pol_route+0x257/0x430
> [  774.110900]  ip6_pol_route_output+0x19/0x20
> [  774.110905]  fib6_rule_lookup+0x18b/0x270
> [  774.110914]  ip6_route_output_flags_noref+0xaa/0x110
> [  774.110918]  ip6_route_output_flags+0x32/0xa0
> [  774.110922]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
> [  774.110929]  ip6_dst_lookup_flow+0x43/0xa0
> [  774.110934]  inet6_csk_route_socket+0x166/0x200
> [  774.110943]  inet6_csk_xmit+0x56/0x130
> [  774.110946]  __tcp_transmit_skb+0x53b/0xc30
> [  774.110953]  __tcp_send_ack.part.0+0xc6/0x1a0
> [  774.110958]  tcp_send_ack+0x1c/0x20
> [  774.110964]  __tcp_ack_snd_check+0x42/0x200
> [  774.110968]  tcp_rcv_established+0x27a/0x6f0
> [  774.110973] leaked reference.
> [  774.110975]  ipv6_add_dev+0x13e/0x4f0
> [  774.110982]  addrconf_notify+0x2ca/0x950
> [  774.110989]  raw_notifier_call_chain+0x49/0x60
> [  774.111000]  call_netdevice_notifiers_info+0x50/0x90
> [  774.111007]  __dev_change_net_namespace+0x30d/0x6c0
> [  774.111016]  do_setlink+0xdc/0x10b0
> [  774.111024]  __rtnl_newlink+0x608/0xa10
> [  774.111031]  rtnl_newlink+0x49/0x70
> [  774.111038]  rtnetlink_rcv_msg+0x14f/0x380
> [  774.111046]  netlink_rcv_skb+0x55/0x100
> [  774.111053]  rtnetlink_rcv+0x15/0x20
> [  774.111059]  netlink_unicast+0x230/0x340
> [  774.111064]  netlink_sendmsg+0x252/0x4b0
> [  774.111075]  sock_sendmsg+0x65/0x70
> [  774.111080]  ____sys_sendmsg+0x24e/0x290
> [  774.111084]  ___sys_sendmsg+0x81/0xc0
>
> I'm using GNS3 to simulate a network topology. So a collection of veth
> pairs, bridges and tap interfaces spread over a few namespaces. The
> network being simulated uses Segment Routing. And traceroute might also
> involved in this somehow. I have 3 patches applied, to make traceroute
> actually work when SRv6 is being used. You can find v3 here:
>
> https://lore.kernel.org/netdev/20211203162926.3680281-3-andrew@lunn.ch/T/
>
> I'm not sure if these patches are part of the problem or not. None of
> the traces i've seen are directly on the ICMP path. traceroute is
> using udp, and one of the traces above is for tcp, and the other looks
> like it is moving an interface into a different namespace?
>
> This is net-next from today.

I do not understand, net-next does not contain this stuff yet ?

I have other patches, this work is still in progress.
Eric Dumazet Dec. 6, 2021, 11:48 p.m. UTC | #3
On Mon, Dec 6, 2021 at 3:44 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Dec 6, 2021 at 3:24 PM Andrew Lunn <andrew@lunn.ch> wrote:

> > I'm not sure if these patches are part of the problem or not. None of
> > the traces i've seen are directly on the ICMP path. traceroute is
> > using udp, and one of the traces above is for tcp, and the other looks
> > like it is moving an interface into a different namespace?
> >
> > This is net-next from today.
>
> I do not understand, net-next does not contain this stuff yet ?
>
> I have other patches, this work is still in progress.

(Total of 55 patches, but I do not want to send them and flood the mailing list)

I will send next series when first round is merged.
Andrew Lunn Dec. 7, 2021, midnight UTC | #4
On Mon, Dec 06, 2021 at 03:44:57PM -0800, Eric Dumazet wrote:
> On Mon, Dec 6, 2021 at 3:24 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > On Sat, Dec 04, 2021 at 08:21:54PM -0800, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > >
> > > Two first patches add a generic infrastructure, that will be used
> > > to get tracking of refcount increments/decrements.
> >
> > Hi Eric
> >
> > Using this i found:
> >
> > [  774.108901] unregister_netdevice: waiting for eth0 to become free. Usage count = 4
> > [  774.110864] leaked reference.
> > [  774.110874]  dst_alloc+0x7a/0x180
> > [  774.110887]  ip6_dst_alloc+0x27/0x90
> > [  774.110894]  ip6_pol_route+0x257/0x430
> > [  774.110900]  ip6_pol_route_output+0x19/0x20
> > [  774.110905]  fib6_rule_lookup+0x18b/0x270
> > [  774.110914]  ip6_route_output_flags_noref+0xaa/0x110
> > [  774.110918]  ip6_route_output_flags+0x32/0xa0
> > [  774.110922]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
> > [  774.110929]  ip6_dst_lookup_flow+0x43/0xa0
> > [  774.110934]  inet6_csk_route_socket+0x166/0x200
> > [  774.110943]  inet6_csk_xmit+0x56/0x130
> > [  774.110946]  __tcp_transmit_skb+0x53b/0xc30
> > [  774.110953]  __tcp_send_ack.part.0+0xc6/0x1a0
> > [  774.110958]  tcp_send_ack+0x1c/0x20
> > [  774.110964]  __tcp_ack_snd_check+0x42/0x200
> > [  774.110968]  tcp_rcv_established+0x27a/0x6f0
> > [  774.110973] leaked reference.
> > [  774.110975]  ipv6_add_dev+0x13e/0x4f0
> > [  774.110982]  addrconf_notify+0x2ca/0x950
> > [  774.110989]  raw_notifier_call_chain+0x49/0x60
> > [  774.111000]  call_netdevice_notifiers_info+0x50/0x90
> > [  774.111007]  __dev_change_net_namespace+0x30d/0x6c0
> > [  774.111016]  do_setlink+0xdc/0x10b0
> > [  774.111024]  __rtnl_newlink+0x608/0xa10
> > [  774.111031]  rtnl_newlink+0x49/0x70
> > [  774.111038]  rtnetlink_rcv_msg+0x14f/0x380
> > [  774.111046]  netlink_rcv_skb+0x55/0x100
> > [  774.111053]  rtnetlink_rcv+0x15/0x20
> > [  774.111059]  netlink_unicast+0x230/0x340
> > [  774.111064]  netlink_sendmsg+0x252/0x4b0
> > [  774.111075]  sock_sendmsg+0x65/0x70
> > [  774.111080]  ____sys_sendmsg+0x24e/0x290
> > [  774.111084]  ___sys_sendmsg+0x81/0xc0
> >
> > I'm using GNS3 to simulate a network topology. So a collection of veth
> > pairs, bridges and tap interfaces spread over a few namespaces. The
> > network being simulated uses Segment Routing. And traceroute might also
> > involved in this somehow. I have 3 patches applied, to make traceroute
> > actually work when SRv6 is being used. You can find v3 here:
> >
> > https://lore.kernel.org/netdev/20211203162926.3680281-3-andrew@lunn.ch/T/
> >
> > I'm not sure if these patches are part of the problem or not. None of
> > the traces i've seen are directly on the ICMP path. traceroute is
> > using udp, and one of the traces above is for tcp, and the other looks
> > like it is moving an interface into a different namespace?
> >
> > This is net-next from today.
> 
> I do not understand, net-next does not contain this stuff yet ?

Hi Eric

I'm getting warnings like:

unregister_netdevice: waiting for eth0 to become free. Usage count = 4

which is what your patchset is supposed to help fix. So i applied what
has been posted so far, in the hope it would find the issue. It is
reporting something...

> I have other patches, this work is still in progress.

Is what is currently posted usable? Do these traces above point at the
real problem i have, or because there are more patches, i should not
trust the output?

      Andrew
Eric Dumazet Dec. 7, 2021, 12:04 a.m. UTC | #5
On Mon, Dec 6, 2021 at 4:00 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Mon, Dec 06, 2021 at 03:44:57PM -0800, Eric Dumazet wrote:
> > On Mon, Dec 6, 2021 at 3:24 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > On Sat, Dec 04, 2021 at 08:21:54PM -0800, Eric Dumazet wrote:
> > > > From: Eric Dumazet <edumazet@google.com>
> > > >
> > > > Two first patches add a generic infrastructure, that will be used
> > > > to get tracking of refcount increments/decrements.
> > >
> > > Hi Eric
> > >
> > > Using this i found:
> > >
> > > [  774.108901] unregister_netdevice: waiting for eth0 to become free. Usage count = 4
> > > [  774.110864] leaked reference.
> > > [  774.110874]  dst_alloc+0x7a/0x180
> > > [  774.110887]  ip6_dst_alloc+0x27/0x90
> > > [  774.110894]  ip6_pol_route+0x257/0x430
> > > [  774.110900]  ip6_pol_route_output+0x19/0x20
> > > [  774.110905]  fib6_rule_lookup+0x18b/0x270
> > > [  774.110914]  ip6_route_output_flags_noref+0xaa/0x110
> > > [  774.110918]  ip6_route_output_flags+0x32/0xa0
> > > [  774.110922]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
> > > [  774.110929]  ip6_dst_lookup_flow+0x43/0xa0
> > > [  774.110934]  inet6_csk_route_socket+0x166/0x200
> > > [  774.110943]  inet6_csk_xmit+0x56/0x130
> > > [  774.110946]  __tcp_transmit_skb+0x53b/0xc30
> > > [  774.110953]  __tcp_send_ack.part.0+0xc6/0x1a0
> > > [  774.110958]  tcp_send_ack+0x1c/0x20
> > > [  774.110964]  __tcp_ack_snd_check+0x42/0x200
> > > [  774.110968]  tcp_rcv_established+0x27a/0x6f0
> > > [  774.110973] leaked reference.
> > > [  774.110975]  ipv6_add_dev+0x13e/0x4f0
> > > [  774.110982]  addrconf_notify+0x2ca/0x950
> > > [  774.110989]  raw_notifier_call_chain+0x49/0x60
> > > [  774.111000]  call_netdevice_notifiers_info+0x50/0x90
> > > [  774.111007]  __dev_change_net_namespace+0x30d/0x6c0
> > > [  774.111016]  do_setlink+0xdc/0x10b0
> > > [  774.111024]  __rtnl_newlink+0x608/0xa10
> > > [  774.111031]  rtnl_newlink+0x49/0x70
> > > [  774.111038]  rtnetlink_rcv_msg+0x14f/0x380
> > > [  774.111046]  netlink_rcv_skb+0x55/0x100
> > > [  774.111053]  rtnetlink_rcv+0x15/0x20
> > > [  774.111059]  netlink_unicast+0x230/0x340
> > > [  774.111064]  netlink_sendmsg+0x252/0x4b0
> > > [  774.111075]  sock_sendmsg+0x65/0x70
> > > [  774.111080]  ____sys_sendmsg+0x24e/0x290
> > > [  774.111084]  ___sys_sendmsg+0x81/0xc0
> > >
> > > I'm using GNS3 to simulate a network topology. So a collection of veth
> > > pairs, bridges and tap interfaces spread over a few namespaces. The
> > > network being simulated uses Segment Routing. And traceroute might also
> > > involved in this somehow. I have 3 patches applied, to make traceroute
> > > actually work when SRv6 is being used. You can find v3 here:
> > >
> > > https://lore.kernel.org/netdev/20211203162926.3680281-3-andrew@lunn.ch/T/
> > >
> > > I'm not sure if these patches are part of the problem or not. None of
> > > the traces i've seen are directly on the ICMP path. traceroute is
> > > using udp, and one of the traces above is for tcp, and the other looks
> > > like it is moving an interface into a different namespace?
> > >
> > > This is net-next from today.
> >
> > I do not understand, net-next does not contain this stuff yet ?
>
> Hi Eric
>
> I'm getting warnings like:
>
> unregister_netdevice: waiting for eth0 to become free. Usage count = 4
>
> which is what your patchset is supposed to help fix. So i applied what
> has been posted so far, in the hope it would find the issue. It is
> reporting something...

I thought you were telling me that you got these new reports after the
patch set being applied ?

Or were they happening because of your other changes ?

>
> > I have other patches, this work is still in progress.
>
> Is what is currently posted usable? Do these traces above point at the
> real problem i have, or because there are more patches, i should not
> trust the output?

I think I have not worked yet on the XFRM side in patch set 1.
Are you using XFRM ?
Andrew Lunn Dec. 7, 2021, 12:12 a.m. UTC | #6
On Mon, Dec 06, 2021 at 04:04:19PM -0800, Eric Dumazet wrote:
> On Mon, Dec 6, 2021 at 4:00 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > On Mon, Dec 06, 2021 at 03:44:57PM -0800, Eric Dumazet wrote:
> > > On Mon, Dec 6, 2021 at 3:24 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > > >
> > > > On Sat, Dec 04, 2021 at 08:21:54PM -0800, Eric Dumazet wrote:
> > > > > From: Eric Dumazet <edumazet@google.com>
> > > > >
> > > > > Two first patches add a generic infrastructure, that will be used
> > > > > to get tracking of refcount increments/decrements.
> > > >
> > > > Hi Eric
> > > >
> > > > Using this i found:
> > > >
> > > > [  774.108901] unregister_netdevice: waiting for eth0 to become free. Usage count = 4
> > > > [  774.110864] leaked reference.
> > > > [  774.110874]  dst_alloc+0x7a/0x180
> > > > [  774.110887]  ip6_dst_alloc+0x27/0x90
> > > > [  774.110894]  ip6_pol_route+0x257/0x430
> > > > [  774.110900]  ip6_pol_route_output+0x19/0x20
> > > > [  774.110905]  fib6_rule_lookup+0x18b/0x270
> > > > [  774.110914]  ip6_route_output_flags_noref+0xaa/0x110
> > > > [  774.110918]  ip6_route_output_flags+0x32/0xa0
> > > > [  774.110922]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
> > > > [  774.110929]  ip6_dst_lookup_flow+0x43/0xa0
> > > > [  774.110934]  inet6_csk_route_socket+0x166/0x200
> > > > [  774.110943]  inet6_csk_xmit+0x56/0x130
> > > > [  774.110946]  __tcp_transmit_skb+0x53b/0xc30
> > > > [  774.110953]  __tcp_send_ack.part.0+0xc6/0x1a0
> > > > [  774.110958]  tcp_send_ack+0x1c/0x20
> > > > [  774.110964]  __tcp_ack_snd_check+0x42/0x200
> > > > [  774.110968]  tcp_rcv_established+0x27a/0x6f0
> > > > [  774.110973] leaked reference.
> > > > [  774.110975]  ipv6_add_dev+0x13e/0x4f0
> > > > [  774.110982]  addrconf_notify+0x2ca/0x950
> > > > [  774.110989]  raw_notifier_call_chain+0x49/0x60
> > > > [  774.111000]  call_netdevice_notifiers_info+0x50/0x90
> > > > [  774.111007]  __dev_change_net_namespace+0x30d/0x6c0
> > > > [  774.111016]  do_setlink+0xdc/0x10b0
> > > > [  774.111024]  __rtnl_newlink+0x608/0xa10
> > > > [  774.111031]  rtnl_newlink+0x49/0x70
> > > > [  774.111038]  rtnetlink_rcv_msg+0x14f/0x380
> > > > [  774.111046]  netlink_rcv_skb+0x55/0x100
> > > > [  774.111053]  rtnetlink_rcv+0x15/0x20
> > > > [  774.111059]  netlink_unicast+0x230/0x340
> > > > [  774.111064]  netlink_sendmsg+0x252/0x4b0
> > > > [  774.111075]  sock_sendmsg+0x65/0x70
> > > > [  774.111080]  ____sys_sendmsg+0x24e/0x290
> > > > [  774.111084]  ___sys_sendmsg+0x81/0xc0
> > > >
> > > > I'm using GNS3 to simulate a network topology. So a collection of veth
> > > > pairs, bridges and tap interfaces spread over a few namespaces. The
> > > > network being simulated uses Segment Routing. And traceroute might also
> > > > involved in this somehow. I have 3 patches applied, to make traceroute
> > > > actually work when SRv6 is being used. You can find v3 here:
> > > >
> > > > https://lore.kernel.org/netdev/20211203162926.3680281-3-andrew@lunn.ch/T/
> > > >
> > > > I'm not sure if these patches are part of the problem or not. None of
> > > > the traces i've seen are directly on the ICMP path. traceroute is
> > > > using udp, and one of the traces above is for tcp, and the other looks
> > > > like it is moving an interface into a different namespace?
> > > >
> > > > This is net-next from today.
> > >
> > > I do not understand, net-next does not contain this stuff yet ?
> >
> > Hi Eric
> >
> > I'm getting warnings like:
> >
> > unregister_netdevice: waiting for eth0 to become free. Usage count = 4
> >
> > which is what your patchset is supposed to help fix. So i applied what
> > has been posted so far, in the hope it would find the issue. It is
> > reporting something...
> 
> I thought you were telling me that you got these new reports after the
> patch set being applied ?

Hi Eric

No. I applied them.

> Or were they happening because of your other changes ?

Hard to say. It looks like some sort of race condition. Sometimes when
i shut down the GNS3 simulation, i get the issues, sometimes not. I
don't have a good enough feeling to say either way, is it an existing
problem, or it is my code which is triggering it.

> > > I have other patches, this work is still in progress.
> >
> > Is what is currently posted usable? Do these traces above point at the
> > real problem i have, or because there are more patches, i should not
> > trust the output?
> 
> I think I have not worked yet on the XFRM side in patch set 1.
> Are you using XFRM ?

I don't think SRv6 uses XFRM.

  Andrew
Eric Dumazet Dec. 7, 2021, 12:17 a.m. UTC | #7
On Mon, Dec 6, 2021 at 4:12 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
>
> Hard to say. It looks like some sort of race condition. Sometimes when
> i shut down the GNS3 simulation, i get the issues, sometimes not. I
> don't have a good enough feeling to say either way, is it an existing
> problem, or it is my code which is triggering it.

OK got it.

I think it might be premature to use ref_tracker yet, until we also
have the netns one.
(Seeing the netns change path from your report, this might be relevant)

Path series adding netns tracking:

1fe7f3e6bf91 net: add networking namespace refcount tracker
14d34ec0eaad net: add netns refcount tracker to struct sock
648e1c8128a1 net: add netns refcount tracker to struct seq_net_private
fa5ec9628f3e net: sched: add netns refcount tracker to struct tcf_exts
fa9f11a0a627 netfilter: nfnetlink: add netns refcount tracker to
struct nfulnl_instance
8e3bbdc619d0 l2tp: add netns refcount tracker to l2tp_dfs_seq_data
323fd18ce64c ppp: add netns refcount tracker
d01d6c0df780 netfilter: nf_nat_masquerade: add netns refcount tracker
to masq_dev_work
1b7051234a99 SUNRPC: add netns refcount tracker to struct svc_xprt
44721a730a24 SUNRPC: add netns refcount tracker to struct gss_auth
648e8fd765b7 SUNRPC: add netns refcount tracker to struct rpc_xprt
c1d5973f3af0 net: initialize init_net earlier
75285dbd40cd net: add netns refcount tracker to struct nsproxy
0fbde1282785 vfs: add netns refcount tracker to struct fs_context
5a0c6bd0445f audit: add netns refcount tracker to struct audit_net
145f70501bfb audit: add netns refcount tracker to struct audit_reply
b5af80d1c341 audit: add netns refcount tracker to struct audit_netlink_list
Eric Dumazet Dec. 7, 2021, 12:21 a.m. UTC | #8
On Mon, Dec 6, 2021 at 4:17 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Dec 6, 2021 at 4:12 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> >
> > Hard to say. It looks like some sort of race condition. Sometimes when
> > i shut down the GNS3 simulation, i get the issues, sometimes not. I
> > don't have a good enough feeling to say either way, is it an existing
> > problem, or it is my code which is triggering it.
>
> OK got it.
>
> I think it might be premature to use ref_tracker yet, until we also
> have the netns one.
> (Seeing the netns change path from your report, this might be relevant)
>
> Path series adding netns tracking:
>
> 1fe7f3e6bf91 net: add networking namespace refcount tracker
> 14d34ec0eaad net: add netns refcount tracker to struct sock
> 648e1c8128a1 net: add netns refcount tracker to struct seq_net_private
> fa5ec9628f3e net: sched: add netns refcount tracker to struct tcf_exts
> fa9f11a0a627 netfilter: nfnetlink: add netns refcount tracker to
> struct nfulnl_instance
> 8e3bbdc619d0 l2tp: add netns refcount tracker to l2tp_dfs_seq_data
> 323fd18ce64c ppp: add netns refcount tracker
> d01d6c0df780 netfilter: nf_nat_masquerade: add netns refcount tracker
> to masq_dev_work
> 1b7051234a99 SUNRPC: add netns refcount tracker to struct svc_xprt
> 44721a730a24 SUNRPC: add netns refcount tracker to struct gss_auth
> 648e8fd765b7 SUNRPC: add netns refcount tracker to struct rpc_xprt
> c1d5973f3af0 net: initialize init_net earlier
> 75285dbd40cd net: add netns refcount tracker to struct nsproxy
> 0fbde1282785 vfs: add netns refcount tracker to struct fs_context
> 5a0c6bd0445f audit: add netns refcount tracker to struct audit_net
> 145f70501bfb audit: add netns refcount tracker to struct audit_reply
> b5af80d1c341 audit: add netns refcount tracker to struct audit_netlink_list

And remaining of netdev tracking patches would be

b445498bdb7a net: eql: add net device refcount tracker
2702cdbf4a6d vlan: add net device refcount tracker
1f1ef25dabe9 net: bridge: add net device refcount tracker
e418e7268655 net: watchdog: add net device refcount tracker
b08577d21d47 net: switchdev: add net device refcount tracker
57f014dc36db inet: add net device refcount tracker to struct fib_nh_common
a51ab6e951ea ax25: add net device refcount tracker
cd494c182dcb llc: add net device refcount tracker
9572141ddb29 pktgen add net device refcount tracker
d0b171bbc275 net/smc: add net device tracker to struct smc_pnetentry
c335545a38c0 netlink: add net device refcount tracker to struct ethnl_req_info
4bb89c4fef19 openvswitch: add net device refcount tracker to struct vport
86957122aab2 net: sched: act_mirred: add net device refcount tracker

I think the most important one for the leaks would be the " inet: add
net device refcount tracker to struct fib_nh_common"
Jakub Kicinski Dec. 7, 2021, 12:26 a.m. UTC | #9
On Sat,  4 Dec 2021 20:21:54 -0800 Eric Dumazet wrote:
> Two first patches add a generic infrastructure, that will be used
> to get tracking of refcount increments/decrements.
> 
> The general idea is to be able to precisely pair each decrement with
> a corresponding prior increment. Both share a cookie, basically
> a pointer to private data storing stack traces.
> 
> The third patch adds dev_hold_track() and dev_put_track() helpers
> (CONFIG_NET_DEV_REFCNT_TRACKER)
> 
> Then a series of 20 patches converts some dev_hold()/dev_put()
> pairs to new hepers : dev_hold_track() and dev_put_track().
> 
> Hopefully this will be used by developpers and syzbot to
> root cause bugs that cause netdevice dismantles freezes.
> 
> With CONFIG_PCPU_DEV_REFCNT=n option, we were able to detect
> some class of bugs, but too late (when too many dev_put()
> were happening).

Applied with minor fixes to patches 3 and 20. Thanks!
Andrew Lunn Dec. 7, 2021, 12:27 a.m. UTC | #10
On Mon, Dec 06, 2021 at 04:17:11PM -0800, Eric Dumazet wrote:
> On Mon, Dec 6, 2021 at 4:12 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> >
> > Hard to say. It looks like some sort of race condition. Sometimes when
> > i shut down the GNS3 simulation, i get the issues, sometimes not. I
> > don't have a good enough feeling to say either way, is it an existing
> > problem, or it is my code which is triggering it.
> 
> OK got it.
> 
> I think it might be premature to use ref_tracker yet, until we also
> have the netns one.

There is a lot of netns going on with GNS3. So it does sound too
early.

Could i get access to the full set of patches and try them out?

Thanks
	Andrew
Eric Dumazet Dec. 7, 2021, 12:53 a.m. UTC | #11
On Mon, Dec 6, 2021 at 4:27 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Mon, Dec 06, 2021 at 04:17:11PM -0800, Eric Dumazet wrote:
> > On Mon, Dec 6, 2021 at 4:12 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > >
> > > Hard to say. It looks like some sort of race condition. Sometimes when
> > > i shut down the GNS3 simulation, i get the issues, sometimes not. I
> > > don't have a good enough feeling to say either way, is it an existing
> > > problem, or it is my code which is triggering it.
> >
> > OK got it.
> >
> > I think it might be premature to use ref_tracker yet, until we also
> > have the netns one.
>
> There is a lot of netns going on with GNS3. So it does sound too
> early.
>
> Could i get access to the full set of patches and try them out?
>
> Thanks
>         Andrew

I just sent the netns tracker series.

I will shortly send the remainder of netdev tracking patches.
Andrew Lunn Dec. 7, 2021, 7:52 p.m. UTC | #12
> I just sent the netns tracker series.
> 
> I will shortly send the remainder of netdev tracking patches.

Hi Eric

Thanks for sending the full set of patches.

I applied them on top of net-next. And i can still reproduce my
issue. This is without any changes of mine, just net-next plus your
two new patchsets.

It is not easy to reproduce. I needed to start/stop the GNS3
simulation around 8 times before i triggered it. I don't know how GNS3
does it tairdown. It could be there are active daemons running in the
name spaces as it removes the veth and tap devices? That might explain
why some of the leaks seem to be from TCP connection setup attempts?

Given i now have your full patchset, do you think these traces are
valid? Are they pointing at real leaks?

[ 1423.515246] unregister_netdevice: waiting for eth0 to become free. Usage count = 9
[ 1423.515747] leaked reference.
[ 1423.515755]  dst_alloc+0x7a/0x180
[ 1423.515765]  ip6_dst_alloc+0x27/0x90
[ 1423.515771]  ip6_pol_route+0x257/0x430
[ 1423.515777]  ip6_pol_route_output+0x19/0x20
[ 1423.515781]  fib6_rule_lookup+0x18b/0x270
[ 1423.515789]  ip6_route_output_flags_noref+0xaa/0x110
[ 1423.515793]  ip6_route_output_flags+0x32/0xa0
[ 1423.515797]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
[ 1423.515803]  ip6_dst_lookup_flow+0x43/0xa0
[ 1423.515808]  inet6_csk_route_socket+0x166/0x200
[ 1423.515815]  inet6_csk_xmit+0x56/0x130
[ 1423.515818]  __tcp_transmit_skb+0x53b/0xc30
[ 1423.515825]  __tcp_send_ack.part.0+0xc6/0x1a0
[ 1423.515829]  tcp_send_ack+0x1c/0x20
[ 1423.515834]  __tcp_ack_snd_check+0x42/0x200
[ 1423.515838]  tcp_rcv_established+0x27a/0x6f0
[ 1423.515842] leaked reference.
[ 1423.515843]  dst_alloc+0x7a/0x180
[ 1423.515848]  ip6_dst_alloc+0x27/0x90
[ 1423.515851]  ip6_pol_route+0x257/0x430
[ 1423.515855]  ip6_pol_route_output+0x19/0x20
[ 1423.515860]  fib6_rule_lookup+0x18b/0x270
[ 1423.515865]  ip6_route_output_flags_noref+0xaa/0x110
[ 1423.515869]  ip6_route_output_flags+0x32/0xa0
[ 1423.515872]  seg6_output_core+0x28d/0x320
[ 1423.515878]  seg6_output+0x33/0x120
[ 1423.515884]  lwtunnel_output+0x72/0xc0
[ 1423.515890]  ip6_local_out+0x61/0x70
[ 1423.515894]  ip6_send_skb+0x23/0x70
[ 1423.515898]  udp_v6_send_skb+0x207/0x460
[ 1423.515905]  udpv6_sendmsg+0xb13/0xdb0
[ 1423.515908]  inet6_sendmsg+0x65/0x70
[ 1423.515916]  sock_sendmsg+0x48/0x70
[ 1423.515920] leaked reference.
[ 1423.515922]  dst_alloc+0x7a/0x180
[ 1423.515926]  ip6_dst_alloc+0x27/0x90
[ 1423.515929]  ip6_pol_route+0x257/0x430
[ 1423.515933]  ip6_pol_route_output+0x19/0x20
[ 1423.515938]  fib6_rule_lookup+0x18b/0x270
[ 1423.515943]  ip6_route_output_flags_noref+0xaa/0x110
[ 1423.515946]  ip6_route_output_flags+0x32/0xa0
[ 1423.515949]  seg6_output_core+0x28d/0x320
[ 1423.515955]  seg6_output+0x33/0x120
[ 1423.515961]  lwtunnel_output+0x72/0xc0
[ 1423.515965]  ip6_local_out+0x61/0x70
[ 1423.515968]  ip6_send_skb+0x23/0x70
[ 1423.515973]  udp_v6_send_skb+0x207/0x460
[ 1423.515979]  udpv6_sendmsg+0xb13/0xdb0
[ 1423.515982]  inet6_sendmsg+0x65/0x70
[ 1423.515985]  sock_sendmsg+0x48/0x70
[ 1423.515988] leaked reference.
[ 1423.515990]  dst_alloc+0x7a/0x180
[ 1423.515993]  ip6_dst_alloc+0x27/0x90
[ 1423.515997]  ip6_pol_route+0x257/0x430
[ 1423.516001]  ip6_pol_route_output+0x19/0x20
[ 1423.516005]  fib6_rule_lookup+0x18b/0x270
[ 1423.516010]  ip6_route_output_flags_noref+0xaa/0x110
[ 1423.516014]  ip6_route_output_flags+0x32/0xa0
[ 1423.516017]  seg6_output_core+0x28d/0x320
[ 1423.516022]  seg6_output+0x33/0x120
[ 1423.516028]  lwtunnel_output+0x72/0xc0
[ 1423.516032]  ip6_local_out+0x61/0x70
[ 1423.516036]  ip6_send_skb+0x23/0x70
[ 1423.516040]  udp_v6_send_skb+0x207/0x460
[ 1423.516046]  udpv6_sendmsg+0xb13/0xdb0
[ 1423.516049]  inet6_sendmsg+0x65/0x70
[ 1423.516052]  sock_sendmsg+0x48/0x70
[ 1423.516055] leaked reference.
[ 1423.516057]  dst_alloc+0x7a/0x180
[ 1423.516061]  ip6_dst_alloc+0x27/0x90
[ 1423.516064]  ip6_pol_route+0x257/0x430
[ 1423.516068]  ip6_pol_route_output+0x19/0x20
[ 1423.516072]  fib6_rule_lookup+0x18b/0x270
[ 1423.516078]  ip6_route_output_flags_noref+0xaa/0x110
[ 1423.516081]  ip6_route_output_flags+0x32/0xa0
[ 1423.516084]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
[ 1423.516088]  ip6_dst_lookup_flow+0x43/0xa0
[ 1423.516092]  inet6_csk_route_socket+0x166/0x200
[ 1423.516099]  inet6_csk_xmit+0x56/0x130
[ 1423.516102]  __tcp_transmit_skb+0x53b/0xc30
[ 1423.516106]  tcp_write_xmit+0x3d4/0x13f0
[ 1423.516110]  __tcp_push_pending_frames+0x37/0x100
[ 1423.516114]  tcp_push+0xd2/0x100
[ 1423.516122]  tcp_sendmsg_locked+0x2a7/0xdb0
[ 1423.516128] leaked reference.
[ 1423.516130]  dst_alloc+0x7a/0x180
[ 1423.516134]  ip6_dst_alloc+0x27/0x90
[ 1423.516137]  ip6_pol_route+0x257/0x430
[ 1423.516141]  ip6_pol_route_output+0x19/0x20
[ 1423.516146]  fib6_rule_lookup+0x18b/0x270
[ 1423.516151]  ip6_route_output_flags_noref+0xaa/0x110
[ 1423.516154]  ip6_route_output_flags+0x32/0xa0
[ 1423.516157]  ip6_dst_lookup_tail.constprop.0+0xde/0x240
[ 1423.516161]  ip6_dst_lookup_flow+0x43/0xa0
[ 1423.516166]  tcp_v6_connect+0x2a7/0x670
[ 1423.516171]  __inet_stream_connect+0xd1/0x3b0
[ 1423.516175]  inet_stream_connect+0x3b/0x60
[ 1423.516178]  __sys_connect_file+0x5f/0x70
[ 1423.516182]  __sys_connect+0xa2/0xd0
[ 1423.516186]  __x64_sys_connect+0x18/0x20
[ 1423.516190]  do_syscall_64+0x3b/0xc0
[ 1423.516197] leaked reference.
[ 1423.516198]  fib6_nh_init+0x30d/0x9c0
[ 1423.516203]  rtm_new_nexthop+0x68a/0x13a0
[ 1423.516208]  rtnetlink_rcv_msg+0x14f/0x380
[ 1423.516216]  netlink_rcv_skb+0x55/0x100
[ 1423.516222]  rtnetlink_rcv+0x15/0x20
[ 1423.516228]  netlink_unicast+0x230/0x340
[ 1423.516232]  netlink_sendmsg+0x252/0x4b0
[ 1423.516236]  sock_sendmsg+0x65/0x70
[ 1423.516240]  ____sys_sendmsg+0x24e/0x290
[ 1423.516243]  ___sys_sendmsg+0x81/0xc0
[ 1423.516247]  __sys_sendmsg+0x62/0xb0
[ 1423.516252]  __x64_sys_sendmsg+0x1d/0x20
[ 1423.516256]  do_syscall_64+0x3b/0xc0
[ 1423.516260]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1423.516266] leaked reference.
[ 1423.516267]  ipv6_add_dev+0x13e/0x4f0
[ 1423.516273]  addrconf_notify+0x2ca/0x950
[ 1423.516280]  raw_notifier_call_chain+0x49/0x60
[ 1423.516288]  call_netdevice_notifiers_info+0x50/0x90
[ 1423.516293]  __dev_change_net_namespace+0x30d/0x6c0
[ 1423.516299]  do_setlink+0xdc/0x10b0
[ 1423.516306]  __rtnl_newlink+0x608/0xa10
[ 1423.516312]  rtnl_newlink+0x49/0x70
[ 1423.516318]  rtnetlink_rcv_msg+0x14f/0x380
[ 1423.516323]  netlink_rcv_skb+0x55/0x100
[ 1423.516328]  rtnetlink_rcv+0x15/0x20
[ 1423.516333]  netlink_unicast+0x230/0x340
[ 1423.516337]  netlink_sendmsg+0x252/0x4b0
[ 1423.516345]  sock_sendmsg+0x65/0x70
[ 1423.516348]  ____sys_sendmsg+0x24e/0x290
[ 1423.516352]  ___sys_sendmsg+0x81/0xc0
Eric Dumazet Dec. 7, 2021, 8 p.m. UTC | #13
On Tue, Dec 7, 2021 at 11:52 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > I just sent the netns tracker series.
> >
> > I will shortly send the remainder of netdev tracking patches.
>
> Hi Eric
>
> Thanks for sending the full set of patches.
>
> I applied them on top of net-next. And i can still reproduce my
> issue. This is without any changes of mine, just net-next plus your
> two new patchsets.
>
> It is not easy to reproduce. I needed to start/stop the GNS3
> simulation around 8 times before i triggered it. I don't know how GNS3
> does it tairdown. It could be there are active daemons running in the
> name spaces as it removes the veth and tap devices? That might explain
> why some of the leaks seem to be from TCP connection setup attempts?
>
> Given i now have your full patchset, do you think these traces are
> valid? Are they pointing at real leaks?
>
> [ 1423.515246] unregister_netdevice: waiting for eth0 to become free. Usage count = 9
> [ 1423.515747] leaked reference.
> [ 1423.515755]  dst_alloc+0x7a/0x180
> [ 1423.515765]  ip6_dst_alloc+0x27/0x90
> [ 1423.515771]  ip6_pol_route+0x257/0x430
> [ 1423.515777]  ip6_pol_route_output+0x19/0x20
> [ 1423.515781]  fib6_rule_lookup+0x18b/0x270
> [ 1423.515789]  ip6_route_output_flags_noref+0xaa/0x110
> [ 1423.515793]  ip6_route_output_flags+0x32/0xa0
> [ 1423.515797]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
> [ 1423.515803]  ip6_dst_lookup_flow+0x43/0xa0
> [ 1423.515808]  inet6_csk_route_socket+0x166/0x200
> [ 1423.515815]  inet6_csk_xmit+0x56/0x130
> [ 1423.515818]  __tcp_transmit_skb+0x53b/0xc30
> [ 1423.515825]  __tcp_send_ack.part.0+0xc6/0x1a0
> [ 1423.515829]  tcp_send_ack+0x1c/0x20
> [ 1423.515834]  __tcp_ack_snd_check+0x42/0x200
> [ 1423.515838]  tcp_rcv_established+0x27a/0x6f0
> [ 1423.515842] leaked reference.
> [ 1423.515843]  dst_alloc+0x7a/0x180
> [ 1423.515848]  ip6_dst_alloc+0x27/0x90
> [ 1423.515851]  ip6_pol_route+0x257/0x430
> [ 1423.515855]  ip6_pol_route_output+0x19/0x20
> [ 1423.515860]  fib6_rule_lookup+0x18b/0x270
> [ 1423.515865]  ip6_route_output_flags_noref+0xaa/0x110
> [ 1423.515869]  ip6_route_output_flags+0x32/0xa0
> [ 1423.515872]  seg6_output_core+0x28d/0x320
> [ 1423.515878]  seg6_output+0x33/0x120
> [ 1423.515884]  lwtunnel_output+0x72/0xc0
> [ 1423.515890]  ip6_local_out+0x61/0x70
> [ 1423.515894]  ip6_send_skb+0x23/0x70
> [ 1423.515898]  udp_v6_send_skb+0x207/0x460
> [ 1423.515905]  udpv6_sendmsg+0xb13/0xdb0
> [ 1423.515908]  inet6_sendmsg+0x65/0x70
> [ 1423.515916]  sock_sendmsg+0x48/0x70
> [ 1423.515920] leaked reference.
> [ 1423.515922]  dst_alloc+0x7a/0x180
> [ 1423.515926]  ip6_dst_alloc+0x27/0x90
> [ 1423.515929]  ip6_pol_route+0x257/0x430
> [ 1423.515933]  ip6_pol_route_output+0x19/0x20
> [ 1423.515938]  fib6_rule_lookup+0x18b/0x270
> [ 1423.515943]  ip6_route_output_flags_noref+0xaa/0x110
> [ 1423.515946]  ip6_route_output_flags+0x32/0xa0
> [ 1423.515949]  seg6_output_core+0x28d/0x320
> [ 1423.515955]  seg6_output+0x33/0x120
> [ 1423.515961]  lwtunnel_output+0x72/0xc0
> [ 1423.515965]  ip6_local_out+0x61/0x70
> [ 1423.515968]  ip6_send_skb+0x23/0x70
> [ 1423.515973]  udp_v6_send_skb+0x207/0x460
> [ 1423.515979]  udpv6_sendmsg+0xb13/0xdb0
> [ 1423.515982]  inet6_sendmsg+0x65/0x70
> [ 1423.515985]  sock_sendmsg+0x48/0x70
> [ 1423.515988] leaked reference.
> [ 1423.515990]  dst_alloc+0x7a/0x180
> [ 1423.515993]  ip6_dst_alloc+0x27/0x90
> [ 1423.515997]  ip6_pol_route+0x257/0x430
> [ 1423.516001]  ip6_pol_route_output+0x19/0x20
> [ 1423.516005]  fib6_rule_lookup+0x18b/0x270
> [ 1423.516010]  ip6_route_output_flags_noref+0xaa/0x110
> [ 1423.516014]  ip6_route_output_flags+0x32/0xa0
> [ 1423.516017]  seg6_output_core+0x28d/0x320
> [ 1423.516022]  seg6_output+0x33/0x120
> [ 1423.516028]  lwtunnel_output+0x72/0xc0
> [ 1423.516032]  ip6_local_out+0x61/0x70
> [ 1423.516036]  ip6_send_skb+0x23/0x70
> [ 1423.516040]  udp_v6_send_skb+0x207/0x460
> [ 1423.516046]  udpv6_sendmsg+0xb13/0xdb0
> [ 1423.516049]  inet6_sendmsg+0x65/0x70
> [ 1423.516052]  sock_sendmsg+0x48/0x70
> [ 1423.516055] leaked reference.
> [ 1423.516057]  dst_alloc+0x7a/0x180
> [ 1423.516061]  ip6_dst_alloc+0x27/0x90
> [ 1423.516064]  ip6_pol_route+0x257/0x430
> [ 1423.516068]  ip6_pol_route_output+0x19/0x20
> [ 1423.516072]  fib6_rule_lookup+0x18b/0x270
> [ 1423.516078]  ip6_route_output_flags_noref+0xaa/0x110
> [ 1423.516081]  ip6_route_output_flags+0x32/0xa0
> [ 1423.516084]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
> [ 1423.516088]  ip6_dst_lookup_flow+0x43/0xa0
> [ 1423.516092]  inet6_csk_route_socket+0x166/0x200
> [ 1423.516099]  inet6_csk_xmit+0x56/0x130
> [ 1423.516102]  __tcp_transmit_skb+0x53b/0xc30
> [ 1423.516106]  tcp_write_xmit+0x3d4/0x13f0
> [ 1423.516110]  __tcp_push_pending_frames+0x37/0x100
> [ 1423.516114]  tcp_push+0xd2/0x100
> [ 1423.516122]  tcp_sendmsg_locked+0x2a7/0xdb0
> [ 1423.516128] leaked reference.
> [ 1423.516130]  dst_alloc+0x7a/0x180
> [ 1423.516134]  ip6_dst_alloc+0x27/0x90
> [ 1423.516137]  ip6_pol_route+0x257/0x430
> [ 1423.516141]  ip6_pol_route_output+0x19/0x20
> [ 1423.516146]  fib6_rule_lookup+0x18b/0x270
> [ 1423.516151]  ip6_route_output_flags_noref+0xaa/0x110
> [ 1423.516154]  ip6_route_output_flags+0x32/0xa0
> [ 1423.516157]  ip6_dst_lookup_tail.constprop.0+0xde/0x240
> [ 1423.516161]  ip6_dst_lookup_flow+0x43/0xa0
> [ 1423.516166]  tcp_v6_connect+0x2a7/0x670
> [ 1423.516171]  __inet_stream_connect+0xd1/0x3b0
> [ 1423.516175]  inet_stream_connect+0x3b/0x60
> [ 1423.516178]  __sys_connect_file+0x5f/0x70
> [ 1423.516182]  __sys_connect+0xa2/0xd0
> [ 1423.516186]  __x64_sys_connect+0x18/0x20
> [ 1423.516190]  do_syscall_64+0x3b/0xc0
> [ 1423.516197] leaked reference.
> [ 1423.516198]  fib6_nh_init+0x30d/0x9c0
> [ 1423.516203]  rtm_new_nexthop+0x68a/0x13a0
> [ 1423.516208]  rtnetlink_rcv_msg+0x14f/0x380
> [ 1423.516216]  netlink_rcv_skb+0x55/0x100
> [ 1423.516222]  rtnetlink_rcv+0x15/0x20
> [ 1423.516228]  netlink_unicast+0x230/0x340
> [ 1423.516232]  netlink_sendmsg+0x252/0x4b0
> [ 1423.516236]  sock_sendmsg+0x65/0x70
> [ 1423.516240]  ____sys_sendmsg+0x24e/0x290
> [ 1423.516243]  ___sys_sendmsg+0x81/0xc0
> [ 1423.516247]  __sys_sendmsg+0x62/0xb0
> [ 1423.516252]  __x64_sys_sendmsg+0x1d/0x20
> [ 1423.516256]  do_syscall_64+0x3b/0xc0
> [ 1423.516260]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 1423.516266] leaked reference.
> [ 1423.516267]  ipv6_add_dev+0x13e/0x4f0
> [ 1423.516273]  addrconf_notify+0x2ca/0x950
> [ 1423.516280]  raw_notifier_call_chain+0x49/0x60
> [ 1423.516288]  call_netdevice_notifiers_info+0x50/0x90
> [ 1423.516293]  __dev_change_net_namespace+0x30d/0x6c0
> [ 1423.516299]  do_setlink+0xdc/0x10b0
> [ 1423.516306]  __rtnl_newlink+0x608/0xa10
> [ 1423.516312]  rtnl_newlink+0x49/0x70
> [ 1423.516318]  rtnetlink_rcv_msg+0x14f/0x380
> [ 1423.516323]  netlink_rcv_skb+0x55/0x100
> [ 1423.516328]  rtnetlink_rcv+0x15/0x20
> [ 1423.516333]  netlink_unicast+0x230/0x340
> [ 1423.516337]  netlink_sendmsg+0x252/0x4b0
> [ 1423.516345]  sock_sendmsg+0x65/0x70
> [ 1423.516348]  ____sys_sendmsg+0x24e/0x290
> [ 1423.516352]  ___sys_sendmsg+0x81/0xc0

Thanks for the report.

This all involves IPv6, and this might point to something I hinted
about last week.

Can you try :

Note that I am about to travel, and will be unable to respond to
emails for about 20 hours.
Thanks !

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8d834f901b48..043d270b65d6 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -163,9 +163,6 @@ static void rt6_uncached_list_flush_dev(struct net
*net, struct net_device *dev)
        struct net_device *loopback_dev = net->loopback_dev;
        int cpu;

-       if (dev == loopback_dev)
-               return;
-
        for_each_possible_cpu(cpu) {
                struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu);
                struct rt6_info *rt;
@@ -175,12 +172,13 @@ static void rt6_uncached_list_flush_dev(struct
net *net, struct net_device *dev)
                        struct inet6_dev *rt_idev = rt->rt6i_idev;
                        struct net_device *rt_dev = rt->dst.dev;

-                       if (rt_idev->dev == dev) {
+                       if (rt_idev->dev == dev && dev != loopback_dev) {
                                rt->rt6i_idev = in6_dev_get(loopback_dev);
                                in6_dev_put(rt_idev);
                        }

                        if (rt_dev == dev) {
+                               WARN_ON_ONCE(dev == loopback_dev);
                                rt->dst.dev = blackhole_netdev;
                                dev_replace_track(rt_dev, blackhole_netdev,
                                                  &rt->dst.dev_tracker,
Andrew Lunn Dec. 8, 2021, 5:29 p.m. UTC | #14
> This all involves IPv6, and this might point to something I hinted
> about last week.
> 
> Can you try :
> 
> Note that I am about to travel, and will be unable to respond to
> emails for about 20 hours.

Hi Eric

Hope you had a good trip.

I tried your patch. No difference, it still reports leaks, and the
WARN_ON_ONCE did not fire.

I extended the tracker a little, adding timestamps into the struct
ref_tracker and printing the age of the leaked reference as part of
the dump. The patch is at the end.

There is possibly an interesting pattern here:

[  454.787491] unregister_netdevice: waiting for eth0 to become free. Usage count = 9
[  454.787696] leaked reference. Age    15.138884
[  454.787699]  dst_alloc+0x7a/0x180
[  454.787706]  ip6_dst_alloc+0x27/0x90
[  454.787709]  ip6_pol_route+0x257/0x430
[  454.787711]  ip6_pol_route_output+0x19/0x20
[  454.787713]  fib6_rule_lookup+0x18b/0x270
[  454.787717]  ip6_route_output_flags_noref+0xaa/0x110
[  454.787719]  ip6_route_output_flags+0x32/0xa0
[  454.787720]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
[  454.787724]  ip6_dst_lookup_flow+0x43/0xa0
[  454.787725]  inet6_csk_route_socket+0x166/0x200
[  454.787729]  inet6_csk_xmit+0x56/0x130
[  454.787731]  __tcp_transmit_skb+0x53b/0xc30
[  454.787734]  __tcp_send_ack.part.0+0xc6/0x1a0
[  454.787735]  tcp_send_ack+0x1c/0x20
[  454.787737]  __tcp_ack_snd_check+0x42/0x200
[  454.787739]  tcp_rcv_established+0x27a/0x6f0
[  454.787740] leaked reference. Age    30.240300
[  454.787741]  dst_alloc+0x7a/0x180
[  454.787743]  ip6_dst_alloc+0x27/0x90
[  454.787744]  ip6_pol_route+0x257/0x430
[  454.787746]  ip6_pol_route_output+0x19/0x20
[  454.787748]  fib6_rule_lookup+0x18b/0x270
[  454.787750]  ip6_route_output_flags_noref+0xaa/0x110
[  454.787751]  ip6_route_output_flags+0x32/0xa0
[  454.787752]  seg6_output_core+0x28d/0x320
[  454.787755]  seg6_output+0x33/0x120
[  454.787757]  lwtunnel_output+0x72/0xc0
[  454.787760]  ip6_local_out+0x61/0x70
[  454.787762]  ip6_send_skb+0x23/0x70
[  454.787764]  udp_v6_send_skb+0x207/0x460
[  454.787767]  udpv6_sendmsg+0xb13/0xdb0
[  454.787768]  inet6_sendmsg+0x65/0x70
[  454.787769]  sock_sendmsg+0x48/0x70
[  454.787772] leaked reference. Age    31.594229
[  454.787773]  dst_alloc+0x7a/0x180
[  454.787775]  ip6_dst_alloc+0x27/0x90
[  454.787776]  ip6_pol_route+0x257/0x430
[  454.787778]  ip6_pol_route_output+0x19/0x20
[  454.787780]  fib6_rule_lookup+0x18b/0x270
[  454.787782]  ip6_route_output_flags_noref+0xaa/0x110
[  454.787783]  ip6_route_output_flags+0x32/0xa0
[  454.787784]  seg6_output_core+0x28d/0x320
[  454.787786]  seg6_output+0x33/0x120
[  454.787789]  lwtunnel_output+0x72/0xc0
[  454.787790]  ip6_local_out+0x61/0x70
[  454.787792]  ip6_send_skb+0x23/0x70
[  454.787793]  udp_v6_send_skb+0x207/0x460
[  454.787796]  udpv6_sendmsg+0xb13/0xdb0
[  454.787797]  inet6_sendmsg+0x65/0x70
[  454.787798]  sock_sendmsg+0x48/0x70
[  454.787800] leaked reference. Age    35.147534
[  454.787800]  dst_alloc+0x7a/0x180
[  454.787802]  ip6_dst_alloc+0x27/0x90
[  454.787803]  ip6_pol_route+0x257/0x430
[  454.787805]  ip6_pol_route_output+0x19/0x20
[  454.787806]  fib6_rule_lookup+0x18b/0x270
[  454.787808]  ip6_route_output_flags_noref+0xaa/0x110
[  454.787810]  ip6_route_output_flags+0x32/0xa0
[  454.787811]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
[  454.787813]  ip6_dst_lookup_flow+0x43/0xa0
[  454.787814]  inet6_csk_route_socket+0x166/0x200
[  454.787817]  inet6_csk_xmit+0x56/0x130
[  454.787818]  __tcp_transmit_skb+0x53b/0xc30
[  454.787819]  __tcp_send_ack.part.0+0xc6/0x1a0
[  454.787821]  tcp_send_ack+0x1c/0x20
[  454.787823]  __tcp_ack_snd_check+0x42/0x200
[  454.787824]  tcp_rcv_established+0x27a/0x6f0
[  454.787826] leaked reference. Age    37.269626
[  454.787826]  dst_alloc+0x7a/0x180
[  454.787828]  ip6_dst_alloc+0x27/0x90
[  454.787829]  ip6_pol_route+0x257/0x430
[  454.787831]  ip6_pol_route_output+0x19/0x20
[  454.787833]  fib6_rule_lookup+0x18b/0x270
[  454.787835]  ip6_route_output_flags_noref+0xaa/0x110
[  454.787836]  ip6_route_output_flags+0x32/0xa0
[  454.787837]  ip6_dst_lookup_tail.constprop.0+0xde/0x240
[  454.787839]  ip6_dst_lookup_flow+0x43/0xa0
[  454.787841]  tcp_v6_connect+0x2a7/0x670
[  454.787843]  __inet_stream_connect+0xd1/0x3b0
[  454.787845]  inet_stream_connect+0x3b/0x60
[  454.787846]  __sys_connect_file+0x5f/0x70
[  454.787848]  __sys_connect+0xa2/0xd0
[  454.787849]  __x64_sys_connect+0x18/0x20
[  454.787850]  do_syscall_64+0x3b/0xc0

Ages of 15 to 37 seconds. So these leaks look related to applications
using tcpv6.

[  454.787854] leaked reference. Age   385.273596
[  454.787855]  dst_alloc+0x7a/0x180
[  454.787857]  ip6_dst_alloc+0x27/0x90
[  454.787858]  ip6_pol_route+0x257/0x430
[  454.787859]  ip6_pol_route_output+0x19/0x20
[  454.787861]  fib6_rule_lookup+0x18b/0x270
[  454.787863]  ip6_route_output_flags_noref+0xaa/0x110
[  454.787864]  ip6_route_output_flags+0x32/0xa0
[  454.787866]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
[  454.787867]  ip6_dst_lookup_flow+0x43/0xa0
[  454.787869]  inet6_csk_route_req+0x11b/0x150
[  454.787871]  tcp_v6_route_req+0xa8/0x140
[  454.787873]  tcp_conn_request+0x349/0xcd0
[  454.787875]  tcp_v6_conn_request+0x64/0xd0
[  454.787877]  tcp_rcv_state_process+0x25b/0x1070
[  454.787878]  tcp_v6_do_rcv+0x1c4/0x4a0
[  454.787881]  tcp_v6_rcv+0xea3/0xee0
[  454.787883] leaked reference. Age   389.378759
[  454.787884]  fib6_nh_init+0x30d/0x9c0
[  454.787885]  rtm_new_nexthop+0x68a/0x13a0
[  454.787888]  rtnetlink_rcv_msg+0x14f/0x380
[  454.787891]  netlink_rcv_skb+0x55/0x100
[  454.787893]  rtnetlink_rcv+0x15/0x20
[  454.787895]  netlink_unicast+0x230/0x340
[  454.787897]  netlink_sendmsg+0x252/0x4b0
[  454.787899]  sock_sendmsg+0x65/0x70
[  454.787900]  ____sys_sendmsg+0x24e/0x290
[  454.787902]  ___sys_sendmsg+0x81/0xc0
[  454.787903]  __sys_sendmsg+0x62/0xb0
[  454.787905]  __x64_sys_sendmsg+0x1d/0x20
[  454.787907]  do_syscall_64+0x3b/0xc0
[  454.787908]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  454.787911] leaked reference. Age   402.739110
[  454.787911]  ipv6_add_dev+0x13e/0x4f0
[  454.787914]  addrconf_notify+0x2ca/0x950
[  454.787917]  raw_notifier_call_chain+0x49/0x60
[  454.787920]  call_netdevice_notifiers_info+0x50/0x90
[  454.787923]  __dev_change_net_namespace+0x30d/0x6c0
[  454.787926]  do_setlink+0xdc/0x10b0
[  454.787928]  __rtnl_newlink+0x608/0xa10
[  454.787931]  rtnl_newlink+0x49/0x70
[  454.787933]  rtnetlink_rcv_msg+0x14f/0x380
[  454.787935]  netlink_rcv_skb+0x55/0x100
[  454.787937]  rtnetlink_rcv+0x15/0x20
[  454.787939]  netlink_unicast+0x230/0x340
[  454.787941]  netlink_sendmsg+0x252/0x4b0
[  454.787942]  sock_sendmsg+0x65/0x70
[  454.787944]  ____sys_sendmsg+0x24e/0x290
[  454.787945]  ___sys_sendmsg+0x81/0xc0

Here the age is much longer, more like the age of the container and
early setup of the container. I took a quick look at this last trace,
and it is the netdev being moved from one namespace to another.

So it could be there are two leaks here?

   Andrew

rom d3a3149c4e8f5020942ac00fe3bce6a1303f10b7 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 8 Dec 2021 10:25:05 -0600
Subject: [PATCH] lib: ref_tracker: Add timestamp for reference

It can be useful to know if the leaked reference is old, or recent.
Is the bug in interface create and release, or more dynamic like a
connection.

Add a timestamp to each reference record, and print the age when
dumping the records.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 lib/ref_tracker.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/ref_tracker.c b/lib/ref_tracker.c
index 0ae2e66dcf0f..82a1c3681969 100644
--- a/lib/ref_tracker.c
+++ b/lib/ref_tracker.c
@@ -12,6 +12,7 @@ struct ref_tracker {
        bool                    dead;
        depot_stack_handle_t    alloc_stack_handle;
        depot_stack_handle_t    free_stack_handle;
+       ktime_t                 alloc_ktime;
 };
 
 void ref_tracker_dir_exit(struct ref_tracker_dir *dir)
@@ -44,13 +45,17 @@ void ref_tracker_dir_print(struct ref_tracker_dir *dir,
                           unsigned int display_limit)
 {
        struct ref_tracker *tracker;
+       ktime_t now = ktime_get();
        unsigned long flags;
        unsigned int i = 0;
-
        spin_lock_irqsave(&dir->lock, flags);
+
        list_for_each_entry(tracker, &dir->list, head) {
                if (i < display_limit) {
-                       pr_err("leaked reference.\n");
+                       ktime_t age = ktime_sub(now, tracker->alloc_ktime);
+                       unsigned long rem_nsec = do_div(age, 1000000000);
+                       pr_err("leaked reference. Age %5lu.%06lu\n",
+                              (unsigned long)age, rem_nsec / 1000);
                        if (tracker->alloc_stack_handle)
                                stack_depot_print(tracker->alloc_stack_handle);
                        i++;
@@ -80,6 +85,7 @@ int ref_tracker_alloc(struct ref_tracker_dir *dir,
        nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 1);
        nr_entries = filter_irq_stacks(entries, nr_entries);
        tracker->alloc_stack_handle = stack_depot_save(entries, nr_entries, gfp);
+       tracker->alloc_ktime = ktime_get();
 
        spin_lock_irqsave(&dir->lock, flags);
        list_add(&tracker->head, &dir->list);
Eric Dumazet Dec. 8, 2021, 6:21 p.m. UTC | #15
On Wed, Dec 8, 2021 at 9:29 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > This all involves IPv6, and this might point to something I hinted
> > about last week.
> >
> > Can you try :
> >
> > Note that I am about to travel, and will be unable to respond to
> > emails for about 20 hours.
>
> Hi Eric
>
> Hope you had a good trip.
>
> I tried your patch. No difference, it still reports leaks, and the
> WARN_ON_ONCE did not fire.
>
> I extended the tracker a little, adding timestamps into the struct
> ref_tracker and printing the age of the leaked reference as part of
> the dump. The patch is at the end.
>
> There is possibly an interesting pattern here:
>
> [  454.787491] unregister_netdevice: waiting for eth0 to become free. Usage count = 9
> [  454.787696] leaked reference. Age    15.138884
> [  454.787699]  dst_alloc+0x7a/0x180
> [  454.787706]  ip6_dst_alloc+0x27/0x90
> [  454.787709]  ip6_pol_route+0x257/0x430
> [  454.787711]  ip6_pol_route_output+0x19/0x20
> [  454.787713]  fib6_rule_lookup+0x18b/0x270
> [  454.787717]  ip6_route_output_flags_noref+0xaa/0x110
> [  454.787719]  ip6_route_output_flags+0x32/0xa0
> [  454.787720]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
> [  454.787724]  ip6_dst_lookup_flow+0x43/0xa0
> [  454.787725]  inet6_csk_route_socket+0x166/0x200
> [  454.787729]  inet6_csk_xmit+0x56/0x130
> [  454.787731]  __tcp_transmit_skb+0x53b/0xc30
> [  454.787734]  __tcp_send_ack.part.0+0xc6/0x1a0
> [  454.787735]  tcp_send_ack+0x1c/0x20
> [  454.787737]  __tcp_ack_snd_check+0x42/0x200
> [  454.787739]  tcp_rcv_established+0x27a/0x6f0
> [  454.787740] leaked reference. Age    30.240300
> [  454.787741]  dst_alloc+0x7a/0x180
> [  454.787743]  ip6_dst_alloc+0x27/0x90
> [  454.787744]  ip6_pol_route+0x257/0x430
> [  454.787746]  ip6_pol_route_output+0x19/0x20
> [  454.787748]  fib6_rule_lookup+0x18b/0x270
> [  454.787750]  ip6_route_output_flags_noref+0xaa/0x110
> [  454.787751]  ip6_route_output_flags+0x32/0xa0
> [  454.787752]  seg6_output_core+0x28d/0x320
> [  454.787755]  seg6_output+0x33/0x120
> [  454.787757]  lwtunnel_output+0x72/0xc0
> [  454.787760]  ip6_local_out+0x61/0x70
> [  454.787762]  ip6_send_skb+0x23/0x70
> [  454.787764]  udp_v6_send_skb+0x207/0x460
> [  454.787767]  udpv6_sendmsg+0xb13/0xdb0
> [  454.787768]  inet6_sendmsg+0x65/0x70
> [  454.787769]  sock_sendmsg+0x48/0x70
> [  454.787772] leaked reference. Age    31.594229
> [  454.787773]  dst_alloc+0x7a/0x180
> [  454.787775]  ip6_dst_alloc+0x27/0x90
> [  454.787776]  ip6_pol_route+0x257/0x430
> [  454.787778]  ip6_pol_route_output+0x19/0x20
> [  454.787780]  fib6_rule_lookup+0x18b/0x270
> [  454.787782]  ip6_route_output_flags_noref+0xaa/0x110
> [  454.787783]  ip6_route_output_flags+0x32/0xa0
> [  454.787784]  seg6_output_core+0x28d/0x320
> [  454.787786]  seg6_output+0x33/0x120
> [  454.787789]  lwtunnel_output+0x72/0xc0
> [  454.787790]  ip6_local_out+0x61/0x70
> [  454.787792]  ip6_send_skb+0x23/0x70
> [  454.787793]  udp_v6_send_skb+0x207/0x460
> [  454.787796]  udpv6_sendmsg+0xb13/0xdb0
> [  454.787797]  inet6_sendmsg+0x65/0x70
> [  454.787798]  sock_sendmsg+0x48/0x70
> [  454.787800] leaked reference. Age    35.147534
> [  454.787800]  dst_alloc+0x7a/0x180
> [  454.787802]  ip6_dst_alloc+0x27/0x90
> [  454.787803]  ip6_pol_route+0x257/0x430
> [  454.787805]  ip6_pol_route_output+0x19/0x20
> [  454.787806]  fib6_rule_lookup+0x18b/0x270
> [  454.787808]  ip6_route_output_flags_noref+0xaa/0x110
> [  454.787810]  ip6_route_output_flags+0x32/0xa0
> [  454.787811]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
> [  454.787813]  ip6_dst_lookup_flow+0x43/0xa0
> [  454.787814]  inet6_csk_route_socket+0x166/0x200
> [  454.787817]  inet6_csk_xmit+0x56/0x130
> [  454.787818]  __tcp_transmit_skb+0x53b/0xc30
> [  454.787819]  __tcp_send_ack.part.0+0xc6/0x1a0
> [  454.787821]  tcp_send_ack+0x1c/0x20
> [  454.787823]  __tcp_ack_snd_check+0x42/0x200
> [  454.787824]  tcp_rcv_established+0x27a/0x6f0
> [  454.787826] leaked reference. Age    37.269626
> [  454.787826]  dst_alloc+0x7a/0x180
> [  454.787828]  ip6_dst_alloc+0x27/0x90
> [  454.787829]  ip6_pol_route+0x257/0x430
> [  454.787831]  ip6_pol_route_output+0x19/0x20
> [  454.787833]  fib6_rule_lookup+0x18b/0x270
> [  454.787835]  ip6_route_output_flags_noref+0xaa/0x110
> [  454.787836]  ip6_route_output_flags+0x32/0xa0
> [  454.787837]  ip6_dst_lookup_tail.constprop.0+0xde/0x240
> [  454.787839]  ip6_dst_lookup_flow+0x43/0xa0
> [  454.787841]  tcp_v6_connect+0x2a7/0x670
> [  454.787843]  __inet_stream_connect+0xd1/0x3b0
> [  454.787845]  inet_stream_connect+0x3b/0x60
> [  454.787846]  __sys_connect_file+0x5f/0x70
> [  454.787848]  __sys_connect+0xa2/0xd0
> [  454.787849]  __x64_sys_connect+0x18/0x20
> [  454.787850]  do_syscall_64+0x3b/0xc0
>
> Ages of 15 to 37 seconds. So these leaks look related to applications
> using tcpv6.
>
> [  454.787854] leaked reference. Age   385.273596
> [  454.787855]  dst_alloc+0x7a/0x180
> [  454.787857]  ip6_dst_alloc+0x27/0x90
> [  454.787858]  ip6_pol_route+0x257/0x430
> [  454.787859]  ip6_pol_route_output+0x19/0x20
> [  454.787861]  fib6_rule_lookup+0x18b/0x270
> [  454.787863]  ip6_route_output_flags_noref+0xaa/0x110
> [  454.787864]  ip6_route_output_flags+0x32/0xa0
> [  454.787866]  ip6_dst_lookup_tail.constprop.0+0x181/0x240
> [  454.787867]  ip6_dst_lookup_flow+0x43/0xa0
> [  454.787869]  inet6_csk_route_req+0x11b/0x150
> [  454.787871]  tcp_v6_route_req+0xa8/0x140
> [  454.787873]  tcp_conn_request+0x349/0xcd0
> [  454.787875]  tcp_v6_conn_request+0x64/0xd0
> [  454.787877]  tcp_rcv_state_process+0x25b/0x1070
> [  454.787878]  tcp_v6_do_rcv+0x1c4/0x4a0
> [  454.787881]  tcp_v6_rcv+0xea3/0xee0
> [  454.787883] leaked reference. Age   389.378759
> [  454.787884]  fib6_nh_init+0x30d/0x9c0
> [  454.787885]  rtm_new_nexthop+0x68a/0x13a0
> [  454.787888]  rtnetlink_rcv_msg+0x14f/0x380
> [  454.787891]  netlink_rcv_skb+0x55/0x100
> [  454.787893]  rtnetlink_rcv+0x15/0x20
> [  454.787895]  netlink_unicast+0x230/0x340
> [  454.787897]  netlink_sendmsg+0x252/0x4b0
> [  454.787899]  sock_sendmsg+0x65/0x70
> [  454.787900]  ____sys_sendmsg+0x24e/0x290
> [  454.787902]  ___sys_sendmsg+0x81/0xc0
> [  454.787903]  __sys_sendmsg+0x62/0xb0
> [  454.787905]  __x64_sys_sendmsg+0x1d/0x20
> [  454.787907]  do_syscall_64+0x3b/0xc0
> [  454.787908]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  454.787911] leaked reference. Age   402.739110
> [  454.787911]  ipv6_add_dev+0x13e/0x4f0
> [  454.787914]  addrconf_notify+0x2ca/0x950
> [  454.787917]  raw_notifier_call_chain+0x49/0x60
> [  454.787920]  call_netdevice_notifiers_info+0x50/0x90
> [  454.787923]  __dev_change_net_namespace+0x30d/0x6c0
> [  454.787926]  do_setlink+0xdc/0x10b0
> [  454.787928]  __rtnl_newlink+0x608/0xa10
> [  454.787931]  rtnl_newlink+0x49/0x70
> [  454.787933]  rtnetlink_rcv_msg+0x14f/0x380
> [  454.787935]  netlink_rcv_skb+0x55/0x100
> [  454.787937]  rtnetlink_rcv+0x15/0x20
> [  454.787939]  netlink_unicast+0x230/0x340
> [  454.787941]  netlink_sendmsg+0x252/0x4b0
> [  454.787942]  sock_sendmsg+0x65/0x70
> [  454.787944]  ____sys_sendmsg+0x24e/0x290
> [  454.787945]  ___sys_sendmsg+0x81/0xc0
>
> Here the age is much longer, more like the age of the container and
> early setup of the container. I took a quick look at this last trace,
> and it is the netdev being moved from one namespace to another.
>
> So it could be there are two leaks here?

Possibly something is wrong when a netdevice is going through
__dev_change_net_namespace(),
there might be a missing reparenting of some objects.

The ipv6_add_dev() presence might give us a hint.

>
>    Andrew
>
> rom d3a3149c4e8f5020942ac00fe3bce6a1303f10b7 Mon Sep 17 00:00:00 2001
> From: Andrew Lunn <andrew@lunn.ch>
> Date: Wed, 8 Dec 2021 10:25:05 -0600
> Subject: [PATCH] lib: ref_tracker: Add timestamp for reference
>
> It can be useful to know if the leaked reference is old, or recent.
> Is the bug in interface create and release, or more dynamic like a
> connection.
>
> Add a timestamp to each reference record, and print the age when
> dumping the records.
>
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  lib/ref_tracker.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/lib/ref_tracker.c b/lib/ref_tracker.c
> index 0ae2e66dcf0f..82a1c3681969 100644
> --- a/lib/ref_tracker.c
> +++ b/lib/ref_tracker.c
> @@ -12,6 +12,7 @@ struct ref_tracker {
>         bool                    dead;
>         depot_stack_handle_t    alloc_stack_handle;
>         depot_stack_handle_t    free_stack_handle;
> +       ktime_t                 alloc_ktime;

I do not think ns precision of a ktime_t is really needed.

jiffies should be enough, saving potential high cost of ktime_get_ns()

>  };
>
>  void ref_tracker_dir_exit(struct ref_tracker_dir *dir)
> @@ -44,13 +45,17 @@ void ref_tracker_dir_print(struct ref_tracker_dir *dir,
>                            unsigned int display_limit)
>  {
>         struct ref_tracker *tracker;
> +       ktime_t now = ktime_get();
>         unsigned long flags;
>         unsigned int i = 0;
> -
>         spin_lock_irqsave(&dir->lock, flags);
> +
>         list_for_each_entry(tracker, &dir->list, head) {
>                 if (i < display_limit) {
> -                       pr_err("leaked reference.\n");
> +                       ktime_t age = ktime_sub(now, tracker->alloc_ktime);
> +                       unsigned long rem_nsec = do_div(age, 1000000000);
> +                       pr_err("leaked reference. Age %5lu.%06lu\n",
> +                              (unsigned long)age, rem_nsec / 1000);
>                         if (tracker->alloc_stack_handle)
>                                 stack_depot_print(tracker->alloc_stack_handle);
>                         i++;
> @@ -80,6 +85,7 @@ int ref_tracker_alloc(struct ref_tracker_dir *dir,
>         nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 1);
>         nr_entries = filter_irq_stacks(entries, nr_entries);
>         tracker->alloc_stack_handle = stack_depot_save(entries, nr_entries, gfp);
> +       tracker->alloc_ktime = ktime_get();
>
>         spin_lock_irqsave(&dir->lock, flags);
>         list_add(&tracker->head, &dir->list);
> --
> 2.33.1
>
>
Eric Dumazet Dec. 8, 2021, 6:53 p.m. UTC | #16
On Wed, Dec 8, 2021 at 10:21 AM Eric Dumazet <edumazet@google.com> wrote:
>

> I do not think ns precision of a ktime_t is really needed.
>
> jiffies should be enough, saving potential high cost of ktime_get_ns()
>

Also worth mentioning this is doubling size of ref_tracker.
(going from 32-byte to 64-bytes standard kmem cache)

I would advise creating a dedicated kmem_cache to avoid wasting memory.

current layout being:

struct ref_tracker {
 struct list_head           head;                 /*     0  0x10 */
 bool                       dead;                 /*  0x10   0x1 */

 /* XXX 3 bytes hole, try to pack */

 depot_stack_handle_t       alloc_stack_handle;   /*  0x14   0x4 */
 depot_stack_handle_t       free_stack_handle;    /*  0x18   0x4 */

 /* size: 32, cachelines: 1, members: 4 */
 /* sum members: 25, holes: 1, sum holes: 3 */
 /* padding: 4 */
 /* last cacheline: 32 bytes */
};