mbox series

[net-next,0/2] DSA trace events

Message ID 20230407141451.133048-1-vladimir.oltean@nxp.com (mailing list archive)
Headers show
Series DSA trace events | expand

Message

Vladimir Oltean April 7, 2023, 2:14 p.m. UTC
This series introduces the "dsa" trace event class, with the following
events:

$ trace-cmd list | grep dsa
dsa
dsa:dsa_fdb_add_hw
dsa:dsa_mdb_add_hw
dsa:dsa_fdb_del_hw
dsa:dsa_mdb_del_hw
dsa:dsa_fdb_add_bump
dsa:dsa_mdb_add_bump
dsa:dsa_fdb_del_drop
dsa:dsa_mdb_del_drop
dsa:dsa_fdb_del_not_found
dsa:dsa_mdb_del_not_found
dsa:dsa_lag_fdb_add_hw
dsa:dsa_lag_fdb_add_bump
dsa:dsa_lag_fdb_del_hw
dsa:dsa_lag_fdb_del_drop
dsa:dsa_lag_fdb_del_not_found
dsa:dsa_vlan_add_hw
dsa:dsa_vlan_del_hw
dsa:dsa_vlan_add_bump
dsa:dsa_vlan_del_drop
dsa:dsa_vlan_del_not_found

These are useful to debug refcounting issues on CPU and DSA ports, where
entries may remain lingering, or may be removed too soon, depending on
bugs in higher layers of the network stack.

Vladimir Oltean (2):
  net: dsa: add trace points for FDB/MDB operations
  net: dsa: add trace points for VLAN operations

 net/dsa/Makefile |   6 +-
 net/dsa/switch.c |  85 +++++++--
 net/dsa/trace.c  |  39 +++++
 net/dsa/trace.h  | 447 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 560 insertions(+), 17 deletions(-)
 create mode 100644 net/dsa/trace.c
 create mode 100644 net/dsa/trace.h

Comments

Andrew Lunn April 12, 2023, 12:48 a.m. UTC | #1
On Fri, Apr 07, 2023 at 05:14:49PM +0300, Vladimir Oltean wrote:
> This series introduces the "dsa" trace event class, with the following
> events:
> 
> $ trace-cmd list | grep dsa
> dsa
> dsa:dsa_fdb_add_hw
> dsa:dsa_mdb_add_hw
> dsa:dsa_fdb_del_hw
> dsa:dsa_mdb_del_hw
> dsa:dsa_fdb_add_bump
> dsa:dsa_mdb_add_bump
> dsa:dsa_fdb_del_drop
> dsa:dsa_mdb_del_drop
> dsa:dsa_fdb_del_not_found
> dsa:dsa_mdb_del_not_found
> dsa:dsa_lag_fdb_add_hw
> dsa:dsa_lag_fdb_add_bump
> dsa:dsa_lag_fdb_del_hw
> dsa:dsa_lag_fdb_del_drop
> dsa:dsa_lag_fdb_del_not_found
> dsa:dsa_vlan_add_hw
> dsa:dsa_vlan_del_hw
> dsa:dsa_vlan_add_bump
> dsa:dsa_vlan_del_drop
> dsa:dsa_vlan_del_not_found
> 
> These are useful to debug refcounting issues on CPU and DSA ports, where
> entries may remain lingering, or may be removed too soon, depending on
> bugs in higher layers of the network stack.

Hi Vladimir

I don't know anything about trace points. Should you Cc: 

Steven Rostedt <rostedt@goodmis.org> (maintainer:TRACING)
Masami Hiramatsu <mhiramat@kernel.org> (maintainer:TRACING)

to get some feedback from people who do?

   Andrew
patchwork-bot+netdevbpf@kernel.org April 12, 2023, 8:30 a.m. UTC | #2
Hello:

This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Fri,  7 Apr 2023 17:14:49 +0300 you wrote:
> This series introduces the "dsa" trace event class, with the following
> events:
> 
> $ trace-cmd list | grep dsa
> dsa
> dsa:dsa_fdb_add_hw
> dsa:dsa_mdb_add_hw
> dsa:dsa_fdb_del_hw
> dsa:dsa_mdb_del_hw
> dsa:dsa_fdb_add_bump
> dsa:dsa_mdb_add_bump
> dsa:dsa_fdb_del_drop
> dsa:dsa_mdb_del_drop
> dsa:dsa_fdb_del_not_found
> dsa:dsa_mdb_del_not_found
> dsa:dsa_lag_fdb_add_hw
> dsa:dsa_lag_fdb_add_bump
> dsa:dsa_lag_fdb_del_hw
> dsa:dsa_lag_fdb_del_drop
> dsa:dsa_lag_fdb_del_not_found
> dsa:dsa_vlan_add_hw
> dsa:dsa_vlan_del_hw
> dsa:dsa_vlan_add_bump
> dsa:dsa_vlan_del_drop
> dsa:dsa_vlan_del_not_found
> 
> [...]

Here is the summary with links:
  - [net-next,1/2] net: dsa: add trace points for FDB/MDB operations
    https://git.kernel.org/netdev/net-next/c/9538ebce88ff
  - [net-next,2/2] net: dsa: add trace points for VLAN operations
    https://git.kernel.org/netdev/net-next/c/02020bd70fa6

You are awesome, thank you!
Vladimir Oltean April 12, 2023, 9:55 a.m. UTC | #3
On Wed, Apr 12, 2023 at 02:48:35AM +0200, Andrew Lunn wrote:
> On Fri, Apr 07, 2023 at 05:14:49PM +0300, Vladimir Oltean wrote:
> > This series introduces the "dsa" trace event class, with the following
> > events:
> > 
> > $ trace-cmd list | grep dsa
> > dsa
> > dsa:dsa_fdb_add_hw
> > dsa:dsa_mdb_add_hw
> > dsa:dsa_fdb_del_hw
> > dsa:dsa_mdb_del_hw
> > dsa:dsa_fdb_add_bump
> > dsa:dsa_mdb_add_bump
> > dsa:dsa_fdb_del_drop
> > dsa:dsa_mdb_del_drop
> > dsa:dsa_fdb_del_not_found
> > dsa:dsa_mdb_del_not_found
> > dsa:dsa_lag_fdb_add_hw
> > dsa:dsa_lag_fdb_add_bump
> > dsa:dsa_lag_fdb_del_hw
> > dsa:dsa_lag_fdb_del_drop
> > dsa:dsa_lag_fdb_del_not_found
> > dsa:dsa_vlan_add_hw
> > dsa:dsa_vlan_del_hw
> > dsa:dsa_vlan_add_bump
> > dsa:dsa_vlan_del_drop
> > dsa:dsa_vlan_del_not_found
> > 
> > These are useful to debug refcounting issues on CPU and DSA ports, where
> > entries may remain lingering, or may be removed too soon, depending on
> > bugs in higher layers of the network stack.
> 
> Hi Vladimir
> 
> I don't know anything about trace points. Should you Cc: 
> 
> Steven Rostedt <rostedt@goodmis.org> (maintainer:TRACING)
> Masami Hiramatsu <mhiramat@kernel.org> (maintainer:TRACING)
> 
> to get some feedback from people who do?
> 
>    Andrew

I suppose I could.

Hi Steven, Masami, would you mind taking a look and letting me know
if the trace API was used reasonably? The patches were merged as:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=9538ebce88ffa074202d592d468521995cb1e714
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=02020bd70fa6abcb1c2a8525ce7c1500dd4f44a8
but I can make incremental changes if necessary.
Masami Hiramatsu (Google) April 21, 2023, 12:38 p.m. UTC | #4
On Wed, 12 Apr 2023 12:55:34 +0300
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Wed, Apr 12, 2023 at 02:48:35AM +0200, Andrew Lunn wrote:
> > On Fri, Apr 07, 2023 at 05:14:49PM +0300, Vladimir Oltean wrote:
> > > This series introduces the "dsa" trace event class, with the following
> > > events:
> > > 
> > > $ trace-cmd list | grep dsa
> > > dsa
> > > dsa:dsa_fdb_add_hw
> > > dsa:dsa_mdb_add_hw
> > > dsa:dsa_fdb_del_hw
> > > dsa:dsa_mdb_del_hw
> > > dsa:dsa_fdb_add_bump
> > > dsa:dsa_mdb_add_bump
> > > dsa:dsa_fdb_del_drop
> > > dsa:dsa_mdb_del_drop
> > > dsa:dsa_fdb_del_not_found
> > > dsa:dsa_mdb_del_not_found
> > > dsa:dsa_lag_fdb_add_hw
> > > dsa:dsa_lag_fdb_add_bump
> > > dsa:dsa_lag_fdb_del_hw
> > > dsa:dsa_lag_fdb_del_drop
> > > dsa:dsa_lag_fdb_del_not_found
> > > dsa:dsa_vlan_add_hw
> > > dsa:dsa_vlan_del_hw
> > > dsa:dsa_vlan_add_bump
> > > dsa:dsa_vlan_del_drop
> > > dsa:dsa_vlan_del_not_found
> > > 
> > > These are useful to debug refcounting issues on CPU and DSA ports, where
> > > entries may remain lingering, or may be removed too soon, depending on
> > > bugs in higher layers of the network stack.
> > 
> > Hi Vladimir
> > 
> > I don't know anything about trace points. Should you Cc: 
> > 
> > Steven Rostedt <rostedt@goodmis.org> (maintainer:TRACING)
> > Masami Hiramatsu <mhiramat@kernel.org> (maintainer:TRACING)
> > 
> > to get some feedback from people who do?
> > 
> >    Andrew
> 
> I suppose I could.
> 
> Hi Steven, Masami, would you mind taking a look and letting me know
> if the trace API was used reasonably? The patches were merged as:
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=9538ebce88ffa074202d592d468521995cb1e714
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=02020bd70fa6abcb1c2a8525ce7c1500dd4f44a8
> but I can make incremental changes if necessary.

If the subsystem maintainers are OK for including this, it is OK.
But basically, since the event is exposed to userland and you may keep
these events maintained, you should carefully add the events.
If those are only for debugging (after debug, it will not be used
frequently), can you consider to use kprobe events?
'perf probe' command will also help you to trace local variables and
structure members as like gdb does.


Thank you,
Vladimir Oltean April 21, 2023, 12:47 p.m. UTC | #5
On Fri, Apr 21, 2023 at 09:38:50PM +0900, Masami Hiramatsu wrote:
> If the subsystem maintainers are OK for including this, it is OK.
> But basically, since the event is exposed to userland and you may keep
> these events maintained, you should carefully add the events.
> If those are only for debugging (after debug, it will not be used
> frequently), can you consider to use kprobe events?
> 'perf probe' command will also help you to trace local variables and
> structure members as like gdb does.

Thanks for taking a look. I haven't looked at kprobe events. I also
wasn't planning on maintaining these assuming stable UABI terms, just
for debugging. What are some user space consumers that expect the UABI
to be stable, and what is it about the trace events that can/can't change?
Steven Rostedt April 24, 2023, 10:25 p.m. UTC | #6
On Fri, 21 Apr 2023 15:47:08 +0300
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> On Fri, Apr 21, 2023 at 09:38:50PM +0900, Masami Hiramatsu wrote:
> > If the subsystem maintainers are OK for including this, it is OK.
> > But basically, since the event is exposed to userland and you may keep
> > these events maintained, you should carefully add the events.
> > If those are only for debugging (after debug, it will not be used
> > frequently), can you consider to use kprobe events?
> > 'perf probe' command will also help you to trace local variables and
> > structure members as like gdb does.  
> 
> Thanks for taking a look. I haven't looked at kprobe events. I also
> wasn't planning on maintaining these assuming stable UABI terms, just
> for debugging. What are some user space consumers that expect the UABI
> to be stable, and what is it about the trace events that can/can't change?

Ideally, tooling will use the libtraceevent library[1] to parse the
events. In that case, if an event is used by tooling, you'll need to
keep around the fields that are used by the tooling.

-- Steve

[1] https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git/
Vladimir Oltean April 26, 2023, 7:13 p.m. UTC | #7
On Mon, Apr 24, 2023 at 06:25:54PM -0400, Steven Rostedt wrote:
> On Fri, 21 Apr 2023 15:47:08 +0300
> Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
> 
> > On Fri, Apr 21, 2023 at 09:38:50PM +0900, Masami Hiramatsu wrote:
> > > If the subsystem maintainers are OK for including this, it is OK.
> > > But basically, since the event is exposed to userland and you may keep
> > > these events maintained, you should carefully add the events.
> > > If those are only for debugging (after debug, it will not be used
> > > frequently), can you consider to use kprobe events?
> > > 'perf probe' command will also help you to trace local variables and
> > > structure members as like gdb does.  
> > 
> > Thanks for taking a look. I haven't looked at kprobe events. I also
> > wasn't planning on maintaining these assuming stable UABI terms, just
> > for debugging. What are some user space consumers that expect the UABI
> > to be stable, and what is it about the trace events that can/can't change?
> 
> Ideally, tooling will use the libtraceevent library[1] to parse the
> events. In that case, if an event is used by tooling, you'll need to
> keep around the fields that are used by the tooling.

Ok, I did not plan for user space to treat these events as something
stable to pick up on. The Linux bridge already notifies VLANs, FDBs,
MDBs through the rtnetlink socket, and that's what I would consider to
be the stable ABI. What can be seen here (DSA) is essentially a
framework used by multiple hardware drivers, but ultimately still device
driver-level code.

What would you recommend here? A revert?
Steven Rostedt April 26, 2023, 7:23 p.m. UTC | #8
On Wed, 26 Apr 2023 22:13:36 +0300
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> Ok, I did not plan for user space to treat these events as something
> stable to pick up on. The Linux bridge already notifies VLANs, FDBs,
> MDBs through the rtnetlink socket, and that's what I would consider to
> be the stable ABI. What can be seen here (DSA) is essentially a
> framework used by multiple hardware drivers, but ultimately still device
> driver-level code.
> 
> What would you recommend here? A revert?

There's lots of events in the kernel that no tools use. Do you expect
anyone to create a tool that uses these events?

We break user space API all the time. As long as nothing notices, it's OK.
We take the "tree in the forest" approach. If user space API breaks, but no
tool uses it, did it break? The answer according to Linus, is "no".

Al Viro refuses to have trace events in VFS, because there's lots of places
that could become useful for tooling, and he doesn't want to support it.
But if the events are not useful for user space tooling, they should be
generally safe to keep.

There's tons of events in the wifi code, because they are very useful for
debugging remote applications out in the world, that the wifi maintainers
have tooling for. But those are not considered "stable", because the only
tools are the ones that the maintainer of the trace events, created.

If you don't see anything using these events for useful tooling outside
your own use, then I'd just keep them. There's a thousand other events in
the kernel that are not used by tools, I doubt these will be any different.

If you think that a tool that will end up in a distribution will start
using them, then you need to take care.

-- Steve
Vladimir Oltean April 26, 2023, 7:43 p.m. UTC | #9
On Wed, Apr 26, 2023 at 03:23:45PM -0400, Steven Rostedt wrote:
> There's lots of events in the kernel that no tools use. Do you expect
> anyone to create a tool that uses these events?
> 
> We break user space API all the time. As long as nothing notices, it's OK.
> We take the "tree in the forest" approach. If user space API breaks, but no
> tool uses it, did it break? The answer according to Linus, is "no".
> 
> Al Viro refuses to have trace events in VFS, because there's lots of places
> that could become useful for tooling, and he doesn't want to support it.
> But if the events are not useful for user space tooling, they should be
> generally safe to keep.
> 
> There's tons of events in the wifi code, because they are very useful for
> debugging remote applications out in the world, that the wifi maintainers
> have tooling for. But those are not considered "stable", because the only
> tools are the ones that the maintainer of the trace events, created.
> 
> If you don't see anything using these events for useful tooling outside
> your own use, then I'd just keep them. There's a thousand other events in
> the kernel that are not used by tools, I doubt these will be any different.
> 
> If you think that a tool that will end up in a distribution will start
> using them, then you need to take care.

Well, there's one thing that could become useful for tooling, and
that's determining the resource utilization of the hardware (number of
dsa_fdb_add_hw events minus dsa_fdb_del_hw, number of dsa_vlan_add_hw
minus dsa_vlan_del_hw, etc) relative to some hardcoded maximum capacity
which would be somehow determined by userspace for each driver. There
have been requests for this in the not so distant past.

Instead of living in fear that this might happen, I think what would be
the most productive thing to do would be to just create a proper API in
the next kernel development cycle to expose just that information, and
point other people to that other API, and keep the trace events just for
debugging.

Andrew, Florian, what do you think?
Steven Rostedt April 26, 2023, 7:47 p.m. UTC | #10
On Wed, 26 Apr 2023 22:43:01 +0300
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> Instead of living in fear that this might happen, I think what would be
> the most productive thing to do would be to just create a proper API in
> the next kernel development cycle to expose just that information, and
> point other people to that other API, and keep the trace events just for
> debugging.

I also want to add that if a tool does use a trace event that was not your
intention, you can then fix the tool to do it properly.

I had this with powertop. It had hardcoded events (did not use
libtraceevent) and when I modified the layout, it broke, and Linus reverted
my changes. After fixing powertop to do things properly, I was able to get
my changes back in.

So even if you do break user space, you can still fix it later.

-- Steve
Vladimir Oltean April 26, 2023, 9:07 p.m. UTC | #11
On Wed, Apr 26, 2023 at 03:47:13PM -0400, Steven Rostedt wrote:
> I also want to add that if a tool does use a trace event that was not your
> intention, you can then fix the tool to do it properly.
> 
> I had this with powertop. It had hardcoded events (did not use
> libtraceevent) and when I modified the layout, it broke, and Linus reverted
> my changes. After fixing powertop to do things properly, I was able to get
> my changes back in.
> 
> So even if you do break user space, you can still fix it later.

Thanks, this is helpful.
Andrew Lunn April 27, 2023, 12:33 a.m. UTC | #12
> Well, there's one thing that could become useful for tooling, and
> that's determining the resource utilization of the hardware (number of
> dsa_fdb_add_hw events minus dsa_fdb_del_hw, number of dsa_vlan_add_hw
> minus dsa_vlan_del_hw, etc) relative to some hardcoded maximum capacity
> which would be somehow determined by userspace for each driver. There
> have been requests for this in the not so distant past.

That sounds similar to devlink resources. The mv88e6xxx driver already
returns the number of entries in the ATU, and the size of the
ATU. Maybe that just needs generalising?

     Andrew