Message ID | 20220407084050.184989-1-michaelgur@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | devlink: Add port stats | expand |
On Thu, 7 Apr 2022 11:40:48 +0300 Michael Guralnik wrote: > This patch set adds port statistics to the devlink port object. > It allows device drivers to dynamically attach and detach counters from a > devlink port object. The challenge in defining APIs for stats is not in how to wrap a free form string in a netlink message but how do define values that have clear semantics and are of value to the user. Start from that, discuss what you have with the customer who requested the feature. Then think about the API. I have said this multiple times to multiple people on your team. > The approach of adding object-attached statistics is already supported for trap > with traffic statistics and for the dev object with reload statistics. That's an entirely false comparison. > For the port object, this will allow the device driver to expose and dynamicly > control a set of metrics related to the port. > Currently we add support only for counters, but later API extensions can be made > to support histograms or configurable counters. > > The statistics are exposed to the user with the port get command. > > Example: > # devlink -s port show > pci/0000:00:0b.0/65535: type eth netdev eth1 flavour physical port 0 splittable false > stats: > counter1 235 > counter2 18
Fri, Apr 08, 2022 at 05:16:38AM CEST, kuba@kernel.org wrote: >On Thu, 7 Apr 2022 11:40:48 +0300 Michael Guralnik wrote: >> This patch set adds port statistics to the devlink port object. >> It allows device drivers to dynamically attach and detach counters from a >> devlink port object. > >The challenge in defining APIs for stats is not in how to wrap a free >form string in a netlink message but how do define values that have >clear semantics and are of value to the user. > >Start from that, discuss what you have with the customer who requested >the feature. Then think about the API. > >I have said this multiple times to multiple people on your team. > >> The approach of adding object-attached statistics is already supported for trap >> with traffic statistics and for the dev object with reload statistics. > >That's an entirely false comparison. The trap stats are there already, why do you thing it is a "false comparison" to that? They use DEVLINK_ATTR_STATS attribute to carry the stats nest and then: DEVLINK_ATTR_STATS_RX_PACKETS, /* u64 */ DEVLINK_ATTR_STATS_RX_BYTES, /* u64 */ DEVLINK_ATTR_STATS_RX_DROPPED, /* u64 */ I think that the semantics of these are quite clear. > >> For the port object, this will allow the device driver to expose and dynamicly >> control a set of metrics related to the port. >> Currently we add support only for counters, but later API extensions can be made >> to support histograms or configurable counters. >> >> The statistics are exposed to the user with the port get command. >> >> Example: >> # devlink -s port show >> pci/0000:00:0b.0/65535: type eth netdev eth1 flavour physical port 0 splittable false >> stats: >> counter1 235 >> counter2 18
Fri, Apr 08, 2022 at 05:16:38AM CEST, kuba@kernel.org wrote: >On Thu, 7 Apr 2022 11:40:48 +0300 Michael Guralnik wrote: >> This patch set adds port statistics to the devlink port object. >> It allows device drivers to dynamically attach and detach counters from a >> devlink port object. > >The challenge in defining APIs for stats is not in how to wrap a free >form string in a netlink message but how do define values that have >clear semantics and are of value to the user. Wait, does all stats have to be well-defined? I mean, look at the ethtool stats. They are free-form strings too. Do you mean that in devlink, we can only have well-defines enum-based stats? > >Start from that, discuss what you have with the customer who requested >the feature. Then think about the API. > >I have said this multiple times to multiple people on your team. > >> The approach of adding object-attached statistics is already supported for trap >> with traffic statistics and for the dev object with reload statistics. > >That's an entirely false comparison. > >> For the port object, this will allow the device driver to expose and dynamicly >> control a set of metrics related to the port. >> Currently we add support only for counters, but later API extensions can be made >> to support histograms or configurable counters. >> >> The statistics are exposed to the user with the port get command. >> >> Example: >> # devlink -s port show >> pci/0000:00:0b.0/65535: type eth netdev eth1 flavour physical port 0 splittable false >> stats: >> counter1 235 >> counter2 18
On Mon, 11 Apr 2022 12:28:40 +0200 Jiri Pirko wrote: > >> The approach of adding object-attached statistics is already supported for trap > >> with traffic statistics and for the dev object with reload statistics. > > > >That's an entirely false comparison. > > The trap stats are there already, why do you thing it is a "false > comparison" to that? > > They use DEVLINK_ATTR_STATS attribute to carry the stats nest and then: > DEVLINK_ATTR_STATS_RX_PACKETS, /* u64 */ > DEVLINK_ATTR_STATS_RX_BYTES, /* u64 */ > DEVLINK_ATTR_STATS_RX_DROPPED, /* u64 */ > I think that the semantics of these are quite clear. Exactly, (1) the semantics of statistics on traps and health are clearly dictated by the object, there is no ambiguity what rx packets for a trap means; (2) the statistics are enumerated by the core... > >> For the port object, this will allow the device driver to expose and dynamicly > >> control a set of metrics related to the port. ..this is like saying "it was 14C in Prague and San Francisco today, so the weather was the same. It was sunny in Prague and raining in SF".
On Mon, 11 Apr 2022 12:31:53 +0200 Jiri Pirko wrote: > Fri, Apr 08, 2022 at 05:16:38AM CEST, kuba@kernel.org wrote: > >On Thu, 7 Apr 2022 11:40:48 +0300 Michael Guralnik wrote: > >> This patch set adds port statistics to the devlink port object. > >> It allows device drivers to dynamically attach and detach counters from a > >> devlink port object. > > > >The challenge in defining APIs for stats is not in how to wrap a free > >form string in a netlink message but how do define values that have > >clear semantics and are of value to the user. > > Wait, does all stats have to be well-defined? I mean, look at the > ethtool stats. They are free-form strings too. Do you mean that in > devlink, we can only have well-defines enum-based stats? That's my strong preference, yes. First, and obvious argument is that it make lazy coding less likely (see devlink params). More importantly, tho, if your stats are not well defined - users don't need to seem them. Really! If I can't draw a line between a statistic and device behavior then keep that stat in the register dump, debugfs or /dev/null. That's why it's important that we talk about _what_ you're trying to expose.
Mon, Apr 11, 2022 at 08:01:57PM CEST, kuba@kernel.org wrote: >On Mon, 11 Apr 2022 12:31:53 +0200 Jiri Pirko wrote: >> Fri, Apr 08, 2022 at 05:16:38AM CEST, kuba@kernel.org wrote: >> >On Thu, 7 Apr 2022 11:40:48 +0300 Michael Guralnik wrote: >> >> This patch set adds port statistics to the devlink port object. >> >> It allows device drivers to dynamically attach and detach counters from a >> >> devlink port object. >> > >> >The challenge in defining APIs for stats is not in how to wrap a free >> >form string in a netlink message but how do define values that have >> >clear semantics and are of value to the user. >> >> Wait, does all stats have to be well-defined? I mean, look at the >> ethtool stats. They are free-form strings too. Do you mean that in >> devlink, we can only have well-defines enum-based stats? > >That's my strong preference, yes. > >First, and obvious argument is that it make lazy coding less likely >(see devlink params). > >More importantly, tho, if your stats are not well defined - users don't >need to seem them. Really! If I can't draw a line between a statistic >and device behavior then keep that stat in the register dump, debugfs During the DaveM's-only era, there was quite strict policy against any debugfs usage. As far as I remember the claim was, find of define the proper api or do your debug things out-of-tree. Does that changed? I just want to make sure that we are now free to use debugfs for exposuse of debugging info as "odd vendor stats". Personally, I think it is good idea. I think that the rest of the kernel actually uses debugfs like that. Thanks! >or /dev/null. > >That's why it's important that we talk about _what_ you're trying to >expose. Basically a mixture of quite generic things and very obscure device specific items.
On Tue, 12 Apr 2022 10:16:26 +0200 Jiri Pirko wrote: > Mon, Apr 11, 2022 at 08:01:57PM CEST, kuba@kernel.org wrote: > >> Wait, does all stats have to be well-defined? I mean, look at the > >> ethtool stats. They are free-form strings too. Do you mean that in > >> devlink, we can only have well-defines enum-based stats? > > > >That's my strong preference, yes. > > > >First, and obvious argument is that it make lazy coding less likely > >(see devlink params). > > > >More importantly, tho, if your stats are not well defined - users don't > >need to seem them. Really! If I can't draw a line between a statistic > >and device behavior then keep that stat in the register dump, debugfs > > During the DaveM's-only era, there was quite strict policy against any > debugfs usage. As far as I remember the claim was, find of define the > proper api or do your debug things out-of-tree. > > Does that changed? I just want to make sure that we are now free to use > debugfs for exposuse of debugging info as "odd vendor stats". > Personally, I think it is good idea. I think that the rest of the kernel > actually uses debugfs like that. I think the policy is "it's fine as long as it's read-only". Which could be a problem if you want to parametrize the counters. But then again based on this RFC IDK if you do. > >or /dev/null. > > > >That's why it's important that we talk about _what_ you're trying to > >expose. > > Basically a mixture of quite generic things and very obscure device > specific items.