Message ID | 20210803163641.3743-4-alexandr.lobakin@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ethtool, stats: introduce and use standard XDP stats | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | fail | Series longer than 15 patches |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 1 maintainers not CCed: johannes.berg@intel.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 2069 this patch: 2069 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 341 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 2061 this patch: 2061 |
netdev/header_inline | success | Link |
On Tue, 3 Aug 2021 18:36:23 +0200 Alexander Lobakin wrote: > Most of the driver-side XDP enabled drivers provide some statistics > on XDP programs runs and different actions taken (number of passes, > drops, redirects etc.). Could you please share the statistics to back that statement up? Having uAPI for XDP stats is pretty much making the recommendation that drivers should implement such stats. The recommendation from Alexei and others back in the day (IIRC) was that XDP programs should implement stats, not the drivers, to avoid duplication. > Regarding that it's almost pretty the same across all the drivers > (which is obvious), we can implement some sort of "standardized" > statistics using Ethtool standard stats infra to eliminate a lot > of code and stringsets duplication, different approaches to count > these stats and so on. I'm not 100% sold on the fact that these should be ethtool stats. Why not rtnl_fill_statsinfo() stats? Current ethtool std stats are all pretty Ethernet specific, and all HW stats. Mixing HW and SW stats is what we're trying to get away from. > These new 12 fields provided by the standard XDP stats should cover > most, if not all, stats that might be interesting for collecting and > tracking. > Note that most NIC drivers keep XDP statistics on a per-channel > basis, so this also introduces a new callback for getting a number > of channels which a driver will provide stats for. If it's not > implemented or returns 0, it means stats are global/device-wide. Per-channel stats via std ethtool stats are not a good idea. Per queue stats must be via the queue netlink interface we keep talking about for ever but which doesn't seem to materialize. When stats are reported via a different interface than objects they pertain to matching stats, objects and their lifetime becomes very murky.
On Tue, 2021-08-03 at 13:49 -0700, Jakub Kicinski wrote: > On Tue, 3 Aug 2021 18:36:23 +0200 Alexander Lobakin wrote: > > Most of the driver-side XDP enabled drivers provide some statistics > > on XDP programs runs and different actions taken (number of passes, > > drops, redirects etc.). > > Could you please share the statistics to back that statement up? > Having uAPI for XDP stats is pretty much making the recommendation > that drivers should implement such stats. The recommendation from > Alexei and others back in the day (IIRC) was that XDP programs should > implement stats, not the drivers, to avoid duplication. > There are stats "mainly errors*" that are not even visible or reported to the user prog, for that i had an idea in the past to attach an exception_bpf_prog provided by the user, where driver/stack will report errors to this special exception_prog. > > Regarding that it's almost pretty the same across all the drivers > > (which is obvious), we can implement some sort of "standardized" > > statistics using Ethtool standard stats infra to eliminate a lot > > of code and stringsets duplication, different approaches to count > > these stats and so on. > > I'm not 100% sold on the fact that these should be ethtool stats. > Why not rtnl_fill_statsinfo() stats? Current ethtool std stats are > all pretty Ethernet specific, and all HW stats. Mixing HW and SW > stats > is what we're trying to get away from. > XDP is going to always be eBPF based ! why not just report such stats to a special BPF_MAP ? BPF stack can collect the stats from the driver and report them to this special MAP upon user request. > > These new 12 fields provided by the standard XDP stats should cover > > most, if not all, stats that might be interesting for collecting > > and > > tracking. > > Note that most NIC drivers keep XDP statistics on a per-channel > > basis, so this also introduces a new callback for getting a number > > of channels which a driver will provide stats for. If it's not > > implemented or returns 0, it means stats are global/device-wide. > > Per-channel stats via std ethtool stats are not a good idea. Per > queue > stats must be via the queue netlink interface we keep talking about > for > ever but which doesn't seem to materialize. When stats are reported > via > a different interface than objects they pertain to matching stats, > objects and their lifetime becomes very murky.
On Tue, 03 Aug 2021 16:57:22 -0700 Saeed Mahameed wrote: > On Tue, 2021-08-03 at 13:49 -0700, Jakub Kicinski wrote: > > On Tue, 3 Aug 2021 18:36:23 +0200 Alexander Lobakin wrote: > > > Most of the driver-side XDP enabled drivers provide some statistics > > > on XDP programs runs and different actions taken (number of passes, > > > drops, redirects etc.). > > > > Could you please share the statistics to back that statement up? > > Having uAPI for XDP stats is pretty much making the recommendation > > that drivers should implement such stats. The recommendation from > > Alexei and others back in the day (IIRC) was that XDP programs should > > implement stats, not the drivers, to avoid duplication. > > There are stats "mainly errors*" that are not even visible or reported > to the user prog, Fair point, exceptions should not be performance critical. > for that i had an idea in the past to attach an > exception_bpf_prog provided by the user, where driver/stack will report > errors to this special exception_prog. Or maybe we should turn trace_xdp_exception() into a call which unconditionally collects exception stats? I think we can reasonably expect the exception_bpf_prog to always be attached, right? > > > Regarding that it's almost pretty the same across all the drivers > > > (which is obvious), we can implement some sort of "standardized" > > > statistics using Ethtool standard stats infra to eliminate a lot > > > of code and stringsets duplication, different approaches to count > > > these stats and so on. > > > > I'm not 100% sold on the fact that these should be ethtool stats. > > Why not rtnl_fill_statsinfo() stats? Current ethtool std stats are > > all pretty Ethernet specific, and all HW stats. Mixing HW and SW > > stats > > is what we're trying to get away from. > > XDP is going to always be eBPF based ! why not just report such stats > to a special BPF_MAP ? BPF stack can collect the stats from the driver > and report them to this special MAP upon user request. Do you mean replacing the ethtool-netlink / rtnetlink etc. with a new BPF_MAP? I don't think adding another category of uAPI thru which netdevice stats are exposed would do much good :( Plus it doesn't address the "yet another cacheline" concern. To my understanding the need for stats recognizes the fact that (in large organizations) fleet monitoring is done by different teams than XDP development. So XDP team may have all the stats they need, but the team doing fleet monitoring has no idea how to get to them. To bridge the two worlds we need a way for the infra team to ask the XDP for well-defined stats. Maybe we should take a page from the BPF iterators book and create a program type for bridging the two worlds? Called by networking core when duping stats to extract from the existing BPF maps all the relevant stats and render them into a well known struct? Users' XDP design can still use a single per-cpu map with all the stats if they so choose, but there's a way to implement more optimal designs and still expose well-defined stats. Maybe that's too complex, IDK.
From: Jakub Kicinski <kuba@kernel.org> Date: Wed, 4 Aug 2021 05:36:50 -0700 > On Tue, 03 Aug 2021 16:57:22 -0700 Saeed Mahameed wrote: > > On Tue, 2021-08-03 at 13:49 -0700, Jakub Kicinski wrote: > > > On Tue, 3 Aug 2021 18:36:23 +0200 Alexander Lobakin wrote: > > > > Most of the driver-side XDP enabled drivers provide some statistics > > > > on XDP programs runs and different actions taken (number of passes, > > > > drops, redirects etc.). > > > > > > Could you please share the statistics to back that statement up? > > > Having uAPI for XDP stats is pretty much making the recommendation > > > that drivers should implement such stats. The recommendation from > > > Alexei and others back in the day (IIRC) was that XDP programs should > > > implement stats, not the drivers, to avoid duplication. Well, 20+ patches in the series with at least half of them is drivers conversion. Plus mlx5. Plus we'll about to land XDP statistics for all Intel drivers, just firstly need to get a common infra for them (the purpose of this series). Also, introducing IEEE and rmon stats didn't make a statement that all drivers should really expose them, right? > > There are stats "mainly errors*" that are not even visible or reported > > to the user prog, Not really. Many drivers like to count the number of redirects, xdp_xmits and stuff (incl. mlx5). Nevertheless, these stats aren't the same as something you can get from inside an XDP prog, right. > Fair point, exceptions should not be performance critical. > > > for that i had an idea in the past to attach an > > exception_bpf_prog provided by the user, where driver/stack will report > > errors to this special exception_prog. > > Or maybe we should turn trace_xdp_exception() into a call which > unconditionally collects exception stats? I think we can reasonably > expect the exception_bpf_prog to always be attached, right? trace_xdp_exception() is again a error path, and would restrict us to have only "bad" statistics. > > > > Regarding that it's almost pretty the same across all the drivers > > > > (which is obvious), we can implement some sort of "standardized" > > > > statistics using Ethtool standard stats infra to eliminate a lot > > > > of code and stringsets duplication, different approaches to count > > > > these stats and so on. > > > > > > I'm not 100% sold on the fact that these should be ethtool stats. > > > Why not rtnl_fill_statsinfo() stats? Current ethtool std stats are > > > all pretty Ethernet specific, and all HW stats. Mixing HW and SW > > > stats > > > is what we're trying to get away from. I was trying to introduce as few functional changes as possible, including that all the current drivers expose XDP stats through Ethtool. I don't say it's a 100% optimal way, but lots of different scripts and monitoring tools are already based on this fact and there can be some negative impact. There'll be for sure due to that std stats is a bit different thing and different drivers count and name XDP stats differently (breh). BTW, I'm fine with rtnl xstats. A nice reminder, thanks. If there won't be much cons like "don't touch our Ethtool stats", I would prefer this one instead of Ethtool standard stats way. > > XDP is going to always be eBPF based ! why not just report such stats > > to a special BPF_MAP ? BPF stack can collect the stats from the driver > > and report them to this special MAP upon user request. > > Do you mean replacing the ethtool-netlink / rtnetlink etc. with > a new BPF_MAP? I don't think adding another category of uAPI thru > which netdevice stats are exposed would do much good :( Plus it > doesn't address the "yet another cacheline" concern. + this makes obtaining/tracking the statistics much harder. For now, all you need is `ethtool -S devname` (mainline) or `ethtool -S devname --groups xdp` (this series), and obtaining rtnl xstats is just a different command to invoke. BPF_MAP-based stats are a completely different story then. > To my understanding the need for stats recognizes the fact that (in > large organizations) fleet monitoring is done by different teams than > XDP development. So XDP team may have all the stats they need, but the > team doing fleet monitoring has no idea how to get to them. > > To bridge the two worlds we need a way for the infra team to ask the > XDP for well-defined stats. Maybe we should take a page from the BPF > iterators book and create a program type for bridging the two worlds? > Called by networking core when duping stats to extract from the > existing BPF maps all the relevant stats and render them into a well > known struct? Users' XDP design can still use a single per-cpu map with > all the stats if they so choose, but there's a way to implement more > optimal designs and still expose well-defined stats. > > Maybe that's too complex, IDK. Thanks, Al
On 8/4/21 6:36 AM, Jakub Kicinski wrote: >> XDP is going to always be eBPF based ! why not just report such stats >> to a special BPF_MAP ? BPF stack can collect the stats from the driver >> and report them to this special MAP upon user request. > Do you mean replacing the ethtool-netlink / rtnetlink etc. with > a new BPF_MAP? I don't think adding another category of uAPI thru > which netdevice stats are exposed would do much good :( Plus it > doesn't address the "yet another cacheline" concern. > > To my understanding the need for stats recognizes the fact that (in > large organizations) fleet monitoring is done by different teams than > XDP development. So XDP team may have all the stats they need, but the > team doing fleet monitoring has no idea how to get to them. > > To bridge the two worlds we need a way for the infra team to ask the > XDP for well-defined stats. Maybe we should take a page from the BPF > iterators book and create a program type for bridging the two worlds? > Called by networking core when duping stats to extract from the > existing BPF maps all the relevant stats and render them into a well > known struct? Users' XDP design can still use a single per-cpu map with > all the stats if they so choose, but there's a way to implement more > optimal designs and still expose well-defined stats. > > Maybe that's too complex, IDK. I was just explaining to someone internally how to get stats at all of the different points in the stack to track down reasons for dropped packets: ethtool -S for h/w and driver tc -s for drops by the qdisc /proc/net/softnet_stat for drops at the backlog layer netstat -s for network and transport layer yet another command and API just adds to the nightmare of explaining and understanding these stats. There is real value in continuing to use ethtool API for XDP stats. Not saying this reorg of the XDP stats is the right thing to do, only that the existing API has real user benefits. Does anyone have data that shows bumping a properly implemented counter causes a noticeable performance degradation and if so by how much? You mention 'yet another cacheline' but collecting stats on stack and incrementing the driver structs at the end of the napi loop should not have a huge impact versus the value the stats provide.
On Wed, 4 Aug 2021 10:17:56 -0600 David Ahern wrote: > On 8/4/21 6:36 AM, Jakub Kicinski wrote: > >> XDP is going to always be eBPF based ! why not just report such stats > >> to a special BPF_MAP ? BPF stack can collect the stats from the driver > >> and report them to this special MAP upon user request. > > Do you mean replacing the ethtool-netlink / rtnetlink etc. with > > a new BPF_MAP? I don't think adding another category of uAPI thru > > which netdevice stats are exposed would do much good :( Plus it > > doesn't address the "yet another cacheline" concern. > > > > To my understanding the need for stats recognizes the fact that (in > > large organizations) fleet monitoring is done by different teams than > > XDP development. So XDP team may have all the stats they need, but the > > team doing fleet monitoring has no idea how to get to them. > > > > To bridge the two worlds we need a way for the infra team to ask the > > XDP for well-defined stats. Maybe we should take a page from the BPF > > iterators book and create a program type for bridging the two worlds? > > Called by networking core when duping stats to extract from the > > existing BPF maps all the relevant stats and render them into a well > > known struct? Users' XDP design can still use a single per-cpu map with > > all the stats if they so choose, but there's a way to implement more > > optimal designs and still expose well-defined stats. > > > > Maybe that's too complex, IDK. > > I was just explaining to someone internally how to get stats at all of > the different points in the stack to track down reasons for dropped packets: > > ethtool -S for h/w and driver > tc -s for drops by the qdisc > /proc/net/softnet_stat for drops at the backlog layer > netstat -s for network and transport layer > > yet another command and API just adds to the nightmare of explaining and > understanding these stats. Are you referring to RTM_GETSTATS when you say "yet another command"? RTM_GETSTATS exists and is used by offloads today. I'd expect ip -s (-s) to be extended to run GETSTATS and display the xdp stats. (Not sure why ip -s was left out of your list :)) > There is real value in continuing to use ethtool API for XDP stats. Not > saying this reorg of the XDP stats is the right thing to do, only that > the existing API has real user benefits. RTM_GETSTATS is an existing API. New ethtool stats are intended to be HW stats. I don't want to go back to ethtool being a dumping ground for all stats because that's what the old interface encouraged. > Does anyone have data that shows bumping a properly implemented counter > causes a noticeable performance degradation and if so by how much? You > mention 'yet another cacheline' but collecting stats on stack and > incrementing the driver structs at the end of the napi loop should not > have a huge impact versus the value the stats provide. Not sure, maybe Jesper has some numbers. Maybe Intel folks do? I'm just allergic to situations when there is a decision made and then months later patches are posted disregarding the decision, without analysis on why that decision was wrong. And while the maintainer who made the decision is on vacation.
On Wed, 4 Aug 2021 17:53:27 +0200 Alexander Lobakin wrote: > From: Jakub Kicinski <kuba@kernel.org> > Date: Wed, 4 Aug 2021 05:36:50 -0700 > > > On Tue, 03 Aug 2021 16:57:22 -0700 Saeed Mahameed wrote: > > > On Tue, 2021-08-03 at 13:49 -0700, Jakub Kicinski wrote: > > > > On Tue, 3 Aug 2021 18:36:23 +0200 Alexander Lobakin wrote: > > > > > Most of the driver-side XDP enabled drivers provide some statistics > > > > > on XDP programs runs and different actions taken (number of passes, > > > > > drops, redirects etc.). > > > > > > > > Could you please share the statistics to back that statement up? > > > > Having uAPI for XDP stats is pretty much making the recommendation > > > > that drivers should implement such stats. The recommendation from > > > > Alexei and others back in the day (IIRC) was that XDP programs should > > > > implement stats, not the drivers, to avoid duplication. > > Well, 20+ patches in the series with at least half of them is > drivers conversion. Plus mlx5. Plus we'll about to land XDP > statistics for all Intel drivers, just firstly need to get a > common infra for them (the purpose of this series). Great, do you have impact of the stats on Intel drivers? (Preferably from realistic scenarios where CPU cache is actually under pressure, not { return XDP_PASS; }). Numbers win arguments. > Also, introducing IEEE and rmon stats didn't make a statement that > all drivers should really expose them, right? That's not relevant. IEEE and RMON stats are read from HW, they have no impact on the SW fast path. > > > There are stats "mainly errors*" that are not even visible or reported > > > to the user prog, > > Not really. Many drivers like to count the number of redirects, > xdp_xmits and stuff (incl. mlx5). Nevertheless, these stats aren't > the same as something you can get from inside an XDP prog, right. > > > Fair point, exceptions should not be performance critical. > > > > > for that i had an idea in the past to attach an > > > exception_bpf_prog provided by the user, where driver/stack will report > > > errors to this special exception_prog. > > > > Or maybe we should turn trace_xdp_exception() into a call which > > unconditionally collects exception stats? I think we can reasonably > > expect the exception_bpf_prog to always be attached, right? > > trace_xdp_exception() is again a error path, and would restrict us > to have only "bad" statistics. > > > > > > Regarding that it's almost pretty the same across all the drivers > > > > > (which is obvious), we can implement some sort of "standardized" > > > > > statistics using Ethtool standard stats infra to eliminate a lot > > > > > of code and stringsets duplication, different approaches to count > > > > > these stats and so on. > > > > > > > > I'm not 100% sold on the fact that these should be ethtool stats. > > > > Why not rtnl_fill_statsinfo() stats? Current ethtool std stats are > > > > all pretty Ethernet specific, and all HW stats. Mixing HW and SW > > > > stats > > > > is what we're trying to get away from. > > I was trying to introduce as few functional changes as possible, > including that all the current drivers expose XDP stats through > Ethtool. You know this, but for the benefit of others - ethtool -S does not dump standard stats from the netlink API, and ethtool -S --goups does not dump "old" stats. So users will need to use different commands to get to the two, anyway. > I don't say it's a 100% optimal way, but lots of different scripts > and monitoring tools are already based on this fact and there can > be some negative impact. There'll be for sure due to that std stats > is a bit different thing and different drivers count and name XDP > stats differently (breh). That's concerning. I'd much rather you didn't convert all the drivers than convert them before someone makes 100% sure the meaning of the stats is equivalent. > BTW, I'm fine with rtnl xstats. A nice reminder, thanks. If there > won't be much cons like "don't touch our Ethtool stats", I would > prefer this one instead of Ethtool standard stats way. You'll have to leave the ethtool -S ones in place anyway, right? New drivers would not include them but I don't think there's much we can (or should) do for the existing ones. > > > XDP is going to always be eBPF based ! why not just report such stats > > > to a special BPF_MAP ? BPF stack can collect the stats from the driver > > > and report them to this special MAP upon user request. > > > > Do you mean replacing the ethtool-netlink / rtnetlink etc. with > > a new BPF_MAP? I don't think adding another category of uAPI thru > > which netdevice stats are exposed would do much good :( Plus it > > doesn't address the "yet another cacheline" concern. > > + this makes obtaining/tracking the statistics much harder. For now, > all you need is `ethtool -S devname` (mainline) or > `ethtool -S devname --groups xdp` (this series), and obtaining rtnl > xstats is just a different command to invoke. BPF_MAP-based stats > are a completely different story then. > > > To my understanding the need for stats recognizes the fact that (in > > large organizations) fleet monitoring is done by different teams than > > XDP development. So XDP team may have all the stats they need, but the > > team doing fleet monitoring has no idea how to get to them. > > > > To bridge the two worlds we need a way for the infra team to ask the > > XDP for well-defined stats. Maybe we should take a page from the BPF > > iterators book and create a program type for bridging the two worlds? > > Called by networking core when duping stats to extract from the > > existing BPF maps all the relevant stats and render them into a well > > known struct? Users' XDP design can still use a single per-cpu map with > > all the stats if they so choose, but there's a way to implement more > > optimal designs and still expose well-defined stats. > > > > Maybe that's too complex, IDK. > > Thanks, > Al
On 8/4/21 10:44 AM, Jakub Kicinski wrote: > On Wed, 4 Aug 2021 10:17:56 -0600 David Ahern wrote: >> On 8/4/21 6:36 AM, Jakub Kicinski wrote: >>>> XDP is going to always be eBPF based ! why not just report such stats >>>> to a special BPF_MAP ? BPF stack can collect the stats from the driver >>>> and report them to this special MAP upon user request. >>> Do you mean replacing the ethtool-netlink / rtnetlink etc. with >>> a new BPF_MAP? I don't think adding another category of uAPI thru >>> which netdevice stats are exposed would do much good :( Plus it >>> doesn't address the "yet another cacheline" concern. >>> >>> To my understanding the need for stats recognizes the fact that (in >>> large organizations) fleet monitoring is done by different teams than >>> XDP development. So XDP team may have all the stats they need, but the >>> team doing fleet monitoring has no idea how to get to them. >>> >>> To bridge the two worlds we need a way for the infra team to ask the >>> XDP for well-defined stats. Maybe we should take a page from the BPF >>> iterators book and create a program type for bridging the two worlds? >>> Called by networking core when duping stats to extract from the >>> existing BPF maps all the relevant stats and render them into a well >>> known struct? Users' XDP design can still use a single per-cpu map with >>> all the stats if they so choose, but there's a way to implement more >>> optimal designs and still expose well-defined stats. >>> >>> Maybe that's too complex, IDK. >> >> I was just explaining to someone internally how to get stats at all of >> the different points in the stack to track down reasons for dropped packets: >> >> ethtool -S for h/w and driver >> tc -s for drops by the qdisc >> /proc/net/softnet_stat for drops at the backlog layer >> netstat -s for network and transport layer >> >> yet another command and API just adds to the nightmare of explaining and >> understanding these stats. > > Are you referring to RTM_GETSTATS when you say "yet another command"? > RTM_GETSTATS exists and is used by offloads today. > > I'd expect ip -s (-s) to be extended to run GETSTATS and display the xdp > stats. (Not sure why ip -s was left out of your list :)) It's on my diagram, and yes, forgot to add it here. > >> There is real value in continuing to use ethtool API for XDP stats. Not >> saying this reorg of the XDP stats is the right thing to do, only that >> the existing API has real user benefits. > > RTM_GETSTATS is an existing API. New ethtool stats are intended to be HW > stats. I don't want to go back to ethtool being a dumping ground for all > stats because that's what the old interface encouraged. driver stats are important too. e.g., mlx5's cache stats and per-queue stats. > >> Does anyone have data that shows bumping a properly implemented counter >> causes a noticeable performance degradation and if so by how much? You >> mention 'yet another cacheline' but collecting stats on stack and >> incrementing the driver structs at the end of the napi loop should not >> have a huge impact versus the value the stats provide. > > Not sure, maybe Jesper has some numbers. Maybe Intel folks do? I just ran some quick tests with my setup and measured about 1.2% worst case. Certainly not exhaustive. Perhaps Intel or Mellanox can provide numbers for their high speed nics - e.g. ConnectX-6 and a saturated host. > > I'm just allergic to situations when there is a decision made and > then months later patches are posted disregarding the decision, > without analysis on why that decision was wrong. And while the > maintainer who made the decision is on vacation. > stats is one of the many sensitive topics. I have been consistent in defending the need to use existing APIs and tooling and not relying on XDP program writers to add the relevant stats and then provide whatever tool is needed to extract and print them. Standardization for fundamental analysis tools.
On Wed, 2021-08-04 at 11:28 -0600, David Ahern wrote: > On 8/4/21 10:44 AM, Jakub Kicinski wrote: > > On Wed, 4 Aug 2021 10:17:56 -0600 David Ahern wrote: > > > On 8/4/21 6:36 AM, Jakub Kicinski wrote: > > > > > XDP is going to always be eBPF based ! why not just report > > > > > such stats > > > > > to a special BPF_MAP ? BPF stack can collect the stats from > > > > > the driver > > > > > and report them to this special MAP upon user request. > > > > Do you mean replacing the ethtool-netlink / rtnetlink etc. with > > > > a new BPF_MAP? I don't think adding another category of uAPI > > > > thru > > > > which netdevice stats are exposed would do much good :( Plus it > > > > doesn't address the "yet another cacheline" concern. > > > > > > > > To my understanding the need for stats recognizes the fact that > > > > (in > > > > large organizations) fleet monitoring is done by different > > > > teams than > > > > XDP development. So XDP team may have all the stats they need, > > > > but the > > > > team doing fleet monitoring has no idea how to get to them. > > > > > > > > To bridge the two worlds we need a way for the infra team to > > > > ask the > > > > XDP for well-defined stats. Maybe we should take a page from > > > > the BPF > > > > iterators book and create a program type for bridging the two > > > > worlds? > > > > Called by networking core when duping stats to extract from the > > > > existing BPF maps all the relevant stats and render them into a > > > > well > > > > known struct? Users' XDP design can still use a single per-cpu > > > > map with > > > > all the stats if they so choose, but there's a way to implement > > > > more > > > > optimal designs and still expose well-defined stats. > > > > > > > > Maybe that's too complex, IDK. > > > The main question here, do we want the prog to count or driver ? and the answer will lead to more questions :) : 1) will the prog/user need to access driver for driver only stats ? or driver shall report to a special program and all the collection and reporting is done in XDP/BPF internally .. 2) stats per prog/queue/cpu/interface ? 3) how to eventually report to user ethtool/ip -s/bpftool ? too complex, IDK too .. :D > > > I was just explaining to someone internally how to get stats at > > > all of > > > the different points in the stack to track down reasons for > > > dropped packets: > > > > > > ethtool -S for h/w and driver > > > tc -s for drops by the qdisc > > > /proc/net/softnet_stat for drops at the backlog layer > > > netstat -s for network and transport layer > > > > > > yet another command and API just adds to the nightmare of > > > explaining and > > > understanding these stats. > > > > Are you referring to RTM_GETSTATS when you say "yet another > > command"? > > RTM_GETSTATS exists and is used by offloads today. > > > > I'd expect ip -s (-s) to be extended to run GETSTATS and display > > the xdp > > stats. (Not sure why ip -s was left out of your list :)) > > It's on my diagram, and yes, forgot to add it here. > i think ip -s is a good place for "standard" driver based xdp stats. but as Jakub already explained, adding such driver mechanism is like making a statement that drivers must implement this. > > > > > There is real value in continuing to use ethtool API for XDP > > > stats. Not > > > saying this reorg of the XDP stats is the right thing to do, only > > > that > > > the existing API has real user benefits. > > > > RTM_GETSTATS is an existing API. New ethtool stats are intended to > > be HW > > stats. I don't want to go back to ethtool being a dumping ground > > for all > > stats because that's what the old interface encouraged. > > driver stats are important too. e.g., mlx5's cache stats and per- > queue > stats. > one could claim that mlx5 cache stats should move to page_pool and per_queue stats should move to the stack. > > > > > Does anyone have data that shows bumping a properly implemented > > > counter > > > causes a noticeable performance degradation and if so by how > > > much? You > > > mention 'yet another cacheline' but collecting stats on stack and > > > incrementing the driver structs at the end of the napi loop > > > should not > > > have a huge impact versus the value the stats provide. > > > > Not sure, maybe Jesper has some numbers. Maybe Intel folks do? > A properly implemented counter that doesn't introduce new cache misses, will hardly show any measurable difference, the only way to measure is via instructions per packet. usually the way we implement counters in mlx5 is that if this is the fastest flow that we expect then we only increment the good counters "packet++/drop++/redirect++" any slower path should include counters to indicate the slower path and the effect of the new "slower" counters will still be negligible as we already are at a higher instructions per packet hence the slower path .. the only time you measure a difference is when you introduce new counting on a counter-free flow, e.g page_pool ;) > I just ran some quick tests with my setup and measured about 1.2% > worst 1.2% is a lot ! what was the test ? what is the change ? > case. Certainly not exhaustive. Perhaps Intel or Mellanox can provide > numbers for their high speed nics - e.g. ConnectX-6 and a saturated > host. > let's define what are we testing first, there are multiple places we need to check, Tariq will be exploring transitioning mlx5 cache to page_pool with all the counters, maybe it is a good place to measure.. > > > > I'm just allergic to situations when there is a decision made and > > then months later patches are posted disregarding the decision, > > without analysis on why that decision was wrong. And while the > > maintainer who made the decision is on vacation. > > > > stats is one of the many sensitive topics. I have been consistent in > defending the need to use existing APIs and tooling and not relying > on > XDP program writers to add the relevant stats and then provide > whatever > tool is needed to extract and print them. Standardization for > fundamental analysis tools.
On 8/4/21 12:27 PM, Saeed Mahameed wrote: > >> I just ran some quick tests with my setup and measured about 1.2% >> worst > > 1.2% is a lot ! what was the test ? what is the change ? I did say "quick test ... not exhaustive" and it was definitely eyeballing a pps change over a small time window. If multiple counters are bumped 20-25 million times a second (e.g. XDP drop case), how measurable is it? I was just trying to ballpark the overhead - 1%, 5%, more? If it is <~ 1% then there is no performance argument in which case let's do the right thing for users - export via existing APIs.
On Wed, 2021-08-04 at 18:43 -0600, David Ahern wrote: > On 8/4/21 12:27 PM, Saeed Mahameed wrote: > > > > > I just ran some quick tests with my setup and measured about 1.2% > > > worst > > > > 1.2% is a lot ! what was the test ? what is the change ? > > I did say "quick test ... not exhaustive" and it was definitely > eyeballing a pps change over a small time window. > > If multiple counters are bumped 20-25 million times a second (e.g. > XDP > drop case), how measurable is it? I was just trying to ballpark the > overhead - 1%, 5%, more? If it is <~ 1% then there is no performance > argument in which case let's do the right thing for users - export > via > existing APIs. from experience I don't believe it can be more than 1% and yes on existing APIs !
From: Jakub Kicinski <kuba@kernel.org> Date: Wed, 5 Aug 2021 09:57:16 -0700 > On Wed, 4 Aug 2021 17:53:27 +0200 Alexander Lobakin wrote: > > From: Jakub Kicinski <kuba@kernel.org> > > Date: Wed, 4 Aug 2021 05:36:50 -0700 > > > > > On Tue, 03 Aug 2021 16:57:22 -0700 Saeed Mahameed wrote: > > > > On Tue, 2021-08-03 at 13:49 -0700, Jakub Kicinski wrote: > > > > > On Tue, 3 Aug 2021 18:36:23 +0200 Alexander Lobakin wrote: > > > > > > Most of the driver-side XDP enabled drivers provide some statistics > > > > > > on XDP programs runs and different actions taken (number of passes, > > > > > > drops, redirects etc.). > > > > > > > > > > Could you please share the statistics to back that statement up? > > > > > Having uAPI for XDP stats is pretty much making the recommendation > > > > > that drivers should implement such stats. The recommendation from > > > > > Alexei and others back in the day (IIRC) was that XDP programs should > > > > > implement stats, not the drivers, to avoid duplication. > > > > Well, 20+ patches in the series with at least half of them is > > drivers conversion. Plus mlx5. Plus we'll about to land XDP > > statistics for all Intel drivers, just firstly need to get a > > common infra for them (the purpose of this series). > > Great, do you have impact of the stats on Intel drivers? > (Preferably from realistic scenarios where CPU cache is actually > under pressure, not { return XDP_PASS; }). Numbers win arguments. > > > Also, introducing IEEE and rmon stats didn't make a statement that > > all drivers should really expose them, right? > > That's not relevant. IEEE and RMON stats are read from HW, they have > no impact on the SW fast path. True, I thought about this, but after the mail was sent, sorry. > > > > There are stats "mainly errors*" that are not even visible or reported > > > > to the user prog, > > > > Not really. Many drivers like to count the number of redirects, > > xdp_xmits and stuff (incl. mlx5). Nevertheless, these stats aren't > > the same as something you can get from inside an XDP prog, right. > > > > > Fair point, exceptions should not be performance critical. > > > > > > > for that i had an idea in the past to attach an > > > > exception_bpf_prog provided by the user, where driver/stack will report > > > > errors to this special exception_prog. > > > > > > Or maybe we should turn trace_xdp_exception() into a call which > > > unconditionally collects exception stats? I think we can reasonably > > > expect the exception_bpf_prog to always be attached, right? > > > > trace_xdp_exception() is again a error path, and would restrict us > > to have only "bad" statistics. > > > > > > > > Regarding that it's almost pretty the same across all the drivers > > > > > > (which is obvious), we can implement some sort of "standardized" > > > > > > statistics using Ethtool standard stats infra to eliminate a lot > > > > > > of code and stringsets duplication, different approaches to count > > > > > > these stats and so on. > > > > > > > > > > I'm not 100% sold on the fact that these should be ethtool stats. > > > > > Why not rtnl_fill_statsinfo() stats? Current ethtool std stats are > > > > > all pretty Ethernet specific, and all HW stats. Mixing HW and SW > > > > > stats > > > > > is what we're trying to get away from. > > > > I was trying to introduce as few functional changes as possible, > > including that all the current drivers expose XDP stats through > > Ethtool. > > You know this, but for the benefit of others - ethtool -S does not > dump standard stats from the netlink API, and ethtool -S --goups does > not dump "old" stats. So users will need to use different commands > to get to the two, anyway. Ditto. > > I don't say it's a 100% optimal way, but lots of different scripts > > and monitoring tools are already based on this fact and there can > > be some negative impact. There'll be for sure due to that std stats > > is a bit different thing and different drivers count and name XDP > > stats differently (breh). > > That's concerning. I'd much rather you didn't convert all the drivers > than convert them before someone makes 100% sure the meaning of the > stats is equivalent. > > > BTW, I'm fine with rtnl xstats. A nice reminder, thanks. If there > > won't be much cons like "don't touch our Ethtool stats", I would > > prefer this one instead of Ethtool standard stats way. > > You'll have to leave the ethtool -S ones in place anyway, right? > New drivers would not include them but I don't think there's much > we can (or should) do for the existing ones. That's also a nice and fair one. So I leave a little conclusion at the end of this message to clarify things. > > > > XDP is going to always be eBPF based ! why not just report such stats > > > > to a special BPF_MAP ? BPF stack can collect the stats from the driver > > > > and report them to this special MAP upon user request. > > > > > > Do you mean replacing the ethtool-netlink / rtnetlink etc. with > > > a new BPF_MAP? I don't think adding another category of uAPI thru > > > which netdevice stats are exposed would do much good :( Plus it > > > doesn't address the "yet another cacheline" concern. > > > > + this makes obtaining/tracking the statistics much harder. For now, > > all you need is `ethtool -S devname` (mainline) or > > `ethtool -S devname --groups xdp` (this series), and obtaining rtnl > > xstats is just a different command to invoke. BPF_MAP-based stats > > are a completely different story then. > > > > > To my understanding the need for stats recognizes the fact that (in > > > large organizations) fleet monitoring is done by different teams than > > > XDP development. So XDP team may have all the stats they need, but the > > > team doing fleet monitoring has no idea how to get to them. > > > > > > To bridge the two worlds we need a way for the infra team to ask the > > > XDP for well-defined stats. Maybe we should take a page from the BPF > > > iterators book and create a program type for bridging the two worlds? > > > Called by networking core when duping stats to extract from the > > > existing BPF maps all the relevant stats and render them into a well > > > known struct? Users' XDP design can still use a single per-cpu map with > > > all the stats if they so choose, but there's a way to implement more > > > optimal designs and still expose well-defined stats. > > > > > > Maybe that's too complex, IDK. [ the conclusion I talk about above ] From the concerns, ideas and thoughts gone throughout this thread, I see the next steps as: - don't use Ethtool standard stats infra as it's designed for HW stats only; - expose SW XDP stats via rtnl xstats which is currently used to query SW/CPU-port stats from switchdev-based drivers; - don't remove the existing Ethtool XDP stats provided by drivers, just wire them up with the new API too; - encourage driver developers to use this new API when they want to provide any XDP stats, not the Ethtool stats which are already overburdened and mix HW, SW and whatnot in most of the complex drivers. That really makes lot more sense for me than the v1 approach. Anyways, I don't think I'll queue v2 before going on a vacation which will happen in two days, so we have two more weeks to discuss all this before I'll start the rework. I'll still be on the line here and keep on tracking the thread and replying when needed. > > Thanks, > > Al Thanks everyone for the feedback, lots of precious stuff, Al
On Thu, 5 Aug 2021 13:18:26 +0200 Alexander Lobakin wrote: > - encourage driver developers to use this new API when they want to > provide any XDP stats, not the Ethtool stats which are already > overburdened and mix HW, SW and whatnot in most of the complex > drivers. On the question of adding the stats in general I'd still ask for (a) performance analysis or (b) to start with just the exception stats which I'd think should be uncontroversial.
On 04/08/2021 18.44, Jakub Kicinski wrote: > On Wed, 4 Aug 2021 10:17:56 -0600 David Ahern wrote: >> On 8/4/21 6:36 AM, Jakub Kicinski wrote: > >> Does anyone have data that shows bumping a properly implemented counter >> causes a noticeable performance degradation and if so by how much? You >> mention 'yet another cacheline' but collecting stats on stack and >> incrementing the driver structs at the end of the napi loop should not >> have a huge impact versus the value the stats provide. > > Not sure, maybe Jesper has some numbers. Maybe Intel folks do? (sorry, behind on emails after vacation ... just partly answering inside this thread, not checking if you did a smart counter impl.). I don't have exact numbers, but I hope Magnus (Intel) would be motivated to validate performance degradation from this patchset. As I know Intel is hunting the DPDK numbers with AF_XDP-zc, where every last cycle *do* count. My experience is that counters can easily hurt performance, without the developers noticing the small degradation's. As Ahern sketch out above (stats on stack + end of napi loop update), I do believe that a smart counter implementation is possible to hide this overhead (hopefully completely in the CPUs pipeline slots). I do highly appreciate the effort to standardize the XDP stats! So, I do hope this can somehow move forward. --Jesper
From: Saeed Mahameed <saeed@kernel.org> Date: Tue, 03 Aug 2021 16:57:22 -0700 [ snip ] > XDP is going to always be eBPF based ! why not just report such stats > to a special BPF_MAP ? BPF stack can collect the stats from the driver > and report them to this special MAP upon user request. I really dig this idea now. How do you see it? <ifindex:channel:stat_id> as a key and its value as a value or ...? [ snip ] Thanks, Al
From: Alexander Lobakin <alexandr.lobakin@intel.com> Date: Tue, 26 Oct 2021 11:23:23 +0200 > From: Saeed Mahameed <saeed@kernel.org> > Date: Tue, 03 Aug 2021 16:57:22 -0700 > > [ snip ] > > > XDP is going to always be eBPF based ! why not just report such stats > > to a special BPF_MAP ? BPF stack can collect the stats from the driver > > and report them to this special MAP upon user request. > > I really dig this idea now. How do you see it? > <ifindex:channel:stat_id> as a key and its value as a value or ...? Ideas, suggestions, anyone? > [ snip ] > > Thanks, > Al Thanks, Al
Alexander Lobakin <alexandr.lobakin@intel.com> writes: > From: Alexander Lobakin <alexandr.lobakin@intel.com> > Date: Tue, 26 Oct 2021 11:23:23 +0200 > >> From: Saeed Mahameed <saeed@kernel.org> >> Date: Tue, 03 Aug 2021 16:57:22 -0700 >> >> [ snip ] >> >> > XDP is going to always be eBPF based ! why not just report such stats >> > to a special BPF_MAP ? BPF stack can collect the stats from the driver >> > and report them to this special MAP upon user request. >> >> I really dig this idea now. How do you see it? >> <ifindex:channel:stat_id> as a key and its value as a value or ...? > > Ideas, suggestions, anyone? I don't like the idea of putting statistics in a map instead of the regular statistics counters. Sure, for bespoke things people want to put into their XDP programs, use a map, but for regular packet/byte counters, update the regular counters so XDP isn't "invisible". As Jesper pointed out, batching the updates so the global counters are only updated once per NAPI cycle is the way to avoid a huge performance overhead of this... -Toke
From: Toke Høiland-Jørgensen <toke@redhat.com> Date: Mon, 08 Nov 2021 12:37:54 +0100 > Alexander Lobakin <alexandr.lobakin@intel.com> writes: > > > From: Alexander Lobakin <alexandr.lobakin@intel.com> > > Date: Tue, 26 Oct 2021 11:23:23 +0200 > > > >> From: Saeed Mahameed <saeed@kernel.org> > >> Date: Tue, 03 Aug 2021 16:57:22 -0700 > >> > >> [ snip ] > >> > >> > XDP is going to always be eBPF based ! why not just report such stats > >> > to a special BPF_MAP ? BPF stack can collect the stats from the driver > >> > and report them to this special MAP upon user request. > >> > >> I really dig this idea now. How do you see it? > >> <ifindex:channel:stat_id> as a key and its value as a value or ...? > > > > Ideas, suggestions, anyone? > > I don't like the idea of putting statistics in a map instead of the > regular statistics counters. Sure, for bespoke things people want to put > into their XDP programs, use a map, but for regular packet/byte > counters, update the regular counters so XDP isn't "invisible". I wanted to provide an `ip link` command for getting these stats from maps and printing them in a usual format as well, but seems like that's an unneeded overcomplication of things since using maps for "regular"/"generic" XDP stats really has no reason except for "XDP means eBPF means maps". > As Jesper pointed out, batching the updates so the global counters are > only updated once per NAPI cycle is the way to avoid a huge performance > overhead of this... That's how I do things currently, seems to work just fine. > -Toke Thanks, Al
Alexander Lobakin <alexandr.lobakin@intel.com> writes: > From: Toke Høiland-Jørgensen <toke@redhat.com> > Date: Mon, 08 Nov 2021 12:37:54 +0100 > >> Alexander Lobakin <alexandr.lobakin@intel.com> writes: >> >> > From: Alexander Lobakin <alexandr.lobakin@intel.com> >> > Date: Tue, 26 Oct 2021 11:23:23 +0200 >> > >> >> From: Saeed Mahameed <saeed@kernel.org> >> >> Date: Tue, 03 Aug 2021 16:57:22 -0700 >> >> >> >> [ snip ] >> >> >> >> > XDP is going to always be eBPF based ! why not just report such stats >> >> > to a special BPF_MAP ? BPF stack can collect the stats from the driver >> >> > and report them to this special MAP upon user request. >> >> >> >> I really dig this idea now. How do you see it? >> >> <ifindex:channel:stat_id> as a key and its value as a value or ...? >> > >> > Ideas, suggestions, anyone? >> >> I don't like the idea of putting statistics in a map instead of the >> regular statistics counters. Sure, for bespoke things people want to put >> into their XDP programs, use a map, but for regular packet/byte >> counters, update the regular counters so XDP isn't "invisible". > > I wanted to provide an `ip link` command for getting these stats > from maps and printing them in a usual format as well, but seems > like that's an unneeded overcomplication of things since using > maps for "regular"/"generic" XDP stats really has no reason except > for "XDP means eBPF means maps". Yeah, don't really see why it would have to: to me, one of the benefits of XDP is being integrated closely with the kernel so we can have a "fast path" *without* reinventing everything... >> As Jesper pointed out, batching the updates so the global counters are >> only updated once per NAPI cycle is the way to avoid a huge performance >> overhead of this... > > That's how I do things currently, seems to work just fine. Awesome! -Toke
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 4711b96dae0c..62d617ad9f50 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -380,6 +380,36 @@ struct ethtool_rmon_stats { u64 hist_tx[ETHTOOL_RMON_HIST_MAX]; }; +/** + * struct ethtool_xdp_stats - standard driver-side XDP statistics + * @packets: number of frames passed to bpf_prog_run_xdp(). + * @errors: number of general XDP errors, if driver has one unified counter. + * @aborted: number of %XDP_ABORTED returns. + * @drop: number of %XDP_DROP returns. + * @invalid: number of returns of unallowed values (i.e. not XDP_*). + * @pass: number of %XDP_PASS returns. + * @redirect: number of successfully performed %XDP_REDIRECT requests. + * @redirect_errors: number of failed %XDP_REDIRECT requests. + * @tx: number of successfully performed %XDP_TX requests. + * @tx_errors: number of failed %XDP_TX requests. + * @xmit: number of xdp_frames successfully transmitted via .ndo_xdp_xmit(). + * @xmit_drops: number of frames dropped from .ndo_xdp_xmit(). + */ +struct ethtool_xdp_stats { + u64 packets; + u64 errors; + u64 aborted; + u64 drop; + u64 invalid; + u64 pass; + u64 redirect; + u64 redirect_errors; + u64 tx; + u64 tx_errors; + u64 xmit; + u64 xmit_drops; +}; + #define ETH_MODULE_EEPROM_PAGE_LEN 128 #define ETH_MODULE_MAX_I2C_ADDRESS 0x7f @@ -570,6 +600,9 @@ struct ethtool_module_eeprom { * @get_eth_ctrl_stats: Query some of the IEEE 802.3 MAC Ctrl statistics. * @get_rmon_stats: Query some of the RMON (RFC 2819) statistics. * Set %ranges to a pointer to zero-terminated array of byte ranges. + * @get_std_stats_channels: Get the number of channels which get_*_stats will + * return statistics for. + * @get_xdp_stats: Query some XDP statistics. * * All operations are optional (i.e. the function pointer may be set * to %NULL) and callers must take this into account. Callers must @@ -689,6 +722,9 @@ struct ethtool_ops { void (*get_rmon_stats)(struct net_device *dev, struct ethtool_rmon_stats *rmon_stats, const struct ethtool_rmon_hist_range **ranges); + int (*get_std_stats_channels)(struct net_device *dev, u32 sset); + void (*get_xdp_stats)(struct net_device *dev, + struct ethtool_xdp_stats *xdp_stats); }; int ethtool_check_ops(const struct ethtool_ops *ops); diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 67aa7134b301..c3f1f3cde739 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -674,6 +674,7 @@ enum ethtool_link_ext_substate_cable_issue { * @ETH_SS_STATS_ETH_MAC: names of IEEE 802.3 MAC statistics * @ETH_SS_STATS_ETH_CTRL: names of IEEE 802.3 MAC Control statistics * @ETH_SS_STATS_RMON: names of RMON statistics + * @ETH_SS_STATS_XDP: names of XDP statistics * * @ETH_SS_COUNT: number of defined string sets */ @@ -699,6 +700,7 @@ enum ethtool_stringset { ETH_SS_STATS_ETH_MAC, ETH_SS_STATS_ETH_CTRL, ETH_SS_STATS_RMON, + ETH_SS_STATS_XDP, /* add new constants above here */ ETH_SS_COUNT diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index b3b93710eff7..f9d19cfa9397 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -716,6 +716,7 @@ enum { ETHTOOL_STATS_ETH_MAC, ETHTOOL_STATS_ETH_CTRL, ETHTOOL_STATS_RMON, + ETHTOOL_STATS_XDP, /* add new constants above here */ __ETHTOOL_STATS_CNT @@ -737,6 +738,8 @@ enum { ETHTOOL_A_STATS_GRP_HIST_BKT_HI, /* u32 */ ETHTOOL_A_STATS_GRP_HIST_VAL, /* u64 */ + ETHTOOL_A_STATS_GRP_STAT_BLOCK, /* nest */ + /* add new constants above here */ __ETHTOOL_A_STATS_GRP_CNT, ETHTOOL_A_STATS_GRP_MAX = (__ETHTOOL_A_STATS_CNT - 1) @@ -831,6 +834,37 @@ enum { ETHTOOL_A_STATS_RMON_MAX = (__ETHTOOL_A_STATS_RMON_CNT - 1) }; +enum { + /* Number of frames passed to bpf_prog_run_xdp() */ + ETHTOOL_A_STATS_XDP_PACKETS, + /* Number o general XDP errors if driver counts them together */ + ETHTOOL_A_STATS_XDP_ERRORS, + /* Number of %XDP_ABORTED returns */ + ETHTOOL_A_STATS_XDP_ABORTED, + /* Number of %XDP_DROP returns */ + ETHTOOL_A_STATS_XDP_DROP, + /* Number of returns of unallowed values (i.e. not XDP_*) */ + ETHTOOL_A_STATS_XDP_INVALID, + /* Number of %XDP_PASS returns */ + ETHTOOL_A_STATS_XDP_PASS, + /* Number of successfully performed %XDP_REDIRECT requests */ + ETHTOOL_A_STATS_XDP_REDIRECT, + /* Number of failed %XDP_REDIRECT requests */ + ETHTOOL_A_STATS_XDP_REDIRECT_ERRORS, + /* Number of successfully performed %XDP_TX requests */ + ETHTOOL_A_STATS_XDP_TX, + /* Number of failed %XDP_TX requests */ + ETHTOOL_A_STATS_XDP_TX_ERRORS, + /* Number of xdp_frames successfully transmitted via .ndo_xdp_xmit() */ + ETHTOOL_A_STATS_XDP_XMIT, + /* Number of frames dropped from .ndo_xdp_xmit() */ + ETHTOOL_A_STATS_XDP_XMIT_DROPS, + + /* Add new constants above here */ + __ETHTOOL_A_STATS_XDP_CNT, + ETHTOOL_A_STATS_XDP_MAX = (__ETHTOOL_A_STATS_XDP_CNT - 1) +}; + /* generic netlink info */ #define ETHTOOL_GENL_NAME "ethtool" #define ETHTOOL_GENL_VERSION 1 diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h index 077aac3929a8..c7983bba0b43 100644 --- a/net/ethtool/netlink.h +++ b/net/ethtool/netlink.h @@ -397,5 +397,6 @@ extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING extern const char stats_eth_mac_names[__ETHTOOL_A_STATS_ETH_MAC_CNT][ETH_GSTRING_LEN]; extern const char stats_eth_ctrl_names[__ETHTOOL_A_STATS_ETH_CTRL_CNT][ETH_GSTRING_LEN]; extern const char stats_rmon_names[__ETHTOOL_A_STATS_RMON_CNT][ETH_GSTRING_LEN]; +extern const char stats_xdp_names[__ETHTOOL_A_STATS_XDP_CNT][ETH_GSTRING_LEN]; #endif /* _NET_ETHTOOL_NETLINK_H */ diff --git a/net/ethtool/stats.c b/net/ethtool/stats.c index 8b5c27e034f9..76f2a78e8d02 100644 --- a/net/ethtool/stats.c +++ b/net/ethtool/stats.c @@ -19,6 +19,8 @@ struct stats_reply_data { struct ethtool_eth_ctrl_stats ctrl_stats; struct ethtool_rmon_stats rmon_stats; const struct ethtool_rmon_hist_range *rmon_ranges; + u32 xdp_stats_channels; + struct ethtool_xdp_stats *xdp_stats; }; #define STATS_REPDATA(__reply_base) \ @@ -29,6 +31,7 @@ const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN] = { [ETHTOOL_STATS_ETH_MAC] = "eth-mac", [ETHTOOL_STATS_ETH_CTRL] = "eth-ctrl", [ETHTOOL_STATS_RMON] = "rmon", + [ETHTOOL_STATS_XDP] = "xdp", }; const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN] = { @@ -73,6 +76,21 @@ const char stats_rmon_names[__ETHTOOL_A_STATS_RMON_CNT][ETH_GSTRING_LEN] = { [ETHTOOL_A_STATS_RMON_JABBER] = "etherStatsJabbers", }; +const char stats_xdp_names[__ETHTOOL_A_STATS_XDP_CNT][ETH_GSTRING_LEN] = { + [ETHTOOL_A_STATS_XDP_PACKETS] = "packets", + [ETHTOOL_A_STATS_XDP_ERRORS] = "errors", + [ETHTOOL_A_STATS_XDP_ABORTED] = "aborted", + [ETHTOOL_A_STATS_XDP_DROP] = "drop", + [ETHTOOL_A_STATS_XDP_INVALID] = "invalid", + [ETHTOOL_A_STATS_XDP_PASS] = "pass", + [ETHTOOL_A_STATS_XDP_REDIRECT] = "redirect", + [ETHTOOL_A_STATS_XDP_REDIRECT_ERRORS] = "redirect-errors", + [ETHTOOL_A_STATS_XDP_TX] = "tx", + [ETHTOOL_A_STATS_XDP_TX_ERRORS] = "tx-errors", + [ETHTOOL_A_STATS_XDP_XMIT] = "xmit", + [ETHTOOL_A_STATS_XDP_XMIT_DROPS] = "xmit-drops", +}; + const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1] = { [ETHTOOL_A_STATS_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy), @@ -101,6 +119,37 @@ static int stats_parse_request(struct ethnl_req_info *req_base, return 0; } +static int stats_prepare_data_xdp(struct net_device *dev, + struct stats_reply_data *data) +{ + const struct ethtool_ops *ops = dev->ethtool_ops; + size_t size; + int ret; + + /* Zero means stats are global/device-wide */ + data->xdp_stats_channels = 0; + + if (ops->get_std_stats_channels) { + ret = ops->get_std_stats_channels(dev, ETH_SS_STATS_XDP); + if (ret > 0) + data->xdp_stats_channels = ret; + } + + size = array_size(min_not_zero(data->xdp_stats_channels, 1U), + sizeof(*data->xdp_stats)); + if (unlikely(size == SIZE_MAX)) + return -EOVERFLOW; + + data->xdp_stats = kvmalloc(size, GFP_KERNEL); + if (!data->xdp_stats) + return -ENOMEM; + + memset(data->xdp_stats, 0xff, size); + ops->get_xdp_stats(dev, data->xdp_stats); + + return 0; +} + static int stats_prepare_data(const struct ethnl_req_info *req_base, struct ethnl_reply_data *reply_base, struct genl_info *info) @@ -125,6 +174,8 @@ static int stats_prepare_data(const struct ethnl_req_info *req_base, __ETHTOOL_A_STATS_ETH_CTRL_CNT); BUILD_BUG_ON(offsetof(typeof(data->rmon_stats), hist) / sizeof(u64) != __ETHTOOL_A_STATS_RMON_CNT); + BUILD_BUG_ON(sizeof(*data->xdp_stats) / sizeof(u64) != + __ETHTOOL_A_STATS_XDP_CNT); /* Mark all stats as unset (see ETHTOOL_STAT_NOT_SET) to prevent them * from being reported to user space in case driver did not set them. @@ -146,15 +197,19 @@ static int stats_prepare_data(const struct ethnl_req_info *req_base, if (test_bit(ETHTOOL_STATS_RMON, req_info->stat_mask) && ops->get_rmon_stats) ops->get_rmon_stats(dev, &data->rmon_stats, &data->rmon_ranges); + if (test_bit(ETHTOOL_STATS_XDP, req_info->stat_mask) && + ops->get_xdp_stats) + ret = stats_prepare_data_xdp(dev, data); ethnl_ops_complete(dev); - return 0; + return ret; } static int stats_reply_size(const struct ethnl_req_info *req_base, const struct ethnl_reply_data *reply_base) { const struct stats_req_info *req_info = STATS_REQINFO(req_base); + const struct stats_reply_data *data = STATS_REPDATA(reply_base); unsigned int n_grps = 0, n_stats = 0; int len = 0; @@ -180,6 +235,14 @@ static int stats_reply_size(const struct ethnl_req_info *req_base, nla_total_size(4)) * /* _A_STATS_GRP_HIST_BKT_HI */ ETHTOOL_RMON_HIST_MAX * 2; } + if (test_bit(ETHTOOL_STATS_XDP, req_info->stat_mask)) { + n_stats += min_not_zero(data->xdp_stats_channels, 1U) * + sizeof(*data->xdp_stats) / sizeof(u64); + n_grps++; + + len += (nla_total_size(0) * /* _A_STATS_GRP_STAT_BLOCK */ + data->xdp_stats_channels); + } len += n_grps * (nla_total_size(0) + /* _A_STATS_GRP */ nla_total_size(4) + /* _A_STATS_GRP_ID */ @@ -356,6 +419,64 @@ static int stats_put_rmon_stats(struct sk_buff *skb, return 0; } +static int stats_put_xdp_stats_one(struct sk_buff *skb, + const struct ethtool_xdp_stats *xdp_stats) +{ + if (stat_put(skb, ETHTOOL_A_STATS_XDP_PACKETS, + xdp_stats->packets) || + stat_put(skb, ETHTOOL_A_STATS_XDP_ABORTED, + xdp_stats->aborted) || + stat_put(skb, ETHTOOL_A_STATS_XDP_DROP, + xdp_stats->drop) || + stat_put(skb, ETHTOOL_A_STATS_XDP_INVALID, + xdp_stats->invalid) || + stat_put(skb, ETHTOOL_A_STATS_XDP_PASS, + xdp_stats->pass) || + stat_put(skb, ETHTOOL_A_STATS_XDP_REDIRECT, + xdp_stats->redirect) || + stat_put(skb, ETHTOOL_A_STATS_XDP_REDIRECT_ERRORS, + xdp_stats->redirect_errors) || + stat_put(skb, ETHTOOL_A_STATS_XDP_TX, + xdp_stats->tx) || + stat_put(skb, ETHTOOL_A_STATS_XDP_TX_ERRORS, + xdp_stats->tx_errors) || + stat_put(skb, ETHTOOL_A_STATS_XDP_XMIT, + xdp_stats->xmit) || + stat_put(skb, ETHTOOL_A_STATS_XDP_XMIT_DROPS, + xdp_stats->xmit_drops)) + return -EMSGSIZE; + + return 0; +} + +static int stats_put_xdp_stats(struct sk_buff *skb, + const struct stats_reply_data *data) +{ + struct nlattr *nest; + int i; + + if (!data->xdp_stats_channels) + return stats_put_xdp_stats_one(skb, data->xdp_stats); + + for (i = 0; i < data->xdp_stats_channels; i++) { + nest = nla_nest_start(skb, ETHTOOL_A_STATS_GRP_STAT_BLOCK); + if (!nest) + return -EMSGSIZE; + + if (stats_put_xdp_stats_one(skb, data->xdp_stats + i)) + goto err_cancel_xdp; + + nla_nest_end(skb, nest); + } + + return 0; + +err_cancel_xdp: + nla_nest_cancel(skb, nest); + + return -EMSGSIZE; +} + static int stats_put_stats(struct sk_buff *skb, const struct stats_reply_data *data, u32 id, u32 ss_id, @@ -406,10 +527,20 @@ static int stats_fill_reply(struct sk_buff *skb, if (!ret && test_bit(ETHTOOL_STATS_RMON, req_info->stat_mask)) ret = stats_put_stats(skb, data, ETHTOOL_STATS_RMON, ETH_SS_STATS_RMON, stats_put_rmon_stats); + if (!ret && test_bit(ETHTOOL_STATS_XDP, req_info->stat_mask)) + ret = stats_put_stats(skb, data, ETHTOOL_STATS_XDP, + ETH_SS_STATS_XDP, stats_put_xdp_stats); return ret; } +static void stats_cleanup_data(struct ethnl_reply_data *reply_data) +{ + struct stats_reply_data *data = STATS_REPDATA(reply_data); + + kvfree(data->xdp_stats); +} + const struct ethnl_request_ops ethnl_stats_request_ops = { .request_cmd = ETHTOOL_MSG_STATS_GET, .reply_cmd = ETHTOOL_MSG_STATS_GET_REPLY, @@ -421,4 +552,5 @@ const struct ethnl_request_ops ethnl_stats_request_ops = { .prepare_data = stats_prepare_data, .reply_size = stats_reply_size, .fill_reply = stats_fill_reply, + .cleanup_data = stats_cleanup_data, }; diff --git a/net/ethtool/strset.c b/net/ethtool/strset.c index 2d51b7ab4dc5..a149048f9c34 100644 --- a/net/ethtool/strset.c +++ b/net/ethtool/strset.c @@ -105,6 +105,11 @@ static const struct strset_info info_template[] = { .count = __ETHTOOL_A_STATS_RMON_CNT, .strings = stats_rmon_names, }, + [ETH_SS_STATS_XDP] = { + .per_dev = false, + .count = __ETHTOOL_A_STATS_XDP_CNT, + .strings = stats_xdp_names, + }, }; struct strset_req_info {