[v2,net-next,00/26] net: introduce and use generic XDP stats

Message ID	20211123163955.154512-1-alexandr.lobakin@intel.com (mailing list archive)
Headers	show Return-Path: <netdev-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98935C433FE for <netdev@archiver.kernel.org>; Tue, 23 Nov 2021 16:41:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233715AbhKWQo0 (ORCPT <rfc822;netdev@archiver.kernel.org>); Tue, 23 Nov 2021 11:44:26 -0500 Received: from mga14.intel.com ([192.55.52.115]:62210 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229992AbhKWQoZ (ORCPT <rfc822;netdev@vger.kernel.org>); Tue, 23 Nov 2021 11:44:25 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10177"; a="235294599" X-IronPort-AV: E=Sophos;i="5.87,258,1631602800"; d="scan'208";a="235294599" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2021 08:41:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,258,1631602800"; d="scan'208";a="591251752" Received: from irvmail001.ir.intel.com ([10.43.11.63]) by FMSMGA003.fm.intel.com with ESMTP; 23 Nov 2021 08:41:07 -0800 Received: from newjersey.igk.intel.com (newjersey.igk.intel.com [10.102.20.203]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 1ANGf4Wc016784; Tue, 23 Nov 2021 16:41:04 GMT From: Alexander Lobakin <alexandr.lobakin@intel.com> To: "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org> Cc: Alexander Lobakin <alexandr.lobakin@intel.com>, Jesse Brandeburg <jesse.brandeburg@intel.com>, Michal Swiatkowski <michal.swiatkowski@linux.intel.com>, Maciej Fijalkowski <maciej.fijalkowski@intel.com>, Jonathan Corbet <corbet@lwn.net>, Shay Agroskin <shayagr@amazon.com>, Arthur Kiyanovski <akiyano@amazon.com>, David Arinzon <darinzon@amazon.com>, Noam Dagan <ndagan@amazon.com>, Saeed Bishara <saeedb@amazon.com>, Ioana Ciornei <ioana.ciornei@nxp.com>, Claudiu Manoil <claudiu.manoil@nxp.com>, Tony Nguyen <anthony.l.nguyen@intel.com>, Thomas Petazzoni <thomas.petazzoni@bootlin.com>, Marcin Wojtas <mw@semihalf.com>, Russell King <linux@armlinux.org.uk>, Saeed Mahameed <saeedm@nvidia.com>, Leon Romanovsky <leon@kernel.org>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Jesper Dangaard Brouer <hawk@kernel.org>, =?utf-8?q?Toke_H=C3=B8iland-J?= =?utf-8?q?=C3=B8rgensen?= <toke@redhat.com>, John Fastabend <john.fastabend@gmail.com>, Edward Cree <ecree.xilinx@gmail.com>, Martin Habets <habetsm.xilinx@gmail.com>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>, Lorenzo Bianconi <lorenzo@kernel.org>, Yajun Deng <yajun.deng@linux.dev>, Sergey Ryazanov <ryazanov.s.a@gmail.com>, David Ahern <dsahern@kernel.org>, Andrei Vagin <avagin@gmail.com>, Johannes Berg <johannes.berg@intel.com>, Vladimir Oltean <vladimir.oltean@nxp.com>, Cong Wang <cong.wang@bytedance.com>, netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, bpf@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [PATCH v2 net-next 00/26] net: introduce and use generic XDP stats Date: Tue, 23 Nov 2021 17:39:29 +0100 Message-Id: <20211123163955.154512-1-alexandr.lobakin@intel.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org
Series	net: introduce and use generic XDP stats \| expand [v2,net-next,00/26] net: introduce and use generic XDP stats [v2,net-next,01/26] rtnetlink: introduce generic XDP statistics [v2,net-next,02/26] xdp: provide common driver helpers for implementing XDP stats [v2,net-next,03/26] ena: implement generic XDP statistics callbacks [v2,net-next,04/26] dpaa2: implement generic XDP stats callbacks [v2,net-next,05/26] enetc: implement generic XDP stats callbacks [v2,net-next,06/26] mvneta: reformat mvneta_netdev_ops [v2,net-next,07/26] mvneta: add .ndo_get_xdp_stats() callback [v2,net-next,08/26] mvpp2: provide .ndo_get_xdp_stats() callback [v2,net-next,09/26] mlx5: don't mix XDP_DROP and Rx XDP error cases [v2,net-next,10/26] mlx5: provide generic XDP stats callbacks [v2,net-next,11/26] sf100, sfx: implement generic XDP stats callbacks [v2,net-next,12/26] veth: don't mix XDP_DROP counter with Rx XDP errors [v2,net-next,13/26] veth: drop 'xdp_' suffix from packets and bytes stats [v2,net-next,14/26] veth: reformat veth_netdev_ops [v2,net-next,15/26] veth: add generic XDP stats callbacks [v2,net-next,16/26] virtio_net: don't mix XDP_DROP counter with Rx XDP errors [v2,net-next,17/26] virtio_net: rename xdp_tx{,_drops} SQ stats to xdp_xmit{,_errors} [v2,net-next,18/26] virtio_net: reformat virtnet_netdev [v2,net-next,19/26] virtio_net: add callbacks for generic XDP stats [v2,net-next,20/26] i40e: add XDP and XSK generic per-channel statistics [v2,net-next,21/26] ice: add XDP and XSK generic per-channel statistics [v2,net-next,22/26] igb: add XDP generic per-channel statistics [v2,net-next,23/26] igc: bail out early on XSK xmit if no descs are available [v2,net-next,24/26] igc: add XDP and XSK generic per-channel statistics [v2,net-next,25/26] ixgbe: add XDP and XSK generic per-channel statistics [v2,net-next,26/26] Documentation: reflect generic XDP statistics

Alexander Lobakin Nov. 23, 2021, 4:39 p.m. UTC

This is an almost complete rework of [0].

This series introduces generic XDP statistics infra based on rtnl
xstats (Ethtool standard stats previously), and wires up the drivers
which collect appropriate statistics to this new interface. Finally,
it introduces XDP/XSK statistics to all XDP-capable Intel drivers.

Those counters are:
* packets: number of frames passed to bpf_prog_run_xdp().
* bytes: number of bytes went through bpf_prog_run_xdp().
* errors: number of general XDP errors, if driver has one unified
  counter.
* aborted: number of XDP_ABORTED returns.
* drop: number of XDP_DROP returns.
* invalid: number of returns of unallowed values (i.e. not XDP_*).
* pass: number of XDP_PASS returns.
* redirect: number of successfully performed XDP_REDIRECT requests.
* redirect_errors: number of failed XDP_REDIRECT requests.
* tx: number of successfully performed XDP_TX requests.
* tx_errors: number of failed XDP_TX requests.
* xmit_packets: number of successfully transmitted XDP/XSK frames.
* xmit_bytes: number of successfully transmitted XDP/XSK frames.
* xmit_errors: of XDP/XSK frames failed to transmit.
* xmit_full: number of XDP/XSK queue being full at the moment of
  transmission.

To provide them, developers need to implement .ndo_get_xdp_stats()
and, if they want to expose stats on a per-channel basis,
.ndo_get_xdp_stats_nch(). include/net/xdp.h contains some helper
structs and functions which might be useful for doing this.
It is up to developers to decide whether to implement XDP stats in
their drivers or not, depending on the needs and so on, but if so,
it is implied that they will be implemented using this new infra
rather than custom Ethtool entries. XDP stats {,type} list can be
expanded if needed as counters are being provided as nested NL attrs.
There's an option to provide XDP and XSK counters separately, and I
used it in mlx5 (has separate RQs and SQs) and Intel (have separate
NAPI poll routines) drivers.

Example output of iproute2's new command:

$ ip link xdpstats dev enp178s0
16: enp178s0:
xdp-channel0-rx_xdp_packets: 0
xdp-channel0-rx_xdp_bytes: 1
xdp-channel0-rx_xdp_errors: 2
xdp-channel0-rx_xdp_aborted: 3
xdp-channel0-rx_xdp_drop: 4
xdp-channel0-rx_xdp_invalid: 5
xdp-channel0-rx_xdp_pass: 6
xdp-channel0-rx_xdp_redirect: 7
xdp-channel0-rx_xdp_redirect_errors: 8
xdp-channel0-rx_xdp_tx: 9
xdp-channel0-rx_xdp_tx_errors: 10
xdp-channel0-tx_xdp_xmit_packets: 11
xdp-channel0-tx_xdp_xmit_bytes: 12
xdp-channel0-tx_xdp_xmit_errors: 13
xdp-channel0-tx_xdp_xmit_full: 14
[ ... ]

This series doesn't touch existing Ethtool-exposed XDP stats due
to the potential breakage of custom{,er} scripts and other stuff
that might be hardwired on their presence. Developers are free to
drop them on their own if possible. In ideal case we would see
Ethtool stats free from anything related to XDP, but it's unlikely
to happen (:

XDP_PASS kpps on an ice NIC with ~50 Gbps line rate:

Frame size     64    | 128   | 256   | 512   | 1024 | 1532
----------------------------------------------------------
net-next       23557 | 23750 | 20731 | 11723 | 6270 | 4377
This series    23484 | 23812 | 20679 | 11720 | 6270 | 4377

The same situation with XDP_DROP and several more common cases:
nothing past stddev (which is a bit surprising, but not complaining
at all).

A brief series breakdown:
 * 1-2: introduce new infra and APIs, for rtnetlink and drivers
   respectively;
 * 3-19: add needed callback to the existing drivers to export
   their stats using new infra. Some of the patches are cosmetic
   prereqs;
 * 20-25: add XDP/XSK stats to all Intel drivers;
 * 26: mention generic XDP stats in Documentation.

This set is also available here: [1]
A separate iproute2-next patch will be published in parallel,
for now you can find it here: [2]

From v1 [0]:
 - use rtnl xstats instead of Ethtool standard -- XDP stats are
   purely software while Ethtool infra was designed for HW stats
   (Jakub);
 - split xmit into xmit_packets and xmit_bytes, add xmit_full;
 - don't touch existing drivers custom XDP stats exposed via
   Ethtool (Jakub);
 - add a bunch of helper structs and functions to reduce boilerplates
   and code duplication in new drivers;
 - merge with the series which adds XDP stats to Intel drivers;
 - add some perf numbers per Jakub's request.

[0] https://lore.kernel.org/all/20210803163641.3743-1-alexandr.lobakin@intel.com
[1] https://github.com/alobakin/linux/pull/11
[2] https://github.com/alobakin/iproute2/pull/1

Alexander Lobakin (26):
  rtnetlink: introduce generic XDP statistics
  xdp: provide common driver helpers for implementing XDP stats
  ena: implement generic XDP statistics callbacks
  dpaa2: implement generic XDP stats callbacks
  enetc: implement generic XDP stats callbacks
  mvneta: reformat mvneta_netdev_ops
  mvneta: add .ndo_get_xdp_stats() callback
  mvpp2: provide .ndo_get_xdp_stats() callback
  mlx5: don't mix XDP_DROP and Rx XDP error cases
  mlx5: provide generic XDP stats callbacks
  sf100, sfx: implement generic XDP stats callbacks
  veth: don't mix XDP_DROP counter with Rx XDP errors
  veth: drop 'xdp_' suffix from packets and bytes stats
  veth: reformat veth_netdev_ops
  veth: add generic XDP stats callbacks
  virtio_net: don't mix XDP_DROP counter with Rx XDP errors
  virtio_net: rename xdp_tx{,_drops} SQ stats to xdp_xmit{,_errors}
  virtio_net: reformat virtnet_netdev
  virtio_net: add callbacks for generic XDP stats
  i40e: add XDP and XSK generic per-channel statistics
  ice: add XDP and XSK generic per-channel statistics
  igb: add XDP generic per-channel statistics
  igc: bail out early on XSK xmit if no descs are available
  igc: add XDP and XSK generic per-channel statistics
  ixgbe: add XDP and XSK generic per-channel statistics
  Documentation: reflect generic XDP statistics

 Documentation/networking/statistics.rst       |  33 +++
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  53 ++++
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |  45 +++
 drivers/net/ethernet/freescale/enetc/enetc.c  |  48 ++++
 drivers/net/ethernet/freescale/enetc/enetc.h  |   3 +
 .../net/ethernet/freescale/enetc/enetc_pf.c   |   2 +
 drivers/net/ethernet/intel/i40e/i40e.h        |   1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  38 ++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  40 ++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |   1 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    |  33 ++-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_lib.c      |  21 ++
 drivers/net/ethernet/intel/ice/ice_main.c     |  17 ++
 drivers/net/ethernet/intel/ice/ice_txrx.c     |  33 ++-
 drivers/net/ethernet/intel/ice/ice_txrx.h     |  12 +-
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c |   3 +
 drivers/net/ethernet/intel/ice/ice_xsk.c      |  51 +++-
 drivers/net/ethernet/intel/igb/igb.h          |  14 +-
 drivers/net/ethernet/intel/igb/igb_main.c     | 102 ++++++-
 drivers/net/ethernet/intel/igc/igc.h          |   3 +
 drivers/net/ethernet/intel/igc/igc_main.c     |  89 +++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |   3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  69 ++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  |  56 +++-
 drivers/net/ethernet/marvell/mvneta.c         |  78 +++++-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  51 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   5 +
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |   3 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   2 +
 .../ethernet/mellanox/mlx5/core/en_stats.c    |  76 +++++
 .../ethernet/mellanox/mlx5/core/en_stats.h    |   3 +
 drivers/net/ethernet/sfc/ef100_netdev.c       |   2 +
 drivers/net/ethernet/sfc/efx.c                |   2 +
 drivers/net/ethernet/sfc/efx_common.c         |  42 +++
 drivers/net/ethernet/sfc/efx_common.h         |   3 +
 drivers/net/veth.c                            | 128 +++++++--
 drivers/net/virtio_net.c                      | 104 +++++--
 include/linux/if_link.h                       |  39 ++-
 include/linux/netdevice.h                     |  13 +
 include/net/xdp.h                             | 162 +++++++++++
 include/uapi/linux/if_link.h                  |  67 +++++
 net/core/rtnetlink.c                          | 264 ++++++++++++++++++
 net/core/xdp.c                                | 124 ++++++++
 45 files changed, 1798 insertions(+), 143 deletions(-)

--
2.33.1

David Ahern Nov. 28, 2021, 10:23 p.m. UTC | #1

On 11/23/21 9:39 AM, Alexander Lobakin wrote:
> This is an almost complete rework of [0].
> 
> This series introduces generic XDP statistics infra based on rtnl
> xstats (Ethtool standard stats previously), and wires up the drivers
> which collect appropriate statistics to this new interface. Finally,
> it introduces XDP/XSK statistics to all XDP-capable Intel drivers.
> 
> Those counters are:
> * packets: number of frames passed to bpf_prog_run_xdp().
> * bytes: number of bytes went through bpf_prog_run_xdp().
> * errors: number of general XDP errors, if driver has one unified
>   counter.
> * aborted: number of XDP_ABORTED returns.
> * drop: number of XDP_DROP returns.
> * invalid: number of returns of unallowed values (i.e. not XDP_*).
> * pass: number of XDP_PASS returns.
> * redirect: number of successfully performed XDP_REDIRECT requests.
> * redirect_errors: number of failed XDP_REDIRECT requests.
> * tx: number of successfully performed XDP_TX requests.
> * tx_errors: number of failed XDP_TX requests.
> * xmit_packets: number of successfully transmitted XDP/XSK frames.
> * xmit_bytes: number of successfully transmitted XDP/XSK frames.
> * xmit_errors: of XDP/XSK frames failed to transmit.
> * xmit_full: number of XDP/XSK queue being full at the moment of
>   transmission.
> 
> To provide them, developers need to implement .ndo_get_xdp_stats()
> and, if they want to expose stats on a per-channel basis,
> .ndo_get_xdp_stats_nch(). include/net/xdp.h contains some helper

Why the tie to a channel? There are Rx queues and Tx queues and no
requirement to link them into a channel. It would be better (more
flexible) to allow them to be independent. Rather than ask the driver
"how many channels", ask 'how many Rx queues' and 'how many Tx queues'
for which xdp stats are reported.

From there, allow queue numbers or queue id's to be non-consecutive and
add a queue id or number as an attribute. e.g.,

[XDP stats]
	[ Rx queue N]
		counters


	[ Tx queue N]
		counters

This would allow a follow on patch set to do something like "Give me XDP
stats for Rx queue N" instead of doing a full dump.

Alexander Lobakin Nov. 30, 2021, 3:56 p.m. UTC | #2

From: Alexander Lobakin <alexandr.lobakin@intel.com>
Date: Tue, 23 Nov 2021 17:39:29 +0100

Ok, open questions:

1. Channels vs queues vs global.

Jakub: no per-channel.
David (Ahern): it's worth it to separate as Rx/Tx.
Toke is fine with globals at the end I think?

My point was that for most of the systems we have 1:1 Rx:Tx
(usually num_online_cpus()), so asking drivers separately for
the number of RQs and then SQs would end up asking for the same
number twice.
But the main reason TBH was that most of the drivers store stats
on a per-channel basis and I didn't want them to regress in
functionality. I'm fine with reporting only netdev-wide if
everyone are.

In case if we keep per-channel: report per-channel only by request
and cumulative globals by default to not flood the output?

2. Count all errors as "drops" vs separately.

Daniel: account everything as drops, plus errors should be
reported as exceptions for tracing sub.
Jesper: we shouldn't mix drops and errors.

My point: we shouldn't, that's why there are patches for 2 drivers
to give errors a separate counter.
I provided an option either to report all errors together ('errors'
in stats structure) or to provide individual counters for each of
them (sonamed ctrs), but personally prefer detailed errors. However,
they might "go detailed" under trace_xdp_exception() only, sound
fine (OTOH in RTNL stats we have both "general" errors and detailed
error counters).

3. XDP and XSK ctrs separately or not.

My PoV is that those are two quite different worlds.
However, stats for actions on XSK really make a little sense since
99% of time we have xskmap redirect. So I think it'd be fine to just
expand stats structure with xsk_{rx,tx}_{packets,bytes} and count
the rest (actions, errors) together with XDP.

Rest:
 - don't create a separate `ip` command and report under `-s`;
 - save some RTNL skb space by skipping zeroed counters.

Also, regarding that I count all on the stack and then add to the
storage once in a polling cycle -- most drivers don't do that and
just increment the values in the storage directly, but this can be
less performant for frequently updated stats (or it's just my
embedded past).
Re u64 vs u64_stats_t -- the latter is more universal and
architecture-friendly, the former is used directly in most of the
drivers primarily because those drivers and the corresponding HW
are being run on 64-bit systems in the vast majority of cases, and
Ethtools stats themselves are not so critical to guard them with
anti-tearing. Anyways, local64_t is cheap on ARM64/x86_64 I guess?

Thanks,
Al

Jakub Kicinski Nov. 30, 2021, 4:12 p.m. UTC | #3

On Tue, 30 Nov 2021 16:56:12 +0100 Alexander Lobakin wrote:
> 3. XDP and XSK ctrs separately or not.
> 
> My PoV is that those are two quite different worlds.
> However, stats for actions on XSK really make a little sense since
> 99% of time we have xskmap redirect. So I think it'd be fine to just
> expand stats structure with xsk_{rx,tx}_{packets,bytes} and count
> the rest (actions, errors) together with XDP.
> 
> 
> Rest:
>  - don't create a separate `ip` command and report under `-s`;
>  - save some RTNL skb space by skipping zeroed counters.

Let me ruin this point of clarity for you. I think that stats should 
be skipped when they are not collected (see ETHTOOL_STAT_NOT_SET).
If messages get large user should use the GETSTATS call and avoid 
the problem more effectively.

> Also, regarding that I count all on the stack and then add to the
> storage once in a polling cycle -- most drivers don't do that and
> just increment the values in the storage directly, but this can be
> less performant for frequently updated stats (or it's just my
> embedded past).
> Re u64 vs u64_stats_t -- the latter is more universal and
> architecture-friendly, the former is used directly in most of the
> drivers primarily because those drivers and the corresponding HW
> are being run on 64-bit systems in the vast majority of cases, and
> Ethtools stats themselves are not so critical to guard them with
> anti-tearing. Anyways, local64_t is cheap on ARM64/x86_64 I guess?

Toke Høiland-Jørgensen Nov. 30, 2021, 4:17 p.m. UTC | #4

Alexander Lobakin <alexandr.lobakin@intel.com> writes:

> From: Alexander Lobakin <alexandr.lobakin@intel.com>
> Date: Tue, 23 Nov 2021 17:39:29 +0100
>
> Ok, open questions:
>
> 1. Channels vs queues vs global.
>
> Jakub: no per-channel.
> David (Ahern): it's worth it to separate as Rx/Tx.
> Toke is fine with globals at the end I think?

Well, I don't like throwing data away, so in that sense I do like
per-queue stats, but it's not a very strong preference (i.e., I can live
with either)...

> My point was that for most of the systems we have 1:1 Rx:Tx
> (usually num_online_cpus()), so asking drivers separately for
> the number of RQs and then SQs would end up asking for the same
> number twice.
> But the main reason TBH was that most of the drivers store stats
> on a per-channel basis and I didn't want them to regress in
> functionality. I'm fine with reporting only netdev-wide if
> everyone are.
>
> In case if we keep per-channel: report per-channel only by request
> and cumulative globals by default to not flood the output?

... however if we do go with per-channel stats I do agree that they
shouldn't be in the default output. I guess netlink could still split
them out and iproute2 could just sum them before display?

> 2. Count all errors as "drops" vs separately.
>
> Daniel: account everything as drops, plus errors should be
> reported as exceptions for tracing sub.
> Jesper: we shouldn't mix drops and errors.
>
> My point: we shouldn't, that's why there are patches for 2 drivers
> to give errors a separate counter.
> I provided an option either to report all errors together ('errors'
> in stats structure) or to provide individual counters for each of
> them (sonamed ctrs), but personally prefer detailed errors. However,
> they might "go detailed" under trace_xdp_exception() only, sound
> fine (OTOH in RTNL stats we have both "general" errors and detailed
> error counters).

I agree it would be nice to have a separate error counter, but a single
counter is enough when combined with the tracepoints.

> 3. XDP and XSK ctrs separately or not.
>
> My PoV is that those are two quite different worlds.
> However, stats for actions on XSK really make a little sense since
> 99% of time we have xskmap redirect. So I think it'd be fine to just
> expand stats structure with xsk_{rx,tx}_{packets,bytes} and count
> the rest (actions, errors) together with XDP.

A whole set of separate counters for XSK is certainly overkill. No
strong preference as to whether they need a separate counter at all...

> Rest:
>  - don't create a separate `ip` command and report under `-s`;
>  - save some RTNL skb space by skipping zeroed counters.
>
> Also, regarding that I count all on the stack and then add to the
> storage once in a polling cycle -- most drivers don't do that and
> just increment the values in the storage directly, but this can be
> less performant for frequently updated stats (or it's just my
> embedded past).
> Re u64 vs u64_stats_t -- the latter is more universal and
> architecture-friendly, the former is used directly in most of the
> drivers primarily because those drivers and the corresponding HW
> are being run on 64-bit systems in the vast majority of cases, and
> Ethtools stats themselves are not so critical to guard them with
> anti-tearing. Anyways, local64_t is cheap on ARM64/x86_64 I guess?

I'm generally a fan of correctness first, so since you're touching all
the drivers anyway why I'd say go for u64_stats_t :)

-Toke

Alexander Lobakin Nov. 30, 2021, 4:34 p.m. UTC | #5

From: Jakub Kicinski <kuba@kernel.org>
Date: Tue, 30 Nov 2021 08:12:07 -0800

> On Tue, 30 Nov 2021 16:56:12 +0100 Alexander Lobakin wrote:
> > 3. XDP and XSK ctrs separately or not.
> > 
> > My PoV is that those are two quite different worlds.
> > However, stats for actions on XSK really make a little sense since
> > 99% of time we have xskmap redirect. So I think it'd be fine to just
> > expand stats structure with xsk_{rx,tx}_{packets,bytes} and count
> > the rest (actions, errors) together with XDP.
> > 
> > 
> > Rest:
> >  - don't create a separate `ip` command and report under `-s`;
> >  - save some RTNL skb space by skipping zeroed counters.
> 
> Let me ruin this point of clarity for you. I think that stats should 
> be skipped when they are not collected (see ETHTOOL_STAT_NOT_SET).
> If messages get large user should use the GETSTATS call and avoid 
> the problem more effectively.

Well, it was Dave's thought here: [0]

> Another thought on this patch: with individual attributes you could save
> some overhead by not sending 0 counters to userspace. e.g., define a
> helper that does:

I know about ETHTOOL_STAT_NOT_SET, but RTNL xstats doesn't use this,
does it?
GETSTATS is another thing, and I'll use it, thanks.

> 
> > Also, regarding that I count all on the stack and then add to the
> > storage once in a polling cycle -- most drivers don't do that and
> > just increment the values in the storage directly, but this can be
> > less performant for frequently updated stats (or it's just my
> > embedded past).
> > Re u64 vs u64_stats_t -- the latter is more universal and
> > architecture-friendly, the former is used directly in most of the
> > drivers primarily because those drivers and the corresponding HW
> > are being run on 64-bit systems in the vast majority of cases, and
> > Ethtools stats themselves are not so critical to guard them with
> > anti-tearing. Anyways, local64_t is cheap on ARM64/x86_64 I guess?

[0] https://lore.kernel.org/netdev/a4602b15-25b1-c388-73b4-1f97f6f0e555@gmail.com/

Al

Jakub Kicinski Nov. 30, 2021, 5:04 p.m. UTC | #6

On Tue, 30 Nov 2021 17:34:54 +0100 Alexander Lobakin wrote:
> > Another thought on this patch: with individual attributes you could save
> > some overhead by not sending 0 counters to userspace. e.g., define a
> > helper that does:  
> 
> I know about ETHTOOL_STAT_NOT_SET, but RTNL xstats doesn't use this,
> does it?

Not sure if you're asking me or Dave but no, to my knowledge RTNL does
not use such semantics today. But the reason is mostly because there
weren't many driver stats added there. Knowing if an error counter is
not supported or supporter and 0 is important for monitoring. Even if
XDP stats don't have a counter which may not be supported today it's
not a good precedent to make IMO.

Jakub Kicinski Nov. 30, 2021, 5:07 p.m. UTC | #7

On Tue, 30 Nov 2021 17:17:24 +0100 Toke Høiland-Jørgensen wrote:
> > 1. Channels vs queues vs global.
> >
> > Jakub: no per-channel.
> > David (Ahern): it's worth it to separate as Rx/Tx.
> > Toke is fine with globals at the end I think?  
> 
> Well, I don't like throwing data away, so in that sense I do like
> per-queue stats, but it's not a very strong preference (i.e., I can live
> with either)...

We don't even have a clear definition of a queue in Linux.

As I said, adding this API today without a strong user and letting
drivers diverge in behavior would be a mistake.

David Ahern Nov. 30, 2021, 5:38 p.m. UTC | #8

On 11/30/21 10:04 AM, Jakub Kicinski wrote:
> On Tue, 30 Nov 2021 17:34:54 +0100 Alexander Lobakin wrote:
>>> Another thought on this patch: with individual attributes you could save
>>> some overhead by not sending 0 counters to userspace. e.g., define a
>>> helper that does:  
>>
>> I know about ETHTOOL_STAT_NOT_SET, but RTNL xstats doesn't use this,
>> does it?
> 
> Not sure if you're asking me or Dave but no, to my knowledge RTNL does
> not use such semantics today. But the reason is mostly because there
> weren't many driver stats added there. Knowing if an error counter is
> not supported or supporter and 0 is important for monitoring. Even if
> XDP stats don't have a counter which may not be supported today it's
> not a good precedent to make IMO.
> 

Today, stats are sent as a struct so skipping stats whose value is 0 is
not an option. When using individual attributes for the counters this
becomes an option. Given there is no value in sending '0' why do it?

Is your pushback that there should be a uapi to opt-in to this behavior?

David Ahern Nov. 30, 2021, 5:45 p.m. UTC | #9

On 11/30/21 8:56 AM, Alexander Lobakin wrote:
> Rest:
>  - don't create a separate `ip` command and report under `-s`;

Reporting XDP stats under 'ip -s' is not going to be scalable from a
readability perspective.

ifstat (misc/ifstat.c) has support for extended stats which is where you
are adding these.

David Ahern Nov. 30, 2021, 5:56 p.m. UTC | #10

On 11/30/21 10:07 AM, Jakub Kicinski wrote:
> On Tue, 30 Nov 2021 17:17:24 +0100 Toke Høiland-Jørgensen wrote:
>>> 1. Channels vs queues vs global.
>>>
>>> Jakub: no per-channel.
>>> David (Ahern): it's worth it to separate as Rx/Tx.
>>> Toke is fine with globals at the end I think?  
>>
>> Well, I don't like throwing data away, so in that sense I do like
>> per-queue stats, but it's not a very strong preference (i.e., I can live
>> with either)...
> 
> We don't even have a clear definition of a queue in Linux.
> 

The summary above says "Jakub: no per-channel", and then you have this
comment about a clear definition of a queue. What is your preference
here, Jakub? I think I have gotten lost in all of the coments.

My request was just to not lump Rx and Tx together under a 'channel'
definition as a new API. Proposals like zctap and 'queues as a first
class citizen' are examples of intentions / desires to move towards Rx
and Tx queues beyond what exists today.

Jakub Kicinski Nov. 30, 2021, 7:46 p.m. UTC | #11

On Tue, 30 Nov 2021 10:38:14 -0700 David Ahern wrote:
> On 11/30/21 10:04 AM, Jakub Kicinski wrote:
> > On Tue, 30 Nov 2021 17:34:54 +0100 Alexander Lobakin wrote:  
> >> I know about ETHTOOL_STAT_NOT_SET, but RTNL xstats doesn't use this,
> >> does it?  
> > 
> > Not sure if you're asking me or Dave but no, to my knowledge RTNL does
> > not use such semantics today. But the reason is mostly because there
> > weren't many driver stats added there. Knowing if an error counter is
> > not supported or supporter and 0 is important for monitoring. Even if
> > XDP stats don't have a counter which may not be supported today it's
> > not a good precedent to make IMO.
> 
> Today, stats are sent as a struct so skipping stats whose value is 0 is
> not an option. When using individual attributes for the counters this
> becomes an option. Given there is no value in sending '0' why do it?

To establish semantics of what it means that the statistic is not
reported. If we need to save space we'd need an extra attr with 
a bitmap of "these stats were skipped because they were zero".
Or conversely some way of querying supported stats.

> Is your pushback that there should be a uapi to opt-in to this behavior?

Not where I was going with it, but it is an option. If skipping 0s was
controlled by a flag a dump without such flag set would basically serve
as a way to query supported stats.

Jakub Kicinski Nov. 30, 2021, 7:53 p.m. UTC | #12

On Tue, 30 Nov 2021 10:56:26 -0700 David Ahern wrote:
> On 11/30/21 10:07 AM, Jakub Kicinski wrote:
> > On Tue, 30 Nov 2021 17:17:24 +0100 Toke Høiland-Jørgensen wrote:  
> >> Well, I don't like throwing data away, so in that sense I do like
> >> per-queue stats, but it's not a very strong preference (i.e., I can live
> >> with either)...  
> > 
> > We don't even have a clear definition of a queue in Linux.
> >   
> 
> The summary above says "Jakub: no per-channel", and then you have this
> comment about a clear definition of a queue. What is your preference
> here, Jakub? I think I have gotten lost in all of the coments.

I'm against per-channel and against per-queue stats. I'm not saying "do
one instead of the other". Hope that makes it clear.

> My request was just to not lump Rx and Tx together under a 'channel'
> definition as a new API. Proposals like zctap and 'queues as a first
> class citizen' are examples of intentions / desires to move towards Rx
> and Tx queues beyond what exists today.

Right, and when we have the objects to control those we'll hang the
stats off them. Right now half of the NICs will destroy queue stats
on random reconfiguration requests, others will mix the stats between
queue instantiations.. mlx5 does it's shadow queue thing. It's a mess.
uAPI which is not portable and not usable in production is pure burden.

Jamal Hadi Salim Dec. 1, 2021, 3:21 p.m. UTC | #13

On 2021-11-30 12:38, David Ahern wrote:
> Today, stats are sent as a struct so skipping stats whose value is 0 is
> not an option. When using individual attributes for the counters this
> becomes an option. Given there is no value in sending '0' why do it?
> 
> Is your pushback that there should be a uapi to opt-in to this behavior?

A filter in the netlink request should help pick what is user-preferred.
You can default to not sending zeros.

cheers,
jamal

[v2,net-next,00/26] net: introduce and use generic XDP stats

Message

Comments