diff mbox series

[v2,net-next,6/6] sfc: add per-queue RX and TX bytes stats

Message ID fe0d5819436883d3ba74a5103325de741d6c3005.1725550155.git.ecree.xilinx@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series sfc: per-queue stats | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 17 this patch: 17
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 17 this patch: 17
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 22 this patch: 22
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 144 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 60 this patch: 60
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-09-06--21-00 (tests: 721)

Commit Message

edward.cree@amd.com Sept. 5, 2024, 3:41 p.m. UTC
From: Edward Cree <ecree.xilinx@gmail.com>

While this does add overhead to the fast path, it should be minimal
 as the cacheline should already be held for write from updating the
 queue's [tr]x_packets stat.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
---
 drivers/net/ethernet/sfc/ef100_rx.c   |  1 +
 drivers/net/ethernet/sfc/ef100_tx.c   |  1 +
 drivers/net/ethernet/sfc/efx.c        | 10 ++++++++++
 drivers/net/ethernet/sfc/net_driver.h | 10 ++++++++++
 drivers/net/ethernet/sfc/rx.c         |  1 +
 drivers/net/ethernet/sfc/rx_common.c  |  1 +
 drivers/net/ethernet/sfc/tx.c         |  2 ++
 drivers/net/ethernet/sfc/tx_common.c  |  1 +
 8 files changed, 27 insertions(+)

Comments

Jakub Kicinski Sept. 7, 2024, 2:03 a.m. UTC | #1
On Thu, 5 Sep 2024 16:41:35 +0100 edward.cree@amd.com wrote:
>   * @tx_packets: Number of packets sent since this struct was created

I think it's number of packets "enqueued", but the doc says:

        name: tx-packets
        doc: |
          Number of wire packets successfully sent. Packet is considered to be
          successfully sent once it is in device memory (usually this means
          the device has issued a DMA completion for the packet).

Not the end of the world if you prefer to keep as is, but if so maybe
just acknowledge in commit message or a code comment that this is not
100% in line with the definition?

> + * @tx_bytes: Number of bytes sent since this struct was created.  For TSO,
> + *	counts the superframe size, not the sizes of generated frames on the
> + *	wire (i.e. the headers are only counted once)

Hm. Hm. This is technically not documented but my intuition is that
tx_bytes should count wire bytes. tx_packets counts segments / wire
packets, looking at ef100_tx.c 
qstats "bytes" should be the same kind of bytes as counted by the MAC.
That way we can hopefully see how many packets "enter" the device from
queues, and how many "leave" via the MAC. Helping to calculate drops 
at various stages. That matters more for packets than bytes, but still..
Edward Cree Sept. 10, 2024, 3:03 p.m. UTC | #2
On 07/09/2024 03:03, Jakub Kicinski wrote:
> On Thu, 5 Sep 2024 16:41:35 +0100 edward.cree@amd.com wrote:
>>   * @tx_packets: Number of packets sent since this struct was created
> 
> I think it's number of packets "enqueued",

You're correct.

> but the doc says:
> 
>         name: tx-packets
>         doc: |
>           Number of wire packets successfully sent. Packet is considered to be
>           successfully sent once it is in device memory (usually this means
>           the device has issued a DMA completion for the packet).

Fair point.  We *do* have tx_queue->pkts_compl but that's reset every
 NAPI poll — it exists for BQL's sake.  That said, if it's the
 completions you want to count why isn't there just a hook in BQL to
 provide those stats automatically without driver involvement?

> Not the end of the world if you prefer to keep as is, but if so maybe
> just acknowledge in commit message or a code comment that this is not
> 100% in line with the definition?

I think it's probably better if I change the code to match the doc.

>> + * @tx_bytes: Number of bytes sent since this struct was created.  For TSO,
>> + *	counts the superframe size, not the sizes of generated frames on the
>> + *	wire (i.e. the headers are only counted once)
> 
> Hm. Hm. This is technically not documented but my intuition is that
> tx_bytes should count wire bytes. tx_packets counts segments / wire
> packets, looking at ef100_tx.c 
> qstats "bytes" should be the same kind of bytes as counted by the MAC.

Well, even if we calculated the wire bytes, the figures still wouldn't
 match entirely because the MAC counts the FCS, which isn't included
 here.  We can add that in too, but then one would expect the same
 thing on RX, which would require an extra branch in the datapath
 checking NETIF_F_RXFCS and I didn't want to take that performance hit.
So my preference here would be to keep this as skb bytes rather than
 wire bytes, since as you say it's the packet count that really
 matters here.
Jakub Kicinski Sept. 10, 2024, 3:37 p.m. UTC | #3
On Tue, 10 Sep 2024 16:03:04 +0100 Edward Cree wrote:
> >> + * @tx_bytes: Number of bytes sent since this struct was created.  For TSO,
> >> + *	counts the superframe size, not the sizes of generated frames on the
> >> + *	wire (i.e. the headers are only counted once)  
> > 
> > Hm. Hm. This is technically not documented but my intuition is that
> > tx_bytes should count wire bytes. tx_packets counts segments / wire
> > packets, looking at ef100_tx.c 
> > qstats "bytes" should be the same kind of bytes as counted by the MAC.  
> 
> Well, even if we calculated the wire bytes, the figures still wouldn't
>  match entirely because the MAC counts the FCS, which isn't included
>  here.  We can add that in too, but then one would expect the same
>  thing on RX, which would require an extra branch in the datapath
>  checking NETIF_F_RXFCS and I didn't want to take that performance hit.
> So my preference here would be to keep this as skb bytes rather than
>  wire bytes, since as you say it's the packet count that really
>  matters here.

Right, that's fine. But just to state the obvious - adding / subtracting
FCS bytes is relatively easy for user space to do (assuming RXFCS
handling is correct, as you mention). Converting from LSO bytes to wire
bytes is impossible, FCS is fixed size while header length varies.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/sfc/ef100_rx.c b/drivers/net/ethernet/sfc/ef100_rx.c
index 992151775cb8..44dc75feb162 100644
--- a/drivers/net/ethernet/sfc/ef100_rx.c
+++ b/drivers/net/ethernet/sfc/ef100_rx.c
@@ -135,6 +135,7 @@  void __ef100_rx_packet(struct efx_channel *channel)
 	}
 
 	++rx_queue->rx_packets;
+	rx_queue->rx_bytes += rx_buf->len;
 
 	efx_rx_packet_gro(channel, rx_buf, channel->rx_pkt_n_frags, eh, csum);
 	goto out;
diff --git a/drivers/net/ethernet/sfc/ef100_tx.c b/drivers/net/ethernet/sfc/ef100_tx.c
index e6b6be549581..a7e30289e231 100644
--- a/drivers/net/ethernet/sfc/ef100_tx.c
+++ b/drivers/net/ethernet/sfc/ef100_tx.c
@@ -493,6 +493,7 @@  int __ef100_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb,
 	} else {
 		tx_queue->tx_packets++;
 	}
+	tx_queue->tx_bytes += skb->len;
 	return 0;
 
 err:
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 4b546f61dfaf..6c709d92e299 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -638,6 +638,7 @@  static void efx_get_queue_stats_rx(struct net_device *net_dev, int idx,
 	rx_queue = efx_channel_get_rx_queue(channel);
 	/* Count only packets since last time datapath was started */
 	stats->packets = rx_queue->rx_packets - rx_queue->old_rx_packets;
+	stats->bytes = rx_queue->rx_bytes - rx_queue->old_rx_bytes;
 	stats->hw_drops = efx_get_queue_stat_rx_hw_drops(channel) -
 			  channel->old_n_rx_hw_drops;
 	stats->hw_drop_overruns = channel->n_rx_nodesc_trunc -
@@ -653,6 +654,7 @@  static void efx_get_queue_stats_tx(struct net_device *net_dev, int idx,
 
 	channel = efx_get_tx_channel(efx, idx);
 	stats->packets = 0;
+	stats->bytes = 0;
 	stats->hw_gso_packets = 0;
 	stats->hw_gso_wire_packets = 0;
 	/* If a TX channel has XDP TXQs, the stats for these will be counted
@@ -664,6 +666,8 @@  static void efx_get_queue_stats_tx(struct net_device *net_dev, int idx,
 		if (!tx_queue->xdp_tx) {
 			stats->packets += tx_queue->tx_packets -
 					  tx_queue->old_tx_packets;
+			stats->bytes += tx_queue->tx_bytes -
+					tx_queue->old_tx_bytes;
 			stats->hw_gso_packets += tx_queue->tso_bursts -
 						 tx_queue->old_tso_bursts;
 			stats->hw_gso_wire_packets += tx_queue->tso_packets -
@@ -681,9 +685,11 @@  static void efx_get_base_stats(struct net_device *net_dev,
 	struct efx_channel *channel;
 
 	rx->packets = 0;
+	rx->bytes = 0;
 	rx->hw_drops = 0;
 	rx->hw_drop_overruns = 0;
 	tx->packets = 0;
+	tx->bytes = 0;
 	tx->hw_gso_packets = 0;
 	tx->hw_gso_wire_packets = 0;
 
@@ -694,10 +700,12 @@  static void efx_get_base_stats(struct net_device *net_dev,
 		rx_queue = efx_channel_get_rx_queue(channel);
 		if (channel->channel >= net_dev->real_num_rx_queues) {
 			rx->packets += rx_queue->rx_packets;
+			rx->bytes += rx_queue->rx_bytes;
 			rx->hw_drops += efx_get_queue_stat_rx_hw_drops(channel);
 			rx->hw_drop_overruns += channel->n_rx_nodesc_trunc;
 		} else {
 			rx->packets += rx_queue->old_rx_packets;
+			rx->bytes += rx_queue->old_rx_bytes;
 			rx->hw_drops += channel->old_n_rx_hw_drops;
 			rx->hw_drop_overruns += channel->old_n_rx_hw_drop_overruns;
 		}
@@ -707,10 +715,12 @@  static void efx_get_base_stats(struct net_device *net_dev,
 						net_dev->real_num_tx_queues ||
 			    tx_queue->xdp_tx) {
 				tx->packets += tx_queue->tx_packets;
+				tx->bytes += tx_queue->tx_bytes;
 				tx->hw_gso_packets += tx_queue->tso_bursts;
 				tx->hw_gso_wire_packets += tx_queue->tso_packets;
 			} else {
 				tx->packets += tx_queue->old_tx_packets;
+				tx->bytes += tx_queue->old_tx_bytes;
 				tx->hw_gso_packets += tx_queue->old_tso_bursts;
 				tx->hw_gso_wire_packets += tx_queue->old_tso_packets;
 			}
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 2cf2935a713c..147052c1e25a 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -233,7 +233,11 @@  struct efx_tx_buffer {
  * @cb_packets: Number of times the TX copybreak feature has been used
  * @notify_count: Count of notified descriptors to the NIC
  * @tx_packets: Number of packets sent since this struct was created
+ * @tx_bytes: Number of bytes sent since this struct was created.  For TSO,
+ *	counts the superframe size, not the sizes of generated frames on the
+ *	wire (i.e. the headers are only counted once)
  * @old_tx_packets: Value of @tx_packets as of last efx_init_tx_queue()
+ * @old_tx_bytes: Value of @tx_bytes as of last efx_init_tx_queue()
  * @old_tso_bursts: Value of @tso_bursts as of last efx_init_tx_queue()
  * @old_tso_packets: Value of @tso_packets as of last efx_init_tx_queue()
  * @empty_read_count: If the completion path has seen the queue as empty
@@ -285,7 +289,9 @@  struct efx_tx_queue {
 	unsigned int notify_count;
 	/* Statistics to supplement MAC stats */
 	unsigned long tx_packets;
+	unsigned long tx_bytes;
 	unsigned long old_tx_packets;
+	unsigned long old_tx_bytes;
 	unsigned int old_tso_bursts;
 	unsigned int old_tso_packets;
 
@@ -378,7 +384,9 @@  struct efx_rx_page_state {
  * @slow_fill: Timer used to defer efx_nic_generate_fill_event().
  * @grant_work: workitem used to grant credits to the MAE if @grant_credits
  * @rx_packets: Number of packets received since this struct was created
+ * @rx_bytes: Number of bytes received since this struct was created
  * @old_rx_packets: Value of @rx_packets as of last efx_init_rx_queue()
+ * @old_rx_bytes: Value of @rx_bytes as of last efx_init_rx_queue()
  * @xdp_rxq_info: XDP specific RX queue information.
  * @xdp_rxq_info_valid: Is xdp_rxq_info valid data?.
  */
@@ -415,7 +423,9 @@  struct efx_rx_queue {
 	struct work_struct grant_work;
 	/* Statistics to supplement MAC stats */
 	unsigned long rx_packets;
+	unsigned long rx_bytes;
 	unsigned long old_rx_packets;
+	unsigned long old_rx_bytes;
 	struct xdp_rxq_info xdp_rxq_info;
 	bool xdp_rxq_info_valid;
 };
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index f07495582125..ffca82207e47 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -393,6 +393,7 @@  void __efx_rx_packet(struct efx_channel *channel)
 	}
 
 	rx_queue->rx_packets++;
+	rx_queue->rx_bytes += rx_buf->len;
 
 	if (!efx_do_xdp(efx, channel, rx_buf, &eh))
 		goto out;
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index bdb4325a7c2c..ab358fe13e1d 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -242,6 +242,7 @@  void efx_init_rx_queue(struct efx_rx_queue *rx_queue)
 	rx_queue->page_recycle_full = 0;
 
 	rx_queue->old_rx_packets = rx_queue->rx_packets;
+	rx_queue->old_rx_bytes = rx_queue->rx_bytes;
 
 	/* Initialise limit fields */
 	max_fill = efx->rxq_entries - EFX_RXD_HEAD_ROOM;
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index fe2d476028e7..1aea19488a56 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -394,6 +394,7 @@  netdev_tx_t __efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb
 	} else {
 		tx_queue->tx_packets++;
 	}
+	tx_queue->tx_bytes += skb_len;
 
 	return NETDEV_TX_OK;
 
@@ -490,6 +491,7 @@  int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs,
 		tx_buffer->dma_offset = 0;
 		tx_buffer->unmap_len = len;
 		tx_queue->tx_packets++;
+		tx_queue->tx_bytes += len;
 	}
 
 	/* Pass mapped frames to hardware. */
diff --git a/drivers/net/ethernet/sfc/tx_common.c b/drivers/net/ethernet/sfc/tx_common.c
index cd0857131aa8..7ef2baa3439a 100644
--- a/drivers/net/ethernet/sfc/tx_common.c
+++ b/drivers/net/ethernet/sfc/tx_common.c
@@ -87,6 +87,7 @@  void efx_init_tx_queue(struct efx_tx_queue *tx_queue)
 	tx_queue->completed_timestamp_minor = 0;
 
 	tx_queue->old_tx_packets = tx_queue->tx_packets;
+	tx_queue->old_tx_bytes = tx_queue->tx_bytes;
 	tx_queue->old_tso_bursts = tx_queue->tso_bursts;
 	tx_queue->old_tso_packets = tx_queue->tso_packets;