Message ID | 20241206090328.4758-1-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v3,net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors | expand |
On Fri, 6 Dec 2024 17:03:28 +0800 Yafang Shao wrote: > We observed a high number of rx_discards_phy events on some servers when > running `ethtool -S`. However, this important counter is not currently > reflected in the /proc/net/dev statistics file, making it challenging to > monitor effectively. > > Since rx_fifo_errors represents receive FIFO errors on this network > deivice, it makes sense to include rx_discards_phy in this counter to > enhance monitoring visibility. This change will help administrators track > these events more effectively through standard interfaces. It's not a standard if there is no definition applicable across vendors. Count it as generic rx_dropped. If you disagree with me please carry this tag on future versions: Nacked-by: Jakub Kicinski <kuba@kernel.org>
On Sun, Dec 8, 2024 at 9:38 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 6 Dec 2024 17:03:28 +0800 Yafang Shao wrote: > > We observed a high number of rx_discards_phy events on some servers when > > running `ethtool -S`. However, this important counter is not currently > > reflected in the /proc/net/dev statistics file, making it challenging to > > monitor effectively. > > > > Since rx_fifo_errors represents receive FIFO errors on this network > > deivice, it makes sense to include rx_discards_phy in this counter to > > enhance monitoring visibility. This change will help administrators track > > these events more effectively through standard interfaces. > > It's not a standard if there is no definition applicable across vendors. > Count it as generic rx_dropped. Thank you for your suggestion. I'm okay with counting it as generic rx_dropped as long as we have a metric to track it. I will send a new version. -- Regards Yafang
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index e601324a690a..15b1a3e6e641 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3916,6 +3916,7 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats) } stats->rx_missed_errors = priv->stats.qcnt.rx_out_of_buffer; + stats->rx_fifo_errors = PPORT_2863_GET(pstats, if_in_discards); stats->rx_length_errors = PPORT_802_3_GET(pstats, a_in_range_length_errors) +
We observed a high number of rx_discards_phy events on some servers when running `ethtool -S`. However, this important counter is not currently reflected in the /proc/net/dev statistics file, making it challenging to monitor effectively. Since rx_fifo_errors represents receive FIFO errors on this network deivice, it makes sense to include rx_discards_phy in this counter to enhance monitoring visibility. This change will help administrators track these events more effectively through standard interfaces. I have also verified the manual of ethtool counters on mlx5 [0], it seems that rx_discards_phy and rx_fifo_errors has the same meaning: rx_discards_phy: The number of received packets dropped due to lack of buffers on a physical port. If this counter is increasing, it implies that the adapter is congested and cannot absorb the traffic coming from the network. ConnectX-3 naming : rx_fifo_errors Link: https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters [0] Suggested-by: Tariq Toukan <ttoukan.linux@gmail.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Tariq Toukan <ttoukan.linux@gmail.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Gal Pressman <gal@nvidia.com> Cc: Jakub Kicinski <kuba@kernel.org> --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 + 1 file changed, 1 insertion(+) Changes: v2->v3: - Drop the changes on the Doc v1->v2: https://lore.kernel.org/netdev/20241114021711.5691-1-laoar.shao@gmail.com/ - Use rx_fifo_errors instead (Tariq) - Update the if_link.h accordingly v1: https://lore.kernel.org/netdev/20241106064015.4118-1-laoar.shao@gmail.com/