diff mbox series

[iwl-net] ice: fix stats being updated by way too large values

Message ID 20240227143124.21015-1-przemyslaw.kitszel@intel.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [iwl-net] ice: fix stats being updated by way too large values | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 956 this patch: 956
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: benjamin.mikailenko@intel.com; 5 maintainers not CCed: pabeni@redhat.com jesse.brandeburg@intel.com kuba@kernel.org benjamin.mikailenko@intel.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 973 this patch: 973
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 973 this patch: 973
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Przemek Kitszel Feb. 27, 2024, 2:31 p.m. UTC
Simplify stats accumulation logic to fix the case where we don't take
previous stat value into account, we should always respect it.

Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of
magnitude too big during OpenStack reconfiguration events, possibly other
reconfiguration cases too.

The regression was reported to be between 6.1 and 6.2, so I was almost
certain that on of the two "preserve stats over reset" commits were the
culprit. While reading the code, it was found that in some cases we will
increase the stats by arbitrarily large number (thanks to ignoring "-prev"
part of condition, after zeroing it).

Note that this fixes also the case where we were around limits of u64, but
that was not the regression reported.

Full disclosure: I remember suggesting this particular piece of code to
Ben a few years ago, so blame on me.

Fixes: 2fd5e433cd26 ("ice: Accumulate HW and Netdev statistics over reset")
Reported-by: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com>
Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.com
Reported-by: Christian Rohmann <christian.rohmann@inovex.de>
Link: https://lore.kernel.org/intel-wired-lan/f38a6ca4-af05-48b1-a3e6-17ef2054e525@inovex.de
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 24 +++++++++++------------
 1 file changed, 11 insertions(+), 13 deletions(-)


base-commit: 9b23fceb4158a3636ce4a2bda28ab03dcfa6a26f

Comments

Simon Horman Feb. 28, 2024, 10:12 a.m. UTC | #1
On Tue, Feb 27, 2024 at 03:31:06PM +0100, Przemek Kitszel wrote:
> Simplify stats accumulation logic to fix the case where we don't take
> previous stat value into account, we should always respect it.
> 
> Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of
> magnitude too big during OpenStack reconfiguration events, possibly other
> reconfiguration cases too.
> 
> The regression was reported to be between 6.1 and 6.2, so I was almost
> certain that on of the two "preserve stats over reset" commits were the
> culprit. While reading the code, it was found that in some cases we will
> increase the stats by arbitrarily large number (thanks to ignoring "-prev"
> part of condition, after zeroing it).
> 
> Note that this fixes also the case where we were around limits of u64, but
> that was not the regression reported.
> 
> Full disclosure: I remember suggesting this particular piece of code to
> Ben a few years ago, so blame on me.
> 
> Fixes: 2fd5e433cd26 ("ice: Accumulate HW and Netdev statistics over reset")
> Reported-by: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com>
> Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.com
> Reported-by: Christian Rohmann <christian.rohmann@inovex.de>
> Link: https://lore.kernel.org/intel-wired-lan/f38a6ca4-af05-48b1-a3e6-17ef2054e525@inovex.de
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Pucha, HimasekharX Reddy March 6, 2024, 12:42 p.m. UTC | #2
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Przemek Kitszel
> Sent: Tuesday, February 27, 2024 8:01 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com>; netdev@vger.kernel.org; Czapnik, Lukasz <lukasz.czapnik@intel.com>; Lobakin, Aleksander <aleksander.lobakin@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>; Christian Rohmann <christian.rohmann@inovex.de>
> Subject: [Intel-wired-lan] [PATCH iwl-net] ice: fix stats being updated by way too large values
>
> Simplify stats accumulation logic to fix the case where we don't take
> previous stat value into account, we should always respect it.
>
> Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of
> magnitude too big during OpenStack reconfiguration events, possibly other
> reconfiguration cases too.
>
> The regression was reported to be between 6.1 and 6.2, so I was almost
> certain that on of the two "preserve stats over reset" commits were the
> culprit. While reading the code, it was found that in some cases we will
> increase the stats by arbitrarily large number (thanks to ignoring "-prev"
> part of condition, after zeroing it).
>
> Note that this fixes also the case where we were around limits of u64, but
> that was not the regression reported.
>
> Full disclosure: I remember suggesting this particular piece of code to
> Ben a few years ago, so blame on me.
>
> Fixes: 2fd5e433cd26 ("ice: Accumulate HW and Netdev statistics over reset")
> Reported-by: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com>
> Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.com
> Reported-by: Christian Rohmann <christian.rohmann@inovex.de>
> Link: https://lore.kernel.org/intel-wired-lan/f38a6ca4-af05-48b1-a3e6-17ef2054e525@inovex.de
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 24 +++++++++++------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
>

Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index dd4a9bc0dfdc..a7c7b1b633a5 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6736,6 +6736,7 @@  static void ice_update_vsi_ring_stats(struct ice_vsi *vsi)
 {
 	struct rtnl_link_stats64 *net_stats, *stats_prev;
 	struct rtnl_link_stats64 *vsi_stats;
+	struct ice_pf *pf = vsi->back;
 	u64 pkts, bytes;
 	int i;
 
@@ -6781,21 +6782,18 @@  static void ice_update_vsi_ring_stats(struct ice_vsi *vsi)
 	net_stats = &vsi->net_stats;
 	stats_prev = &vsi->net_stats_prev;
 
-	/* clear prev counters after reset */
-	if (vsi_stats->tx_packets < stats_prev->tx_packets ||
-	    vsi_stats->rx_packets < stats_prev->rx_packets) {
-		stats_prev->tx_packets = 0;
-		stats_prev->tx_bytes = 0;
-		stats_prev->rx_packets = 0;
-		stats_prev->rx_bytes = 0;
+	/* Update netdev counters, but keep in mind that values could start at
+	 * random value after PF reset. And as we increase the reported stat by
+	 * diff of Prev-Cur, we need to be sure that Prev is valid. If it's not,
+	 * let's skip this round.
+	 */
+	if (likely(pf->stat_prev_loaded)) {
+		net_stats->tx_packets += vsi_stats->tx_packets - stats_prev->tx_packets;
+		net_stats->tx_bytes += vsi_stats->tx_bytes - stats_prev->tx_bytes;
+		net_stats->rx_packets += vsi_stats->rx_packets - stats_prev->rx_packets;
+		net_stats->rx_bytes += vsi_stats->rx_bytes - stats_prev->rx_bytes;
 	}
 
-	/* update netdev counters */
-	net_stats->tx_packets += vsi_stats->tx_packets - stats_prev->tx_packets;
-	net_stats->tx_bytes += vsi_stats->tx_bytes - stats_prev->tx_bytes;
-	net_stats->rx_packets += vsi_stats->rx_packets - stats_prev->rx_packets;
-	net_stats->rx_bytes += vsi_stats->rx_bytes - stats_prev->rx_bytes;
-
 	stats_prev->tx_packets = vsi_stats->tx_packets;
 	stats_prev->tx_bytes = vsi_stats->tx_bytes;
 	stats_prev->rx_packets = vsi_stats->rx_packets;