diff mbox series

[v4,net] qede: Fix scheduling while atomic

Message ID 20230518145214.570101-1-manishc@marvell.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [v4,net] qede: Fix scheduling while atomic | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers fail 2 blamed authors not CCed: Yuval.Mintz@qlogic.com Sudarsana.Kalluru@qlogic.com; 4 maintainers not CCed: pabeni@redhat.com Yuval.Mintz@qlogic.com Sudarsana.Kalluru@qlogic.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch warning WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Manish Chopra May 18, 2023, 2:52 p.m. UTC
Bonding module collects the statistics while holding
the spinlock, beneath that qede->qed driver statistics
flow gets scheduled out due to usleep_range() used in PTT
acquire logic which results into below bug and traces -

[ 3673.988874] Hardware name: HPE ProLiant DL365 Gen10 Plus/ProLiant DL365 Gen10 Plus, BIOS A42 10/29/2021
[ 3673.988878] Call Trace:
[ 3673.988891]  dump_stack_lvl+0x34/0x44
[ 3673.988908]  __schedule_bug.cold+0x47/0x53
[ 3673.988918]  __schedule+0x3fb/0x560
[ 3673.988929]  schedule+0x43/0xb0
[ 3673.988932]  schedule_hrtimeout_range_clock+0xbf/0x1b0
[ 3673.988937]  ? __hrtimer_init+0xc0/0xc0
[ 3673.988950]  usleep_range+0x5e/0x80
[ 3673.988955]  qed_ptt_acquire+0x2b/0xd0 [qed]
[ 3673.988981]  _qed_get_vport_stats+0x141/0x240 [qed]
[ 3673.989001]  qed_get_vport_stats+0x18/0x80 [qed]
[ 3673.989016]  qede_fill_by_demand_stats+0x37/0x400 [qede]
[ 3673.989028]  qede_get_stats64+0x19/0xe0 [qede]
[ 3673.989034]  dev_get_stats+0x5c/0xc0
[ 3673.989045]  netstat_show.constprop.0+0x52/0xb0
[ 3673.989055]  dev_attr_show+0x19/0x40
[ 3673.989065]  sysfs_kf_seq_show+0x9b/0xf0
[ 3673.989076]  seq_read_iter+0x120/0x4b0
[ 3673.989087]  new_sync_read+0x118/0x1a0
[ 3673.989095]  vfs_read+0xf3/0x180
[ 3673.989099]  ksys_read+0x5f/0xe0
[ 3673.989102]  do_syscall_64+0x3b/0x90
[ 3673.989109]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3673.989115] RIP: 0033:0x7f8467d0b082
[ 3673.989119] Code: c0 e9 b2 fe ff ff 50 48 8d 3d ca 05 08 00 e8 35 e7 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 3673.989121] RSP: 002b:00007ffffb21fd08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 3673.989127] RAX: ffffffffffffffda RBX: 000000000100eca0 RCX: 00007f8467d0b082
[ 3673.989128] RDX: 00000000000003ff RSI: 00007ffffb21fdc0 RDI: 0000000000000003
[ 3673.989130] RBP: 00007f8467b96028 R08: 0000000000000010 R09: 00007ffffb21ec00
[ 3673.989132] R10: 00007ffffb27b170 R11: 0000000000000246 R12: 00000000000000f0
[ 3673.989134] R13: 0000000000000003 R14: 00007f8467b92000 R15: 0000000000045a05
[ 3673.989139] CPU: 30 PID: 285188 Comm: read_all Kdump: loaded Tainted: G        W  OE

Fix this by collecting the statistics asynchronously from a periodic
delayed work scheduled at default stats coalescing interval and return
the recent copy of statisitcs from .ndo_get_stats64(), also add ability
to configure/retrieve stats coalescing interval using below commands -

ethtool -C ethx stats-block-usecs <val>
ethtool -c ethx

Fixes: 133fac0eedc3 ("qede: Add basic ethtool support")
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Manish Chopra <manishc@marvell.com>
---
v1->v2:
 - Fixed checkpatch and kdoc warnings.
v2->v3:
 - Moving the changelog after tags.
v3->v4:
 - Changes to collect stats periodically using delayed work
   and add ability to configure/retrieve stats coalescing
   interval using ethtool
 - Modified commit description to reflect the changes
---
 drivers/net/ethernet/qlogic/qede/qede.h       |  9 +++++
 .../net/ethernet/qlogic/qede/qede_ethtool.c   | 38 ++++++++++++++++++-
 drivers/net/ethernet/qlogic/qede/qede_main.c  | 34 ++++++++++++++++-
 3 files changed, 78 insertions(+), 3 deletions(-)

Comments

Jakub Kicinski May 20, 2023, 4:30 a.m. UTC | #1
On Thu, 18 May 2023 20:22:14 +0530 Manish Chopra wrote:
> Bonding module collects the statistics while holding
> the spinlock, beneath that qede->qed driver statistics
> flow gets scheduled out due to usleep_range() used in PTT
> acquire logic which results into below bug and traces -

>  	struct qede_dump_info		dump_info;
> +	struct delayed_work		periodic_task;
> +	unsigned long			stats_coal_interval;
> +	u32				stats_coal_ticks;

It's a bit odd to make _interval ulong and ticks u32 when _ticks will
obviously be much larger..

Also - s/ticks/usecs/ ? I'd have guessed interval == usecs, ticks ==
jiffies without reading the code, and the inverse is true.

> +	spinlock_t			stats_lock; /* lock for vport stats access */
>  };

> +	if (edev->stats_coal_ticks != coal->stats_block_coalesce_usecs) {
> +		u32 stats_coal_ticks, prev_stats_coal_ticks;
> +
> +		stats_coal_ticks = coal->stats_block_coalesce_usecs;
> +		prev_stats_coal_ticks = edev->stats_coal_ticks;
> +
> +		/* zero coal ticks to disable periodic stats */
> +		if (stats_coal_ticks)
> +			stats_coal_ticks = clamp_t(u32, stats_coal_ticks,
> +						   QEDE_MIN_STATS_COAL_TICKS,
> +						   QEDE_MAX_STATS_COAL_TICKS);
> +
> +		stats_coal_ticks = rounddown(stats_coal_ticks, QEDE_MIN_STATS_COAL_TICKS);
> +		edev->stats_coal_ticks = stats_coal_ticks;

Why round down the usecs?  Don't you want to return to the user on get
exactly what set specified?  Otherwise I wouldn't bother saving the
usecs at all, just convert back from jiffies.

> +		if (edev->stats_coal_ticks) {
> +			edev->stats_coal_interval = (unsigned long)edev->stats_coal_ticks *
> +							HZ / 1000000;

usecs_to_jiffies()

> +			if (prev_stats_coal_ticks == 0)
> +				schedule_delayed_work(&edev->periodic_task, 0);
> +		}
> +
> +		DP_VERBOSE(edev, QED_MSG_DEBUG, "stats coal interval=%lu jiffies\n",
> +			   edev->stats_coal_interval);
> +	}
> +
>  	if (!netif_running(dev)) {
>  		DP_INFO(edev, "Interface is down\n");
>  		return -EINVAL;
Manish Chopra May 22, 2023, 1:09 p.m. UTC | #2
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Saturday, May 20, 2023 10:01 AM
> To: Manish Chopra <manishc@marvell.com>
> Cc: netdev@vger.kernel.org; Ariel Elior <aelior@marvell.com>; Alok Prasad
> <palok@marvell.com>; Sudarsana Reddy Kalluru <skalluru@marvell.com>;
> David Miller <davem@davemloft.net>
> Subject: [EXT] Re: [PATCH v4 net] qede: Fix scheduling while atomic
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Thu, 18 May 2023 20:22:14 +0530 Manish Chopra wrote:
> > Bonding module collects the statistics while holding the spinlock,
> > beneath that qede->qed driver statistics flow gets scheduled out due
> > to usleep_range() used in PTT acquire logic which results into below
> > bug and traces -
> 
> >  	struct qede_dump_info		dump_info;
> > +	struct delayed_work		periodic_task;
> > +	unsigned long			stats_coal_interval;
> > +	u32				stats_coal_ticks;
> 
> It's a bit odd to make _interval ulong and ticks u32 when _ticks will obviously
> be much larger..
> 
> Also - s/ticks/usecs/ ? I'd have guessed interval == usecs, ticks == jiffies
> without reading the code, and the inverse is true.

I will rename to avoid the confusion (ticks to usecs and interval to ticks).

> 
> > +	spinlock_t			stats_lock; /* lock for vport stats
> access */
> >  };
> 
> > +	if (edev->stats_coal_ticks != coal->stats_block_coalesce_usecs) {
> > +		u32 stats_coal_ticks, prev_stats_coal_ticks;
> > +
> > +		stats_coal_ticks = coal->stats_block_coalesce_usecs;
> > +		prev_stats_coal_ticks = edev->stats_coal_ticks;
> > +
> > +		/* zero coal ticks to disable periodic stats */
> > +		if (stats_coal_ticks)
> > +			stats_coal_ticks = clamp_t(u32, stats_coal_ticks,
> > +
> QEDE_MIN_STATS_COAL_TICKS,
> > +
> QEDE_MAX_STATS_COAL_TICKS);
> > +
> > +		stats_coal_ticks = rounddown(stats_coal_ticks,
> QEDE_MIN_STATS_COAL_TICKS);
> > +		edev->stats_coal_ticks = stats_coal_ticks;
> 
> Why round down the usecs?  Don't you want to return to the user on get
> exactly what set specified?  Otherwise I wouldn't bother saving the usecs at
> all, just convert back from jiffies.

Maybe I should not put this and allow user to configure any usecs value (by not limiting to any min/max)
as long as it results into different/unique ticks/jiffies. But I should still save usecs rather than ticks/jiffies
(or preferably both in order to avoid runtime conversion at the time of scheduling the work every time)
because on get jiffies_to_usecs() could be misleading, for example if user configured 10 usecs which will
result into single jiffy and on getting it back it will report 1000 usecs when converting back using jiffies_to_usecs()
which is still not the same as what user configured.

> 
> > +		if (edev->stats_coal_ticks) {
> > +			edev->stats_coal_interval = (unsigned long)edev-
> >stats_coal_ticks *
> > +							HZ / 1000000;
> 
> usecs_to_jiffies()
> 
> > +			if (prev_stats_coal_ticks == 0)
> > +				schedule_delayed_work(&edev-
> >periodic_task, 0);
> > +		}
> > +
> > +		DP_VERBOSE(edev, QED_MSG_DEBUG, "stats coal
> interval=%lu jiffies\n",
> > +			   edev->stats_coal_interval);
> > +	}
> > +
> >  	if (!netif_running(dev)) {
> >  		DP_INFO(edev, "Interface is down\n");
> >  		return -EINVAL;
> --
> pw-bot: cr
diff mbox series

Patch

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index f90dcfe9ee68..7106e29e2c2c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -32,6 +32,11 @@ 
 
 #define DRV_MODULE_SYM		qede
 
+#define QEDE_DEF_STATS_COAL_INTERVAL	HZ
+#define QEDE_DEF_STATS_COAL_TICKS	1000000
+#define QEDE_MIN_STATS_COAL_TICKS	250000
+#define QEDE_MAX_STATS_COAL_TICKS	5000000
+
 struct qede_stats_common {
 	u64 no_buff_discards;
 	u64 packet_too_big_discard;
@@ -271,6 +276,10 @@  struct qede_dev {
 #define QEDE_ERR_WARN			3
 
 	struct qede_dump_info		dump_info;
+	struct delayed_work		periodic_task;
+	unsigned long			stats_coal_interval;
+	u32				stats_coal_ticks;
+	spinlock_t			stats_lock; /* lock for vport stats access */
 };
 
 enum QEDE_STATE {
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index 8284c4c1528f..3457f2af4bde 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -426,6 +426,8 @@  static void qede_get_ethtool_stats(struct net_device *dev,
 		}
 	}
 
+	spin_lock(&edev->stats_lock);
+
 	for (i = 0; i < QEDE_NUM_STATS; i++) {
 		if (qede_is_irrelevant_stat(edev, i))
 			continue;
@@ -435,6 +437,8 @@  static void qede_get_ethtool_stats(struct net_device *dev,
 		buf++;
 	}
 
+	spin_unlock(&edev->stats_lock);
+
 	__qede_unlock(edev);
 }
 
@@ -817,6 +821,7 @@  static int qede_get_coalesce(struct net_device *dev,
 
 	coal->rx_coalesce_usecs = rx_coal;
 	coal->tx_coalesce_usecs = tx_coal;
+	coal->stats_block_coalesce_usecs = edev->stats_coal_ticks;
 
 	return rc;
 }
@@ -830,6 +835,33 @@  int qede_set_coalesce(struct net_device *dev, struct ethtool_coalesce *coal,
 	int i, rc = 0;
 	u16 rxc, txc;
 
+	if (edev->stats_coal_ticks != coal->stats_block_coalesce_usecs) {
+		u32 stats_coal_ticks, prev_stats_coal_ticks;
+
+		stats_coal_ticks = coal->stats_block_coalesce_usecs;
+		prev_stats_coal_ticks = edev->stats_coal_ticks;
+
+		/* zero coal ticks to disable periodic stats */
+		if (stats_coal_ticks)
+			stats_coal_ticks = clamp_t(u32, stats_coal_ticks,
+						   QEDE_MIN_STATS_COAL_TICKS,
+						   QEDE_MAX_STATS_COAL_TICKS);
+
+		stats_coal_ticks = rounddown(stats_coal_ticks, QEDE_MIN_STATS_COAL_TICKS);
+		edev->stats_coal_ticks = stats_coal_ticks;
+
+		if (edev->stats_coal_ticks) {
+			edev->stats_coal_interval = (unsigned long)edev->stats_coal_ticks *
+							HZ / 1000000;
+
+			if (prev_stats_coal_ticks == 0)
+				schedule_delayed_work(&edev->periodic_task, 0);
+		}
+
+		DP_VERBOSE(edev, QED_MSG_DEBUG, "stats coal interval=%lu jiffies\n",
+			   edev->stats_coal_interval);
+	}
+
 	if (!netif_running(dev)) {
 		DP_INFO(edev, "Interface is down\n");
 		return -EINVAL;
@@ -2236,7 +2268,8 @@  static int qede_get_per_coalesce(struct net_device *dev,
 }
 
 static const struct ethtool_ops qede_ethtool_ops = {
-	.supported_coalesce_params	= ETHTOOL_COALESCE_USECS,
+	.supported_coalesce_params	= ETHTOOL_COALESCE_USECS |
+					  ETHTOOL_COALESCE_STATS_BLOCK_USECS,
 	.get_link_ksettings		= qede_get_link_ksettings,
 	.set_link_ksettings		= qede_set_link_ksettings,
 	.get_drvinfo			= qede_get_drvinfo,
@@ -2287,7 +2320,8 @@  static const struct ethtool_ops qede_ethtool_ops = {
 };
 
 static const struct ethtool_ops qede_vf_ethtool_ops = {
-	.supported_coalesce_params	= ETHTOOL_COALESCE_USECS,
+	.supported_coalesce_params	= ETHTOOL_COALESCE_USECS |
+					  ETHTOOL_COALESCE_STATS_BLOCK_USECS,
 	.get_link_ksettings		= qede_get_link_ksettings,
 	.get_drvinfo			= qede_get_drvinfo,
 	.get_msglevel			= qede_get_msglevel,
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 06c6a5813606..5aba9c4a759d 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -308,6 +308,8 @@  void qede_fill_by_demand_stats(struct qede_dev *edev)
 
 	edev->ops->get_vport_stats(edev->cdev, &stats);
 
+	spin_lock(&edev->stats_lock);
+
 	p_common->no_buff_discards = stats.common.no_buff_discards;
 	p_common->packet_too_big_discard = stats.common.packet_too_big_discard;
 	p_common->ttl0_discard = stats.common.ttl0_discard;
@@ -405,6 +407,8 @@  void qede_fill_by_demand_stats(struct qede_dev *edev)
 		p_ah->tx_1519_to_max_byte_packets =
 		    stats.ah.tx_1519_to_max_byte_packets;
 	}
+
+	spin_unlock(&edev->stats_lock);
 }
 
 static void qede_get_stats64(struct net_device *dev,
@@ -413,9 +417,10 @@  static void qede_get_stats64(struct net_device *dev,
 	struct qede_dev *edev = netdev_priv(dev);
 	struct qede_stats_common *p_common;
 
-	qede_fill_by_demand_stats(edev);
 	p_common = &edev->stats.common;
 
+	spin_lock(&edev->stats_lock);
+
 	stats->rx_packets = p_common->rx_ucast_pkts + p_common->rx_mcast_pkts +
 			    p_common->rx_bcast_pkts;
 	stats->tx_packets = p_common->tx_ucast_pkts + p_common->tx_mcast_pkts +
@@ -435,6 +440,8 @@  static void qede_get_stats64(struct net_device *dev,
 		stats->collisions = edev->stats.bb.tx_total_collisions;
 	stats->rx_crc_errors = p_common->rx_crc_errors;
 	stats->rx_frame_errors = p_common->rx_align_errors;
+
+	spin_unlock(&edev->stats_lock);
 }
 
 #ifdef CONFIG_QED_SRIOV
@@ -1000,6 +1007,21 @@  static void qede_unlock(struct qede_dev *edev)
 	rtnl_unlock();
 }
 
+static void qede_periodic_task(struct work_struct *work)
+{
+	struct qede_dev *edev = container_of(work, struct qede_dev,
+					     periodic_task.work);
+
+	if (test_bit(QEDE_SP_DISABLE, &edev->sp_flags))
+		return;
+
+	if (edev->stats_coal_ticks) {
+		qede_fill_by_demand_stats(edev);
+		schedule_delayed_work(&edev->periodic_task,
+				      edev->stats_coal_interval);
+	}
+}
+
 static void qede_sp_task(struct work_struct *work)
 {
 	struct qede_dev *edev = container_of(work, struct qede_dev,
@@ -1208,7 +1230,9 @@  static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level,
 		 * from there, although it's unlikely].
 		 */
 		INIT_DELAYED_WORK(&edev->sp_task, qede_sp_task);
+		INIT_DELAYED_WORK(&edev->periodic_task, qede_periodic_task);
 		mutex_init(&edev->qede_lock);
+		spin_lock_init(&edev->stats_lock);
 
 		rc = register_netdev(edev->ndev);
 		if (rc) {
@@ -1233,6 +1257,11 @@  static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level,
 	edev->rx_copybreak = QEDE_RX_HDR_SIZE;
 
 	qede_log_probe(edev);
+
+	edev->stats_coal_interval = QEDE_DEF_STATS_COAL_INTERVAL;
+	edev->stats_coal_ticks = QEDE_DEF_STATS_COAL_TICKS;
+	schedule_delayed_work(&edev->periodic_task, 0);
+
 	return 0;
 
 err4:
@@ -1301,6 +1330,7 @@  static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode)
 		unregister_netdev(ndev);
 
 		cancel_delayed_work_sync(&edev->sp_task);
+		cancel_delayed_work_sync(&edev->periodic_task);
 
 		edev->ops->common->set_power_state(cdev, PCI_D0);
 
@@ -2571,6 +2601,8 @@  static void qede_recovery_handler(struct qede_dev *edev)
 
 	DP_NOTICE(edev, "Starting a recovery process\n");
 
+	edev->stats_coal_ticks = 0;
+
 	/* No need to acquire first the qede_lock since is done by qede_sp_task
 	 * before calling this function.
 	 */