diff mbox series

[v5,5/8] PCI/AER: Rename struct aer_stats to aer_report

Message ID 20250321015806.954866-6-pandoh@google.com (mailing list archive)
State New
Delegated to: Bjorn Helgaas
Headers show
Series Rate limit AER logs | expand

Commit Message

Jon Pan-Doh March 21, 2025, 1:58 a.m. UTC
Update name to reflect the broader definition of structs/variables that
are stored (e.g. ratelimits). This is a preparatory patch for adding rate
limit support.

Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Reported-by: Sargun Dhillon <sargun@meta.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Karolina Stolarek <karolina.stolarek@oracle.com>
---
 drivers/pci/pcie/aer.c | 50 +++++++++++++++++++++---------------------
 include/linux/pci.h    |  2 +-
 2 files changed, 26 insertions(+), 26 deletions(-)

Comments

Bjorn Helgaas March 21, 2025, 10:01 p.m. UTC | #1
On Thu, Mar 20, 2025 at 06:58:03PM -0700, Jon Pan-Doh wrote:
> Update name to reflect the broader definition of structs/variables that
> are stored (e.g. ratelimits). This is a preparatory patch for adding rate
> limit support.
> 
> Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
> Signed-off-by: Jon Pan-Doh <pandoh@google.com>
> Reported-by: Sargun Dhillon <sargun@meta.com>

What did Sargun report?  Is there a bug fix in here?  Can we include a
URL to whatever Sargun reported?
Jon Pan-Doh March 21, 2025, 10:15 p.m. UTC | #2
On Fri, Mar 21, 2025 at 3:01 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> What did Sargun report?  Is there a bug fix in here?  Can we include a
> URL to whatever Sargun reported?

Paul added Reported-By and Acked-By tags[1] for the series which I
applied to each patch. Checkpatch mildly complained about not having a
Closes tag for Reported-By.

Sargun/Paul, do you have a Closes/URL tag?

Thanks,
Jon
Paul E. McKenney March 21, 2025, 10:16 p.m. UTC | #3
On Fri, Mar 21, 2025 at 05:01:15PM -0500, Bjorn Helgaas wrote:
> On Thu, Mar 20, 2025 at 06:58:03PM -0700, Jon Pan-Doh wrote:
> > Update name to reflect the broader definition of structs/variables that
> > are stored (e.g. ratelimits). This is a preparatory patch for adding rate
> > limit support.
> > 
> > Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
> > Signed-off-by: Jon Pan-Doh <pandoh@google.com>
> > Reported-by: Sargun Dhillon <sargun@meta.com>
> 
> What did Sargun report?  Is there a bug fix in here?  Can we include a
> URL to whatever Sargun reported?

He reported RCU CPU stall warnings and CSD-lock warnings internally
within Meta, so sorry, no useful URL.

I did put together a series of hacks that fix the problem: (1) Disabling
__aer_print_error() entirely, (2) Disabling __aer_print_error() printk()
and sysfs, (3) Disabling only __aer_print_error() printk(), and finally
(4) Throttling __aer_print_error() printk().  Jon's patch looks to
cover my #4 plus it looks to allow run-time control of the throttling.
So my patch is strictly a stop-the-bleeding measure for Meta's fleet
while this patch series makes its way upstream.

I do plan to look at Jon's patch in more detail when he posts the next
version.

Fair enough?

							Thanx, Paul
Paul E. McKenney March 21, 2025, 10:30 p.m. UTC | #4
On Fri, Mar 21, 2025 at 03:15:26PM -0700, Jon Pan-Doh wrote:
> On Fri, Mar 21, 2025 at 3:01 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > What did Sargun report?  Is there a bug fix in here?  Can we include a
> > URL to whatever Sargun reported?
> 
> Paul added Reported-By and Acked-By tags[1] for the series which I
> applied to each patch. Checkpatch mildly complained about not having a
> Closes tag for Reported-By.
> 
> Sargun/Paul, do you have a Closes/URL tag?

The URL of this email?

Perhaps more productively, I hope that we can run tests on this series,
but it will take some time.

							Thanx, Paul
Bjorn Helgaas March 21, 2025, 10:39 p.m. UTC | #5
On Fri, Mar 21, 2025 at 03:16:48PM -0700, Paul E. McKenney wrote:
> On Fri, Mar 21, 2025 at 05:01:15PM -0500, Bjorn Helgaas wrote:
> > On Thu, Mar 20, 2025 at 06:58:03PM -0700, Jon Pan-Doh wrote:
> > > Update name to reflect the broader definition of structs/variables that
> > > are stored (e.g. ratelimits). This is a preparatory patch for adding rate
> > > limit support.
> > > 
> > > Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
> > > Signed-off-by: Jon Pan-Doh <pandoh@google.com>
> > > Reported-by: Sargun Dhillon <sargun@meta.com>
> > 
> > What did Sargun report?  Is there a bug fix in here?  Can we include a
> > URL to whatever Sargun reported?
> 
> He reported RCU CPU stall warnings and CSD-lock warnings internally
> within Meta, so sorry, no useful URL.

Oh, I see now how this happened via your ack email, Paul.  So I think
it would make sense for Jon to add Sargun's reported-by to "[PATCH v5
6/8] PCI/AER: Introduce ratelimit for error logs", along with a line
in the commit log connecting Sargun with the RCU CPU stall warnings,
etc., because that's the patch that actually addresses those warnings.

I wouldn't add it to the other patches because it's just confusing.

Bjorn
Paul E. McKenney March 21, 2025, 10:47 p.m. UTC | #6
On Fri, Mar 21, 2025 at 05:39:30PM -0500, Bjorn Helgaas wrote:
> On Fri, Mar 21, 2025 at 03:16:48PM -0700, Paul E. McKenney wrote:
> > On Fri, Mar 21, 2025 at 05:01:15PM -0500, Bjorn Helgaas wrote:
> > > On Thu, Mar 20, 2025 at 06:58:03PM -0700, Jon Pan-Doh wrote:
> > > > Update name to reflect the broader definition of structs/variables that
> > > > are stored (e.g. ratelimits). This is a preparatory patch for adding rate
> > > > limit support.
> > > > 
> > > > Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
> > > > Signed-off-by: Jon Pan-Doh <pandoh@google.com>
> > > > Reported-by: Sargun Dhillon <sargun@meta.com>
> > > 
> > > What did Sargun report?  Is there a bug fix in here?  Can we include a
> > > URL to whatever Sargun reported?
> > 
> > He reported RCU CPU stall warnings and CSD-lock warnings internally
> > within Meta, so sorry, no useful URL.
> 
> Oh, I see now how this happened via your ack email, Paul.  So I think
> it would make sense for Jon to add Sargun's reported-by to "[PATCH v5
> 6/8] PCI/AER: Introduce ratelimit for error logs", along with a line
> in the commit log connecting Sargun with the RCU CPU stall warnings,
> etc., because that's the patch that actually addresses those warnings.
> 
> I wouldn't add it to the other patches because it's just confusing.

Works for me!

							Thanx, Paul
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 3c63a6963608..f657edca8769 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -54,11 +54,11 @@  struct aer_rpc {
 	DECLARE_KFIFO(aer_fifo, struct aer_err_source, AER_ERROR_SOURCES_MAX);
 };
 
-/* AER stats for the device */
-struct aer_stats {
+/* AER report for the device */
+struct aer_report {
 
 	/*
-	 * Fields for all AER capable devices. They indicate the errors
+	 * Stats for all AER capable devices. They indicate the errors
 	 * "as seen by this device". Note that this may mean that if an
 	 * end point is causing problems, the AER counters may increment
 	 * at its link partner (e.g. root port) because the errors will be
@@ -80,7 +80,7 @@  struct aer_stats {
 	u64 dev_total_nonfatal_errs;
 
 	/*
-	 * Fields for Root ports & root complex event collectors only, these
+	 * Stats for Root ports & root complex event collectors only, these
 	 * indicate the total number of ERR_COR, ERR_FATAL, and ERR_NONFATAL
 	 * messages received by the root port / event collector, INCLUDING the
 	 * ones that are generated internally (by the rootport itself)
@@ -377,7 +377,7 @@  void pci_aer_init(struct pci_dev *dev)
 	if (!dev->aer_cap)
 		return;
 
-	dev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
+	dev->aer_report = kzalloc(sizeof(*dev->aer_report), GFP_KERNEL);
 
 	/*
 	 * We save/restore PCI_ERR_UNCOR_MASK, PCI_ERR_UNCOR_SEVER,
@@ -398,8 +398,8 @@  void pci_aer_init(struct pci_dev *dev)
 
 void pci_aer_exit(struct pci_dev *dev)
 {
-	kfree(dev->aer_stats);
-	dev->aer_stats = NULL;
+	kfree(dev->aer_report);
+	dev->aer_report = NULL;
 }
 
 #define AER_AGENT_RECEIVER		0
@@ -537,10 +537,10 @@  static const char *aer_agent_string[] = {
 {									\
 	unsigned int i;							\
 	struct pci_dev *pdev = to_pci_dev(dev);				\
-	u64 *stats = pdev->aer_stats->stats_array;			\
+	u64 *stats = pdev->aer_report->stats_array;			\
 	size_t len = 0;							\
 									\
-	for (i = 0; i < ARRAY_SIZE(pdev->aer_stats->stats_array); i++) {\
+	for (i = 0; i < ARRAY_SIZE(pdev->aer_report->stats_array); i++) {\
 		if (strings_array[i])					\
 			len += sysfs_emit_at(buf, len, "%s %llu\n",	\
 					     strings_array[i],		\
@@ -551,7 +551,7 @@  static const char *aer_agent_string[] = {
 					     i, stats[i]);		\
 	}								\
 	len += sysfs_emit_at(buf, len, "TOTAL_%s %llu\n", total_string,	\
-			     pdev->aer_stats->total_field);		\
+			     pdev->aer_report->total_field);		\
 	return len;							\
 }									\
 static DEVICE_ATTR_RO(name)
@@ -572,7 +572,7 @@  aer_stats_dev_attr(aer_dev_nonfatal, dev_nonfatal_errs,
 		     char *buf)						\
 {									\
 	struct pci_dev *pdev = to_pci_dev(dev);				\
-	return sysfs_emit(buf, "%llu\n", pdev->aer_stats->field);	\
+	return sysfs_emit(buf, "%llu\n", pdev->aer_report->field);	\
 }									\
 static DEVICE_ATTR_RO(name)
 
@@ -599,7 +599,7 @@  static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
 	struct device *dev = kobj_to_dev(kobj);
 	struct pci_dev *pdev = to_pci_dev(dev);
 
-	if (!pdev->aer_stats)
+	if (!pdev->aer_report)
 		return 0;
 
 	if ((a == &dev_attr_aer_rootport_total_err_cor.attr ||
@@ -622,25 +622,25 @@  void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info)
 	unsigned long status = info->status & ~info->mask;
 	int i, max = -1;
 	u64 *counter = NULL;
-	struct aer_stats *aer_stats = pdev->aer_stats;
+	struct aer_report *aer_report = pdev->aer_report;
 
-	if (!aer_stats)
+	if (!aer_report)
 		return;
 
 	switch (info->severity) {
 	case AER_CORRECTABLE:
-		aer_stats->dev_total_cor_errs++;
-		counter = &aer_stats->dev_cor_errs[0];
+		aer_report->dev_total_cor_errs++;
+		counter = &aer_report->dev_cor_errs[0];
 		max = AER_MAX_TYPEOF_COR_ERRS;
 		break;
 	case AER_NONFATAL:
-		aer_stats->dev_total_nonfatal_errs++;
-		counter = &aer_stats->dev_nonfatal_errs[0];
+		aer_report->dev_total_nonfatal_errs++;
+		counter = &aer_report->dev_nonfatal_errs[0];
 		max = AER_MAX_TYPEOF_UNCOR_ERRS;
 		break;
 	case AER_FATAL:
-		aer_stats->dev_total_fatal_errs++;
-		counter = &aer_stats->dev_fatal_errs[0];
+		aer_report->dev_total_fatal_errs++;
+		counter = &aer_report->dev_fatal_errs[0];
 		max = AER_MAX_TYPEOF_UNCOR_ERRS;
 		break;
 	}
@@ -652,19 +652,19 @@  void pci_dev_aer_stats_incr(struct pci_dev *pdev, struct aer_err_info *info)
 static void pci_rootport_aer_stats_incr(struct pci_dev *pdev,
 				 struct aer_err_source *e_src)
 {
-	struct aer_stats *aer_stats = pdev->aer_stats;
+	struct aer_report *aer_report = pdev->aer_report;
 
-	if (!aer_stats)
+	if (!aer_report)
 		return;
 
 	if (e_src->status & PCI_ERR_ROOT_COR_RCV)
-		aer_stats->rootport_total_cor_errs++;
+		aer_report->rootport_total_cor_errs++;
 
 	if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
 		if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
-			aer_stats->rootport_total_fatal_errs++;
+			aer_report->rootport_total_fatal_errs++;
 		else
-			aer_stats->rootport_total_nonfatal_errs++;
+			aer_report->rootport_total_nonfatal_errs++;
 	}
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e4bf67bf8172..900edb6f8f62 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -346,7 +346,7 @@  struct pci_dev {
 	u8		hdr_type;	/* PCI header type (`multi' flag masked out) */
 #ifdef CONFIG_PCIEAER
 	u16		aer_cap;	/* AER capability offset */
-	struct aer_stats *aer_stats;	/* AER stats for this device */
+	struct aer_report *aer_report;	/* AER report for this device */
 #endif
 #ifdef CONFIG_PCIEPORTBUS
 	struct rcec_ea	*rcec_ea;	/* RCEC cached endpoint association */