Message ID | 20250115074301.3514927-9-pandoh@google.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Rate limit AER logs/IRQs | expand |
On 15/01/2025 08:43, Jon Pan-Doh wrote: > Prepare for the addition of new AER sysfs attributes (e.g. ratelimits) > by moving them into their own directory. Update naming to reflect > broader definition and for consistency. > > /sys/bus/pci/devices/<dev>/aer_dev_correctable > /sys/bus/pci/devices/<dev>/aer_dev_fatal > /sys/bus/pci/devices/<dev>/aer_dev_nonfatal > /sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor > /sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal > /sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal > -> > /sys/bus/pci/devices/<dev>/aer/err_cor > /sys/bus/pci/devices/<dev>/aer/err_fatal > /sys/bus/pci/devices/<dev>/aer/err_nonfatal > /sys/bus/pci/devices/<dev>/aer/rootport_total_err_cor > /sys/bus/pci/devices/<dev>/aer/rootport_total_err_fatal > /sys/bus/pci/devices/<dev>/aer/rootport_total_err_nonfatal > > Tested using aer-inject[1] tool. Sent 1 AER error. Observed AER stats > correctedly logged (cat /sys/bus/pci/devices/<dev>/aer/dev_err_cor). I'm not a sysfs expert but my understanding is that we shouldn't do major changes in the existing hierarchies. On one hand, I think it would be nice to extract out AER-specific info and knobs into a subdirectory (e.g., using attribute_group with name "aer"), but on the other this would be disruptive to the userspace. I can imagine that there are tools that watch these values that would break after this change. All the best, Karolina > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git > > Signed-off-by: Jon Pan-Doh <pandoh@google.com> > --- > .../ABI/testing/sysfs-bus-pci-devices-aer | 18 +++--- > drivers/pci/pci-sysfs.c | 1 - > drivers/pci/pci.h | 1 - > drivers/pci/pcie/aer.c | 64 +++++++------------ > 4 files changed, 32 insertions(+), 52 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer > index c680a53af0f4..e1472583207b 100644 > --- a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer > +++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer > @@ -9,7 +9,7 @@ errors may be "seen" / reported by the link partner and not the > problematic endpoint itself (which may report all counters as 0 as it never > saw any problems). > > -What: /sys/bus/pci/devices/<dev>/aer_dev_correctable > +What: /sys/bus/pci/devices/<dev>/aer/err_cor > Date: July 2018 > KernelVersion: 4.19.0 > Contact: linux-pci@vger.kernel.org, rajatja@google.com > @@ -19,7 +19,7 @@ Description: List of correctable errors seen and reported by this > TOTAL_ERR_COR at the end of the file may not match the actual > total of all the errors in the file. Sample output:: > > - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_correctable > + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_cor > Receiver Error 2 > Bad TLP 0 > Bad DLLP 0 > @@ -30,7 +30,7 @@ Description: List of correctable errors seen and reported by this > Header Log Overflow 0 > TOTAL_ERR_COR 2 > > -What: /sys/bus/pci/devices/<dev>/aer_dev_fatal > +What: /sys/bus/pci/devices/<dev>/aer/err_fatal > Date: July 2018 > KernelVersion: 4.19.0 > Contact: linux-pci@vger.kernel.org, rajatja@google.com > @@ -40,7 +40,7 @@ Description: List of uncorrectable fatal errors seen and reported by this > TOTAL_ERR_FATAL at the end of the file may not match the actual > total of all the errors in the file. Sample output:: > > - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_fatal > + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_fatal > Undefined 0 > Data Link Protocol 0 > Surprise Down Error 0 > @@ -60,7 +60,7 @@ Description: List of uncorrectable fatal errors seen and reported by this > TLP Prefix Blocked Error 0 > TOTAL_ERR_FATAL 0 > > -What: /sys/bus/pci/devices/<dev>/aer_dev_nonfatal > +What: /sys/bus/pci/devices/<dev>/aer/err_nonfatal > Date: July 2018 > KernelVersion: 4.19.0 > Contact: linux-pci@vger.kernel.org, rajatja@google.com > @@ -70,7 +70,7 @@ Description: List of uncorrectable nonfatal errors seen and reported by this > TOTAL_ERR_NONFATAL at the end of the file may not match the > actual total of all the errors in the file. Sample output:: > > - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_nonfatal > + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_nonfatal > Undefined 0 > Data Link Protocol 0 > Surprise Down Error 0 > @@ -100,19 +100,19 @@ collectors) that are AER capable. These indicate the number of error messages as > device, so these counters include them and are thus cumulative of all the error > messages on the PCI hierarchy originating at that root port. > > -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor > +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_cor > Date: July 2018 > KernelVersion: 4.19.0 > Contact: linux-pci@vger.kernel.org, rajatja@google.com > Description: Total number of ERR_COR messages reported to rootport. > > -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal > +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_fatal > Date: July 2018 > KernelVersion: 4.19.0 > Contact: linux-pci@vger.kernel.org, rajatja@google.com > Description: Total number of ERR_FATAL messages reported to rootport. > > -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal > +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_nonfatal > Date: July 2018 > KernelVersion: 4.19.0 > Contact: linux-pci@vger.kernel.org, rajatja@google.com > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index 41acb6713e2d..e16b92edf3bd 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -1692,7 +1692,6 @@ const struct attribute_group *pci_dev_attr_groups[] = { > &pci_bridge_attr_group, > &pcie_dev_attr_group, > #ifdef CONFIG_PCIEAER > - &aer_stats_attr_group, > &aer_attr_group, > #endif > #ifdef CONFIG_PCIEASPM > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 9d0272a890ef..a80cfc08f634 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -880,7 +880,6 @@ static inline void of_pci_remove_node(struct pci_dev *pdev) { } > void pci_no_aer(void); > void pci_aer_init(struct pci_dev *dev); > void pci_aer_exit(struct pci_dev *dev); > -extern const struct attribute_group aer_stats_attr_group; > extern const struct attribute_group aer_attr_group; > void pci_aer_clear_fatal_status(struct pci_dev *dev); > int pci_aer_clear_status(struct pci_dev *dev); > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index e48e2951baae..68850525cc8d 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -569,13 +569,13 @@ static const char *aer_agent_string[] = { > } \ > static DEVICE_ATTR_RO(name) > > -aer_stats_dev_attr(aer_dev_correctable, dev_cor_errs, > +aer_stats_dev_attr(err_cor, dev_cor_errs, > aer_correctable_error_string, "ERR_COR", > dev_total_cor_errs); > -aer_stats_dev_attr(aer_dev_fatal, dev_fatal_errs, > +aer_stats_dev_attr(err_fatal, dev_fatal_errs, > aer_uncorrectable_error_string, "ERR_FATAL", > dev_total_fatal_errs); > -aer_stats_dev_attr(aer_dev_nonfatal, dev_nonfatal_errs, > +aer_stats_dev_attr(err_nonfatal, dev_nonfatal_errs, > aer_uncorrectable_error_string, "ERR_NONFATAL", > dev_total_nonfatal_errs); > > @@ -589,47 +589,13 @@ aer_stats_dev_attr(aer_dev_nonfatal, dev_nonfatal_errs, > } \ > static DEVICE_ATTR_RO(name) > > -aer_stats_rootport_attr(aer_rootport_total_err_cor, > +aer_stats_rootport_attr(rootport_total_err_cor, > rootport_total_cor_errs); > -aer_stats_rootport_attr(aer_rootport_total_err_fatal, > +aer_stats_rootport_attr(rootport_total_err_fatal, > rootport_total_fatal_errs); > -aer_stats_rootport_attr(aer_rootport_total_err_nonfatal, > +aer_stats_rootport_attr(rootport_total_err_nonfatal, > rootport_total_nonfatal_errs); > > -static struct attribute *aer_stats_attrs[] __ro_after_init = { > - &dev_attr_aer_dev_correctable.attr, > - &dev_attr_aer_dev_fatal.attr, > - &dev_attr_aer_dev_nonfatal.attr, > - &dev_attr_aer_rootport_total_err_cor.attr, > - &dev_attr_aer_rootport_total_err_fatal.attr, > - &dev_attr_aer_rootport_total_err_nonfatal.attr, > - NULL > -}; > - > -static umode_t aer_stats_attrs_are_visible(struct kobject *kobj, > - struct attribute *a, int n) > -{ > - struct device *dev = kobj_to_dev(kobj); > - struct pci_dev *pdev = to_pci_dev(dev); > - > - if (!pdev->aer_info) > - return 0; > - > - if ((a == &dev_attr_aer_rootport_total_err_cor.attr || > - a == &dev_attr_aer_rootport_total_err_fatal.attr || > - a == &dev_attr_aer_rootport_total_err_nonfatal.attr) && > - ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) && > - (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC))) > - return 0; > - > - return a->mode; > -} > - > -const struct attribute_group aer_stats_attr_group = { > - .attrs = aer_stats_attrs, > - .is_visible = aer_stats_attrs_are_visible, > -}; > - > #define aer_ratelimit_attr(name, ratelimit) \ > static ssize_t \ > name##_show(struct device *dev, struct device_attribute *attr, \ > @@ -662,6 +628,14 @@ aer_ratelimit_attr(ratelimit_cor_log, cor_log_ratelimit); > aer_ratelimit_attr(ratelimit_uncor_log, uncor_log_ratelimit); > > static struct attribute *aer_attrs[] __ro_after_init = { > + /* Stats */ > + &dev_attr_err_cor.attr, > + &dev_attr_err_fatal.attr, > + &dev_attr_err_nonfatal.attr, > + &dev_attr_rootport_total_err_cor.attr, > + &dev_attr_rootport_total_err_fatal.attr, > + &dev_attr_rootport_total_err_nonfatal.attr, > + /* Ratelimits */ > &dev_attr_ratelimit_cor_irq.attr, > &dev_attr_ratelimit_uncor_irq.attr, > &dev_attr_ratelimit_cor_log.attr, > @@ -670,13 +644,21 @@ static struct attribute *aer_attrs[] __ro_after_init = { > }; > > static umode_t aer_attrs_are_visible(struct kobject *kobj, > - struct attribute *a, int n) > + struct attribute *a, int n) > { > struct device *dev = kobj_to_dev(kobj); > struct pci_dev *pdev = to_pci_dev(dev); > > if (!pdev->aer_info) > return 0; > + > + if ((a == &dev_attr_rootport_total_err_cor.attr || > + a == &dev_attr_rootport_total_err_fatal.attr || > + a == &dev_attr_rootport_total_err_nonfatal.attr) && > + ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) && > + (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC))) > + return 0; > + > return a->mode; > } >
Hello, On Thu, Jan 16, 2025 at 2:26 AM Karolina Stolarek <karolina.stolarek@oracle.com> wrote: > > On 15/01/2025 08:43, Jon Pan-Doh wrote: > > Prepare for the addition of new AER sysfs attributes (e.g. ratelimits) > > by moving them into their own directory. Update naming to reflect > > broader definition and for consistency. > > > > /sys/bus/pci/devices/<dev>/aer_dev_correctable > > /sys/bus/pci/devices/<dev>/aer_dev_fatal > > /sys/bus/pci/devices/<dev>/aer_dev_nonfatal > > /sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor > > /sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal > > /sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal > > -> > > /sys/bus/pci/devices/<dev>/aer/err_cor > > /sys/bus/pci/devices/<dev>/aer/err_fatal > > /sys/bus/pci/devices/<dev>/aer/err_nonfatal > > /sys/bus/pci/devices/<dev>/aer/rootport_total_err_cor > > /sys/bus/pci/devices/<dev>/aer/rootport_total_err_fatal > > /sys/bus/pci/devices/<dev>/aer/rootport_total_err_nonfatal > > > > Tested using aer-inject[1] tool. Sent 1 AER error. Observed AER stats > > correctedly logged (cat /sys/bus/pci/devices/<dev>/aer/dev_err_cor). > > I'm not a sysfs expert but my understanding is that we shouldn't do > major changes in the existing hierarchies. > > On one hand, I think it would be nice to extract out AER-specific info > and knobs into a subdirectory (e.g., using attribute_group with name > "aer"), but on the other this would be disruptive to the userspace. I > can imagine that there are tools that watch these values that would > break after this change. Thank you. This is the right guidance. As the original author to introduce these attributes, I just wanted to chime in from the ChromeOS team's perspective (who originally introduced these attributes). I can say that we have used these attributes for debugging mostly manually, and do not have tools yet with hardcoded hierarchy / paths. So we wouldn't be opposed to it, if changes to the hierarchy have wider acceptance and it seems better in general. Thanks & Best Regards, Rajat > > All the best, > Karolina > > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git > > > > Signed-off-by: Jon Pan-Doh <pandoh@google.com> > > --- > > .../ABI/testing/sysfs-bus-pci-devices-aer | 18 +++--- > > drivers/pci/pci-sysfs.c | 1 - > > drivers/pci/pci.h | 1 - > > drivers/pci/pcie/aer.c | 64 +++++++------------ > > 4 files changed, 32 insertions(+), 52 deletions(-) > > > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer > > index c680a53af0f4..e1472583207b 100644 > > --- a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer > > +++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer > > @@ -9,7 +9,7 @@ errors may be "seen" / reported by the link partner and not the > > problematic endpoint itself (which may report all counters as 0 as it never > > saw any problems). > > > > -What: /sys/bus/pci/devices/<dev>/aer_dev_correctable > > +What: /sys/bus/pci/devices/<dev>/aer/err_cor > > Date: July 2018 > > KernelVersion: 4.19.0 > > Contact: linux-pci@vger.kernel.org, rajatja@google.com > > @@ -19,7 +19,7 @@ Description: List of correctable errors seen and reported by this > > TOTAL_ERR_COR at the end of the file may not match the actual > > total of all the errors in the file. Sample output:: > > > > - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_correctable > > + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_cor > > Receiver Error 2 > > Bad TLP 0 > > Bad DLLP 0 > > @@ -30,7 +30,7 @@ Description: List of correctable errors seen and reported by this > > Header Log Overflow 0 > > TOTAL_ERR_COR 2 > > > > -What: /sys/bus/pci/devices/<dev>/aer_dev_fatal > > +What: /sys/bus/pci/devices/<dev>/aer/err_fatal > > Date: July 2018 > > KernelVersion: 4.19.0 > > Contact: linux-pci@vger.kernel.org, rajatja@google.com > > @@ -40,7 +40,7 @@ Description: List of uncorrectable fatal errors seen and reported by this > > TOTAL_ERR_FATAL at the end of the file may not match the actual > > total of all the errors in the file. Sample output:: > > > > - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_fatal > > + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_fatal > > Undefined 0 > > Data Link Protocol 0 > > Surprise Down Error 0 > > @@ -60,7 +60,7 @@ Description: List of uncorrectable fatal errors seen and reported by this > > TLP Prefix Blocked Error 0 > > TOTAL_ERR_FATAL 0 > > > > -What: /sys/bus/pci/devices/<dev>/aer_dev_nonfatal > > +What: /sys/bus/pci/devices/<dev>/aer/err_nonfatal > > Date: July 2018 > > KernelVersion: 4.19.0 > > Contact: linux-pci@vger.kernel.org, rajatja@google.com > > @@ -70,7 +70,7 @@ Description: List of uncorrectable nonfatal errors seen and reported by this > > TOTAL_ERR_NONFATAL at the end of the file may not match the > > actual total of all the errors in the file. Sample output:: > > > > - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_nonfatal > > + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_nonfatal > > Undefined 0 > > Data Link Protocol 0 > > Surprise Down Error 0 > > @@ -100,19 +100,19 @@ collectors) that are AER capable. These indicate the number of error messages as > > device, so these counters include them and are thus cumulative of all the error > > messages on the PCI hierarchy originating at that root port. > > > > -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor > > +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_cor > > Date: July 2018 > > KernelVersion: 4.19.0 > > Contact: linux-pci@vger.kernel.org, rajatja@google.com > > Description: Total number of ERR_COR messages reported to rootport. > > > > -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal > > +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_fatal > > Date: July 2018 > > KernelVersion: 4.19.0 > > Contact: linux-pci@vger.kernel.org, rajatja@google.com > > Description: Total number of ERR_FATAL messages reported to rootport. > > > > -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal > > +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_nonfatal > > Date: July 2018 > > KernelVersion: 4.19.0 > > Contact: linux-pci@vger.kernel.org, rajatja@google.com > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > index 41acb6713e2d..e16b92edf3bd 100644 > > --- a/drivers/pci/pci-sysfs.c > > +++ b/drivers/pci/pci-sysfs.c > > @@ -1692,7 +1692,6 @@ const struct attribute_group *pci_dev_attr_groups[] = { > > &pci_bridge_attr_group, > > &pcie_dev_attr_group, > > #ifdef CONFIG_PCIEAER > > - &aer_stats_attr_group, > > &aer_attr_group, > > #endif > > #ifdef CONFIG_PCIEASPM > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > > index 9d0272a890ef..a80cfc08f634 100644 > > --- a/drivers/pci/pci.h > > +++ b/drivers/pci/pci.h > > @@ -880,7 +880,6 @@ static inline void of_pci_remove_node(struct pci_dev *pdev) { } > > void pci_no_aer(void); > > void pci_aer_init(struct pci_dev *dev); > > void pci_aer_exit(struct pci_dev *dev); > > -extern const struct attribute_group aer_stats_attr_group; > > extern const struct attribute_group aer_attr_group; > > void pci_aer_clear_fatal_status(struct pci_dev *dev); > > int pci_aer_clear_status(struct pci_dev *dev); > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index e48e2951baae..68850525cc8d 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -569,13 +569,13 @@ static const char *aer_agent_string[] = { > > } \ > > static DEVICE_ATTR_RO(name) > > > > -aer_stats_dev_attr(aer_dev_correctable, dev_cor_errs, > > +aer_stats_dev_attr(err_cor, dev_cor_errs, > > aer_correctable_error_string, "ERR_COR", > > dev_total_cor_errs); > > -aer_stats_dev_attr(aer_dev_fatal, dev_fatal_errs, > > +aer_stats_dev_attr(err_fatal, dev_fatal_errs, > > aer_uncorrectable_error_string, "ERR_FATAL", > > dev_total_fatal_errs); > > -aer_stats_dev_attr(aer_dev_nonfatal, dev_nonfatal_errs, > > +aer_stats_dev_attr(err_nonfatal, dev_nonfatal_errs, > > aer_uncorrectable_error_string, "ERR_NONFATAL", > > dev_total_nonfatal_errs); > > > > @@ -589,47 +589,13 @@ aer_stats_dev_attr(aer_dev_nonfatal, dev_nonfatal_errs, > > } \ > > static DEVICE_ATTR_RO(name) > > > > -aer_stats_rootport_attr(aer_rootport_total_err_cor, > > +aer_stats_rootport_attr(rootport_total_err_cor, > > rootport_total_cor_errs); > > -aer_stats_rootport_attr(aer_rootport_total_err_fatal, > > +aer_stats_rootport_attr(rootport_total_err_fatal, > > rootport_total_fatal_errs); > > -aer_stats_rootport_attr(aer_rootport_total_err_nonfatal, > > +aer_stats_rootport_attr(rootport_total_err_nonfatal, > > rootport_total_nonfatal_errs); > > > > -static struct attribute *aer_stats_attrs[] __ro_after_init = { > > - &dev_attr_aer_dev_correctable.attr, > > - &dev_attr_aer_dev_fatal.attr, > > - &dev_attr_aer_dev_nonfatal.attr, > > - &dev_attr_aer_rootport_total_err_cor.attr, > > - &dev_attr_aer_rootport_total_err_fatal.attr, > > - &dev_attr_aer_rootport_total_err_nonfatal.attr, > > - NULL > > -}; > > - > > -static umode_t aer_stats_attrs_are_visible(struct kobject *kobj, > > - struct attribute *a, int n) > > -{ > > - struct device *dev = kobj_to_dev(kobj); > > - struct pci_dev *pdev = to_pci_dev(dev); > > - > > - if (!pdev->aer_info) > > - return 0; > > - > > - if ((a == &dev_attr_aer_rootport_total_err_cor.attr || > > - a == &dev_attr_aer_rootport_total_err_fatal.attr || > > - a == &dev_attr_aer_rootport_total_err_nonfatal.attr) && > > - ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) && > > - (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC))) > > - return 0; > > - > > - return a->mode; > > -} > > - > > -const struct attribute_group aer_stats_attr_group = { > > - .attrs = aer_stats_attrs, > > - .is_visible = aer_stats_attrs_are_visible, > > -}; > > - > > #define aer_ratelimit_attr(name, ratelimit) \ > > static ssize_t \ > > name##_show(struct device *dev, struct device_attribute *attr, \ > > @@ -662,6 +628,14 @@ aer_ratelimit_attr(ratelimit_cor_log, cor_log_ratelimit); > > aer_ratelimit_attr(ratelimit_uncor_log, uncor_log_ratelimit); > > > > static struct attribute *aer_attrs[] __ro_after_init = { > > + /* Stats */ > > + &dev_attr_err_cor.attr, > > + &dev_attr_err_fatal.attr, > > + &dev_attr_err_nonfatal.attr, > > + &dev_attr_rootport_total_err_cor.attr, > > + &dev_attr_rootport_total_err_fatal.attr, > > + &dev_attr_rootport_total_err_nonfatal.attr, > > + /* Ratelimits */ > > &dev_attr_ratelimit_cor_irq.attr, > > &dev_attr_ratelimit_uncor_irq.attr, > > &dev_attr_ratelimit_cor_log.attr, > > @@ -670,13 +644,21 @@ static struct attribute *aer_attrs[] __ro_after_init = { > > }; > > > > static umode_t aer_attrs_are_visible(struct kobject *kobj, > > - struct attribute *a, int n) > > + struct attribute *a, int n) > > { > > struct device *dev = kobj_to_dev(kobj); > > struct pci_dev *pdev = to_pci_dev(dev); > > > > if (!pdev->aer_info) > > return 0; > > + > > + if ((a == &dev_attr_rootport_total_err_cor.attr || > > + a == &dev_attr_rootport_total_err_fatal.attr || > > + a == &dev_attr_rootport_total_err_nonfatal.attr) && > > + ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) && > > + (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC))) > > + return 0; > > + > > return a->mode; > > } > > > >
diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer index c680a53af0f4..e1472583207b 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer +++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer @@ -9,7 +9,7 @@ errors may be "seen" / reported by the link partner and not the problematic endpoint itself (which may report all counters as 0 as it never saw any problems). -What: /sys/bus/pci/devices/<dev>/aer_dev_correctable +What: /sys/bus/pci/devices/<dev>/aer/err_cor Date: July 2018 KernelVersion: 4.19.0 Contact: linux-pci@vger.kernel.org, rajatja@google.com @@ -19,7 +19,7 @@ Description: List of correctable errors seen and reported by this TOTAL_ERR_COR at the end of the file may not match the actual total of all the errors in the file. Sample output:: - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_correctable + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_cor Receiver Error 2 Bad TLP 0 Bad DLLP 0 @@ -30,7 +30,7 @@ Description: List of correctable errors seen and reported by this Header Log Overflow 0 TOTAL_ERR_COR 2 -What: /sys/bus/pci/devices/<dev>/aer_dev_fatal +What: /sys/bus/pci/devices/<dev>/aer/err_fatal Date: July 2018 KernelVersion: 4.19.0 Contact: linux-pci@vger.kernel.org, rajatja@google.com @@ -40,7 +40,7 @@ Description: List of uncorrectable fatal errors seen and reported by this TOTAL_ERR_FATAL at the end of the file may not match the actual total of all the errors in the file. Sample output:: - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_fatal + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_fatal Undefined 0 Data Link Protocol 0 Surprise Down Error 0 @@ -60,7 +60,7 @@ Description: List of uncorrectable fatal errors seen and reported by this TLP Prefix Blocked Error 0 TOTAL_ERR_FATAL 0 -What: /sys/bus/pci/devices/<dev>/aer_dev_nonfatal +What: /sys/bus/pci/devices/<dev>/aer/err_nonfatal Date: July 2018 KernelVersion: 4.19.0 Contact: linux-pci@vger.kernel.org, rajatja@google.com @@ -70,7 +70,7 @@ Description: List of uncorrectable nonfatal errors seen and reported by this TOTAL_ERR_NONFATAL at the end of the file may not match the actual total of all the errors in the file. Sample output:: - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_nonfatal + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_nonfatal Undefined 0 Data Link Protocol 0 Surprise Down Error 0 @@ -100,19 +100,19 @@ collectors) that are AER capable. These indicate the number of error messages as device, so these counters include them and are thus cumulative of all the error messages on the PCI hierarchy originating at that root port. -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_cor Date: July 2018 KernelVersion: 4.19.0 Contact: linux-pci@vger.kernel.org, rajatja@google.com Description: Total number of ERR_COR messages reported to rootport. -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_fatal Date: July 2018 KernelVersion: 4.19.0 Contact: linux-pci@vger.kernel.org, rajatja@google.com Description: Total number of ERR_FATAL messages reported to rootport. -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_nonfatal Date: July 2018 KernelVersion: 4.19.0 Contact: linux-pci@vger.kernel.org, rajatja@google.com diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index 41acb6713e2d..e16b92edf3bd 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1692,7 +1692,6 @@ const struct attribute_group *pci_dev_attr_groups[] = { &pci_bridge_attr_group, &pcie_dev_attr_group, #ifdef CONFIG_PCIEAER - &aer_stats_attr_group, &aer_attr_group, #endif #ifdef CONFIG_PCIEASPM diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 9d0272a890ef..a80cfc08f634 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -880,7 +880,6 @@ static inline void of_pci_remove_node(struct pci_dev *pdev) { } void pci_no_aer(void); void pci_aer_init(struct pci_dev *dev); void pci_aer_exit(struct pci_dev *dev); -extern const struct attribute_group aer_stats_attr_group; extern const struct attribute_group aer_attr_group; void pci_aer_clear_fatal_status(struct pci_dev *dev); int pci_aer_clear_status(struct pci_dev *dev); diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index e48e2951baae..68850525cc8d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -569,13 +569,13 @@ static const char *aer_agent_string[] = { } \ static DEVICE_ATTR_RO(name) -aer_stats_dev_attr(aer_dev_correctable, dev_cor_errs, +aer_stats_dev_attr(err_cor, dev_cor_errs, aer_correctable_error_string, "ERR_COR", dev_total_cor_errs); -aer_stats_dev_attr(aer_dev_fatal, dev_fatal_errs, +aer_stats_dev_attr(err_fatal, dev_fatal_errs, aer_uncorrectable_error_string, "ERR_FATAL", dev_total_fatal_errs); -aer_stats_dev_attr(aer_dev_nonfatal, dev_nonfatal_errs, +aer_stats_dev_attr(err_nonfatal, dev_nonfatal_errs, aer_uncorrectable_error_string, "ERR_NONFATAL", dev_total_nonfatal_errs); @@ -589,47 +589,13 @@ aer_stats_dev_attr(aer_dev_nonfatal, dev_nonfatal_errs, } \ static DEVICE_ATTR_RO(name) -aer_stats_rootport_attr(aer_rootport_total_err_cor, +aer_stats_rootport_attr(rootport_total_err_cor, rootport_total_cor_errs); -aer_stats_rootport_attr(aer_rootport_total_err_fatal, +aer_stats_rootport_attr(rootport_total_err_fatal, rootport_total_fatal_errs); -aer_stats_rootport_attr(aer_rootport_total_err_nonfatal, +aer_stats_rootport_attr(rootport_total_err_nonfatal, rootport_total_nonfatal_errs); -static struct attribute *aer_stats_attrs[] __ro_after_init = { - &dev_attr_aer_dev_correctable.attr, - &dev_attr_aer_dev_fatal.attr, - &dev_attr_aer_dev_nonfatal.attr, - &dev_attr_aer_rootport_total_err_cor.attr, - &dev_attr_aer_rootport_total_err_fatal.attr, - &dev_attr_aer_rootport_total_err_nonfatal.attr, - NULL -}; - -static umode_t aer_stats_attrs_are_visible(struct kobject *kobj, - struct attribute *a, int n) -{ - struct device *dev = kobj_to_dev(kobj); - struct pci_dev *pdev = to_pci_dev(dev); - - if (!pdev->aer_info) - return 0; - - if ((a == &dev_attr_aer_rootport_total_err_cor.attr || - a == &dev_attr_aer_rootport_total_err_fatal.attr || - a == &dev_attr_aer_rootport_total_err_nonfatal.attr) && - ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) && - (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC))) - return 0; - - return a->mode; -} - -const struct attribute_group aer_stats_attr_group = { - .attrs = aer_stats_attrs, - .is_visible = aer_stats_attrs_are_visible, -}; - #define aer_ratelimit_attr(name, ratelimit) \ static ssize_t \ name##_show(struct device *dev, struct device_attribute *attr, \ @@ -662,6 +628,14 @@ aer_ratelimit_attr(ratelimit_cor_log, cor_log_ratelimit); aer_ratelimit_attr(ratelimit_uncor_log, uncor_log_ratelimit); static struct attribute *aer_attrs[] __ro_after_init = { + /* Stats */ + &dev_attr_err_cor.attr, + &dev_attr_err_fatal.attr, + &dev_attr_err_nonfatal.attr, + &dev_attr_rootport_total_err_cor.attr, + &dev_attr_rootport_total_err_fatal.attr, + &dev_attr_rootport_total_err_nonfatal.attr, + /* Ratelimits */ &dev_attr_ratelimit_cor_irq.attr, &dev_attr_ratelimit_uncor_irq.attr, &dev_attr_ratelimit_cor_log.attr, @@ -670,13 +644,21 @@ static struct attribute *aer_attrs[] __ro_after_init = { }; static umode_t aer_attrs_are_visible(struct kobject *kobj, - struct attribute *a, int n) + struct attribute *a, int n) { struct device *dev = kobj_to_dev(kobj); struct pci_dev *pdev = to_pci_dev(dev); if (!pdev->aer_info) return 0; + + if ((a == &dev_attr_rootport_total_err_cor.attr || + a == &dev_attr_rootport_total_err_fatal.attr || + a == &dev_attr_rootport_total_err_nonfatal.attr) && + ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) && + (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC))) + return 0; + return a->mode; }
Prepare for the addition of new AER sysfs attributes (e.g. ratelimits) by moving them into their own directory. Update naming to reflect broader definition and for consistency. /sys/bus/pci/devices/<dev>/aer_dev_correctable /sys/bus/pci/devices/<dev>/aer_dev_fatal /sys/bus/pci/devices/<dev>/aer_dev_nonfatal /sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor /sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal /sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal -> /sys/bus/pci/devices/<dev>/aer/err_cor /sys/bus/pci/devices/<dev>/aer/err_fatal /sys/bus/pci/devices/<dev>/aer/err_nonfatal /sys/bus/pci/devices/<dev>/aer/rootport_total_err_cor /sys/bus/pci/devices/<dev>/aer/rootport_total_err_fatal /sys/bus/pci/devices/<dev>/aer/rootport_total_err_nonfatal Tested using aer-inject[1] tool. Sent 1 AER error. Observed AER stats correctedly logged (cat /sys/bus/pci/devices/<dev>/aer/dev_err_cor). [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git Signed-off-by: Jon Pan-Doh <pandoh@google.com> --- .../ABI/testing/sysfs-bus-pci-devices-aer | 18 +++--- drivers/pci/pci-sysfs.c | 1 - drivers/pci/pci.h | 1 - drivers/pci/pcie/aer.c | 64 +++++++------------ 4 files changed, 32 insertions(+), 52 deletions(-)