Message ID | 20250115074301.3514927-2-pandoh@google.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Rate limit AER logs/IRQs | expand |
On 15/01/2025 08:42, Jon Pan-Doh wrote: > Info logged is duplicated when either the source device is processed. If > no source device is found, than an error is logged. Nit: s/than/then/ > > Code flow: > aer_isr_one_error() > -> aer_print_port_info() > -> find_source_device() > -> return/pci_info() if no device found else continue > -> aer_process_err_devices() > -> aer_print_error() > > aer_print_port_info(): > [ 21.596150] pcieport 0000:00:04.0: Correctable error message received > from 0000:01:00.0 I agree that the bus, device and function info is repeated later, but isn't this line also about the fact we deal with one or multiple errors in a message? The question is how valuable this information, in itself, is. All the best, Karolina > > aer_print_error(): > [ 21.596163] e1000e 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID) > [ 21.600575] e1000e 0000:01:00.0: device [8086:10d3] error status/mask=00000040/0000e000 > [ 21.604707] e1000e 0000:01:00.0: [ 6] BadTLP > > Tested using aer-inject[1] tool. No more root port log on dmesg. > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git > > Signed-off-by: Jon Pan-Doh <pandoh@google.com>
On Thu, Jan 16, 2025 at 6:27 AM Karolina Stolarek <karolina.stolarek@oracle.com> wrote: > On 15/01/2025 08:42, Jon Pan-Doh wrote: > > aer_print_port_info(): > > [ 21.596150] pcieport 0000:00:04.0: Correctable error message received > > from 0000:01:00.0 > > I agree that the bus, device and function info is repeated later, but > isn't this line also about the fact we deal with one or multiple errors > in a message? The question is how valuable this information, in itself, is. I think whether or not there are multiple errors can be gleaned from __aer_print_error(). It prints out all fields of the status register (as well as denotes the first error). Thanks, Jon
On 18/01/2025 02:57, Jon Pan-Doh wrote: > On Thu, Jan 16, 2025 at 6:27 AM Karolina Stolarek > <karolina.stolarek@oracle.com> wrote: >> >> I agree that the bus, device and function info is repeated later, but >> isn't this line also about the fact we deal with one or multiple errors >> in a message? The question is how valuable this information, in itself, is. > > I think whether or not there are multiple errors can be gleaned from > __aer_print_error(). It prints out all fields of the status register > (as well as denotes the first error). That's true, so I think it's reasonable to remove it. For the code changes: Reviewed-by: Karolina Stolarek <karolina.stolarek@oracle.com> As for the commit message, I'd drop the before-after dmesg messages and explain that the information presented by aer_print_port_info() can be easily inferred from the actual error messages. I had to read the code to remind myself what information is duplicated. All the best, Karolina > > Thanks, > Jon
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 34ce9f834d0c..ba40800b5494 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -735,18 +735,6 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) info->severity, info->tlp_header_valid, &info->tlp); } -static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) -{ - u8 bus = info->id >> 8; - u8 devfn = info->id & 0xff; - - pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n", - info->multi_error_valid ? "Multiple " : "", - aer_error_severity_string[info->severity], - pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), - PCI_FUNC(devfn)); -} - #ifdef CONFIG_ACPI_APEI_PCIEAER int cper_severity_to_aer(int cper_severity) { @@ -1295,7 +1283,6 @@ static void aer_isr_one_error(struct aer_rpc *rpc, e_info.multi_error_valid = 1; else e_info.multi_error_valid = 0; - aer_print_port_info(pdev, &e_info); if (find_source_device(pdev, &e_info)) aer_process_err_devices(&e_info); @@ -1314,8 +1301,6 @@ static void aer_isr_one_error(struct aer_rpc *rpc, else e_info.multi_error_valid = 0; - aer_print_port_info(pdev, &e_info); - if (find_source_device(pdev, &e_info)) aer_process_err_devices(&e_info); }
Info logged is duplicated when either the source device is processed. If no source device is found, than an error is logged. Code flow: aer_isr_one_error() -> aer_print_port_info() -> find_source_device() -> return/pci_info() if no device found else continue -> aer_process_err_devices() -> aer_print_error() aer_print_port_info(): [ 21.596150] pcieport 0000:00:04.0: Correctable error message received from 0000:01:00.0 aer_print_error(): [ 21.596163] e1000e 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID) [ 21.600575] e1000e 0000:01:00.0: device [8086:10d3] error status/mask=00000040/0000e000 [ 21.604707] e1000e 0000:01:00.0: [ 6] BadTLP Tested using aer-inject[1] tool. No more root port log on dmesg. [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git Signed-off-by: Jon Pan-Doh <pandoh@google.com> --- drivers/pci/pcie/aer.c | 15 --------------- 1 file changed, 15 deletions(-)