Message ID | 20250115074301.3514927-2-pandoh@google.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Rate limit AER logs/IRQs | expand |
On 15/01/2025 08:42, Jon Pan-Doh wrote: > Info logged is duplicated when either the source device is processed. If > no source device is found, than an error is logged. Nit: s/than/then/ > > Code flow: > aer_isr_one_error() > -> aer_print_port_info() > -> find_source_device() > -> return/pci_info() if no device found else continue > -> aer_process_err_devices() > -> aer_print_error() > > aer_print_port_info(): > [ 21.596150] pcieport 0000:00:04.0: Correctable error message received > from 0000:01:00.0 I agree that the bus, device and function info is repeated later, but isn't this line also about the fact we deal with one or multiple errors in a message? The question is how valuable this information, in itself, is. All the best, Karolina > > aer_print_error(): > [ 21.596163] e1000e 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID) > [ 21.600575] e1000e 0000:01:00.0: device [8086:10d3] error status/mask=00000040/0000e000 > [ 21.604707] e1000e 0000:01:00.0: [ 6] BadTLP > > Tested using aer-inject[1] tool. No more root port log on dmesg. > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git > > Signed-off-by: Jon Pan-Doh <pandoh@google.com>
On Thu, Jan 16, 2025 at 6:27 AM Karolina Stolarek <karolina.stolarek@oracle.com> wrote: > On 15/01/2025 08:42, Jon Pan-Doh wrote: > > aer_print_port_info(): > > [ 21.596150] pcieport 0000:00:04.0: Correctable error message received > > from 0000:01:00.0 > > I agree that the bus, device and function info is repeated later, but > isn't this line also about the fact we deal with one or multiple errors > in a message? The question is how valuable this information, in itself, is. I think whether or not there are multiple errors can be gleaned from __aer_print_error(). It prints out all fields of the status register (as well as denotes the first error). Thanks, Jon
On 18/01/2025 02:57, Jon Pan-Doh wrote: > On Thu, Jan 16, 2025 at 6:27 AM Karolina Stolarek > <karolina.stolarek@oracle.com> wrote: >> >> I agree that the bus, device and function info is repeated later, but >> isn't this line also about the fact we deal with one or multiple errors >> in a message? The question is how valuable this information, in itself, is. > > I think whether or not there are multiple errors can be gleaned from > __aer_print_error(). It prints out all fields of the status register > (as well as denotes the first error). That's true, so I think it's reasonable to remove it. For the code changes: Reviewed-by: Karolina Stolarek <karolina.stolarek@oracle.com> As for the commit message, I'd drop the before-after dmesg messages and explain that the information presented by aer_print_port_info() can be easily inferred from the actual error messages. I had to read the code to remind myself what information is duplicated. All the best, Karolina > > Thanks, > Jon
On Tue, 14 Jan 2025, Jon Pan-Doh wrote: > Info logged is duplicated when either the source device is processed. If Is the first sentence lacking something, as "either" feels to require another option/alternative that is not present in the sentence? (I'm non-native myself) > no source device is found, than an error is logged. > > Code flow: > aer_isr_one_error() > -> aer_print_port_info() > -> find_source_device() > -> return/pci_info() if no device found else continue > -> aer_process_err_devices() > -> aer_print_error() > > aer_print_port_info(): > [ 21.596150] pcieport 0000:00:04.0: Correctable error message received > from 0000:01:00.0 > > aer_print_error(): > [ 21.596163] e1000e 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID) > [ 21.600575] e1000e 0000:01:00.0: device [8086:10d3] error status/mask=00000040/0000e000 > [ 21.604707] e1000e 0000:01:00.0: [ 6] BadTLP > > Tested using aer-inject[1] tool. No more root port log on dmesg. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git > > Signed-off-by: Jon Pan-Doh <pandoh@google.com> > --- > drivers/pci/pcie/aer.c | 15 --------------- > 1 file changed, 15 deletions(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 34ce9f834d0c..ba40800b5494 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -735,18 +735,6 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) > info->severity, info->tlp_header_valid, &info->tlp); > } > > -static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) > -{ > - u8 bus = info->id >> 8; > - u8 devfn = info->id & 0xff; > - > - pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n", > - info->multi_error_valid ? "Multiple " : "", > - aer_error_severity_string[info->severity], > - pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), > - PCI_FUNC(devfn)); > -} > - > #ifdef CONFIG_ACPI_APEI_PCIEAER > int cper_severity_to_aer(int cper_severity) > { > @@ -1295,7 +1283,6 @@ static void aer_isr_one_error(struct aer_rpc *rpc, > e_info.multi_error_valid = 1; > else > e_info.multi_error_valid = 0; > - aer_print_port_info(pdev, &e_info); > > if (find_source_device(pdev, &e_info)) > aer_process_err_devices(&e_info); > @@ -1314,8 +1301,6 @@ static void aer_isr_one_error(struct aer_rpc *rpc, > else > e_info.multi_error_valid = 0; > > - aer_print_port_info(pdev, &e_info); > - > if (find_source_device(pdev, &e_info)) > aer_process_err_devices(&e_info); > } >
On 1/14/25 11:42 PM, Jon Pan-Doh wrote: > Info logged is duplicated when either the source device is processed. If > no source device is found, than an error is logged. Any issue with logging when no source device is found in device hierarchy? > > Code flow: > aer_isr_one_error() > -> aer_print_port_info() > -> find_source_device() > -> return/pci_info() if no device found else continue > -> aer_process_err_devices() > -> aer_print_error() > > aer_print_port_info(): > [ 21.596150] pcieport 0000:00:04.0: Correctable error message received > from 0000:01:00.0 > > aer_print_error(): > [ 21.596163] e1000e 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID) > [ 21.600575] e1000e 0000:01:00.0: device [8086:10d3] error status/mask=00000040/0000e000 > [ 21.604707] e1000e 0000:01:00.0: [ 6] BadTLP Please remove time stamp from dmesg log. > > Tested using aer-inject[1] tool. No more root port log on dmesg. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git > > Signed-off-by: Jon Pan-Doh <pandoh@google.com> > --- > drivers/pci/pcie/aer.c | 15 --------------- > 1 file changed, 15 deletions(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 34ce9f834d0c..ba40800b5494 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -735,18 +735,6 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) > info->severity, info->tlp_header_valid, &info->tlp); > } > > -static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) > -{ > - u8 bus = info->id >> 8; > - u8 devfn = info->id & 0xff; > - > - pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n", > - info->multi_error_valid ? "Multiple " : "", > - aer_error_severity_string[info->severity], > - pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), > - PCI_FUNC(devfn)); > -} > - > #ifdef CONFIG_ACPI_APEI_PCIEAER > int cper_severity_to_aer(int cper_severity) > { > @@ -1295,7 +1283,6 @@ static void aer_isr_one_error(struct aer_rpc *rpc, > e_info.multi_error_valid = 1; > else > e_info.multi_error_valid = 0; > - aer_print_port_info(pdev, &e_info); > > if (find_source_device(pdev, &e_info)) > aer_process_err_devices(&e_info); > @@ -1314,8 +1301,6 @@ static void aer_isr_one_error(struct aer_rpc *rpc, > else > e_info.multi_error_valid = 0; > > - aer_print_port_info(pdev, &e_info); > - > if (find_source_device(pdev, &e_info)) > aer_process_err_devices(&e_info); > }
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 34ce9f834d0c..ba40800b5494 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -735,18 +735,6 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) info->severity, info->tlp_header_valid, &info->tlp); } -static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) -{ - u8 bus = info->id >> 8; - u8 devfn = info->id & 0xff; - - pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n", - info->multi_error_valid ? "Multiple " : "", - aer_error_severity_string[info->severity], - pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn), - PCI_FUNC(devfn)); -} - #ifdef CONFIG_ACPI_APEI_PCIEAER int cper_severity_to_aer(int cper_severity) { @@ -1295,7 +1283,6 @@ static void aer_isr_one_error(struct aer_rpc *rpc, e_info.multi_error_valid = 1; else e_info.multi_error_valid = 0; - aer_print_port_info(pdev, &e_info); if (find_source_device(pdev, &e_info)) aer_process_err_devices(&e_info); @@ -1314,8 +1301,6 @@ static void aer_isr_one_error(struct aer_rpc *rpc, else e_info.multi_error_valid = 0; - aer_print_port_info(pdev, &e_info); - if (find_source_device(pdev, &e_info)) aer_process_err_devices(&e_info); }
Info logged is duplicated when either the source device is processed. If no source device is found, than an error is logged. Code flow: aer_isr_one_error() -> aer_print_port_info() -> find_source_device() -> return/pci_info() if no device found else continue -> aer_process_err_devices() -> aer_print_error() aer_print_port_info(): [ 21.596150] pcieport 0000:00:04.0: Correctable error message received from 0000:01:00.0 aer_print_error(): [ 21.596163] e1000e 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID) [ 21.600575] e1000e 0000:01:00.0: device [8086:10d3] error status/mask=00000040/0000e000 [ 21.604707] e1000e 0000:01:00.0: [ 6] BadTLP Tested using aer-inject[1] tool. No more root port log on dmesg. [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git Signed-off-by: Jon Pan-Doh <pandoh@google.com> --- drivers/pci/pcie/aer.c | 15 --------------- 1 file changed, 15 deletions(-)