diff mbox series

[1/8] PCI/AER: Remove aer_print_port_info

Message ID 20250115074301.3514927-2-pandoh@google.com (mailing list archive)
State New
Delegated to: Bjorn Helgaas
Headers show
Series Rate limit AER logs/IRQs | expand

Commit Message

Jon Pan-Doh Jan. 15, 2025, 7:42 a.m. UTC
Info logged is duplicated when either the source device is processed. If
no source device is found, than an error is logged.

Code flow:
aer_isr_one_error()
-> aer_print_port_info()
-> find_source_device()
   -> return/pci_info() if no device found else continue
-> aer_process_err_devices()
   -> aer_print_error()

aer_print_port_info():
[   21.596150] pcieport 0000:00:04.0: Correctable error message received
from 0000:01:00.0

aer_print_error():
[   21.596163] e1000e 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
[   21.600575] e1000e 0000:01:00.0:   device [8086:10d3] error status/mask=00000040/0000e000
[   21.604707] e1000e 0000:01:00.0:    [ 6] BadTLP

Tested using aer-inject[1] tool. No more root port log on dmesg.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git

Signed-off-by: Jon Pan-Doh <pandoh@google.com>
---
 drivers/pci/pcie/aer.c | 15 ---------------
 1 file changed, 15 deletions(-)

Comments

Karolina Stolarek Jan. 16, 2025, 2:27 p.m. UTC | #1
On 15/01/2025 08:42, Jon Pan-Doh wrote:
> Info logged is duplicated when either the source device is processed. If
> no source device is found, than an error is logged.

Nit: s/than/then/

> 
> Code flow:
> aer_isr_one_error()
> -> aer_print_port_info()
> -> find_source_device()
>     -> return/pci_info() if no device found else continue
> -> aer_process_err_devices()
>     -> aer_print_error()
> 
> aer_print_port_info():
> [   21.596150] pcieport 0000:00:04.0: Correctable error message received
> from 0000:01:00.0

I agree that the bus, device and function info is repeated later, but 
isn't this line also about the fact we deal with one or multiple errors 
in a message? The question is how valuable this information, in itself, is.

All the best,
Karolina

> 
> aer_print_error():
> [   21.596163] e1000e 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
> [   21.600575] e1000e 0000:01:00.0:   device [8086:10d3] error status/mask=00000040/0000e000
> [   21.604707] e1000e 0000:01:00.0:    [ 6] BadTLP
> 
> Tested using aer-inject[1] tool. No more root port log on dmesg.
 >
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git
> 
> Signed-off-by: Jon Pan-Doh <pandoh@google.com>
Jon Pan-Doh Jan. 18, 2025, 1:57 a.m. UTC | #2
On Thu, Jan 16, 2025 at 6:27 AM Karolina Stolarek
<karolina.stolarek@oracle.com> wrote:
> On 15/01/2025 08:42, Jon Pan-Doh wrote:
> > aer_print_port_info():
> > [   21.596150] pcieport 0000:00:04.0: Correctable error message received
> > from 0000:01:00.0
>
> I agree that the bus, device and function info is repeated later, but
> isn't this line also about the fact we deal with one or multiple errors
> in a message? The question is how valuable this information, in itself, is.

I think whether or not there are multiple errors can be gleaned from
__aer_print_error(). It prints out all fields of the status register
(as well as denotes the first error).

Thanks,
Jon
Karolina Stolarek Jan. 20, 2025, 9:25 a.m. UTC | #3
On 18/01/2025 02:57, Jon Pan-Doh wrote:
> On Thu, Jan 16, 2025 at 6:27 AM Karolina Stolarek
> <karolina.stolarek@oracle.com> wrote:
>>
>> I agree that the bus, device and function info is repeated later, but
>> isn't this line also about the fact we deal with one or multiple errors
>> in a message? The question is how valuable this information, in itself, is.
> 
> I think whether or not there are multiple errors can be gleaned from
> __aer_print_error(). It prints out all fields of the status register
> (as well as denotes the first error).

That's true, so I think it's reasonable to remove it.

For the code changes:
Reviewed-by: Karolina Stolarek <karolina.stolarek@oracle.com>

As for the commit message, I'd drop the before-after dmesg messages and 
explain that the information presented by aer_print_port_info() can be 
easily inferred from the actual error messages. I had to read the code 
to remind myself what information is duplicated.

All the best,
Karolina

> 
> Thanks,
> Jon
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 34ce9f834d0c..ba40800b5494 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -735,18 +735,6 @@  void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 			info->severity, info->tlp_header_valid, &info->tlp);
 }
 
-static void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
-{
-	u8 bus = info->id >> 8;
-	u8 devfn = info->id & 0xff;
-
-	pci_info(dev, "%s%s error message received from %04x:%02x:%02x.%d\n",
-		 info->multi_error_valid ? "Multiple " : "",
-		 aer_error_severity_string[info->severity],
-		 pci_domain_nr(dev->bus), bus, PCI_SLOT(devfn),
-		 PCI_FUNC(devfn));
-}
-
 #ifdef CONFIG_ACPI_APEI_PCIEAER
 int cper_severity_to_aer(int cper_severity)
 {
@@ -1295,7 +1283,6 @@  static void aer_isr_one_error(struct aer_rpc *rpc,
 			e_info.multi_error_valid = 1;
 		else
 			e_info.multi_error_valid = 0;
-		aer_print_port_info(pdev, &e_info);
 
 		if (find_source_device(pdev, &e_info))
 			aer_process_err_devices(&e_info);
@@ -1314,8 +1301,6 @@  static void aer_isr_one_error(struct aer_rpc *rpc,
 		else
 			e_info.multi_error_valid = 0;
 
-		aer_print_port_info(pdev, &e_info);
-
 		if (find_source_device(pdev, &e_info))
 			aer_process_err_devices(&e_info);
 	}