diff mbox

[v1,2/2] PCI/AER: Stop printing vendor/device ID

Message ID 152770286159.80701.8079550179741454699.stgit@bhelgaas-glaptop.roam.corp.google.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Bjorn Helgaas May 30, 2018, 5:54 p.m. UTC
From: Bjorn Helgaas <bhelgaas@google.com>

The Vendor and Device ID of the root port that raised an AER interrupt is
irrelevant and already available via normal enumeration dmesg logging or
lspci.

Remove the Vendor and Device ID from AER logging.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/aer/aerdrv_errprint.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Rajat Jain May 30, 2018, 6:18 p.m. UTC | #1
On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote:

> From: Bjorn Helgaas <bhelgaas@google.com>

> The Vendor and Device ID of the root port that raised an AER interrupt is
> irrelevant and already available via normal enumeration dmesg logging or
> lspci.

Er, what is getting printed is not the vendor/device id of the root port
but that of the AER source device (the one that root port got an ERR_*
message from). In case of fatal AERs, the end point device may become
inaccessible so lspci will not be available, and enumeration logs (from
boot) may have gotten rolled over. So I think it is still better to print
this information here.

Just my opinion :-)

Thanks,

Rajat


> Remove the Vendor and Device ID from AER logging.

> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>   drivers/pci/pcie/aer/aerdrv_errprint.c |    5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)

> diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c
b/drivers/pci/pcie/aer/aerdrv_errprint.c
> index d7fde8368d81..16116844531c 100644
> --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> @@ -175,9 +175,8 @@ void aer_print_error(struct pci_dev *dev, struct
aer_err_info *info)
>                  aer_error_severity_string[info->severity],
>                  aer_error_layer[layer], aer_agent_string[agent]);

> -       pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
> -               dev->vendor, dev->device,
> -               info->status, info->mask);
> +       pci_err(dev, "  error status/mask=%08x/%08x\n", info->status,
> +               info->mask);

>          __aer_print_error(dev, info);
Bjorn Helgaas May 31, 2018, 12:28 a.m. UTC | #2
On Wed, May 30, 2018 at 11:18:35AM -0700, Rajat Jain wrote:
> On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> > From: Bjorn Helgaas <bhelgaas@google.com>
> 
> > The Vendor and Device ID of the root port that raised an AER interrupt is
> > irrelevant and already available via normal enumeration dmesg logging or
> > lspci.
> 
> Er, what is getting printed is not the vendor/device id of the root port
> but that of the AER source device (the one that root port got an ERR_*
> message from). In case of fatal AERs, the end point device may become
> inaccessible so lspci will not be available, and enumeration logs (from
> boot) may have gotten rolled over. So I think it is still better to print
> this information here.

Thanks for looking this over!

You're right, "dev" here is not necessarily the Root Port, so this
changelog is bogus.  "dev" came from e_info->dev[] from
aer_process_err_devices().

I think to be more precise, aer_irq() reads the Root Port's
PCI_ERR_ROOT_ERR_SRC register, which gives us the Requester ID from
the ERR_* message.  Then find_source_device() walks the tree starting
with the Root Port, looking for:

  - a device that matches the Requester ID, or
  - a device that doesn't match the Requester ID (e.g., because a VMD
    port clears the source ID) but has AER enabled and has logged an
    error of the same type (ERR_COR vs ERR_FATAL/NONFATAL) we're
    currently decoding

So there might be multiple "dev" pointers in e_info->dev[] because
several devices could have logged errors.

I'm not convinced the vendor/device ID is that useful because there
might be several devices with the same ID, so it doesn't really tell
you which one.  The Requester ID (bus/device/function) is the
important thing.

The current code is not ideal because the find_source_device() path
depends on the pci_dev still being present and even accessible (so we
can read DEVCTL, ERR_COR_STATUS, etc), which might not be the case.

If find_source_device() fails, i.e., it can't find a matching pci_dev
and prints the "can't find device of ID%04x" message, we're in real
trouble because we don't call aer_process_err_devices(), which means
we don't clear PCI_ERR_COR_STATUS.

Anyway, I'll abandon this change for now since it's not a clear
improvement.

> > Remove the Vendor and Device ID from AER logging.
> 
> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > ---
> >   drivers/pci/pcie/aer/aerdrv_errprint.c |    5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c
> b/drivers/pci/pcie/aer/aerdrv_errprint.c
> > index d7fde8368d81..16116844531c 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> > @@ -175,9 +175,8 @@ void aer_print_error(struct pci_dev *dev, struct
> aer_err_info *info)
> >                  aer_error_severity_string[info->severity],
> >                  aer_error_layer[layer], aer_agent_string[agent]);
> 
> > -       pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
> > -               dev->vendor, dev->device,
> > -               info->status, info->mask);
> > +       pci_err(dev, "  error status/mask=%08x/%08x\n", info->status,
> > +               info->mask);
> 
> >          __aer_print_error(dev, info);
diff mbox

Patch

diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index d7fde8368d81..16116844531c 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -175,9 +175,8 @@  void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 		aer_error_severity_string[info->severity],
 		aer_error_layer[layer], aer_agent_string[agent]);
 
-	pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
-		dev->vendor, dev->device,
-		info->status, info->mask);
+	pci_err(dev, "  error status/mask=%08x/%08x\n", info->status,
+		info->mask);
 
 	__aer_print_error(dev, info);