diff mbox

[v2,1/3] PCI/AER: Fix incorrect return from aer_hest_parse()

Message ID 1369924769-17183-2-git-send-email-betty.dall@hp.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Betty Dall May 30, 2013, 2:39 p.m. UTC
The function aer_hest_parse() is called to determine if the given
PCI device is firmware first or not. The code loops through each
section of the HEST table to look for a match. The bug is that
the function always returns whether the last HEST section is firmware
first. The fix stops the iteration once the info.firmware_first
variable is set.  This is similar to how the function aer_hest_parse_aff()
stops the iteration.

Signed-off-by: Betty Dall <betty.dall@hp.com>
---

 drivers/pci/pcie/aer/aerdrv_acpi.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bjorn Helgaas June 2, 2013, 12:38 a.m. UTC | #1
[+cc Bob for ACPI HEST spec questions]

On Thu, May 30, 2013 at 8:39 AM, Betty Dall <betty.dall@hp.com> wrote:
> The function aer_hest_parse() is called to determine if the given
> PCI device is firmware first or not. The code loops through each
> section of the HEST table to look for a match. The bug is that
> the function always returns whether the last HEST section is firmware
> first. The fix stops the iteration once the info.firmware_first
> variable is set.  This is similar to how the function aer_hest_parse_aff()
> stops the iteration.
>
> Signed-off-by: Betty Dall <betty.dall@hp.com>
> ---
>
>  drivers/pci/pcie/aer/aerdrv_acpi.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
>
> diff --git a/drivers/pci/pcie/aer/aerdrv_acpi.c b/drivers/pci/pcie/aer/aerdrv_acpi.c
> index 5194a7d..39b8671 100644
> --- a/drivers/pci/pcie/aer/aerdrv_acpi.c
> +++ b/drivers/pci/pcie/aer/aerdrv_acpi.c
> @@ -42,6 +42,9 @@ static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data)
>         u8 bridge = 0;
>         int ff = 0;
>
> +       if (info->firmware_first)
> +               return 0;
> +
>         switch (hest_hdr->type) {
>         case ACPI_HEST_TYPE_AER_ROOT_PORT:
>                 pcie_type = PCI_EXP_TYPE_ROOT_PORT;

Not related directly to your patch, Betty, but I can't figure out why
the ACPI spec defines the HEST structures for PCIe as it does.  I'm
looking at ACPI 5.0, sec 18.3.2.3 - 18.3.2.5.

1) The PCIe Root Port, PCIe Device, and PCIe/PCI-X Bridge structures
all include Bus, Device, and Function fields.  But there's no Segment.
 The current Linux code (hest_match_pci()) assumes HEST records can
only apply to PCI domain 0.  Is Linux missing something, or is the
HEST really this limited?

2) Doesn't the fact that the HEST structures include a bus number mean
that the OS cannot reassign bus numbers?  I guess maybe we could still
reassign them if we kept track of the mapping back to the original
values.

3) It's interesting that the only choices seem to be "global" or "list
every device."  There's no "apply to this device and the subtree under
it," so I guess if you want HEST to apply to hot-added devices,
"global" is the only thing that makes sense?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Betty Dall June 3, 2013, 9:18 p.m. UTC | #2
On Sat, 2013-06-01 at 18:38 -0600, Bjorn Helgaas wrote:
> [+cc Bob for ACPI HEST spec questions]
> 
> On Thu, May 30, 2013 at 8:39 AM, Betty Dall <betty.dall@hp.com> wrote:
> > The function aer_hest_parse() is called to determine if the given
> > PCI device is firmware first or not. The code loops through each
> > section of the HEST table to look for a match. The bug is that
> > the function always returns whether the last HEST section is firmware
> > first. The fix stops the iteration once the info.firmware_first
> > variable is set.  This is similar to how the function aer_hest_parse_aff()
> > stops the iteration.
> >
> > Signed-off-by: Betty Dall <betty.dall@hp.com>
> > ---
> >
> >  drivers/pci/pcie/aer/aerdrv_acpi.c |    3 +++
> >  1 files changed, 3 insertions(+), 0 deletions(-)
> >
> >
> > diff --git a/drivers/pci/pcie/aer/aerdrv_acpi.c b/drivers/pci/pcie/aer/aerdrv_acpi.c
> > index 5194a7d..39b8671 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_acpi.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_acpi.c
> > @@ -42,6 +42,9 @@ static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data)
> >         u8 bridge = 0;
> >         int ff = 0;
> >
> > +       if (info->firmware_first)
> > +               return 0;
> > +
> >         switch (hest_hdr->type) {
> >         case ACPI_HEST_TYPE_AER_ROOT_PORT:
> >                 pcie_type = PCI_EXP_TYPE_ROOT_PORT;
> 
> Not related directly to your patch, Betty, but I can't figure out why
> the ACPI spec defines the HEST structures for PCIe as it does.  I'm
> looking at ACPI 5.0, sec 18.3.2.3 - 18.3.2.5. 

> 1) The PCIe Root Port, PCIe Device, and PCIe/PCI-X Bridge structures
> all include Bus, Device, and Function fields.  But there's no Segment.
>  The current Linux code (hest_match_pci()) assumes HEST records can
> only apply to PCI domain 0.  Is Linux missing something, or is the
> HEST really this limited?
You are right that the HEST table does not have the Segment for the PCIe
sources. The Linux code uses the Generic Source type that points to a
UEFI CPER record. Those do have the Segment. The code in
acpi/apei/ghes.c that parses the HEST and invokes the
aer_recover_queue() is using the segment from the CPER record. 

The hest_match_pci() code is only looking at the PCIe error source types
because that is where the flags field is defined to indicate "firmware
first". Flags is not defined in the Generic error source.  Firmware
first platforms today must be using GLOBAL error sources because the
code is written in such a way that any specific PCIe match will cause
the entire platform to be firmware first. Linux only supports either AER
or firmware first right now, and it would be an enhancement to do both. 

> 2) Doesn't the fact that the HEST structures include a bus number mean
> that the OS cannot reassign bus numbers?  I guess maybe we could still
> reassign them if we kept track of the mapping back to the original
> values.
Yes, I think there needs to be a mapping between the HEST/CPER bus
number and the OS assigned bus numbers, if the OS is reassigning bus
numbers.  

> 3) It's interesting that the only choices seem to be "global" or "list
> every device."  There's no "apply to this device and the subtree under
> it," so I guess if you want HEST to apply to hot-added devices,
> "global" is the only thing that makes sense?
"Global" is the one one that makes sense to me too. The size of the HEST
table is fixed at boot and firmware strives to keep it as small as
possible because firmware memory is a scarce resource. 

Betty

> Bjorn


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chen Gong June 4, 2013, 7:42 a.m. UTC | #3
On Thu, May 30, 2013 at 08:39:27AM -0600, Betty Dall wrote:
> Date:	Thu, 30 May 2013 08:39:27 -0600
> From: Betty Dall <betty.dall@hp.com>
> To: rjw@sisk.pl, bhelgaas@google.com
> Cc: ying.huang@intel.com, linux-acpi@vger.kernel.org,
>  linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, Betty Dall
>  <betty.dall@hp.com>
> Subject: [PATCH v2 1/3] PCI/AER: Fix incorrect return from aer_hest_parse()
> X-Mailer: git-send-email 1.7.7.6
> 
> The function aer_hest_parse() is called to determine if the given
> PCI device is firmware first or not. The code loops through each
> section of the HEST table to look for a match. The bug is that
> the function always returns whether the last HEST section is firmware
> first. The fix stops the iteration once the info.firmware_first
> variable is set.  This is similar to how the function aer_hest_parse_aff()
> stops the iteration.
> 
> Signed-off-by: Betty Dall <betty.dall@hp.com>

The patch is good. But I have further concern based on your patch.
1) aer_hest_parse never checks the 2nd input parameter (void *data),
which means once it is NULL, it will crash the kernel.

2) both aer_hest_parse and aer_hest_parse_aff utilize some flag
as shortcut, if so, why not adding similar logic in apei_hest_parse
to stop meaningless loops once FFM is confirmed as enabled.

> ---
> 
>  drivers/pci/pcie/aer/aerdrv_acpi.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_acpi.c b/drivers/pci/pcie/aer/aerdrv_acpi.c
> index 5194a7d..39b8671 100644
> --- a/drivers/pci/pcie/aer/aerdrv_acpi.c
> +++ b/drivers/pci/pcie/aer/aerdrv_acpi.c
> @@ -42,6 +42,9 @@ static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data)
>  	u8 bridge = 0;
>  	int ff = 0;
>  
> +	if (info->firmware_first)
> +		return 0;
> +
>  	switch (hest_hdr->type) {
>  	case ACPI_HEST_TYPE_AER_ROOT_PORT:
>  		pcie_type = PCI_EXP_TYPE_ROOT_PORT;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas June 4, 2013, 5:39 p.m. UTC | #4
On Mon, Jun 3, 2013 at 3:18 PM, Betty Dall <betty.dall@hp.com> wrote:
> On Sat, 2013-06-01 at 18:38 -0600, Bjorn Helgaas wrote:
>> [+cc Bob for ACPI HEST spec questions]
>>
>> On Thu, May 30, 2013 at 8:39 AM, Betty Dall <betty.dall@hp.com> wrote:
>> > The function aer_hest_parse() is called to determine if the given
>> > PCI device is firmware first or not. The code loops through each
>> > section of the HEST table to look for a match. The bug is that
>> > the function always returns whether the last HEST section is firmware
>> > first. The fix stops the iteration once the info.firmware_first
>> > variable is set.  This is similar to how the function aer_hest_parse_aff()
>> > stops the iteration.
>> >
>> > Signed-off-by: Betty Dall <betty.dall@hp.com>
>> > ---
>> >
>> >  drivers/pci/pcie/aer/aerdrv_acpi.c |    3 +++
>> >  1 files changed, 3 insertions(+), 0 deletions(-)
>> >
>> >
>> > diff --git a/drivers/pci/pcie/aer/aerdrv_acpi.c b/drivers/pci/pcie/aer/aerdrv_acpi.c
>> > index 5194a7d..39b8671 100644
>> > --- a/drivers/pci/pcie/aer/aerdrv_acpi.c
>> > +++ b/drivers/pci/pcie/aer/aerdrv_acpi.c
>> > @@ -42,6 +42,9 @@ static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data)
>> >         u8 bridge = 0;
>> >         int ff = 0;
>> >
>> > +       if (info->firmware_first)
>> > +               return 0;
>> > +
>> >         switch (hest_hdr->type) {
>> >         case ACPI_HEST_TYPE_AER_ROOT_PORT:
>> >                 pcie_type = PCI_EXP_TYPE_ROOT_PORT;
>>
>> Not related directly to your patch, Betty, but I can't figure out why
>> the ACPI spec defines the HEST structures for PCIe as it does.  I'm
>> looking at ACPI 5.0, sec 18.3.2.3 - 18.3.2.5.
>
>> 1) The PCIe Root Port, PCIe Device, and PCIe/PCI-X Bridge structures
>> all include Bus, Device, and Function fields.  But there's no Segment.
>>  The current Linux code (hest_match_pci()) assumes HEST records can
>> only apply to PCI domain 0.  Is Linux missing something, or is the
>> HEST really this limited?
> You are right that the HEST table does not have the Segment for the PCIe
> sources. The Linux code uses the Generic Source type that points to a
> UEFI CPER record. Those do have the Segment. The code in
> acpi/apei/ghes.c that parses the HEST and invokes the
> aer_recover_queue() is using the segment from the CPER record.

My question has nothing to do with CPER.  Drivers can enable error
reporting for their devices with pci_enable_pcie_error_reporting().
If the device is marked "firmware-first" in the HEST, this call fails
without enabling reporting.  The HEST can only mark devices in domain
0 as "firmware-first."  Therefore pci_enable_pcie_error_reporting()
will enable reporting for:

  - domain 0 devices not marked "firmware-first" and
  - all devices in other domains.

This inconsistency seems like a hole in the spec, but maybe I'm
missing something.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pcie/aer/aerdrv_acpi.c b/drivers/pci/pcie/aer/aerdrv_acpi.c
index 5194a7d..39b8671 100644
--- a/drivers/pci/pcie/aer/aerdrv_acpi.c
+++ b/drivers/pci/pcie/aer/aerdrv_acpi.c
@@ -42,6 +42,9 @@  static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data)
 	u8 bridge = 0;
 	int ff = 0;
 
+	if (info->firmware_first)
+		return 0;
+
 	switch (hest_hdr->type) {
 	case ACPI_HEST_TYPE_AER_ROOT_PORT:
 		pcie_type = PCI_EXP_TYPE_ROOT_PORT;