diff mbox

[PATCHv2,1/7] PCI/DPC: Enable ERR_COR

Message ID 20180402162203.3370-2-keith.busch@intel.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Keith Busch April 2, 2018, 4:21 p.m. UTC
A DPC port may be configured to send ERR_COR message when the
triggered. This patch enables this feature so additional notification
of the event is possible.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/pcie-dpc.c   | 5 ++++-
 include/uapi/linux/pci_regs.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas April 2, 2018, 9:23 p.m. UTC | #1
On Mon, Apr 02, 2018 at 10:21:57AM -0600, Keith Busch wrote:
> A DPC port may be configured to send ERR_COR message when the
> triggered. This patch enables this feature so additional notification
> of the event is possible.

Who is the intended consumer of the ERR_COR message?  I guess if the
root port supports AER, we'll see AER logging when we didn't before?
Is that what you mean by "additional notification"?

Or, since the DPC ERR_COR implementation note (sec 6.2.10.2) suggests
that ERR_COR is primarily intended for use by platform firmware, does
this change enable notification via the firmware?  If so, how does
that work?  Does it require the platform to retain control of AER so
it can see these events?  But if the platform retains AER control, I
don't think we'd be enabling DPC in Linux.  I'm confused.

The similar implementation note in 6.2.10.5 suggests that
DL_ACTIVE_ERR_COR is also intended for use by the platform.  Should we
be setting that for the same reason we're setting
PCI_EXP_DPC_CTL_ERR_COR?

> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/pcie/pcie-dpc.c   | 5 ++++-
>  include/uapi/linux/pci_regs.h | 1 +
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
> index 38e40c6c576f..a9be6938417f 100644
> --- a/drivers/pci/pcie/pcie-dpc.c
> +++ b/drivers/pci/pcie/pcie-dpc.c
> @@ -269,7 +269,10 @@ static int dpc_probe(struct pcie_device *dev)
>  		}
>  	}
>  
> -	ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_NONFATAL | PCI_EXP_DPC_CTL_INT_EN;
> +	ctl = (ctl & 0xfff4) |
> +		PCI_EXP_DPC_CTL_EN_NONFATAL |
> +		PCI_EXP_DPC_CTL_INT_EN |
> +		PCI_EXP_DPC_CTL_ERR_COR;

The 0xfff4 is a little confusing because it:

  - Clears DPC Trigger Enable (0x0003), then sets
    PCI_EXP_DPC_CTL_EN_NONFATAL.  That makes good sense.  Maybe we
    should have a mask for the 0x0003 value.

  - Preserves whatever the firmware set for DPC Completion Control
    (0x0004).  Seems like maybe we should decide and set this
    explicitly, unless there's a reason we should respect the
    platform's choice.  But that's a separate issue.

  - Clears DPC Interrupt Enable (0x0008), then sets it again.
    Clearing seems pointless.

  - Preserves DPC ERR_COR Enable (0x0010), then sets it.

I think the most readable thing would be:

  ctl = (ctl & 0xfffc) | PCI_EXP_DPC_CTL_EN_NONFATAL;
  ctl |= PCI_EXP_DPC_CTL_INT_EN | PCI_EXP_DPC_CTL_ERR_COR;

or maybe:

  ctl = (ctl & 0xfffc) | PCI_EXP_DPC_CTL_EN_NONFATAL;
  ctl |= PCI_EXP_DPC_CTL_INT_EN;

  /* Platform firmware may rely on these ERR_COR messages */
  ctl |= PCI_EXP_DPC_CTL_ERR_COR | PCI_EXP_DPC_CTL_DL_ACT_ERR_COR;

(If that comment is accurate.)

>  	pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, ctl);
>  
>  	dev_info(device, "DPC error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n",
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 0c79eac5e9b8..9cfcecdc3ec7 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -980,6 +980,7 @@
>  #define PCI_EXP_DPC_CTL			6	/* DPC control */
>  #define  PCI_EXP_DPC_CTL_EN_NONFATAL 	0x0002	/* Enable trigger on ERR_NONFATAL message */
>  #define  PCI_EXP_DPC_CTL_INT_EN 	0x0008	/* DPC Interrupt Enable */
> +#define  PCI_EXP_DPC_CTL_ERR_COR 	0x0010	/* DPC ERR_COR Enable */
>  
>  #define PCI_EXP_DPC_STATUS		8	/* DPC Status */
>  #define  PCI_EXP_DPC_STATUS_TRIGGER	    0x0001 /* Trigger Status */
> -- 
> 2.14.3
>
Keith Busch April 2, 2018, 11:09 p.m. UTC | #2
On Mon, Apr 02, 2018 at 04:23:02PM -0500, Bjorn Helgaas wrote:
> On Mon, Apr 02, 2018 at 10:21:57AM -0600, Keith Busch wrote:
> > A DPC port may be configured to send ERR_COR message when the
> > triggered. This patch enables this feature so additional notification
> > of the event is possible.
> 
> Who is the intended consumer of the ERR_COR message?  I guess if the
> root port supports AER, we'll see AER logging when we didn't before?
> Is that what you mean by "additional notification"?
> 
> Or, since the DPC ERR_COR implementation note (sec 6.2.10.2) suggests
> that ERR_COR is primarily intended for use by platform firmware, does
> this change enable notification via the firmware?  If so, how does
> that work?  Does it require the platform to retain control of AER so
> it can see these events?  But if the platform retains AER control, I
> don't think we'd be enabling DPC in Linux.  I'm confused.
> 
> The similar implementation note in 6.2.10.5 suggests that
> DL_ACTIVE_ERR_COR is also intended for use by the platform.  Should we
> be setting that for the same reason we're setting
> PCI_EXP_DPC_CTL_ERR_COR?

I think there are various ways this could play out. I added this to
the series per a request for future use, but I actually don't have an
environment where I could test as intended. As I understand, it was
for platform firmware that may own the root port, but not switches in
an external enclosure, so the OS would own the DPC control. The ERR_COR
is to notify firmware so it may note an event in a platform log.

I think you bring up a good point on DL_Active.

This should be harmless even if there are no consumers for the message,
or if the OS owns the root port AER control. But I'm totally fine
with dropping this one in the series, and it's not related to the rest
anyway. I think I at least need to circle back with platform makers and
really test the feature on capable hardware.
Bjorn Helgaas April 3, 2018, 2:16 p.m. UTC | #3
[+cc Rafael, linux-acpi]

On Mon, Apr 02, 2018 at 05:09:49PM -0600, Keith Busch wrote:
> On Mon, Apr 02, 2018 at 04:23:02PM -0500, Bjorn Helgaas wrote:
> > On Mon, Apr 02, 2018 at 10:21:57AM -0600, Keith Busch wrote:
> > > A DPC port may be configured to send ERR_COR message when the
> > > triggered. This patch enables this feature so additional notification
> > > of the event is possible.
> > 
> > Who is the intended consumer of the ERR_COR message?  I guess if the
> > root port supports AER, we'll see AER logging when we didn't before?
> > Is that what you mean by "additional notification"?
> > 
> > Or, since the DPC ERR_COR implementation note (sec 6.2.10.2) suggests
> > that ERR_COR is primarily intended for use by platform firmware, does
> > this change enable notification via the firmware?  If so, how does
> > that work?  Does it require the platform to retain control of AER so
> > it can see these events?  But if the platform retains AER control, I
> > don't think we'd be enabling DPC in Linux.  I'm confused.
> > 
> > The similar implementation note in 6.2.10.5 suggests that
> > DL_ACTIVE_ERR_COR is also intended for use by the platform.  Should we
> > be setting that for the same reason we're setting
> > PCI_EXP_DPC_CTL_ERR_COR?
> 
> I think there are various ways this could play out. I added this to
> the series per a request for future use, but I actually don't have an
> environment where I could test as intended. As I understand, it was
> for platform firmware that may own the root port, but not switches in
> an external enclosure, so the OS would own the DPC control. The ERR_COR
> is to notify firmware so it may note an event in a platform log.

This picture doesn't make sense to me yet.  Per the PCI Firmware spec,
r3.2, sec 4.5.2, we use _OSC to negotiate control over AER (and now
DPC).  I think _OSC is only allowed directly under a host bridge
device (PNP0A08 or PNP0A03), and it applies to the entire hierarchy
under the host bridge.

I don't know how firmware could claim to own AER on a root port, but
not on switches (external or otherwise) below that root port.

If _OSC says firmware owns AER, we won't touch AER on the root port,
but we also won't touch DPC on switches below the root port.  So I
don't see how this change will help.

> I think you bring up a good point on DL_Active.
> 
> This should be harmless even if there are no consumers for the message,
> or if the OS owns the root port AER control. But I'm totally fine
> with dropping this one in the series, and it's not related to the rest
> anyway. I think I at least need to circle back with platform makers and
> really test the feature on capable hardware.
Rafael J. Wysocki April 3, 2018, 2:31 p.m. UTC | #4
On Tue, Apr 3, 2018 at 4:16 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> [+cc Rafael, linux-acpi]

Thanks.

> On Mon, Apr 02, 2018 at 05:09:49PM -0600, Keith Busch wrote:
>> On Mon, Apr 02, 2018 at 04:23:02PM -0500, Bjorn Helgaas wrote:
>> > On Mon, Apr 02, 2018 at 10:21:57AM -0600, Keith Busch wrote:
>> > > A DPC port may be configured to send ERR_COR message when the
>> > > triggered. This patch enables this feature so additional notification
>> > > of the event is possible.
>> >
>> > Who is the intended consumer of the ERR_COR message?  I guess if the
>> > root port supports AER, we'll see AER logging when we didn't before?
>> > Is that what you mean by "additional notification"?
>> >
>> > Or, since the DPC ERR_COR implementation note (sec 6.2.10.2) suggests
>> > that ERR_COR is primarily intended for use by platform firmware, does
>> > this change enable notification via the firmware?  If so, how does
>> > that work?  Does it require the platform to retain control of AER so
>> > it can see these events?  But if the platform retains AER control, I
>> > don't think we'd be enabling DPC in Linux.  I'm confused.
>> >
>> > The similar implementation note in 6.2.10.5 suggests that
>> > DL_ACTIVE_ERR_COR is also intended for use by the platform.  Should we
>> > be setting that for the same reason we're setting
>> > PCI_EXP_DPC_CTL_ERR_COR?
>>
>> I think there are various ways this could play out. I added this to
>> the series per a request for future use, but I actually don't have an
>> environment where I could test as intended. As I understand, it was
>> for platform firmware that may own the root port, but not switches in
>> an external enclosure, so the OS would own the DPC control. The ERR_COR
>> is to notify firmware so it may note an event in a platform log.
>
> This picture doesn't make sense to me yet.  Per the PCI Firmware spec,
> r3.2, sec 4.5.2, we use _OSC to negotiate control over AER (and now
> DPC).  I think _OSC is only allowed directly under a host bridge
> device (PNP0A08 or PNP0A03), and it applies to the entire hierarchy
> under the host bridge.
>
> I don't know how firmware could claim to own AER on a root port, but
> not on switches (external or otherwise) below that root port.
>
> If _OSC says firmware owns AER, we won't touch AER on the root port,
> but we also won't touch DPC on switches below the root port.  So I
> don't see how this change will help.

This matches my understanding of that part of the spec.
Keith Busch April 3, 2018, 2:37 p.m. UTC | #5
On Tue, Apr 03, 2018 at 04:31:15PM +0200, Rafael J. Wysocki wrote:
> On Tue, Apr 3, 2018 at 4:16 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > This picture doesn't make sense to me yet.  Per the PCI Firmware spec,
> > r3.2, sec 4.5.2, we use _OSC to negotiate control over AER (and now
> > DPC).  I think _OSC is only allowed directly under a host bridge
> > device (PNP0A08 or PNP0A03), and it applies to the entire hierarchy
> > under the host bridge.
> >
> > I don't know how firmware could claim to own AER on a root port, but
> > not on switches (external or otherwise) below that root port.
> >
> > If _OSC says firmware owns AER, we won't touch AER on the root port,
> > but we also won't touch DPC on switches below the root port.  So I
> > don't see how this change will help.
> 
> This matches my understanding of that part of the spec.

Thanks, guys. It seems this patch doesn't accomplish anything.
diff mbox

Patch

diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
index 38e40c6c576f..a9be6938417f 100644
--- a/drivers/pci/pcie/pcie-dpc.c
+++ b/drivers/pci/pcie/pcie-dpc.c
@@ -269,7 +269,10 @@  static int dpc_probe(struct pcie_device *dev)
 		}
 	}
 
-	ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_NONFATAL | PCI_EXP_DPC_CTL_INT_EN;
+	ctl = (ctl & 0xfff4) |
+		PCI_EXP_DPC_CTL_EN_NONFATAL |
+		PCI_EXP_DPC_CTL_INT_EN |
+		PCI_EXP_DPC_CTL_ERR_COR;
 	pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, ctl);
 
 	dev_info(device, "DPC error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n",
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 0c79eac5e9b8..9cfcecdc3ec7 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -980,6 +980,7 @@ 
 #define PCI_EXP_DPC_CTL			6	/* DPC control */
 #define  PCI_EXP_DPC_CTL_EN_NONFATAL 	0x0002	/* Enable trigger on ERR_NONFATAL message */
 #define  PCI_EXP_DPC_CTL_INT_EN 	0x0008	/* DPC Interrupt Enable */
+#define  PCI_EXP_DPC_CTL_ERR_COR 	0x0010	/* DPC ERR_COR Enable */
 
 #define PCI_EXP_DPC_STATUS		8	/* DPC Status */
 #define  PCI_EXP_DPC_STATUS_TRIGGER	    0x0001 /* Trigger Status */