diff mbox series

[16/22] PCI: pciehp: Use RESPONSE_IS_PCI_ERROR() to check read from hardware

Message ID 36c7c3005c4d86a6884b270807d84433a86c0953.1633972263.git.naveennaidu479@gmail.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series PCI: Unify PCI error response checking | expand

Commit Message

Naveen Naidu Oct. 11, 2021, 6:07 p.m. UTC
An MMIO read from a PCI device that doesn't exist or doesn't respond
causes a PCI error.  There's no real data to return to satisfy the
CPU read, so most hardware fabricates ~0 data.

Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read
data from hardware.

This helps unify PCI error response checking and make error checks
consistent and easier to find.

Compile tested only.

Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 drivers/pci/hotplug/pciehp_hpc.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Lukas Wunner Oct. 11, 2021, 7:47 p.m. UTC | #1
On Mon, Oct 11, 2021 at 11:37:33PM +0530, Naveen Naidu wrote:
> An MMIO read from a PCI device that doesn't exist or doesn't respond
> causes a PCI error.  There's no real data to return to satisfy the
> CPU read, so most hardware fabricates ~0 data.
> 
> Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read
> data from hardware.

Actually what happens is that PCI read transactions *time out*,
so the host controller fabricates a response.

By contrast, a PCI *error* usually denotes an Uncorrectable or
Correctable Error as specified in section 6.2.2 of the PCIe Base Spec.

Thus something like RESPONSE_IS_PCI_TIMEOUT() or IS_PCI_TIMEOUT() would
probably be more appropriate.  I'll leave the exact bikeshed color for
others to decide. :-)


> Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
> ---
>  drivers/pci/hotplug/pciehp_hpc.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)

Acked-by: Lukas Wunner <lukas@wunner.de>
Naveen Naidu Oct. 12, 2021, 4:05 p.m. UTC | #2
On 11/10, Lukas Wunner wrote:
> On Mon, Oct 11, 2021 at 11:37:33PM +0530, Naveen Naidu wrote:
> > An MMIO read from a PCI device that doesn't exist or doesn't respond
> > causes a PCI error.  There's no real data to return to satisfy the
> > CPU read, so most hardware fabricates ~0 data.
> > 
> > Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read
> > data from hardware.
> 
> Actually what happens is that PCI read transactions *time out*,
> so the host controller fabricates a response.
>

Ah! yes. Now that I look at it, RESPONSE_IS_PCI_TIMEOUT() does indeed
seem like a better option to RESPONSE_IS_PCI_ERROR(), since it's more
specfic and depicts the actual condition. 

I'll wait for sometime and see if others have any objection/a better
name for the macro and then redo the patch with that.

Thank you very much for the review ^^ 

> By contrast, a PCI *error* usually denotes an Uncorrectable or
> Correctable Error as specified in section 6.2.2 of the PCIe Base Spec.
> 
> Thus something like RESPONSE_IS_PCI_TIMEOUT() or IS_PCI_TIMEOUT() would
> probably be more appropriate.  I'll leave the exact bikeshed color for
> others to decide. :-)
> 
> 
> > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
> > ---
> >  drivers/pci/hotplug/pciehp_hpc.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> Acked-by: Lukas Wunner <lukas@wunner.de>
Pali Rohár Oct. 12, 2021, 11:12 p.m. UTC | #3
On Tuesday 12 October 2021 21:35:13 Naveen Naidu wrote:
> On 11/10, Lukas Wunner wrote:
> > On Mon, Oct 11, 2021 at 11:37:33PM +0530, Naveen Naidu wrote:
> > > An MMIO read from a PCI device that doesn't exist or doesn't respond
> > > causes a PCI error.  There's no real data to return to satisfy the
> > > CPU read, so most hardware fabricates ~0 data.
> > > 
> > > Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read
> > > data from hardware.
> > 
> > Actually what happens is that PCI read transactions *time out*,
> > so the host controller fabricates a response.
> >
> 
> Ah! yes. Now that I look at it, RESPONSE_IS_PCI_TIMEOUT() does indeed
> seem like a better option to RESPONSE_IS_PCI_ERROR(), since it's more
> specfic and depicts the actual condition. 

This is not fully correct. 0xffffffff is returned when some error
happens. It does not have to be timeout error. Errors like Unsupported
Request, Completer Abort or Configuration Request Retry Status (when
CRSSVE bit is disabled) are also reported as 0xffffffff and they do not
represent timeout. For example Unsupported Request is returned when you
try to read from non-existent device behind some PCIe switch.

Also pci-aardvark.c fabricates value 0xffffffff when trying to read from
config space below the PCIe Root Port when PCIe link is not up.

And I have seen that Completer Abort was returned by PCIe switch when
switch itself did not received reply from device below switch. So it
means that controller can receive some reply from other device even when
no real reply was sent. Which means that timeout can be reported by some
other message.

So I think that generic PCI_ERROR is the best name. You do not know what
really happened (only some controller drivers can provide additional
information, it does not have any standard HW<-->OS API) and application
logic must decide how to process error.

> I'll wait for sometime and see if others have any objection/a better
> name for the macro and then redo the patch with that.
> 
> Thank you very much for the review ^^ 
> 
> > By contrast, a PCI *error* usually denotes an Uncorrectable or
> > Correctable Error as specified in section 6.2.2 of the PCIe Base Spec.
> > 
> > Thus something like RESPONSE_IS_PCI_TIMEOUT() or IS_PCI_TIMEOUT() would
> > probably be more appropriate.  I'll leave the exact bikeshed color for
> > others to decide. :-)
> > 
> > 
> > > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
> > > ---
> > >  drivers/pci/hotplug/pciehp_hpc.c | 10 +++++-----
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > Acked-by: Lukas Wunner <lukas@wunner.de>
Lukas Wunner Oct. 13, 2021, 12:20 p.m. UTC | #4
On Wed, Oct 13, 2021 at 01:12:01AM +0200, Pali Rohár wrote:
> > On 11/10, Lukas Wunner wrote:
> > > On Mon, Oct 11, 2021 at 11:37:33PM +0530, Naveen Naidu wrote:
> > > > An MMIO read from a PCI device that doesn't exist or doesn't respond
> > > > causes a PCI error.  There's no real data to return to satisfy the
> > > > CPU read, so most hardware fabricates ~0 data.
> > > > 
> > > > Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read
> > > > data from hardware.
> > > 
> > > Actually what happens is that PCI read transactions *time out*,
> > > so the host controller fabricates a response.
> 
> This is not fully correct. 0xffffffff is returned when some error
> happens. It does not have to be timeout error. Errors like Unsupported
> Request, Completer Abort or Configuration Request Retry Status (when
> CRSSVE bit is disabled) are also reported as 0xffffffff and they do not
> represent timeout. For example Unsupported Request is returned when you
> try to read from non-existent device behind some PCIe switch.

This particular patch concerns pciehp and in that context,
"all ones" responses are predominantly timeouts caused by
hot-removed devices.

Thanks,

Lukas
diff mbox series

Patch

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3024d7e85e6a..8a2f6bb643b5 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -89,7 +89,7 @@  static int pcie_poll_cmd(struct controller *ctrl, int timeout)
 
 	do {
 		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
-		if (slot_status == (u16) ~0) {
+		if (RESPONSE_IS_PCI_ERROR(&slot_status)) {
 			ctrl_info(ctrl, "%s: no response from device\n",
 				  __func__);
 			return 0;
@@ -165,7 +165,7 @@  static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd,
 	pcie_wait_cmd(ctrl);
 
 	pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
-	if (slot_ctrl == (u16) ~0) {
+	if (RESPONSE_IS_PCI_ERROR(&slot_ctrl)) {
 		ctrl_info(ctrl, "%s: no response from device\n", __func__);
 		goto out;
 	}
@@ -236,7 +236,7 @@  int pciehp_check_link_active(struct controller *ctrl)
 	int ret;
 
 	ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
-	if (ret == PCIBIOS_DEVICE_NOT_FOUND || lnk_status == (u16)~0)
+	if (ret == PCIBIOS_DEVICE_NOT_FOUND || RESPONSE_IS_PCI_ERROR(&lnk_status))
 		return -ENODEV;
 
 	ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
@@ -443,7 +443,7 @@  int pciehp_card_present(struct controller *ctrl)
 	int ret;
 
 	ret = pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
-	if (ret == PCIBIOS_DEVICE_NOT_FOUND || slot_status == (u16)~0)
+	if (ret == PCIBIOS_DEVICE_NOT_FOUND || RESPONSE_IS_PCI_ERROR(&slot_status))
 		return -ENODEV;
 
 	return !!(slot_status & PCI_EXP_SLTSTA_PDS);
@@ -621,7 +621,7 @@  static irqreturn_t pciehp_isr(int irq, void *dev_id)
 
 read_status:
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
-	if (status == (u16) ~0) {
+	if (RESPONSE_IS_PCI_ERROR(&status)) {
 		ctrl_info(ctrl, "%s: no response from device\n", __func__);
 		if (parent)
 			pm_runtime_put(parent);