Message ID | 20201121001036.8560-11-sean.v.kelley@intel.com (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Add RCEC handling to PCI/AER | expand |
On Fri, Nov 20, 2020 at 04:10:31PM -0800, Sean V Kelley wrote: > In some cases a bridge may not exist as the hardware controlling may be > handled only by firmware and so is not visible to the OS. This scenario is > also possible in future use cases involving non-native use of RCECs by > firmware. > > Explicitly apply conditional logic around these resets by limiting them to > Root Ports and Downstream Ports. Can you help me understand this? The subject says "Limit AER resets" and here you say "limit them to RPs and DPs", but it's not completely obvious how the resets are being limited, i.e., the patch doesn't add anything like: + if (type == PCI_EXP_TYPE_ROOT_PORT || + type == PCI_EXP_TYPE_DOWNSTREAM) reset_subordinates(bridge); It *does* add checks around pcie_clear_device_status(), but that also includes RC_EC. And that's not a reset, so I don't think that's explicitly mentioned in the commit log. Also see the question below. > Link: https://lore.kernel.org/r/20201002184735.1229220-8-seanvk.dev@oregontracks.org > Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/pci/pcie/err.c | 31 +++++++++++++++++++++++++------ > 1 file changed, 25 insertions(+), 6 deletions(-) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index 8b53aecdb43d..7883c9791562 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev, void *data) > > /** > * pci_walk_bridge - walk bridges potentially AER affected > - * @bridge: bridge which may be a Port > + * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs, > + * or an RCiEP associated with an RCEC > * @cb: callback to be called for each device found > * @userdata: arbitrary pointer to be passed to callback > * > * If the device provided is a bridge, walk the subordinate bus, including > * any bridged devices on buses under this bus. Call the provided callback > * on each device found. > + * > + * If the device provided has no subordinate bus, call the callback on the > + * device itself. > */ > static void pci_walk_bridge(struct pci_dev *bridge, > int (*cb)(struct pci_dev *, void *), > @@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev *bridge, > { > if (bridge->subordinate) > pci_walk_bus(bridge->subordinate, cb, userdata); > + else > + cb(bridge, userdata); > } > > pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > @@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > > /* > * Error recovery runs on all subordinates of the bridge. If the > - * bridge detected the error, it is cleared at the end. > + * bridge detected the error, it is cleared at the end. For RCiEPs > + * we should reset just the RCiEP itself. > */ > if (type == PCI_EXP_TYPE_ROOT_PORT || > - type == PCI_EXP_TYPE_DOWNSTREAM) > + type == PCI_EXP_TYPE_DOWNSTREAM || > + type == PCI_EXP_TYPE_RC_EC || > + type == PCI_EXP_TYPE_RC_END) > bridge = dev; > else > bridge = pci_upstream_bridge(dev); > @@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_dbg(bridge, "broadcast error_detected message\n"); > if (state == pci_channel_io_frozen) { > pci_walk_bridge(bridge, report_frozen_detected, &status); > + if (type == PCI_EXP_TYPE_RC_END) { > + pci_warn(dev, "subordinate device reset not possible for RCiEP\n"); > + status = PCI_ERS_RESULT_NONE; > + goto failed; > + } > + > status = reset_subordinates(bridge); > if (status != PCI_ERS_RESULT_RECOVERED) { > pci_warn(bridge, "subordinate device reset failed\n"); > @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_dbg(bridge, "broadcast resume message\n"); > pci_walk_bridge(bridge, report_resume, &status); > > - if (pcie_aer_is_native(bridge)) > - pcie_clear_device_status(bridge); > - pci_aer_clear_nonfatal_status(bridge); > + if (type == PCI_EXP_TYPE_ROOT_PORT || > + type == PCI_EXP_TYPE_DOWNSTREAM || > + type == PCI_EXP_TYPE_RC_EC) { > + if (pcie_aer_is_native(bridge)) > + pcie_clear_device_status(bridge); > + pci_aer_clear_nonfatal_status(bridge); This is hard to understand because "type" is from "dev", but "bridge" is not necessarily the same device. Should it be this? type = pci_pcie_type(bridge); if (type == PCI_EXP_TYPE_ROOT_PORT || ...) > + } > pci_info(bridge, "device recovery successful\n"); > return status; > > -- > 2.29.2 >
Hi Bjorn, > On Nov 23, 2020, at 3:28 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Nov 20, 2020 at 04:10:31PM -0800, Sean V Kelley wrote: >> In some cases a bridge may not exist as the hardware controlling may be >> handled only by firmware and so is not visible to the OS. This scenario is >> also possible in future use cases involving non-native use of RCECs by >> firmware. >> >> Explicitly apply conditional logic around these resets by limiting them to >> Root Ports and Downstream Ports. > > Can you help me understand this? The subject says "Limit AER resets" > and here you say "limit them to RPs and DPs", but it's not completely > obvious how the resets are being limited, i.e., the patch doesn't add > anything like: > > + if (type == PCI_EXP_TYPE_ROOT_PORT || > + type == PCI_EXP_TYPE_DOWNSTREAM) > reset_subordinates(bridge); > > It *does* add checks around pcie_clear_device_status(), but that also > includes RC_EC. And that's not a reset, so I don't think that's > explicitly mentioned in the commit log. The subject should have referred to the clearing of the device status rather than resets. It originally came from this simpler patch in which I made use of reset instead of clear: https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@oregontracks.org/ So a rephrase of clearing in place of resets would be more appropriate. Then we added the notion of bridges…below > > Also see the question below. > >> Link: https://lore.kernel.org/r/20201002184735.1229220-8-seanvk.dev@oregontracks.org >> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> >> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> >> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> --- >> drivers/pci/pcie/err.c | 31 +++++++++++++++++++++++++------ >> 1 file changed, 25 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index 8b53aecdb43d..7883c9791562 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev, void *data) >> >> /** >> * pci_walk_bridge - walk bridges potentially AER affected >> - * @bridge: bridge which may be a Port >> + * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs, >> + * or an RCiEP associated with an RCEC >> * @cb: callback to be called for each device found >> * @userdata: arbitrary pointer to be passed to callback >> * >> * If the device provided is a bridge, walk the subordinate bus, including >> * any bridged devices on buses under this bus. Call the provided callback >> * on each device found. >> + * >> + * If the device provided has no subordinate bus, call the callback on the >> + * device itself. >> */ >> static void pci_walk_bridge(struct pci_dev *bridge, >> int (*cb)(struct pci_dev *, void *), >> @@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev *bridge, >> { >> if (bridge->subordinate) >> pci_walk_bus(bridge->subordinate, cb, userdata); >> + else >> + cb(bridge, userdata); >> } >> >> pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> @@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> >> /* >> * Error recovery runs on all subordinates of the bridge. If the >> - * bridge detected the error, it is cleared at the end. >> + * bridge detected the error, it is cleared at the end. For RCiEPs >> + * we should reset just the RCiEP itself. >> */ >> if (type == PCI_EXP_TYPE_ROOT_PORT || >> - type == PCI_EXP_TYPE_DOWNSTREAM) >> + type == PCI_EXP_TYPE_DOWNSTREAM || >> + type == PCI_EXP_TYPE_RC_EC || >> + type == PCI_EXP_TYPE_RC_END) >> bridge = dev; >> else >> bridge = pci_upstream_bridge(dev); >> @@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_dbg(bridge, "broadcast error_detected message\n"); >> if (state == pci_channel_io_frozen) { >> pci_walk_bridge(bridge, report_frozen_detected, &status); >> + if (type == PCI_EXP_TYPE_RC_END) { >> + pci_warn(dev, "subordinate device reset not possible for RCiEP\n"); >> + status = PCI_ERS_RESULT_NONE; >> + goto failed; >> + } >> + >> status = reset_subordinates(bridge); >> if (status != PCI_ERS_RESULT_RECOVERED) { >> pci_warn(bridge, "subordinate device reset failed\n"); >> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_dbg(bridge, "broadcast resume message\n"); >> pci_walk_bridge(bridge, report_resume, &status); >> >> - if (pcie_aer_is_native(bridge)) >> - pcie_clear_device_status(bridge); >> - pci_aer_clear_nonfatal_status(bridge); >> + if (type == PCI_EXP_TYPE_ROOT_PORT || >> + type == PCI_EXP_TYPE_DOWNSTREAM || >> + type == PCI_EXP_TYPE_RC_EC) { >> + if (pcie_aer_is_native(bridge)) >> + pcie_clear_device_status(bridge); >> + pci_aer_clear_nonfatal_status(bridge); > > This is hard to understand because "type" is from "dev", but "bridge" > is not necessarily the same device. Should it be this? > > type = pci_pcie_type(bridge); > if (type == PCI_EXP_TYPE_ROOT_PORT || > ...) Correct, it would be better if the type was based on the ‘bridge’. Thanks, Sean > >> + } >> pci_info(bridge, "device recovery successful\n"); >> return status; >> >> -- >> 2.29.2
On Mon, Nov 23, 2020 at 11:57:35PM +0000, Kelley, Sean V wrote: > > On Nov 23, 2020, at 3:28 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Nov 20, 2020 at 04:10:31PM -0800, Sean V Kelley wrote: > >> In some cases a bridge may not exist as the hardware controlling may be > >> handled only by firmware and so is not visible to the OS. This scenario is > >> also possible in future use cases involving non-native use of RCECs by > >> firmware. > >> > >> Explicitly apply conditional logic around these resets by limiting them to > >> Root Ports and Downstream Ports. > > > > Can you help me understand this? The subject says "Limit AER resets" > > and here you say "limit them to RPs and DPs", but it's not completely > > obvious how the resets are being limited, i.e., the patch doesn't add > > anything like: > > > > + if (type == PCI_EXP_TYPE_ROOT_PORT || > > + type == PCI_EXP_TYPE_DOWNSTREAM) > > reset_subordinates(bridge); > > > > It *does* add checks around pcie_clear_device_status(), but that also > > includes RC_EC. And that's not a reset, so I don't think that's > > explicitly mentioned in the commit log. > > The subject should have referred to the clearing of the device status rather than resets. > It originally came from this simpler patch in which I made use of reset instead of clear: > > https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@oregontracks.org/ > > So a rephrase of clearing in place of resets would be more appropriate. > > Then we added the notion of bridges…below > > > > > Also see the question below. > > > >> Link: https://lore.kernel.org/r/20201002184735.1229220-8-seanvk.dev@oregontracks.org > >> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> > >> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > >> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > >> --- > >> drivers/pci/pcie/err.c | 31 +++++++++++++++++++++++++------ > >> 1 file changed, 25 insertions(+), 6 deletions(-) > >> > >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > >> index 8b53aecdb43d..7883c9791562 100644 > >> --- a/drivers/pci/pcie/err.c > >> +++ b/drivers/pci/pcie/err.c > >> @@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev, void *data) > >> > >> /** > >> * pci_walk_bridge - walk bridges potentially AER affected > >> - * @bridge: bridge which may be a Port > >> + * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs, > >> + * or an RCiEP associated with an RCEC > >> * @cb: callback to be called for each device found > >> * @userdata: arbitrary pointer to be passed to callback > >> * > >> * If the device provided is a bridge, walk the subordinate bus, including > >> * any bridged devices on buses under this bus. Call the provided callback > >> * on each device found. > >> + * > >> + * If the device provided has no subordinate bus, call the callback on the > >> + * device itself. > >> */ > >> static void pci_walk_bridge(struct pci_dev *bridge, > >> int (*cb)(struct pci_dev *, void *), > >> @@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev *bridge, > >> { > >> if (bridge->subordinate) > >> pci_walk_bus(bridge->subordinate, cb, userdata); > >> + else > >> + cb(bridge, userdata); > >> } > >> > >> pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >> @@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >> > >> /* > >> * Error recovery runs on all subordinates of the bridge. If the > >> - * bridge detected the error, it is cleared at the end. > >> + * bridge detected the error, it is cleared at the end. For RCiEPs > >> + * we should reset just the RCiEP itself. > >> */ > >> if (type == PCI_EXP_TYPE_ROOT_PORT || > >> - type == PCI_EXP_TYPE_DOWNSTREAM) > >> + type == PCI_EXP_TYPE_DOWNSTREAM || > >> + type == PCI_EXP_TYPE_RC_EC || > >> + type == PCI_EXP_TYPE_RC_END) > >> bridge = dev; > >> else > >> bridge = pci_upstream_bridge(dev); > >> @@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >> pci_dbg(bridge, "broadcast error_detected message\n"); > >> if (state == pci_channel_io_frozen) { > >> pci_walk_bridge(bridge, report_frozen_detected, &status); > >> + if (type == PCI_EXP_TYPE_RC_END) { > >> + pci_warn(dev, "subordinate device reset not possible for RCiEP\n"); > >> + status = PCI_ERS_RESULT_NONE; > >> + goto failed; > >> + } > >> + > >> status = reset_subordinates(bridge); > >> if (status != PCI_ERS_RESULT_RECOVERED) { > >> pci_warn(bridge, "subordinate device reset failed\n"); > >> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >> pci_dbg(bridge, "broadcast resume message\n"); > >> pci_walk_bridge(bridge, report_resume, &status); > >> > >> - if (pcie_aer_is_native(bridge)) > >> - pcie_clear_device_status(bridge); > >> - pci_aer_clear_nonfatal_status(bridge); > >> + if (type == PCI_EXP_TYPE_ROOT_PORT || > >> + type == PCI_EXP_TYPE_DOWNSTREAM || > >> + type == PCI_EXP_TYPE_RC_EC) { > >> + if (pcie_aer_is_native(bridge)) > >> + pcie_clear_device_status(bridge); > >> + pci_aer_clear_nonfatal_status(bridge); > > > > This is hard to understand because "type" is from "dev", but "bridge" > > is not necessarily the same device. Should it be this? > > > > type = pci_pcie_type(bridge); > > if (type == PCI_EXP_TYPE_ROOT_PORT || > > ...) > > Correct, it would be better if the type was based on the ‘bridge’. OK. This is similar to https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@oregontracks.org/, which you cited above except for the bridge/dev question and the addition here of RC_EC. I tried to split that back into its own patch and started with the commit message from that patch. But I got stuck on the commit message. I got as far as: In some cases an error may be reported by a device not visible to the OS, e.g., if firmware manages the device and passes error information to the OS via ACPI APEI. But I still can't quite connect that to the patch. "bridge" is clearly a device visible to Linux. I guess we're trying to assert that if "bridge" is not a Root Port, Downstream Port, or RCEC, we shouldn't clear the error status because the error came from a device Linux doesn't know about. But I think "bridge" is *always* either a Root Port or a Downstream Port: if (type == PCI_EXP_TYPE_ROOT_PORT || type == PCI_EXP_TYPE_DOWNSTREAM) bridge = dev; else bridge = pci_upstream_bridge(dev); pci_upstream_bridge() returns either NULL (in which case previous uses dereference a NULL pointer), or dev->bus->self, which is always a Root Port, Switch Downstream Port, or Switch Upstream Port (or NULL for the special case of VFs). > >> + } > >> pci_info(bridge, "device recovery successful\n"); > >> return status; > >> > >> -- > >> 2.29.2 >
Hi Bjorn, Was away briefly for the holidays, comments below: > On Nov 24, 2020, at 9:17 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Mon, Nov 23, 2020 at 11:57:35PM +0000, Kelley, Sean V wrote: >>> On Nov 23, 2020, at 3:28 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: >>> On Fri, Nov 20, 2020 at 04:10:31PM -0800, Sean V Kelley wrote: >>>> In some cases a bridge may not exist as the hardware controlling may be >>>> handled only by firmware and so is not visible to the OS. This scenario is >>>> also possible in future use cases involving non-native use of RCECs by >>>> firmware. >>>> >>>> Explicitly apply conditional logic around these resets by limiting them to >>>> Root Ports and Downstream Ports. >>> >>> Can you help me understand this? The subject says "Limit AER resets" >>> and here you say "limit them to RPs and DPs", but it's not completely >>> obvious how the resets are being limited, i.e., the patch doesn't add >>> anything like: >>> >>> + if (type == PCI_EXP_TYPE_ROOT_PORT || >>> + type == PCI_EXP_TYPE_DOWNSTREAM) >>> reset_subordinates(bridge); >>> >>> It *does* add checks around pcie_clear_device_status(), but that also >>> includes RC_EC. And that's not a reset, so I don't think that's >>> explicitly mentioned in the commit log. >> >> The subject should have referred to the clearing of the device status rather than resets. >> It originally came from this simpler patch in which I made use of reset instead of clear: >> >> https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@oregontracks.org/ >> >> So a rephrase of clearing in place of resets would be more appropriate. >> >> Then we added the notion of bridges…below >> >>> >>> Also see the question below. >>> >>>> Link: https://lore.kernel.org/r/20201002184735.1229220-8-seanvk.dev@oregontracks.org >>>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> >>>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> >>>> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >>>> --- >>>> drivers/pci/pcie/err.c | 31 +++++++++++++++++++++++++------ >>>> 1 file changed, 25 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >>>> index 8b53aecdb43d..7883c9791562 100644 >>>> --- a/drivers/pci/pcie/err.c >>>> +++ b/drivers/pci/pcie/err.c >>>> @@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev, void *data) >>>> >>>> /** >>>> * pci_walk_bridge - walk bridges potentially AER affected >>>> - * @bridge: bridge which may be a Port >>>> + * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs, >>>> + * or an RCiEP associated with an RCEC >>>> * @cb: callback to be called for each device found >>>> * @userdata: arbitrary pointer to be passed to callback >>>> * >>>> * If the device provided is a bridge, walk the subordinate bus, including >>>> * any bridged devices on buses under this bus. Call the provided callback >>>> * on each device found. >>>> + * >>>> + * If the device provided has no subordinate bus, call the callback on the >>>> + * device itself. >>>> */ >>>> static void pci_walk_bridge(struct pci_dev *bridge, >>>> int (*cb)(struct pci_dev *, void *), >>>> @@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev *bridge, >>>> { >>>> if (bridge->subordinate) >>>> pci_walk_bus(bridge->subordinate, cb, userdata); >>>> + else >>>> + cb(bridge, userdata); >>>> } >>>> >>>> pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>> @@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>> >>>> /* >>>> * Error recovery runs on all subordinates of the bridge. If the >>>> - * bridge detected the error, it is cleared at the end. >>>> + * bridge detected the error, it is cleared at the end. For RCiEPs >>>> + * we should reset just the RCiEP itself. >>>> */ >>>> if (type == PCI_EXP_TYPE_ROOT_PORT || >>>> - type == PCI_EXP_TYPE_DOWNSTREAM) >>>> + type == PCI_EXP_TYPE_DOWNSTREAM || >>>> + type == PCI_EXP_TYPE_RC_EC || >>>> + type == PCI_EXP_TYPE_RC_END) >>>> bridge = dev; >>>> else >>>> bridge = pci_upstream_bridge(dev); >>>> @@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>> pci_dbg(bridge, "broadcast error_detected message\n"); >>>> if (state == pci_channel_io_frozen) { >>>> pci_walk_bridge(bridge, report_frozen_detected, &status); >>>> + if (type == PCI_EXP_TYPE_RC_END) { >>>> + pci_warn(dev, "subordinate device reset not possible for RCiEP\n"); >>>> + status = PCI_ERS_RESULT_NONE; >>>> + goto failed; >>>> + } >>>> + >>>> status = reset_subordinates(bridge); >>>> if (status != PCI_ERS_RESULT_RECOVERED) { >>>> pci_warn(bridge, "subordinate device reset failed\n"); >>>> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>> pci_dbg(bridge, "broadcast resume message\n"); >>>> pci_walk_bridge(bridge, report_resume, &status); >>>> >>>> - if (pcie_aer_is_native(bridge)) >>>> - pcie_clear_device_status(bridge); >>>> - pci_aer_clear_nonfatal_status(bridge); >>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || >>>> + type == PCI_EXP_TYPE_DOWNSTREAM || >>>> + type == PCI_EXP_TYPE_RC_EC) { >>>> + if (pcie_aer_is_native(bridge)) >>>> + pcie_clear_device_status(bridge); >>>> + pci_aer_clear_nonfatal_status(bridge); >>> >>> This is hard to understand because "type" is from "dev", but "bridge" >>> is not necessarily the same device. Should it be this? >>> >>> type = pci_pcie_type(bridge); >>> if (type == PCI_EXP_TYPE_ROOT_PORT || >>> ...) >> >> Correct, it would be better if the type was based on the ‘bridge’. > > OK. This is similar to > https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@oregontracks.org/, > which you cited above except for the bridge/dev question and the > addition here of RC_EC. > > I tried to split that back into its own patch and started with the > commit message from that patch. But I got stuck on the commit > message. I got as far as: > > In some cases an error may be reported by a device not visible to > the OS, e.g., if firmware manages the device and passes error > information to the OS via ACPI APEI. > > But I still can't quite connect that to the patch. "bridge" is > clearly a device visible to Linux. > > I guess we're trying to assert that if "bridge" is not a Root Port, > Downstream Port, or RCEC, we shouldn't clear the error status because > the error came from a device Linux doesn't know about. But I think > "bridge" is *always* either a Root Port or a Downstream Port: That’s ultimately what we are trying to do. > > if (type == PCI_EXP_TYPE_ROOT_PORT || > type == PCI_EXP_TYPE_DOWNSTREAM) > bridge = dev; > else > bridge = pci_upstream_bridge(dev); > > pci_upstream_bridge() returns either NULL (in which case previous uses > dereference a NULL pointer), or dev->bus->self, which is always a Root > Port, Switch Downstream Port, or Switch Upstream Port (or NULL for the > special case of VFs). In the past recall we were augmenting it with bridge = dev->rcec for RC_END. But we were able to relocate the handling in aer_root_reset(). So in this patch - we add the conditionals because RC_END is being passed in addition to RC_EC. if (type == PCI_EXP_TYPE_ROOT_PORT || - type == PCI_EXP_TYPE_DOWNSTREAM) + type == PCI_EXP_TYPE_DOWNSTREAM || + type == PCI_EXP_TYPE_RC_EC || + type == PCI_EXP_TYPE_RC_END) bridge = dev; else bridge = pci_upstream_bridge(dev); So we need to check for RP, DS, and RC_EC @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(bridge, "broadcast resume message\n"); pci_walk_bridge(bridge, report_resume, &status); - if (pcie_aer_is_native(bridge)) - pcie_clear_device_status(bridge); - pci_aer_clear_nonfatal_status(bridge); + if (type == PCI_EXP_TYPE_ROOT_PORT || + type == PCI_EXP_TYPE_DOWNSTREAM || + type == PCI_EXP_TYPE_RC_EC) { + if (pcie_aer_is_native(bridge)) + pcie_clear_device_status(bridge); + pci_aer_clear_nonfatal_status(bridge); + } Breaking out a separate patch would be unnecessary as you correctly point out that it’s only going to be an RP or DS before this patch. Thanks, Sean >>>> + } >>>> pci_info(bridge, "device recovery successful\n"); >>>> return status; >>>> >>>> -- >>>> 2.29.2 >>
On Mon, Nov 30, 2020 at 07:54:37PM +0000, Kelley, Sean V wrote: > > On Nov 24, 2020, at 9:17 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Mon, Nov 23, 2020 at 11:57:35PM +0000, Kelley, Sean V wrote: > >>> On Nov 23, 2020, at 3:28 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > >>> On Fri, Nov 20, 2020 at 04:10:31PM -0800, Sean V Kelley wrote: > >>>> In some cases a bridge may not exist as the hardware controlling may be > >>>> handled only by firmware and so is not visible to the OS. This scenario is > >>>> also possible in future use cases involving non-native use of RCECs by > >>>> firmware. > >>>> > >>>> Explicitly apply conditional logic around these resets by limiting them to > >>>> Root Ports and Downstream Ports. > >>> > >>> Can you help me understand this? The subject says "Limit AER resets" > >>> and here you say "limit them to RPs and DPs", but it's not completely > >>> obvious how the resets are being limited, i.e., the patch doesn't add > >>> anything like: > >>> > >>> + if (type == PCI_EXP_TYPE_ROOT_PORT || > >>> + type == PCI_EXP_TYPE_DOWNSTREAM) > >>> reset_subordinates(bridge); > >>> > >>> It *does* add checks around pcie_clear_device_status(), but that also > >>> includes RC_EC. And that's not a reset, so I don't think that's > >>> explicitly mentioned in the commit log. > >> > >> The subject should have referred to the clearing of the device status rather than resets. > >> It originally came from this simpler patch in which I made use of reset instead of clear: > >> > >> https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@oregontracks.org/ > >> > >> So a rephrase of clearing in place of resets would be more appropriate. > >> > >> Then we added the notion of bridges…below > >> > >>> > >>> Also see the question below. > >>> > >>>> Link: https://lore.kernel.org/r/20201002184735.1229220-8-seanvk.dev@oregontracks.org > >>>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> > >>>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > >>>> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > >>>> --- > >>>> drivers/pci/pcie/err.c | 31 +++++++++++++++++++++++++------ > >>>> 1 file changed, 25 insertions(+), 6 deletions(-) > >>>> > >>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > >>>> index 8b53aecdb43d..7883c9791562 100644 > >>>> --- a/drivers/pci/pcie/err.c > >>>> +++ b/drivers/pci/pcie/err.c > >>>> @@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev, void *data) > >>>> > >>>> /** > >>>> * pci_walk_bridge - walk bridges potentially AER affected > >>>> - * @bridge: bridge which may be a Port > >>>> + * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs, > >>>> + * or an RCiEP associated with an RCEC > >>>> * @cb: callback to be called for each device found > >>>> * @userdata: arbitrary pointer to be passed to callback > >>>> * > >>>> * If the device provided is a bridge, walk the subordinate bus, including > >>>> * any bridged devices on buses under this bus. Call the provided callback > >>>> * on each device found. > >>>> + * > >>>> + * If the device provided has no subordinate bus, call the callback on the > >>>> + * device itself. > >>>> */ > >>>> static void pci_walk_bridge(struct pci_dev *bridge, > >>>> int (*cb)(struct pci_dev *, void *), > >>>> @@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev *bridge, > >>>> { > >>>> if (bridge->subordinate) > >>>> pci_walk_bus(bridge->subordinate, cb, userdata); > >>>> + else > >>>> + cb(bridge, userdata); > >>>> } > >>>> > >>>> pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >>>> @@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >>>> > >>>> /* > >>>> * Error recovery runs on all subordinates of the bridge. If the > >>>> - * bridge detected the error, it is cleared at the end. > >>>> + * bridge detected the error, it is cleared at the end. For RCiEPs > >>>> + * we should reset just the RCiEP itself. > >>>> */ > >>>> if (type == PCI_EXP_TYPE_ROOT_PORT || > >>>> - type == PCI_EXP_TYPE_DOWNSTREAM) > >>>> + type == PCI_EXP_TYPE_DOWNSTREAM || > >>>> + type == PCI_EXP_TYPE_RC_EC || > >>>> + type == PCI_EXP_TYPE_RC_END) > >>>> bridge = dev; > >>>> else > >>>> bridge = pci_upstream_bridge(dev); > >>>> @@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >>>> pci_dbg(bridge, "broadcast error_detected message\n"); > >>>> if (state == pci_channel_io_frozen) { > >>>> pci_walk_bridge(bridge, report_frozen_detected, &status); > >>>> + if (type == PCI_EXP_TYPE_RC_END) { > >>>> + pci_warn(dev, "subordinate device reset not possible for RCiEP\n"); > >>>> + status = PCI_ERS_RESULT_NONE; > >>>> + goto failed; > >>>> + } > >>>> + > >>>> status = reset_subordinates(bridge); > >>>> if (status != PCI_ERS_RESULT_RECOVERED) { > >>>> pci_warn(bridge, "subordinate device reset failed\n"); > >>>> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >>>> pci_dbg(bridge, "broadcast resume message\n"); > >>>> pci_walk_bridge(bridge, report_resume, &status); > >>>> > >>>> - if (pcie_aer_is_native(bridge)) > >>>> - pcie_clear_device_status(bridge); > >>>> - pci_aer_clear_nonfatal_status(bridge); > >>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || > >>>> + type == PCI_EXP_TYPE_DOWNSTREAM || > >>>> + type == PCI_EXP_TYPE_RC_EC) { > >>>> + if (pcie_aer_is_native(bridge)) > >>>> + pcie_clear_device_status(bridge); > >>>> + pci_aer_clear_nonfatal_status(bridge); > >>> > >>> This is hard to understand because "type" is from "dev", but "bridge" > >>> is not necessarily the same device. Should it be this? > >>> > >>> type = pci_pcie_type(bridge); > >>> if (type == PCI_EXP_TYPE_ROOT_PORT || > >>> ...) > >> > >> Correct, it would be better if the type was based on the ‘bridge’. > > > > OK. This is similar to > > https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@oregontracks.org/, > > which you cited above except for the bridge/dev question and the > > addition here of RC_EC. > > > > I tried to split that back into its own patch and started with the > > commit message from that patch. But I got stuck on the commit > > message. I got as far as: > > > > In some cases an error may be reported by a device not visible to > > the OS, e.g., if firmware manages the device and passes error > > information to the OS via ACPI APEI. > > > > But I still can't quite connect that to the patch. "bridge" is > > clearly a device visible to Linux. > > > > I guess we're trying to assert that if "bridge" is not a Root Port, > > Downstream Port, or RCEC, we shouldn't clear the error status because > > the error came from a device Linux doesn't know about. But I think > > "bridge" is *always* either a Root Port or a Downstream Port: > > That’s ultimately what we are trying to do. > > > if (type == PCI_EXP_TYPE_ROOT_PORT || > > type == PCI_EXP_TYPE_DOWNSTREAM) > > bridge = dev; > > else > > bridge = pci_upstream_bridge(dev); > > > > pci_upstream_bridge() returns either NULL (in which case previous uses > > dereference a NULL pointer), or dev->bus->self, which is always a Root > > Port, Switch Downstream Port, or Switch Upstream Port (or NULL for the > > special case of VFs). > > In the past recall we were augmenting it with bridge = dev->rcec for > RC_END. But we were able to relocate the handling in > aer_root_reset(). > > So in this patch - we add the conditionals because RC_END is being > passed in addition to RC_EC. > > if (type == PCI_EXP_TYPE_ROOT_PORT || > > - type == PCI_EXP_TYPE_DOWNSTREAM) > > + type == PCI_EXP_TYPE_DOWNSTREAM || > + type == PCI_EXP_TYPE_RC_EC || > + type == PCI_EXP_TYPE_RC_END) > > bridge = dev; > else > bridge = pci_upstream_bridge(dev); > > So we need to check for RP, DS, and RC_EC > > @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > > pci_dbg(bridge, "broadcast resume message\n"); > pci_walk_bridge(bridge, report_resume, &status); > > > - if (pcie_aer_is_native(bridge)) > - pcie_clear_device_status(bridge); > - pci_aer_clear_nonfatal_status(bridge); > > + if (type == PCI_EXP_TYPE_ROOT_PORT || > + type == PCI_EXP_TYPE_DOWNSTREAM || > + type == PCI_EXP_TYPE_RC_EC) { > + if (pcie_aer_is_native(bridge)) > + pcie_clear_device_status(bridge); > + pci_aer_clear_nonfatal_status(bridge); > + } > > Breaking out a separate patch would be unnecessary as you correctly > point out that it’s only going to be an RP or DS before this patch. Still trying to sort this out in my head, so half-baked questions before I quit for the day: We call pcie_do_recovery() in four paths: AER, APEI, DPC, and EDR, and I'm trying to understand what "dev" we pass in all those cases. For DPC, I think "dev" must be a downstream port that triggered DPC, and its secondary link is disabled. The device and any siblings have already been reset by the link going down. We want to clear AER status in downstream device(s) after they come back up (the AER status bits are sticky, so they're not cleared by the reset), and the AER status in the downstream port. I think EDR is the same as DPC? For AER, I think "dev" will typically (maybe always?) be the device that detected the error and logged it in its AER Capability, not the Root Port or RCEC that generated the interrupt. We want to reset "dev" and any siblings, clear AER status in "dev", and clear AER status in the Root Port or RCEC. For APEI, I assume "dev" is typically the device that detected the error, and we want to reset it and any siblings. Firmware has already read the AER status for "dev", and I assume firmware also clears it. I assume firmware is also responsible for clearing AER status in the Root Port, RCEC, or non-architected device that generated the interrupt. It seems like there are basically two devices of interest in pcie_do_recovery(): the endpoint where we have to call the driver error recovery, and the port that generated the interrupt. I wonder if this would make more sense if the caller passed both of them in explicitly instead of having pcie_do_recovery() check the type of "dev" and try to figure things out after the fact. Bjorn
On 11/30/20 4:25 PM, Bjorn Helgaas wrote:
> I think EDR is the same as DPC?
Yes, EDR is same as DPC.
> On Nov 30, 2020, at 4:25 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Mon, Nov 30, 2020 at 07:54:37PM +0000, Kelley, Sean V wrote: >>> On Nov 24, 2020, at 9:17 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: >>> On Mon, Nov 23, 2020 at 11:57:35PM +0000, Kelley, Sean V wrote: >>>>> On Nov 23, 2020, at 3:28 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: >>>>> On Fri, Nov 20, 2020 at 04:10:31PM -0800, Sean V Kelley wrote: >>>>>> In some cases a bridge may not exist as the hardware controlling may be >>>>>> handled only by firmware and so is not visible to the OS. This scenario is >>>>>> also possible in future use cases involving non-native use of RCECs by >>>>>> firmware. >>>>>> >>>>>> Explicitly apply conditional logic around these resets by limiting them to >>>>>> Root Ports and Downstream Ports. >>>>> >>>>> Can you help me understand this? The subject says "Limit AER resets" >>>>> and here you say "limit them to RPs and DPs", but it's not completely >>>>> obvious how the resets are being limited, i.e., the patch doesn't add >>>>> anything like: >>>>> >>>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>> + type == PCI_EXP_TYPE_DOWNSTREAM) >>>>> reset_subordinates(bridge); >>>>> >>>>> It *does* add checks around pcie_clear_device_status(), but that also >>>>> includes RC_EC. And that's not a reset, so I don't think that's >>>>> explicitly mentioned in the commit log. >>>> >>>> The subject should have referred to the clearing of the device status rather than resets. >>>> It originally came from this simpler patch in which I made use of reset instead of clear: >>>> >>>> https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@oregontracks.org/ >>>> >>>> So a rephrase of clearing in place of resets would be more appropriate. >>>> >>>> Then we added the notion of bridges…below >>>> >>>>> >>>>> Also see the question below. >>>>> >>>>>> Link: https://lore.kernel.org/r/20201002184735.1229220-8-seanvk.dev@oregontracks.org >>>>>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> >>>>>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> >>>>>> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >>>>>> --- >>>>>> drivers/pci/pcie/err.c | 31 +++++++++++++++++++++++++------ >>>>>> 1 file changed, 25 insertions(+), 6 deletions(-) >>>>>> >>>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >>>>>> index 8b53aecdb43d..7883c9791562 100644 >>>>>> --- a/drivers/pci/pcie/err.c >>>>>> +++ b/drivers/pci/pcie/err.c >>>>>> @@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev, void *data) >>>>>> >>>>>> /** >>>>>> * pci_walk_bridge - walk bridges potentially AER affected >>>>>> - * @bridge: bridge which may be a Port >>>>>> + * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs, >>>>>> + * or an RCiEP associated with an RCEC >>>>>> * @cb: callback to be called for each device found >>>>>> * @userdata: arbitrary pointer to be passed to callback >>>>>> * >>>>>> * If the device provided is a bridge, walk the subordinate bus, including >>>>>> * any bridged devices on buses under this bus. Call the provided callback >>>>>> * on each device found. >>>>>> + * >>>>>> + * If the device provided has no subordinate bus, call the callback on the >>>>>> + * device itself. >>>>>> */ >>>>>> static void pci_walk_bridge(struct pci_dev *bridge, >>>>>> int (*cb)(struct pci_dev *, void *), >>>>>> @@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev *bridge, >>>>>> { >>>>>> if (bridge->subordinate) >>>>>> pci_walk_bus(bridge->subordinate, cb, userdata); >>>>>> + else >>>>>> + cb(bridge, userdata); >>>>>> } >>>>>> >>>>>> pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> @@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> >>>>>> /* >>>>>> * Error recovery runs on all subordinates of the bridge. If the >>>>>> - * bridge detected the error, it is cleared at the end. >>>>>> + * bridge detected the error, it is cleared at the end. For RCiEPs >>>>>> + * we should reset just the RCiEP itself. >>>>>> */ >>>>>> if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>>> - type == PCI_EXP_TYPE_DOWNSTREAM) >>>>>> + type == PCI_EXP_TYPE_DOWNSTREAM || >>>>>> + type == PCI_EXP_TYPE_RC_EC || >>>>>> + type == PCI_EXP_TYPE_RC_END) >>>>>> bridge = dev; >>>>>> else >>>>>> bridge = pci_upstream_bridge(dev); >>>>>> @@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> pci_dbg(bridge, "broadcast error_detected message\n"); >>>>>> if (state == pci_channel_io_frozen) { >>>>>> pci_walk_bridge(bridge, report_frozen_detected, &status); >>>>>> + if (type == PCI_EXP_TYPE_RC_END) { >>>>>> + pci_warn(dev, "subordinate device reset not possible for RCiEP\n"); >>>>>> + status = PCI_ERS_RESULT_NONE; >>>>>> + goto failed; >>>>>> + } >>>>>> + >>>>>> status = reset_subordinates(bridge); >>>>>> if (status != PCI_ERS_RESULT_RECOVERED) { >>>>>> pci_warn(bridge, "subordinate device reset failed\n"); >>>>>> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> pci_dbg(bridge, "broadcast resume message\n"); >>>>>> pci_walk_bridge(bridge, report_resume, &status); >>>>>> >>>>>> - if (pcie_aer_is_native(bridge)) >>>>>> - pcie_clear_device_status(bridge); >>>>>> - pci_aer_clear_nonfatal_status(bridge); >>>>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>>> + type == PCI_EXP_TYPE_DOWNSTREAM || >>>>>> + type == PCI_EXP_TYPE_RC_EC) { >>>>>> + if (pcie_aer_is_native(bridge)) >>>>>> + pcie_clear_device_status(bridge); >>>>>> + pci_aer_clear_nonfatal_status(bridge); >>>>> >>>>> This is hard to understand because "type" is from "dev", but "bridge" >>>>> is not necessarily the same device. Should it be this? >>>>> >>>>> type = pci_pcie_type(bridge); >>>>> if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>> ...) >>>> >>>> Correct, it would be better if the type was based on the ‘bridge’. >>> >>> OK. This is similar to >>> https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@oregontracks.org/, >>> which you cited above except for the bridge/dev question and the >>> addition here of RC_EC. >>> >>> I tried to split that back into its own patch and started with the >>> commit message from that patch. But I got stuck on the commit >>> message. I got as far as: >>> >>> In some cases an error may be reported by a device not visible to >>> the OS, e.g., if firmware manages the device and passes error >>> information to the OS via ACPI APEI. >>> >>> But I still can't quite connect that to the patch. "bridge" is >>> clearly a device visible to Linux. >>> >>> I guess we're trying to assert that if "bridge" is not a Root Port, >>> Downstream Port, or RCEC, we shouldn't clear the error status because >>> the error came from a device Linux doesn't know about. But I think >>> "bridge" is *always* either a Root Port or a Downstream Port: >> >> That’s ultimately what we are trying to do. >> >>> if (type == PCI_EXP_TYPE_ROOT_PORT || >>> type == PCI_EXP_TYPE_DOWNSTREAM) >>> bridge = dev; >>> else >>> bridge = pci_upstream_bridge(dev); >>> >>> pci_upstream_bridge() returns either NULL (in which case previous uses >>> dereference a NULL pointer), or dev->bus->self, which is always a Root >>> Port, Switch Downstream Port, or Switch Upstream Port (or NULL for the >>> special case of VFs). >> >> In the past recall we were augmenting it with bridge = dev->rcec for >> RC_END. But we were able to relocate the handling in >> aer_root_reset(). >> >> So in this patch - we add the conditionals because RC_END is being >> passed in addition to RC_EC. >> >> if (type == PCI_EXP_TYPE_ROOT_PORT || >> >> - type == PCI_EXP_TYPE_DOWNSTREAM) >> >> + type == PCI_EXP_TYPE_DOWNSTREAM || >> + type == PCI_EXP_TYPE_RC_EC || >> + type == PCI_EXP_TYPE_RC_END) >> >> bridge = dev; >> else >> bridge = pci_upstream_bridge(dev); >> >> So we need to check for RP, DS, and RC_EC >> >> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> >> pci_dbg(bridge, "broadcast resume message\n"); >> pci_walk_bridge(bridge, report_resume, &status); >> >> >> - if (pcie_aer_is_native(bridge)) >> - pcie_clear_device_status(bridge); >> - pci_aer_clear_nonfatal_status(bridge); >> >> + if (type == PCI_EXP_TYPE_ROOT_PORT || >> + type == PCI_EXP_TYPE_DOWNSTREAM || >> + type == PCI_EXP_TYPE_RC_EC) { >> + if (pcie_aer_is_native(bridge)) >> + pcie_clear_device_status(bridge); >> + pci_aer_clear_nonfatal_status(bridge); >> + } >> >> Breaking out a separate patch would be unnecessary as you correctly >> point out that it’s only going to be an RP or DS before this patch. > > Still trying to sort this out in my head, so half-baked questions > before I quit for the day: We call pcie_do_recovery() in four paths: > AER, APEI, DPC, and EDR, and I'm trying to understand what "dev" we > pass in all those cases. > > For DPC, I think "dev" must be a downstream port that triggered DPC, > and its secondary link is disabled. The device and any siblings have > already been reset by the link going down. We want to clear AER > status in downstream device(s) after they come back up (the AER status > bits are sticky, so they're not cleared by the reset), and the AER > status in the downstream port. > > I think EDR is the same as DPC? > > For AER, I think "dev" will typically (maybe always?) be the device > that detected the error and logged it in its AER Capability, not the > Root Port or RCEC that generated the interrupt. We want to reset > "dev" and any siblings, clear AER status in "dev", and clear AER > status in the Root Port or RCEC. It’s also possible for RCEC’s to have errors themselves without a corresponding end point device. Sean > > For APEI, I assume "dev" is typically the device that detected the > error, and we want to reset it and any siblings. Firmware has already > read the AER status for "dev", and I assume firmware also clears it. > I assume firmware is also responsible for clearing AER status in the > Root Port, RCEC, or non-architected device that generated the > interrupt. > > It seems like there are basically two devices of interest in > pcie_do_recovery(): the endpoint where we have to call the driver > error recovery, and the port that generated the interrupt. I wonder > if this would make more sense if the caller passed both of them in > explicitly instead of having pcie_do_recovery() check the type of > "dev" and try to figure things out after the fact. > > Bjorn
Hi Bjorn, > On Nov 30, 2020, at 4:25 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Mon, Nov 30, 2020 at 07:54:37PM +0000, Kelley, Sean V wrote: >>> On Nov 24, 2020, at 9:17 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: >>> On Mon, Nov 23, 2020 at 11:57:35PM +0000, Kelley, Sean V wrote: >>>>> On Nov 23, 2020, at 3:28 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: >>>>> On Fri, Nov 20, 2020 at 04:10:31PM -0800, Sean V Kelley wrote: >>>>>> In some cases a bridge may not exist as the hardware controlling may be >>>>>> handled only by firmware and so is not visible to the OS. This scenario is >>>>>> also possible in future use cases involving non-native use of RCECs by >>>>>> firmware. >>>>>> >>>>>> Explicitly apply conditional logic around these resets by limiting them to >>>>>> Root Ports and Downstream Ports. >>>>> >>>>> Can you help me understand this? The subject says "Limit AER resets" >>>>> and here you say "limit them to RPs and DPs", but it's not completely >>>>> obvious how the resets are being limited, i.e., the patch doesn't add >>>>> anything like: >>>>> >>>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>> + type == PCI_EXP_TYPE_DOWNSTREAM) >>>>> reset_subordinates(bridge); >>>>> >>>>> It *does* add checks around pcie_clear_device_status(), but that also >>>>> includes RC_EC. And that's not a reset, so I don't think that's >>>>> explicitly mentioned in the commit log. >>>> >>>> The subject should have referred to the clearing of the device status rather than resets. >>>> It originally came from this simpler patch in which I made use of reset instead of clear: >>>> >>>> https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@oregontracks.org/ >>>> >>>> So a rephrase of clearing in place of resets would be more appropriate. >>>> >>>> Then we added the notion of bridges…below >>>> >>>>> >>>>> Also see the question below. >>>>> >>>>>> Link: https://lore.kernel.org/r/20201002184735.1229220-8-seanvk.dev@oregontracks.org >>>>>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> >>>>>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> >>>>>> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >>>>>> --- >>>>>> drivers/pci/pcie/err.c | 31 +++++++++++++++++++++++++------ >>>>>> 1 file changed, 25 insertions(+), 6 deletions(-) >>>>>> >>>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >>>>>> index 8b53aecdb43d..7883c9791562 100644 >>>>>> --- a/drivers/pci/pcie/err.c >>>>>> +++ b/drivers/pci/pcie/err.c >>>>>> @@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev, void *data) >>>>>> >>>>>> /** >>>>>> * pci_walk_bridge - walk bridges potentially AER affected >>>>>> - * @bridge: bridge which may be a Port >>>>>> + * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs, >>>>>> + * or an RCiEP associated with an RCEC >>>>>> * @cb: callback to be called for each device found >>>>>> * @userdata: arbitrary pointer to be passed to callback >>>>>> * >>>>>> * If the device provided is a bridge, walk the subordinate bus, including >>>>>> * any bridged devices on buses under this bus. Call the provided callback >>>>>> * on each device found. >>>>>> + * >>>>>> + * If the device provided has no subordinate bus, call the callback on the >>>>>> + * device itself. >>>>>> */ >>>>>> static void pci_walk_bridge(struct pci_dev *bridge, >>>>>> int (*cb)(struct pci_dev *, void *), >>>>>> @@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev *bridge, >>>>>> { >>>>>> if (bridge->subordinate) >>>>>> pci_walk_bus(bridge->subordinate, cb, userdata); >>>>>> + else >>>>>> + cb(bridge, userdata); >>>>>> } >>>>>> >>>>>> pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> @@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> >>>>>> /* >>>>>> * Error recovery runs on all subordinates of the bridge. If the >>>>>> - * bridge detected the error, it is cleared at the end. >>>>>> + * bridge detected the error, it is cleared at the end. For RCiEPs >>>>>> + * we should reset just the RCiEP itself. >>>>>> */ >>>>>> if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>>> - type == PCI_EXP_TYPE_DOWNSTREAM) >>>>>> + type == PCI_EXP_TYPE_DOWNSTREAM || >>>>>> + type == PCI_EXP_TYPE_RC_EC || >>>>>> + type == PCI_EXP_TYPE_RC_END) >>>>>> bridge = dev; >>>>>> else >>>>>> bridge = pci_upstream_bridge(dev); >>>>>> @@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> pci_dbg(bridge, "broadcast error_detected message\n"); >>>>>> if (state == pci_channel_io_frozen) { >>>>>> pci_walk_bridge(bridge, report_frozen_detected, &status); >>>>>> + if (type == PCI_EXP_TYPE_RC_END) { >>>>>> + pci_warn(dev, "subordinate device reset not possible for RCiEP\n"); >>>>>> + status = PCI_ERS_RESULT_NONE; >>>>>> + goto failed; >>>>>> + } >>>>>> + >>>>>> status = reset_subordinates(bridge); >>>>>> if (status != PCI_ERS_RESULT_RECOVERED) { >>>>>> pci_warn(bridge, "subordinate device reset failed\n"); >>>>>> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>>>> pci_dbg(bridge, "broadcast resume message\n"); >>>>>> pci_walk_bridge(bridge, report_resume, &status); >>>>>> >>>>>> - if (pcie_aer_is_native(bridge)) >>>>>> - pcie_clear_device_status(bridge); >>>>>> - pci_aer_clear_nonfatal_status(bridge); >>>>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>>> + type == PCI_EXP_TYPE_DOWNSTREAM || >>>>>> + type == PCI_EXP_TYPE_RC_EC) { >>>>>> + if (pcie_aer_is_native(bridge)) >>>>>> + pcie_clear_device_status(bridge); >>>>>> + pci_aer_clear_nonfatal_status(bridge); >>>>> >>>>> This is hard to understand because "type" is from "dev", but "bridge" >>>>> is not necessarily the same device. Should it be this? >>>>> >>>>> type = pci_pcie_type(bridge); >>>>> if (type == PCI_EXP_TYPE_ROOT_PORT || >>>>> ...) >>>> >>>> Correct, it would be better if the type was based on the ‘bridge’. >>> >>> OK. This is similar to >>> https://lore.kernel.org/linux-pci/20201002184735.1229220-8-seanvk.dev@oregontracks.org/, >>> which you cited above except for the bridge/dev question and the >>> addition here of RC_EC. >>> >>> I tried to split that back into its own patch and started with the >>> commit message from that patch. But I got stuck on the commit >>> message. I got as far as: >>> >>> In some cases an error may be reported by a device not visible to >>> the OS, e.g., if firmware manages the device and passes error >>> information to the OS via ACPI APEI. >>> >>> But I still can't quite connect that to the patch. "bridge" is >>> clearly a device visible to Linux. >>> >>> I guess we're trying to assert that if "bridge" is not a Root Port, >>> Downstream Port, or RCEC, we shouldn't clear the error status because >>> the error came from a device Linux doesn't know about. But I think >>> "bridge" is *always* either a Root Port or a Downstream Port: >> >> That’s ultimately what we are trying to do. >> >>> if (type == PCI_EXP_TYPE_ROOT_PORT || >>> type == PCI_EXP_TYPE_DOWNSTREAM) >>> bridge = dev; >>> else >>> bridge = pci_upstream_bridge(dev); >>> >>> pci_upstream_bridge() returns either NULL (in which case previous uses >>> dereference a NULL pointer), or dev->bus->self, which is always a Root >>> Port, Switch Downstream Port, or Switch Upstream Port (or NULL for the >>> special case of VFs). >> >> In the past recall we were augmenting it with bridge = dev->rcec for >> RC_END. But we were able to relocate the handling in >> aer_root_reset(). >> >> So in this patch - we add the conditionals because RC_END is being >> passed in addition to RC_EC. >> >> if (type == PCI_EXP_TYPE_ROOT_PORT || >> >> - type == PCI_EXP_TYPE_DOWNSTREAM) >> >> + type == PCI_EXP_TYPE_DOWNSTREAM || >> + type == PCI_EXP_TYPE_RC_EC || >> + type == PCI_EXP_TYPE_RC_END) >> >> bridge = dev; >> else >> bridge = pci_upstream_bridge(dev); >> >> So we need to check for RP, DS, and RC_EC >> >> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> >> pci_dbg(bridge, "broadcast resume message\n"); >> pci_walk_bridge(bridge, report_resume, &status); >> >> >> - if (pcie_aer_is_native(bridge)) >> - pcie_clear_device_status(bridge); >> - pci_aer_clear_nonfatal_status(bridge); >> >> + if (type == PCI_EXP_TYPE_ROOT_PORT || >> + type == PCI_EXP_TYPE_DOWNSTREAM || >> + type == PCI_EXP_TYPE_RC_EC) { >> + if (pcie_aer_is_native(bridge)) >> + pcie_clear_device_status(bridge); >> + pci_aer_clear_nonfatal_status(bridge); >> + } >> >> Breaking out a separate patch would be unnecessary as you correctly >> point out that it’s only going to be an RP or DS before this patch. > > Still trying to sort this out in my head, so half-baked questions > before I quit for the day: We call pcie_do_recovery() in four paths: > AER, APEI, DPC, and EDR, and I'm trying to understand what "dev" we > pass in all those cases. > > For DPC, I think "dev" must be a downstream port that triggered DPC, > and its secondary link is disabled. The device and any siblings have > already been reset by the link going down. We want to clear AER > status in downstream device(s) after they come back up (the AER status > bits are sticky, so they're not cleared by the reset), and the AER > status in the downstream port. > > I think EDR is the same as DPC? > > For AER, I think "dev" will typically (maybe always?) be the device > that detected the error and logged it in its AER Capability, not the > Root Port or RCEC that generated the interrupt. We want to reset > "dev" and any siblings, clear AER status in "dev", and clear AER > status in the Root Port or RCEC. > > For APEI, I assume "dev" is typically the device that detected the > error, and we want to reset it and any siblings. Firmware has already > read the AER status for "dev", and I assume firmware also clears it. > I assume firmware is also responsible for clearing AER status in the > Root Port, RCEC, or non-architected device that generated the > interrupt. > > It seems like there are basically two devices of interest in > pcie_do_recovery(): the endpoint where we have to call the driver > error recovery, and the port that generated the interrupt. I wonder > if this would make more sense if the caller passed both of them in > explicitly instead of having pcie_do_recovery() check the type of > "dev" and try to figure things out after the fact. On this last point I wanted to add that this is a possibility that could provide a clearer distinction, especially where actions need to be taken or not taken as a part of pcie_do_recovery(), i.e., bridge versus dev. In this patch series we have taken steps to minimize the need for the distinction by pushing the awareness into the driver’s error recovery routine, i.e., dev->rcec. A future evolution after this series could lead to both devices of interest being passed explicitly for the larger scope EDR/DPC/AER/etc. Thanks, Sean > > Bjorn
On Wed, Dec 02, 2020 at 08:53:54PM +0000, Kelley, Sean V wrote: > > On Nov 30, 2020, at 4:25 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Mon, Nov 30, 2020 at 07:54:37PM +0000, Kelley, Sean V wrote: > >> - if (pcie_aer_is_native(bridge)) > >> - pcie_clear_device_status(bridge); > >> - pci_aer_clear_nonfatal_status(bridge); > >> > >> + if (type == PCI_EXP_TYPE_ROOT_PORT || > >> + type == PCI_EXP_TYPE_DOWNSTREAM || > >> + type == PCI_EXP_TYPE_RC_EC) { > >> + if (pcie_aer_is_native(bridge)) > >> + pcie_clear_device_status(bridge); > >> + pci_aer_clear_nonfatal_status(bridge); > >> + } Back to this specific hunk, what if we made it this? struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); if (host->native_aer || pcie_ports_native) { pcie_clear_device_status(bridge); pci_aer_clear_nonfatal_status(bridge); } Previously, if "bridge" didn't have an AER Capability, we didn't pcie_clear_device_status(). In the case of a DPC bridge without AER, I think we *should* call pcie_clear_device_status(). Otherwise, I think this should work the same and would be a little simpler. > > It seems like there are basically two devices of interest in > > pcie_do_recovery(): the endpoint where we have to call the driver > > error recovery, and the port that generated the interrupt. I wonder > > if this would make more sense if the caller passed both of them in > > explicitly instead of having pcie_do_recovery() check the type of > > "dev" and try to figure things out after the fact. > > On this last point I wanted to add that this is a possibility that > could provide a clearer distinction, especially where actions need > to be taken or not taken as a part of pcie_do_recovery(), i.e., > bridge versus dev. In this patch series we have taken steps to > minimize the need for the distinction by pushing the awareness into > the driver’s error recovery routine, i.e., dev->rcec. A future > evolution after this series could lead to both devices of interest > being passed explicitly for the larger scope EDR/DPC/AER/etc. Yeah, not worth doing in *this* series. Bjorn
Hi Bjorn, > On Dec 2, 2020, at 1:27 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Dec 02, 2020 at 08:53:54PM +0000, Kelley, Sean V wrote: >>> On Nov 30, 2020, at 4:25 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: >>> On Mon, Nov 30, 2020 at 07:54:37PM +0000, Kelley, Sean V wrote: > >>>> - if (pcie_aer_is_native(bridge)) >>>> - pcie_clear_device_status(bridge); >>>> - pci_aer_clear_nonfatal_status(bridge); >>>> >>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || >>>> + type == PCI_EXP_TYPE_DOWNSTREAM || >>>> + type == PCI_EXP_TYPE_RC_EC) { >>>> + if (pcie_aer_is_native(bridge)) >>>> + pcie_clear_device_status(bridge); >>>> + pci_aer_clear_nonfatal_status(bridge); >>>> + } > > Back to this specific hunk, what if we made it this? > > struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > > if (host->native_aer || pcie_ports_native) { > pcie_clear_device_status(bridge); > pci_aer_clear_nonfatal_status(bridge); > } > > Previously, if "bridge" didn't have an AER Capability, we didn't > pcie_clear_device_status(). In the case of a DPC bridge without AER, > I think we *should* call pcie_clear_device_status(). Agree, I was overlooking DPC here with the AER check. > > Otherwise, I think this should work the same and would be a little > simpler. Looks fine to me. It simplifies it a bit. > >>> It seems like there are basically two devices of interest in >>> pcie_do_recovery(): the endpoint where we have to call the driver >>> error recovery, and the port that generated the interrupt. I wonder >>> if this would make more sense if the caller passed both of them in >>> explicitly instead of having pcie_do_recovery() check the type of >>> "dev" and try to figure things out after the fact. >> >> On this last point I wanted to add that this is a possibility that >> could provide a clearer distinction, especially where actions need >> to be taken or not taken as a part of pcie_do_recovery(), i.e., >> bridge versus dev. In this patch series we have taken steps to >> minimize the need for the distinction by pushing the awareness into >> the driver’s error recovery routine, i.e., dev->rcec. A future >> evolution after this series could lead to both devices of interest >> being passed explicitly for the larger scope EDR/DPC/AER/etc. > > Yeah, not worth doing in *this* series. > > Bjorn Thanks, Sean
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 8b53aecdb43d..7883c9791562 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev, void *data) /** * pci_walk_bridge - walk bridges potentially AER affected - * @bridge: bridge which may be a Port + * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs, + * or an RCiEP associated with an RCEC * @cb: callback to be called for each device found * @userdata: arbitrary pointer to be passed to callback * * If the device provided is a bridge, walk the subordinate bus, including * any bridged devices on buses under this bus. Call the provided callback * on each device found. + * + * If the device provided has no subordinate bus, call the callback on the + * device itself. */ static void pci_walk_bridge(struct pci_dev *bridge, int (*cb)(struct pci_dev *, void *), @@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev *bridge, { if (bridge->subordinate) pci_walk_bus(bridge->subordinate, cb, userdata); + else + cb(bridge, userdata); } pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, @@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, /* * Error recovery runs on all subordinates of the bridge. If the - * bridge detected the error, it is cleared at the end. + * bridge detected the error, it is cleared at the end. For RCiEPs + * we should reset just the RCiEP itself. */ if (type == PCI_EXP_TYPE_ROOT_PORT || - type == PCI_EXP_TYPE_DOWNSTREAM) + type == PCI_EXP_TYPE_DOWNSTREAM || + type == PCI_EXP_TYPE_RC_EC || + type == PCI_EXP_TYPE_RC_END) bridge = dev; else bridge = pci_upstream_bridge(dev); @@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(bridge, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bridge(bridge, report_frozen_detected, &status); + if (type == PCI_EXP_TYPE_RC_END) { + pci_warn(dev, "subordinate device reset not possible for RCiEP\n"); + status = PCI_ERS_RESULT_NONE; + goto failed; + } + status = reset_subordinates(bridge); if (status != PCI_ERS_RESULT_RECOVERED) { pci_warn(bridge, "subordinate device reset failed\n"); @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(bridge, "broadcast resume message\n"); pci_walk_bridge(bridge, report_resume, &status); - if (pcie_aer_is_native(bridge)) - pcie_clear_device_status(bridge); - pci_aer_clear_nonfatal_status(bridge); + if (type == PCI_EXP_TYPE_ROOT_PORT || + type == PCI_EXP_TYPE_DOWNSTREAM || + type == PCI_EXP_TYPE_RC_EC) { + if (pcie_aer_is_native(bridge)) + pcie_clear_device_status(bridge); + pci_aer_clear_nonfatal_status(bridge); + } pci_info(bridge, "device recovery successful\n"); return status;