diff mbox

[V3,9/9] powerpc/eeh: handle VF PE properly

Message ID 1430723258-21299-10-git-send-email-weiyang@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wei Yang May 4, 2015, 7:07 a.m. UTC
Compared with Bus PE, VF PE just has one single pci function. This
introduces the difference of error handling on a VF PE.

For example in the hotplug case, EEH needs to remove and re-create the VF
properly. In the case when PF's error_detected() disable SRIOV, this patch
introduces a flag to mark the eeh_dev of a VF to avoid the slot_reset() and
resume(). Since the FW is not ware of the VF, this patch handles the VF
restore/reset in kernel directly.

This patch is to handle the VF PE properly in these cases.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   |    1 +
 arch/powerpc/kernel/eeh.c        |    1 +
 arch/powerpc/kernel/eeh_driver.c |  108 ++++++++++++++++++++++++++++++--------
 arch/powerpc/kernel/eeh_pe.c     |    3 +-
 4 files changed, 90 insertions(+), 23 deletions(-)

Comments

Gavin Shan May 13, 2015, 1:16 a.m. UTC | #1
On Mon, May 04, 2015 at 03:07:38PM +0800, Wei Yang wrote:
>Compared with Bus PE, VF PE just has one single pci function. This
                                                 ^^^
>introduces the difference of error handling on a VF PE.
>

Lets have simple example to make the discussion easy: Suppose that one
particular PF has only one IOV BAR and we're going to enable 8 VFs. Also
the VF BAR size exceeds 64MB. For this case, we're going to have 2 VF
groups, which have 4 VFs. First of all, there are 8 PE numbers consumed
in total and lets say they're PE#(x) ... PE#(x+7).

For above case, the M64 runs in "single" mode. In other word, we just
need two M64 BARs here and they're owned by PE#x and PE#(x+1) separately.
Equally speaking, the relationship between PE# and VF index becomes as
below from MMIO's view: PE#x (VF#0/1/2/3), PE#x(VF#4/5/6/7). However,
this mapping isn't updated to PELTM. On the other hand, each VF has
separate PE# number as we update to PELTM. Lastly, we have to chain
PE#x/+1/+2/+3 and PE#x+4/5/6/7 because the VF group is sharing M64
resources. Hopefully, I didn't miss anything.

If above description is complete, I think you need expose VF (PE) group
to EEH. Potentially, I guess you need improve it later for VFIO so that
it can pass VF (PE) group to *same* guest. Generally speaking, the VF (PE)
group should be visible to EEH as single PE, which is similar to what I
did for "huge BAR" support where we have master/slave PE. The master PE
is exposed to EEH while the slave PEs are invisible from EEH. However,
the situation for VF (PE) group is different, but not too much: the
slave PEs (PE#x+1/2/3, PE#x+5/6/7 in this case) do contain PCI device.
So think you can something as below if you can't figure out better
solution:

- When the PE group is finalized in pcibios_fixup_sriov_enable(), you
  mark the master/slave PE accordingly, and put the slave PE to the
  master PE's slave list.
- When doing EEH probe on VF's edev, detach it to EEH with master PE#.

With above mechanism, each VF PE can potentially have multiple EEH devices,
not one single PCI function as you said in the commit log, which needs to
be changed accordingly.

>For example in the hotplug case, EEH needs to remove and re-create the VF
>properly. In the case when PF's error_detected() disable SRIOV, this patch
>introduces a flag to mark the eeh_dev of a VF to avoid the slot_reset() and
>resume(). Since the FW is not ware of the VF, this patch handles the VF
>restore/reset in kernel directly.
>
>This patch is to handle the VF PE properly in these cases.
>
>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
>---
> arch/powerpc/include/asm/eeh.h   |    1 +
> arch/powerpc/kernel/eeh.c        |    1 +
> arch/powerpc/kernel/eeh_driver.c |  108 ++++++++++++++++++++++++++++++--------
> arch/powerpc/kernel/eeh_pe.c     |    3 +-
> 4 files changed, 90 insertions(+), 23 deletions(-)
>
>diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
>index 78c8bec..43e8a24 100644
>--- a/arch/powerpc/include/asm/eeh.h
>+++ b/arch/powerpc/include/asm/eeh.h
>@@ -141,6 +141,7 @@ struct eeh_dev {
> 	struct pci_controller *phb;	/* Associated PHB		*/
> 	struct pci_dn *pdn;		/* Associated PCI device node	*/
> 	struct pci_dev *pdev;		/* Associated PCI device	*/
>+	int    in_error;		/* Error flag for eeh_dev	*/

Begining from "v2" (maybe I'm wrong about the revision), I suggested to
remove "in_error" because we already had similar flag EEH_DEV_NO_HANDLER.
I don't think it's different to merge them and I already told how to do
that. If you still find it's difficult to do it, I'm fine to keep it and
I'll fix it up later by myself.

> #ifdef CONFIG_PCI_IOV
> 	struct pci_dev *physfn;		/* Associated PF PORT		*/
> #endif /* CONFIG_PCI_IOV */
>diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>index 221e280..077c3d1 100644
>--- a/arch/powerpc/kernel/eeh.c
>+++ b/arch/powerpc/kernel/eeh.c
>@@ -1226,6 +1226,7 @@ void eeh_remove_device(struct pci_dev *dev)
> 	 * from the parent PE during the BAR resotre.
> 	 */
> 	edev->pdev = NULL;
>+	edev->in_error = 0;
> 	dev->dev.archdata.edev = NULL;
> 	if (!(edev->pe->state & EEH_PE_KEEP))
> 		eeh_rmv_from_parent_pe(edev);
>diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>index 89eb4bc..6e42cad 100644
>--- a/arch/powerpc/kernel/eeh_driver.c
>+++ b/arch/powerpc/kernel/eeh_driver.c
>@@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata)
> 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
> 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
>
>+	edev->in_error = 1;
> 	eeh_pcid_put(dev);
> 	return NULL;
> }
>@@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata)
>
> 	if (!driver->err_handler ||
> 	    !driver->err_handler->slot_reset ||
>-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
>+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
>+	    (!edev->in_error)) {
> 		eeh_pcid_put(dev);
> 		return NULL;
> 	}
>@@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata)
>
> 	if (!driver->err_handler ||
> 	    !driver->err_handler->resume ||
>-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
>+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
>+	    (!edev->in_error)) {
> 		edev->mode &= ~EEH_DEV_NO_HANDLER;
>-		eeh_pcid_put(dev);
>-		return NULL;
>+		goto out;
> 	}
>
> 	driver->err_handler->resume(dev);
>
>+out:
>+	edev->in_error = 0;
> 	eeh_pcid_put(dev);
> 	return NULL;
> }
>@@ -386,12 +390,42 @@ static void *eeh_report_failure(void *data, void *userdata)
> 	return NULL;
> }
>
>+#ifdef CONFIG_PCI_IOV
>+static void *eeh_add_virt_device(void *data, void *userdata)
>+{
>+	struct pci_driver *driver;
>+	struct eeh_dev *edev = (struct eeh_dev *)data;
>+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
>+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
>+
>+	if (!(edev->mode & EEH_DEV_VF)) {
>+		pr_warn("EEH: eeh_dev(%04x:%02x:%02x:%01x) is not a VF\n",
>+			edev->phb->global_number, pdn->busno,
>+			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
>+		return NULL;
>+	}

Please don't use logs starting with "EEH:" arbitrarily. It's assumed
for principle EEH logs. This one isn't a major one and print the function
name helps for debugging:

	if (!edev->physfn) {
		pr_warn("%s: EEH dev %04x:%02x:%02x:%01x not for VF\n",
			edev->phb->global_number, pdn->busno,
			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
		return NULL;
	}

>+
>+	driver = eeh_pcid_get(dev);
>+	if (driver) {
>+		eeh_pcid_put(dev);
>+		if (driver->err_handler)
>+			return NULL;
>+	}

dev and driver are NULL for those VFs that have been unplugged. For those
VFs weren't unplugged, driver and err_handler should be valid. The code
looks correct. However, for consistence, please use EEH_DEV_DISCONNECTED
that has been marked to those EEH devices which were unplugged. Do you
think it would be better?

	if (!(dev->flags & EEH_DEV_DISCONNECTED))
		return NULL;

>+
>+	pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
>+	return NULL;
>+}

>+#endif /* CONFIG_PCI_IOV */
>+
> static void *eeh_rmv_device(void *data, void *userdata)
> {
> 	struct pci_driver *driver;
> 	struct eeh_dev *edev = (struct eeh_dev *)data;
> 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
> 	int *removed = (int *)userdata;
>+#ifdef CONFIG_PCI_IOV
>+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
>+#endif

You don't have to have CONFIG_PCI_IOV here. I guess you probably remove
all CONFIG_PCI_IOV checks because the flag or conditions introduced
to edev/PE can tell they're VF sensitive or not.

>
> 	/*
> 	 * Actually, we should remove the PCI bridges as well.
>@@ -416,7 +450,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
> 	driver = eeh_pcid_get(dev);
> 	if (driver) {
> 		eeh_pcid_put(dev);
>-		if (driver->err_handler)
>+		if (removed && driver->err_handler)
> 			return NULL;
> 	}
>
>@@ -425,11 +459,21 @@ static void *eeh_rmv_device(void *data, void *userdata)
> 		 pci_name(dev));
> 	edev->bus = dev->bus;
> 	edev->mode |= EEH_DEV_DISCONNECTED;
>-	(*removed)++;
>-
>-	pci_lock_rescan_remove();
>-	pci_stop_and_remove_bus_device(dev);
>-	pci_unlock_rescan_remove();
>+	if (removed)
>+		(*removed)++;
>+
>+#ifdef CONFIG_PCI_IOV
>+	if (edev->mode & EEH_DEV_VF) {
>+		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
>+		edev->pdev = NULL;
>+		pdn->pe_number = IODA_INVALID_PE;

Setting the PE number to invalid one seems not correct because the PE
is still consumed by the VF's RID.

>+	} else
>+#endif

You can remove this CONFIG_PCI_IOV here since you already had EEH_DEV_VF
or "edev->physfn" as I said previously.

>+	{
>+		pci_lock_rescan_remove();
>+		pci_stop_and_remove_bus_device(dev);
>+		pci_unlock_rescan_remove();
>+	}
>
> 	return NULL;
> }
>@@ -548,6 +592,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
> 	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
> 	struct timeval tstamp;
> 	int cnt, rc, removed = 0;
>+	struct eeh_dev *edev;
>
> 	/* pcibios will clear the counter; save the value */
> 	cnt = pe->freeze_count;
>@@ -561,12 +606,15 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
> 	 */
> 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
> 	if (bus) {
>-		pci_lock_rescan_remove();
>-		pcibios_remove_pci_devices(bus);
>-		pci_unlock_rescan_remove();
>-	} else if (frozen_bus) {
>+		if (pe->type & EEH_PE_VF)
>+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
>+		else {
>+			pci_lock_rescan_remove();
>+			pcibios_remove_pci_devices(bus);
>+			pci_unlock_rescan_remove();
>+		}
>+	} else if (frozen_bus)
> 		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
>-	}
>
> 	/*
> 	 * Reset the pci controller. (Asserts RST#; resets config space).
>@@ -607,14 +655,26 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
> 		 * PE. We should disconnect it so the binding can be
> 		 * rebuilt when adding PCI devices.
> 		 */
>+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);

As above, VF PE can have multiple EEH devices.

> 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
>-		pcibios_add_pci_devices(bus);
>+#ifdef CONFIG_PCI_IOV
>+		if (pe->type & EEH_PE_VF)
>+			eeh_add_virt_device(edev, NULL);
>+		else
>+#endif
>+			pcibios_add_pci_devices(bus);
> 	} else if (frozen_bus && removed) {
> 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
> 		ssleep(5);
>
>+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
> 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
>-		pcibios_add_pci_devices(frozen_bus);
>+#ifdef CONFIG_PCI_IOV
>+		if (pe->type & EEH_PE_VF)
>+			eeh_add_virt_device(edev, NULL);
>+		else
>+#endif
>+			pcibios_add_pci_devices(frozen_bus);
> 	}
> 	eeh_pe_state_clear(pe, EEH_PE_KEEP);
>
>@@ -792,11 +852,15 @@ perm_error:
> 	 * the their PCI config any more.
> 	 */
> 	if (frozen_bus) {
>-		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>-
>-		pci_lock_rescan_remove();
>-		pcibios_remove_pci_devices(frozen_bus);
>-		pci_unlock_rescan_remove();
>+		if (pe->type & EEH_PE_VF) {
>+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
>+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>+		} else {
>+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>+			pci_lock_rescan_remove();
>+			pcibios_remove_pci_devices(frozen_bus);
>+			pci_unlock_rescan_remove();
>+		}
> 	}
> }
>
>diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>index edfe63a..3b8b32f 100644
>--- a/arch/powerpc/kernel/eeh_pe.c
>+++ b/arch/powerpc/kernel/eeh_pe.c
>@@ -916,7 +916,8 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
> 	if (pe->type & EEH_PE_PHB) {
> 		bus = pe->phb->bus;
> 	} else if (pe->type & EEH_PE_BUS ||
>-		   pe->type & EEH_PE_DEVICE) {
>+		   pe->type & EEH_PE_DEVICE ||
>+		   pe->type & EEH_PE_VF) {

I still have the concern: VF PE isn't supposed to have PE primary bus as the
PE contains single or multiple (virtual) functions. So you can't simply
return the "shared" bus to caller to apply hotplug on it or for other
purpose.

> 		if (pe->bus) {
> 			bus = pe->bus;
> 			goto out;

Thanks,
Gavin

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wei Yang May 14, 2015, 9:35 a.m. UTC | #2
On Wed, May 13, 2015 at 11:16:30AM +1000, Gavin Shan wrote:
>On Mon, May 04, 2015 at 03:07:38PM +0800, Wei Yang wrote:
>
>	if (!edev->physfn) {
>		pr_warn("%s: EEH dev %04x:%02x:%02x:%01x not for VF\n",
>			edev->phb->global_number, pdn->busno,
>			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
>		return NULL;
>	}
>
>>+
>>+	driver = eeh_pcid_get(dev);
>>+	if (driver) {
>>+		eeh_pcid_put(dev);
>>+		if (driver->err_handler)
>>+			return NULL;
>>+	}
>
>dev and driver are NULL for those VFs that have been unplugged. For those
>VFs weren't unplugged, driver and err_handler should be valid. The code
>looks correct. However, for consistence, please use EEH_DEV_DISCONNECTED
>that has been marked to those EEH devices which were unplugged. Do you
>think it would be better?
>
>	if (!(dev->flags & EEH_DEV_DISCONNECTED))
>		return NULL;
>

Hi, Gavin,

I think this is a nice idea, while this may not work.

We mark the DISCONNECTED flag when remove a PCI device, while before we do the
hot plug we will detach it from the tree and remove this flag in
eeh_pe_detach_dev().

This will leads to the VF not be hot plugged.
Wei Yang May 14, 2015, 10:02 a.m. UTC | #3
On Wed, May 13, 2015 at 11:16:30AM +1000, Gavin Shan wrote:
>> 	 * Actually, we should remove the PCI bridges as well.
>>@@ -416,7 +450,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
>> 	driver = eeh_pcid_get(dev);
>> 	if (driver) {
>> 		eeh_pcid_put(dev);
>>-		if (driver->err_handler)
>>+		if (removed && driver->err_handler)
>> 			return NULL;
>> 	}
>>
>>@@ -425,11 +459,21 @@ static void *eeh_rmv_device(void *data, void *userdata)
>> 		 pci_name(dev));
>> 	edev->bus = dev->bus;
>> 	edev->mode |= EEH_DEV_DISCONNECTED;
>>-	(*removed)++;
>>-
>>-	pci_lock_rescan_remove();
>>-	pci_stop_and_remove_bus_device(dev);
>>-	pci_unlock_rescan_remove();
>>+	if (removed)
>>+		(*removed)++;
>>+
>>+#ifdef CONFIG_PCI_IOV
>>+	if (edev->mode & EEH_DEV_VF) {
>>+		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
>>+		edev->pdev = NULL;
>>+		pdn->pe_number = IODA_INVALID_PE;
>
>Setting the PE number to invalid one seems not correct because the PE
>is still consumed by the VF's RID.
>

In commit 781a868f3136, we introduce the check of pdn->pe_number in
pnv_pci_dma_dev_setup(). Since VFs are create/released dynamically, we need to
delay the bind between PE and the device. Then to avoid rebind it, we check
the pdn->pe_number. The WARN_ON() is what you suggested.

So if don't clear the pe_number here, we break that rule.
Gavin Shan May 14, 2015, 12:15 p.m. UTC | #4
On Thu, May 14, 2015 at 05:35:31PM +0800, Wei Yang wrote:
>On Wed, May 13, 2015 at 11:16:30AM +1000, Gavin Shan wrote:
>>On Mon, May 04, 2015 at 03:07:38PM +0800, Wei Yang wrote:
>>
>>	if (!edev->physfn) {
>>		pr_warn("%s: EEH dev %04x:%02x:%02x:%01x not for VF\n",
>>			edev->phb->global_number, pdn->busno,
>>			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
>>		return NULL;
>>	}
>>
>>>+
>>>+	driver = eeh_pcid_get(dev);
>>>+	if (driver) {
>>>+		eeh_pcid_put(dev);
>>>+		if (driver->err_handler)
>>>+			return NULL;
>>>+	}
>>
>>dev and driver are NULL for those VFs that have been unplugged. For those
>>VFs weren't unplugged, driver and err_handler should be valid. The code
>>looks correct. However, for consistence, please use EEH_DEV_DISCONNECTED
>>that has been marked to those EEH devices which were unplugged. Do you
>>think it would be better?
>>
>>	if (!(dev->flags & EEH_DEV_DISCONNECTED))
>>		return NULL;
>>
>
>
>I think this is a nice idea, while this may not work.
>
>We mark the DISCONNECTED flag when remove a PCI device, while before we do the
>hot plug we will detach it from the tree and remove this flag in
>eeh_pe_detach_dev().
>
>This will leads to the VF not be hot plugged.
>

Ok, the following way, with the original idea improved for a bit, would
work for you:

- Don't clear DISCONNECTED flag in eeh_pe_detach_dev().
- Use the flag for your case.
- Clear DISCONNECTED flag after the hogplug is done, right before
  eeh_pe_state_clear(pe, EEH_PE_KEEP) in eeh_reset_device().

Thanks,
Gavin


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gavin Shan May 14, 2015, 12:30 p.m. UTC | #5
On Thu, May 14, 2015 at 06:02:50PM +0800, Wei Yang wrote:
>On Wed, May 13, 2015 at 11:16:30AM +1000, Gavin Shan wrote:
>>> 	 * Actually, we should remove the PCI bridges as well.
>>>@@ -416,7 +450,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>> 	driver = eeh_pcid_get(dev);
>>> 	if (driver) {
>>> 		eeh_pcid_put(dev);
>>>-		if (driver->err_handler)
>>>+		if (removed && driver->err_handler)
>>> 			return NULL;
>>> 	}
>>>
>>>@@ -425,11 +459,21 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>> 		 pci_name(dev));
>>> 	edev->bus = dev->bus;
>>> 	edev->mode |= EEH_DEV_DISCONNECTED;
>>>-	(*removed)++;
>>>-
>>>-	pci_lock_rescan_remove();
>>>-	pci_stop_and_remove_bus_device(dev);
>>>-	pci_unlock_rescan_remove();
>>>+	if (removed)
>>>+		(*removed)++;
>>>+
>>>+#ifdef CONFIG_PCI_IOV
>>>+	if (edev->mode & EEH_DEV_VF) {
>>>+		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
>>>+		edev->pdev = NULL;
>>>+		pdn->pe_number = IODA_INVALID_PE;
>>
>>Setting the PE number to invalid one seems not correct because the PE
>>is still consumed by the VF's RID.
>>
>
>In commit 781a868f3136, we introduce the check of pdn->pe_number in
>pnv_pci_dma_dev_setup(). Since VFs are create/released dynamically, we need to
>delay the bind between PE and the device. Then to avoid rebind it, we check
>the pdn->pe_number. The WARN_ON() is what you suggested.
>
>So if don't clear the pe_number here, we break that rule.
>

Ok. You choose one of the following two options:

- Remove WARN_ON() in pnv_pci_dma_dev_setup(). Actually, I don't think
  it's quite correct to update PE# to VF's pdn at this function. Instead,
  it would have been done at the SRIOV enablement backend. But it's not
  related to this patchset. You may improve it later.
- Keep the code you had: set the PE# to invalid one, and put some comments
  here as below.

  /*
   * We have to set the VF PE number to invalid one, which is required
   * to plug the VF successfully.
   */

Thanks,
Gavin
 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 78c8bec..43e8a24 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -141,6 +141,7 @@  struct eeh_dev {
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_dn *pdn;		/* Associated PCI device node	*/
 	struct pci_dev *pdev;		/* Associated PCI device	*/
+	int    in_error;		/* Error flag for eeh_dev	*/
 #ifdef CONFIG_PCI_IOV
 	struct pci_dev *physfn;		/* Associated PF PORT		*/
 #endif /* CONFIG_PCI_IOV */
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 221e280..077c3d1 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1226,6 +1226,7 @@  void eeh_remove_device(struct pci_dev *dev)
 	 * from the parent PE during the BAR resotre.
 	 */
 	edev->pdev = NULL;
+	edev->in_error = 0;
 	dev->dev.archdata.edev = NULL;
 	if (!(edev->pe->state & EEH_PE_KEEP))
 		eeh_rmv_from_parent_pe(edev);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 89eb4bc..6e42cad 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -211,6 +211,7 @@  static void *eeh_report_error(void *data, void *userdata)
 	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
 	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
 
+	edev->in_error = 1;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -282,7 +283,8 @@  static void *eeh_report_reset(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->slot_reset ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		eeh_pcid_put(dev);
 		return NULL;
 	}
@@ -339,14 +341,16 @@  static void *eeh_report_resume(void *data, void *userdata)
 
 	if (!driver->err_handler ||
 	    !driver->err_handler->resume ||
-	    (edev->mode & EEH_DEV_NO_HANDLER)) {
+	    (edev->mode & EEH_DEV_NO_HANDLER) ||
+	    (!edev->in_error)) {
 		edev->mode &= ~EEH_DEV_NO_HANDLER;
-		eeh_pcid_put(dev);
-		return NULL;
+		goto out;
 	}
 
 	driver->err_handler->resume(dev);
 
+out:
+	edev->in_error = 0;
 	eeh_pcid_put(dev);
 	return NULL;
 }
@@ -386,12 +390,42 @@  static void *eeh_report_failure(void *data, void *userdata)
 	return NULL;
 }
 
+#ifdef CONFIG_PCI_IOV
+static void *eeh_add_virt_device(void *data, void *userdata)
+{
+	struct pci_driver *driver;
+	struct eeh_dev *edev = (struct eeh_dev *)data;
+	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
+
+	if (!(edev->mode & EEH_DEV_VF)) {
+		pr_warn("EEH: eeh_dev(%04x:%02x:%02x:%01x) is not a VF\n",
+			edev->phb->global_number, pdn->busno,
+			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+		return NULL;
+	}
+
+	driver = eeh_pcid_get(dev);
+	if (driver) {
+		eeh_pcid_put(dev);
+		if (driver->err_handler)
+			return NULL;
+	}
+
+	pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
+	return NULL;
+}
+#endif /* CONFIG_PCI_IOV */
+
 static void *eeh_rmv_device(void *data, void *userdata)
 {
 	struct pci_driver *driver;
 	struct eeh_dev *edev = (struct eeh_dev *)data;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	int *removed = (int *)userdata;
+#ifdef CONFIG_PCI_IOV
+	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
+#endif
 
 	/*
 	 * Actually, we should remove the PCI bridges as well.
@@ -416,7 +450,7 @@  static void *eeh_rmv_device(void *data, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (driver) {
 		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (removed && driver->err_handler)
 			return NULL;
 	}
 
@@ -425,11 +459,21 @@  static void *eeh_rmv_device(void *data, void *userdata)
 		 pci_name(dev));
 	edev->bus = dev->bus;
 	edev->mode |= EEH_DEV_DISCONNECTED;
-	(*removed)++;
-
-	pci_lock_rescan_remove();
-	pci_stop_and_remove_bus_device(dev);
-	pci_unlock_rescan_remove();
+	if (removed)
+		(*removed)++;
+
+#ifdef CONFIG_PCI_IOV
+	if (edev->mode & EEH_DEV_VF) {
+		pci_iov_virtfn_remove(edev->physfn, pdn->vf_index, 0);
+		edev->pdev = NULL;
+		pdn->pe_number = IODA_INVALID_PE;
+	} else
+#endif
+	{
+		pci_lock_rescan_remove();
+		pci_stop_and_remove_bus_device(dev);
+		pci_unlock_rescan_remove();
+	}
 
 	return NULL;
 }
@@ -548,6 +592,7 @@  static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
 	struct timeval tstamp;
 	int cnt, rc, removed = 0;
+	struct eeh_dev *edev;
 
 	/* pcibios will clear the counter; save the value */
 	cnt = pe->freeze_count;
@@ -561,12 +606,15 @@  static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
 	if (bus) {
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(bus);
-		pci_unlock_rescan_remove();
-	} else if (frozen_bus) {
+		if (pe->type & EEH_PE_VF)
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+		else {
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(bus);
+			pci_unlock_rescan_remove();
+		}
+	} else if (frozen_bus)
 		eeh_pe_dev_traverse(pe, eeh_rmv_device, &removed);
-	}
 
 	/*
 	 * Reset the pci controller. (Asserts RST#; resets config space).
@@ -607,14 +655,26 @@  static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 		 * PE. We should disconnect it so the binding can be
 		 * rebuilt when adding PCI devices.
 		 */
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(bus);
+#ifdef CONFIG_PCI_IOV
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+#endif
+			pcibios_add_pci_devices(bus);
 	} else if (frozen_bus && removed) {
 		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
 		ssleep(5);
 
+		edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
 		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-		pcibios_add_pci_devices(frozen_bus);
+#ifdef CONFIG_PCI_IOV
+		if (pe->type & EEH_PE_VF)
+			eeh_add_virt_device(edev, NULL);
+		else
+#endif
+			pcibios_add_pci_devices(frozen_bus);
 	}
 	eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -792,11 +852,15 @@  perm_error:
 	 * the their PCI config any more.
 	 */
 	if (frozen_bus) {
-		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
-
-		pci_lock_rescan_remove();
-		pcibios_remove_pci_devices(frozen_bus);
-		pci_unlock_rescan_remove();
+		if (pe->type & EEH_PE_VF) {
+			eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+		} else {
+			eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+			pci_lock_rescan_remove();
+			pcibios_remove_pci_devices(frozen_bus);
+			pci_unlock_rescan_remove();
+		}
 	}
 }
 
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index edfe63a..3b8b32f 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -916,7 +916,8 @@  struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
 	if (pe->type & EEH_PE_PHB) {
 		bus = pe->phb->bus;
 	} else if (pe->type & EEH_PE_BUS ||
-		   pe->type & EEH_PE_DEVICE) {
+		   pe->type & EEH_PE_DEVICE ||
+		   pe->type & EEH_PE_VF) {
 		if (pe->bus) {
 			bus = pe->bus;
 			goto out;