diff mbox

[4/4] drivers/pci/hotplug: Support surprise hotplug

Message ID 1474460130-8316-5-git-send-email-gwshan@linux.vnet.ibm.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Gavin Shan Sept. 21, 2016, 12:15 p.m. UTC
This supports PCI surprise hotplug. The design is highlighted as
below:

   * The PCI slot's surprise hotplug capability is exposed through
     device node property "ibm,slot-surprise-pluggable", meaning
     PCI surprise hotplug will be disabled if skiboot doesn't support
     it yet.
   * The interrupt because of presence or link state change is raised
     on surprise hotplug event. One event is allocated and queued to
     the PCI slot for workqueue to pick it up and process in serialized
     fashion. The code flow for surprise hotplug is same to that for
     managed hotplug except: the affected PEs are put into frozen state
     to avoid unexpected EEH error reporting in surprise hot remove path.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pnv-pci.h |   9 ++
 drivers/pci/hotplug/pnv_php.c      | 219 +++++++++++++++++++++++++++++++++++++
 2 files changed, 228 insertions(+)

Comments

Bjorn Helgaas Sept. 21, 2016, 4:57 p.m. UTC | #1
Hi Gavin,

You don't need my ack for any of these, and I assume you'll merge them
through the powerpc tree.

Minor comments below, feel free to ignore them.

On Wed, Sep 21, 2016 at 10:15:30PM +1000, Gavin Shan wrote:
> ...
> @@ -536,9 +565,16 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
>  	if (unlikely(!php_slot))
>  		return NULL;
>  
> +	php_slot->event = kzalloc(sizeof(struct pnv_php_event), GFP_KERNEL);
> +	if (unlikely(!php_slot->event)) {
> +		kfree(php_slot);
> +		return NULL;
> +	}

Since you *always* allocate the event when allocating the php_slot,
making the event a member of php_slot (instead of keeping a pointer to
it) would simplify your memory management a bit.

It seems to be the style in this file to use "unlikely" liberally, but
I really doubt there's any performance consideration in this code.  To
me it adds more clutter than usefulness.

> +static irqreturn_t pnv_php_interrupt(int irq, void *data)
> +{
> +	struct pnv_php_slot *php_slot = data;
> +	struct pci_dev *pchild, *pdev = php_slot->pdev;
> +	struct eeh_dev *edev;
> +	struct eeh_pe *pe;
> +	struct pnv_php_event *event;
> +	u16 sts, lsts;
> +	u8 presence;
> +	bool added;
> +	unsigned long flags;
> +	int ret;
> +
> +	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &sts);
> +	sts &= (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
> +	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, sts);

I didn't realize that this is some sort of hybrid of native PCIe
hotplug and PowerNV-specific stuff.  Wonder if there's any opportunity
to combine with or leverage pciehp.  That seems pretty blue-sky
though, since there's so much PowerNV special sauce here.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Ellerman Sept. 22, 2016, 10:51 a.m. UTC | #2
Bjorn Helgaas <helgaas@kernel.org> writes:

> Hi Gavin,
>
> You don't need my ack for any of these, and I assume you'll merge them
> through the powerpc tree.

Thanks Bjorn, I wasn't sure if you wanted to ack it or not. I'll take
the whole series via the powerpc tree.

> Minor comments below, feel free to ignore them.
>
> On Wed, Sep 21, 2016 at 10:15:30PM +1000, Gavin Shan wrote:
>> ...
>> @@ -536,9 +565,16 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
>>  	if (unlikely(!php_slot))
>>  		return NULL;
>>  
>> +	php_slot->event = kzalloc(sizeof(struct pnv_php_event), GFP_KERNEL);
>> +	if (unlikely(!php_slot->event)) {
>> +		kfree(php_slot);
>> +		return NULL;
>> +	}
>
> Since you *always* allocate the event when allocating the php_slot,
> making the event a member of php_slot (instead of keeping a pointer to
> it) would simplify your memory management a bit.

> It seems to be the style in this file to use "unlikely" liberally, but
> I really doubt there's any performance consideration in this code.  To
> me it adds more clutter than usefulness.

Agreed on both counts. Gavin can you do another spin with those changes,
thanks.

cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gavin Shan Sept. 26, 2016, 1:08 p.m. UTC | #3
On Wed, Sep 21, 2016 at 11:57:03AM -0500, Bjorn Helgaas wrote:
>Hi Gavin,
>
>You don't need my ack for any of these, and I assume you'll merge them
>through the powerpc tree.
>
>Minor comments below, feel free to ignore them.
>
>On Wed, Sep 21, 2016 at 10:15:30PM +1000, Gavin Shan wrote:
>> ...
>> @@ -536,9 +565,16 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
>>  	if (unlikely(!php_slot))
>>  		return NULL;
>>  
>> +	php_slot->event = kzalloc(sizeof(struct pnv_php_event), GFP_KERNEL);
>> +	if (unlikely(!php_slot->event)) {
>> +		kfree(php_slot);
>> +		return NULL;
>> +	}
>
>Since you *always* allocate the event when allocating the php_slot,
>making the event a member of php_slot (instead of keeping a pointer to
>it) would simplify your memory management a bit.
>
>It seems to be the style in this file to use "unlikely" liberally, but
>I really doubt there's any performance consideration in this code.  To
>me it adds more clutter than usefulness.
>
>> +static irqreturn_t pnv_php_interrupt(int irq, void *data)
>> +{
>> +	struct pnv_php_slot *php_slot = data;
>> +	struct pci_dev *pchild, *pdev = php_slot->pdev;
>> +	struct eeh_dev *edev;
>> +	struct eeh_pe *pe;
>> +	struct pnv_php_event *event;
>> +	u16 sts, lsts;
>> +	u8 presence;
>> +	bool added;
>> +	unsigned long flags;
>> +	int ret;
>> +
>> +	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &sts);
>> +	sts &= (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
>> +	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, sts);
>
>I didn't realize that this is some sort of hybrid of native PCIe
>hotplug and PowerNV-specific stuff.  Wonder if there's any opportunity
>to combine with or leverage pciehp.  That seems pretty blue-sky
>though, since there's so much PowerNV special sauce here.
>

Bjorn, thanks a lot for your comments. All comments except last one
(leverage pciehp) are covered in v2 which wasn't copied to linux-pci@
list to avoid unnecessary traffic. Yeah, the driver is too much PowerNV
platform specific things, which makes it hard to be built on top of
pciehp.

Thanks,
Gavin

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Sept. 26, 2016, 1:11 p.m. UTC | #4
On Mon, Sep 26, 2016 at 11:08:02PM +1000, Gavin Shan wrote:
> On Wed, Sep 21, 2016 at 11:57:03AM -0500, Bjorn Helgaas wrote:
> >Hi Gavin,
> >
> >You don't need my ack for any of these, and I assume you'll merge them
> >through the powerpc tree.
> >
> >Minor comments below, feel free to ignore them.
> >
> >On Wed, Sep 21, 2016 at 10:15:30PM +1000, Gavin Shan wrote:
> >> ...
> >> @@ -536,9 +565,16 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
> >>  	if (unlikely(!php_slot))
> >>  		return NULL;
> >>  
> >> +	php_slot->event = kzalloc(sizeof(struct pnv_php_event), GFP_KERNEL);
> >> +	if (unlikely(!php_slot->event)) {
> >> +		kfree(php_slot);
> >> +		return NULL;
> >> +	}
> >
> >Since you *always* allocate the event when allocating the php_slot,
> >making the event a member of php_slot (instead of keeping a pointer to
> >it) would simplify your memory management a bit.
> >
> >It seems to be the style in this file to use "unlikely" liberally, but
> >I really doubt there's any performance consideration in this code.  To
> >me it adds more clutter than usefulness.
> >
> >> +static irqreturn_t pnv_php_interrupt(int irq, void *data)
> >> +{
> >> +	struct pnv_php_slot *php_slot = data;
> >> +	struct pci_dev *pchild, *pdev = php_slot->pdev;
> >> +	struct eeh_dev *edev;
> >> +	struct eeh_pe *pe;
> >> +	struct pnv_php_event *event;
> >> +	u16 sts, lsts;
> >> +	u8 presence;
> >> +	bool added;
> >> +	unsigned long flags;
> >> +	int ret;
> >> +
> >> +	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &sts);
> >> +	sts &= (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
> >> +	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, sts);
> >
> >I didn't realize that this is some sort of hybrid of native PCIe
> >hotplug and PowerNV-specific stuff.  Wonder if there's any opportunity
> >to combine with or leverage pciehp.  That seems pretty blue-sky
> >though, since there's so much PowerNV special sauce here.
> >
> 
> Bjorn, thanks a lot for your comments. All comments except last one
> (leverage pciehp) are covered in v2 which wasn't copied to linux-pci@
> list to avoid unnecessary traffic. Yeah, the driver is too much PowerNV
> platform specific things, which makes it hard to be built on top of
> pciehp.

Sounds good, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/pnv-pci.h b/arch/powerpc/include/asm/pnv-pci.h
index 0cbd813..4ccd2b4 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -48,6 +48,12 @@  void pnv_cxl_phb_set_peer_afu(struct pci_dev *dev, struct cxl_afu *afu);
 
 #endif
 
+struct pnv_php_event {
+	bool			added;
+	struct pnv_php_slot	*php_slot;
+	struct work_struct	work;
+};
+
 struct pnv_php_slot {
 	struct hotplug_slot		slot;
 	struct hotplug_slot_info	slot_info;
@@ -60,6 +66,9 @@  struct pnv_php_slot {
 #define PNV_PHP_STATE_POPULATED		2
 #define PNV_PHP_STATE_OFFLINE		3
 	int				state;
+	int				irq;
+	struct workqueue_struct		*wq;
+	struct pnv_php_event		*event;
 	struct device_node		*dn;
 	struct pci_dev			*pdev;
 	struct pci_bus			*bus;
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 21f1f9d..0358aa7 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -30,13 +30,42 @@  static void pnv_php_register(struct device_node *dn);
 static void pnv_php_unregister_one(struct device_node *dn);
 static void pnv_php_unregister(struct device_node *dn);
 
+static void pnv_php_disable_irq(struct pnv_php_slot *php_slot)
+{
+	struct pci_dev *pdev = php_slot->pdev;
+	u16 ctrl;
+
+	if (php_slot->irq > 0) {
+		pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &ctrl);
+		ctrl &= ~(PCI_EXP_SLTCTL_HPIE |
+			  PCI_EXP_SLTCTL_PDCE |
+			  PCI_EXP_SLTCTL_DLLSCE);
+		pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, ctrl);
+
+		free_irq(php_slot->irq, php_slot);
+		php_slot->irq = 0;
+	}
+
+	if (php_slot->wq) {
+		destroy_workqueue(php_slot->wq);
+		php_slot->wq = NULL;
+	}
+
+	if (pdev->msix_enabled)
+		pci_disable_msix(pdev);
+	else if (pdev->msi_enabled)
+		pci_disable_msi(pdev);
+}
+
 static void pnv_php_free_slot(struct kref *kref)
 {
 	struct pnv_php_slot *php_slot = container_of(kref,
 					struct pnv_php_slot, kref);
 
 	WARN_ON(!list_empty(&php_slot->children));
+	pnv_php_disable_irq(php_slot);
 	kfree(php_slot->name);
+	kfree(php_slot->event);
 	kfree(php_slot);
 }
 
@@ -536,9 +565,16 @@  static struct pnv_php_slot *pnv_php_alloc_slot(struct device_node *dn)
 	if (unlikely(!php_slot))
 		return NULL;
 
+	php_slot->event = kzalloc(sizeof(struct pnv_php_event), GFP_KERNEL);
+	if (unlikely(!php_slot->event)) {
+		kfree(php_slot);
+		return NULL;
+	}
+
 	php_slot->name = kstrdup(label, GFP_KERNEL);
 	if (unlikely(!php_slot->name)) {
 		kfree(php_slot);
+		kfree(php_slot->event);
 		return NULL;
 	}
 
@@ -616,6 +652,184 @@  static int pnv_php_register_slot(struct pnv_php_slot *php_slot)
 	return 0;
 }
 
+static int pnv_php_enable_msix(struct pnv_php_slot *php_slot)
+{
+	struct pci_dev *pdev = php_slot->pdev;
+	struct msix_entry entry;
+	int nr_entries, ret;
+	u16 pcie_flag;
+
+	/* Get total number of MSIx entries */
+	nr_entries = pci_msix_vec_count(pdev);
+	if (nr_entries < 0)
+		return nr_entries;
+
+	/* Check hotplug MSIx entry is in range */
+	pcie_capability_read_word(pdev, PCI_EXP_FLAGS, &pcie_flag);
+	entry.entry = (pcie_flag & PCI_EXP_FLAGS_IRQ) >> 9;
+	if (entry.entry >= nr_entries)
+		return -ERANGE;
+
+	/* Enable MSIx */
+	ret = pci_enable_msix_exact(pdev, &entry, 1);
+	if (ret) {
+		dev_warn(&pdev->dev, "Error %d enabling MSIx\n", ret);
+		return ret;
+	}
+
+	return entry.vector;
+}
+
+static void pnv_php_event_handler(struct work_struct *work)
+{
+	struct pnv_php_event *event =
+		container_of(work, struct pnv_php_event, work);
+	struct pnv_php_slot *php_slot = event->php_slot;
+
+	if (event->added)
+		pnv_php_enable_slot(&php_slot->slot);
+	else
+		pnv_php_disable_slot(&php_slot->slot);
+
+	kfree(event);
+}
+
+static irqreturn_t pnv_php_interrupt(int irq, void *data)
+{
+	struct pnv_php_slot *php_slot = data;
+	struct pci_dev *pchild, *pdev = php_slot->pdev;
+	struct eeh_dev *edev;
+	struct eeh_pe *pe;
+	struct pnv_php_event *event;
+	u16 sts, lsts;
+	u8 presence;
+	bool added;
+	unsigned long flags;
+	int ret;
+
+	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &sts);
+	sts &= (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
+	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, sts);
+	if (sts & PCI_EXP_SLTSTA_DLLSC) {
+		pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lsts);
+		added = !!(lsts & PCI_EXP_LNKSTA_DLLLA);
+	} else if (sts & PCI_EXP_SLTSTA_PDC) {
+		ret = pnv_pci_get_presence_state(php_slot->id, &presence);
+		if (!ret)
+			return IRQ_HANDLED;
+		added = !!(presence == OPAL_PCI_SLOT_PRESENT);
+	} else {
+		return IRQ_NONE;
+	}
+
+	/* We need freeze the PE in PCI hot remove case. The event that
+	 * was allocated in advanced is used. The time to allocate the
+	 * memory for the event is saved so that we can freeze the PE
+	 * quikly as possible.
+	 */
+	event = php_slot->event;
+	php_slot->event = NULL;
+	if (unlikely(!event))
+		event = kzalloc(sizeof(*event), GFP_ATOMIC);
+	if (unlikely(!event)) {
+		dev_warn(&pdev->dev, "PCI slot [%s] missed hotplug event 0x%04x\n",
+			 php_slot->name, sts);
+		return IRQ_HANDLED;
+	}
+
+	if (!added) {
+		pchild = list_first_entry_or_null(&php_slot->bus->devices,
+						  struct pci_dev, bus_list);
+		edev = pchild ? pci_dev_to_eeh_dev(pchild) : NULL;
+		pe = edev ? edev->pe : NULL;
+		if (likely(pe)) {
+			eeh_serialize_lock(&flags);
+			eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
+			eeh_serialize_unlock(flags);
+			eeh_pe_set_option(pe, EEH_OPT_FREEZE_PE);
+		}
+	}
+
+	dev_info(&pdev->dev, "PCI slot [%s] %s (IRQ: %d)\n",
+		 php_slot->name, added ? "added" : "removed", irq);
+	php_slot->event = kzalloc(sizeof(*event), GFP_ATOMIC);
+	INIT_WORK(&event->work, pnv_php_event_handler);
+	event->added = added;
+	event->php_slot = php_slot;
+	queue_work(php_slot->wq, &event->work);
+
+	return IRQ_HANDLED;
+}
+
+static void pnv_php_init_irq(struct pnv_php_slot *php_slot, int irq)
+{
+	struct pci_dev *pdev = php_slot->pdev;
+	u16 sts, ctrl;
+	int ret;
+
+	/* Allocate workqueue */
+	php_slot->wq = alloc_workqueue("pciehp-%s", 0, 0, php_slot->name);
+	if (!php_slot->wq) {
+		dev_warn(&pdev->dev, "Cannot alloc workqueue\n");
+		pnv_php_disable_irq(php_slot);
+		return;
+	}
+
+	/* Clear pending interrupts */
+	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &sts);
+	sts |= (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
+	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, sts);
+
+	/* Request the interrupt */
+	ret = request_irq(irq, pnv_php_interrupt, IRQF_SHARED,
+			  php_slot->name, php_slot);
+	if (ret) {
+		pnv_php_disable_irq(php_slot);
+		dev_warn(&pdev->dev, "Error %d enabling IRQ %d\n", ret, irq);
+		return;
+	}
+
+	/* Enable the interrupts */
+	pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &ctrl);
+	ctrl |= (PCI_EXP_SLTCTL_HPIE |
+		 PCI_EXP_SLTCTL_PDCE |
+		 PCI_EXP_SLTCTL_DLLSCE);
+	pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, ctrl);
+
+	/* The interrupt is initialized successfully when @irq is valid */
+	php_slot->irq = irq;
+}
+
+static void pnv_php_enable_irq(struct pnv_php_slot *php_slot)
+{
+	struct pci_dev *pdev = php_slot->pdev;
+	int irq, ret;
+
+	ret = pci_enable_device(pdev);
+	if (ret) {
+		dev_warn(&pdev->dev, "Error %d enabling device\n", ret);
+		return;
+	}
+
+	pci_set_master(pdev);
+
+	/* Enable MSIx interrupt */
+	irq = pnv_php_enable_msix(php_slot);
+	if (irq > 0) {
+		pnv_php_init_irq(php_slot, irq);
+		return;
+	}
+
+	/* Use MSI if MSIx doesn't work. Fail back to legacy INTx
+	 * if MSI doesn't work either
+	 */
+	ret = pci_enable_msi(pdev);
+	if (!ret || pdev->irq) {
+		irq = pdev->irq;
+		pnv_php_init_irq(php_slot, irq);
+	}
+}
+
 static int pnv_php_register_one(struct device_node *dn)
 {
 	struct pnv_php_slot *php_slot;
@@ -643,6 +857,11 @@  static int pnv_php_register_one(struct device_node *dn)
 	if (unlikely(ret))
 		goto unregister_slot;
 
+	/* Enable interrupt if the slot supports surprise hotplug */
+	prop32 = of_get_property(dn, "ibm,slot-surprise-pluggable", NULL);
+	if (prop32 && of_read_number(prop32, 1))
+		pnv_php_enable_irq(php_slot);
+
 	return 0;
 
 unregister_slot: