diff mbox

pciehp: Fix race condition handling surprise link-down

Message ID 1481317564-18045-1-git-send-email-ashok.raj@intel.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Ashok Raj Dec. 9, 2016, 9:06 p.m. UTC
Changes from v1:
	Address comments from Bjorn:
		Added p_slot->lock mutex around changes to p_slot->state
		Updated commit message to call out mutex names

A surprise link down may retrain very quickly, causing the same slot to
generate a link up event before handling the link down completes.

Since the link is active, the power off work queued from the first link
down will cause a second down event when the power is disabled. The second
down event should be ignored because the slot is already powering off;
however, the "link up" event sets the slot state to POWERON before the
event to handle this is enqueued, making the second down event believe
it needs to do something. This creates a constant link up and down
event cycle.

This patch fixes that by setting the p_slot->state only when the work to
handle the power event is executing, protected by the p_slot->hotplug_lock.

To: Bjorn Helgass <bhelgaas@google.com>
Cc: linux-kernel@vger.kernel.org
Cc: Keith Busch <keith.busch@intel.com>

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/hotplug/pciehp_ctrl.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Ashok Raj Jan. 11, 2017, 7:04 p.m. UTC | #1
Hi Keith

did you have a chance to look at this?

Cheers,
Ashok

On Fri, Dec 09, 2016 at 01:06:04PM -0800, Ashok Raj wrote:
> Changes from v1:
> 	Address comments from Bjorn:
> 		Added p_slot->lock mutex around changes to p_slot->state
> 		Updated commit message to call out mutex names
> 
> A surprise link down may retrain very quickly, causing the same slot to
> generate a link up event before handling the link down completes.
> 
> Since the link is active, the power off work queued from the first link
> down will cause a second down event when the power is disabled. The second
> down event should be ignored because the slot is already powering off;
> however, the "link up" event sets the slot state to POWERON before the
> event to handle this is enqueued, making the second down event believe
> it needs to do something. This creates a constant link up and down
> event cycle.
> 
> This patch fixes that by setting the p_slot->state only when the work to
> handle the power event is executing, protected by the p_slot->hotplug_lock.
> 
> To: Bjorn Helgass <bhelgaas@google.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Keith Busch <keith.busch@intel.com>
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Reviewed-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/hotplug/pciehp_ctrl.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index ec0b4c1..4cf4772 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -182,6 +182,9 @@ static void pciehp_power_thread(struct work_struct *work)
>  	switch (info->req) {
>  	case DISABLE_REQ:
>  		mutex_lock(&p_slot->hotplug_lock);
> +		mutex_lock(&p_slot->lock);
> +		p_slot->state = POWEROFF_STATE;
> +		mutex_unlock(&p_slot->lock);
>  		pciehp_disable_slot(p_slot);
>  		mutex_unlock(&p_slot->hotplug_lock);
>  		mutex_lock(&p_slot->lock);
> @@ -190,6 +193,9 @@ static void pciehp_power_thread(struct work_struct *work)
>  		break;
>  	case ENABLE_REQ:
>  		mutex_lock(&p_slot->hotplug_lock);
> +		mutex_lock(&p_slot->lock);
> +		p_slot->state = POWERON_STATE;
> +		mutex_unlock(&p_slot->lock);
>  		ret = pciehp_enable_slot(p_slot);
>  		mutex_unlock(&p_slot->hotplug_lock);
>  		if (ret)
> @@ -209,8 +215,6 @@ static void pciehp_queue_power_work(struct slot *p_slot, int req)
>  {
>  	struct power_work_info *info;
>  
> -	p_slot->state = (req == ENABLE_REQ) ? POWERON_STATE : POWEROFF_STATE;
> -
>  	info = kmalloc(sizeof(*info), GFP_KERNEL);
>  	if (!info) {
>  		ctrl_err(p_slot->ctrl, "no memory to queue %s request\n",
> -- 
> 2.7.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ashok Raj Jan. 17, 2017, 7:15 p.m. UTC | #2
Hi Bjorn

Sorry to bug you, didn't hear from you after i added the lock for consistency
to address the feedback.

Let me know if there is anymore changes you like to see.

Cheers,
Ashok

On Fri, Dec 09, 2016 at 01:06:04PM -0800, Ashok Raj wrote:
> Changes from v1:
> 	Address comments from Bjorn:
> 		Added p_slot->lock mutex around changes to p_slot->state
> 		Updated commit message to call out mutex names
> 
> A surprise link down may retrain very quickly, causing the same slot to
> generate a link up event before handling the link down completes.
> 
> Since the link is active, the power off work queued from the first link
> down will cause a second down event when the power is disabled. The second
> down event should be ignored because the slot is already powering off;
> however, the "link up" event sets the slot state to POWERON before the
> event to handle this is enqueued, making the second down event believe
> it needs to do something. This creates a constant link up and down
> event cycle.
> 
> This patch fixes that by setting the p_slot->state only when the work to
> handle the power event is executing, protected by the p_slot->hotplug_lock.
> 
> To: Bjorn Helgass <bhelgaas@google.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Keith Busch <keith.busch@intel.com>
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Reviewed-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/hotplug/pciehp_ctrl.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index ec0b4c1..4cf4772 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -182,6 +182,9 @@ static void pciehp_power_thread(struct work_struct *work)
>  	switch (info->req) {
>  	case DISABLE_REQ:
>  		mutex_lock(&p_slot->hotplug_lock);
> +		mutex_lock(&p_slot->lock);
> +		p_slot->state = POWEROFF_STATE;
> +		mutex_unlock(&p_slot->lock);
>  		pciehp_disable_slot(p_slot);
>  		mutex_unlock(&p_slot->hotplug_lock);
>  		mutex_lock(&p_slot->lock);
> @@ -190,6 +193,9 @@ static void pciehp_power_thread(struct work_struct *work)
>  		break;
>  	case ENABLE_REQ:
>  		mutex_lock(&p_slot->hotplug_lock);
> +		mutex_lock(&p_slot->lock);
> +		p_slot->state = POWERON_STATE;
> +		mutex_unlock(&p_slot->lock);
>  		ret = pciehp_enable_slot(p_slot);
>  		mutex_unlock(&p_slot->hotplug_lock);
>  		if (ret)
> @@ -209,8 +215,6 @@ static void pciehp_queue_power_work(struct slot *p_slot, int req)
>  {
>  	struct power_work_info *info;
>  
> -	p_slot->state = (req == ENABLE_REQ) ? POWERON_STATE : POWEROFF_STATE;
> -
>  	info = kmalloc(sizeof(*info), GFP_KERNEL);
>  	if (!info) {
>  		ctrl_err(p_slot->ctrl, "no memory to queue %s request\n",
> -- 
> 2.7.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keith Busch Jan. 18, 2017, 6:47 p.m. UTC | #3
Hi Bjorn,

This fix looks good to me as well now. Any other concerns before staging
this one for inclusion?

Thanks,
Keith

On Tue, Jan 17, 2017 at 11:15:40AM -0800, Raj, Ashok wrote:
> Hi Bjorn
> 
> Sorry to bug you, didn't hear from you after i added the lock for consistency
> to address the feedback.
> 
> Let me know if there is anymore changes you like to see.
> 
> Cheers,
> Ashok
> 
> On Fri, Dec 09, 2016 at 01:06:04PM -0800, Ashok Raj wrote:
> > Changes from v1:
> > 	Address comments from Bjorn:
> > 		Added p_slot->lock mutex around changes to p_slot->state
> > 		Updated commit message to call out mutex names
> > 
> > A surprise link down may retrain very quickly, causing the same slot to
> > generate a link up event before handling the link down completes.
> > 
> > Since the link is active, the power off work queued from the first link
> > down will cause a second down event when the power is disabled. The second
> > down event should be ignored because the slot is already powering off;
> > however, the "link up" event sets the slot state to POWERON before the
> > event to handle this is enqueued, making the second down event believe
> > it needs to do something. This creates a constant link up and down
> > event cycle.
> > 
> > This patch fixes that by setting the p_slot->state only when the work to
> > handle the power event is executing, protected by the p_slot->hotplug_lock.
> > 
> > To: Bjorn Helgass <bhelgaas@google.com>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Keith Busch <keith.busch@intel.com>
> > 
> > Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> > Reviewed-by: Keith Busch <keith.busch@intel.com>
> > ---
> >  drivers/pci/hotplug/pciehp_ctrl.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> > index ec0b4c1..4cf4772 100644
> > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > @@ -182,6 +182,9 @@ static void pciehp_power_thread(struct work_struct *work)
> >  	switch (info->req) {
> >  	case DISABLE_REQ:
> >  		mutex_lock(&p_slot->hotplug_lock);
> > +		mutex_lock(&p_slot->lock);
> > +		p_slot->state = POWEROFF_STATE;
> > +		mutex_unlock(&p_slot->lock);
> >  		pciehp_disable_slot(p_slot);
> >  		mutex_unlock(&p_slot->hotplug_lock);
> >  		mutex_lock(&p_slot->lock);
> > @@ -190,6 +193,9 @@ static void pciehp_power_thread(struct work_struct *work)
> >  		break;
> >  	case ENABLE_REQ:
> >  		mutex_lock(&p_slot->hotplug_lock);
> > +		mutex_lock(&p_slot->lock);
> > +		p_slot->state = POWERON_STATE;
> > +		mutex_unlock(&p_slot->lock);
> >  		ret = pciehp_enable_slot(p_slot);
> >  		mutex_unlock(&p_slot->hotplug_lock);
> >  		if (ret)
> > @@ -209,8 +215,6 @@ static void pciehp_queue_power_work(struct slot *p_slot, int req)
> >  {
> >  	struct power_work_info *info;
> >  
> > -	p_slot->state = (req == ENABLE_REQ) ? POWERON_STATE : POWEROFF_STATE;
> > -
> >  	info = kmalloc(sizeof(*info), GFP_KERNEL);
> >  	if (!info) {
> >  		ctrl_err(p_slot->ctrl, "no memory to queue %s request\n",
> > -- 
> > 2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Jan. 19, 2017, 2:34 p.m. UTC | #4
Sorry for the delay, I'm just trying to clear out some of our defects from
bugzilla and have gotten really behind on the list, so I just haven't
gotten to this yet.

On Wed, Jan 18, 2017 at 01:47:11PM -0500, Keith Busch wrote:
> Hi Bjorn,
> 
> This fix looks good to me as well now. Any other concerns before staging
> this one for inclusion?
> 
> Thanks,
> Keith
> 
> On Tue, Jan 17, 2017 at 11:15:40AM -0800, Raj, Ashok wrote:
> > Hi Bjorn
> > 
> > Sorry to bug you, didn't hear from you after i added the lock for consistency
> > to address the feedback.
> > 
> > Let me know if there is anymore changes you like to see.
> > 
> > Cheers,
> > Ashok
> > 
> > On Fri, Dec 09, 2016 at 01:06:04PM -0800, Ashok Raj wrote:
> > > Changes from v1:
> > > 	Address comments from Bjorn:
> > > 		Added p_slot->lock mutex around changes to p_slot->state
> > > 		Updated commit message to call out mutex names
> > > 
> > > A surprise link down may retrain very quickly, causing the same slot to
> > > generate a link up event before handling the link down completes.
> > > 
> > > Since the link is active, the power off work queued from the first link
> > > down will cause a second down event when the power is disabled. The second
> > > down event should be ignored because the slot is already powering off;
> > > however, the "link up" event sets the slot state to POWERON before the
> > > event to handle this is enqueued, making the second down event believe
> > > it needs to do something. This creates a constant link up and down
> > > event cycle.
> > > 
> > > This patch fixes that by setting the p_slot->state only when the work to
> > > handle the power event is executing, protected by the p_slot->hotplug_lock.
> > > 
> > > To: Bjorn Helgass <bhelgaas@google.com>
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: Keith Busch <keith.busch@intel.com>
> > > 
> > > Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> > > Reviewed-by: Keith Busch <keith.busch@intel.com>
> > > ---
> > >  drivers/pci/hotplug/pciehp_ctrl.c | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> > > index ec0b4c1..4cf4772 100644
> > > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > > @@ -182,6 +182,9 @@ static void pciehp_power_thread(struct work_struct *work)
> > >  	switch (info->req) {
> > >  	case DISABLE_REQ:
> > >  		mutex_lock(&p_slot->hotplug_lock);
> > > +		mutex_lock(&p_slot->lock);
> > > +		p_slot->state = POWEROFF_STATE;
> > > +		mutex_unlock(&p_slot->lock);
> > >  		pciehp_disable_slot(p_slot);
> > >  		mutex_unlock(&p_slot->hotplug_lock);
> > >  		mutex_lock(&p_slot->lock);
> > > @@ -190,6 +193,9 @@ static void pciehp_power_thread(struct work_struct *work)
> > >  		break;
> > >  	case ENABLE_REQ:
> > >  		mutex_lock(&p_slot->hotplug_lock);
> > > +		mutex_lock(&p_slot->lock);
> > > +		p_slot->state = POWERON_STATE;
> > > +		mutex_unlock(&p_slot->lock);
> > >  		ret = pciehp_enable_slot(p_slot);
> > >  		mutex_unlock(&p_slot->hotplug_lock);
> > >  		if (ret)
> > > @@ -209,8 +215,6 @@ static void pciehp_queue_power_work(struct slot *p_slot, int req)
> > >  {
> > >  	struct power_work_info *info;
> > >  
> > > -	p_slot->state = (req == ENABLE_REQ) ? POWERON_STATE : POWEROFF_STATE;
> > > -
> > >  	info = kmalloc(sizeof(*info), GFP_KERNEL);
> > >  	if (!info) {
> > >  		ctrl_err(p_slot->ctrl, "no memory to queue %s request\n",
> > > -- 
> > > 2.7.4
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Feb. 3, 2017, 2:59 a.m. UTC | #5
Hi Ashok,

Sorry it took me so long to review this.  I never felt like I really
understood it, and it took me a long time to try to figure out a more
useful response.

On Fri, Dec 09, 2016 at 01:06:04PM -0800, Ashok Raj wrote:
> Changes from v1:
> 	Address comments from Bjorn:
> 		Added p_slot->lock mutex around changes to p_slot->state
> 		Updated commit message to call out mutex names
> 
> A surprise link down may retrain very quickly, causing the same slot to
> generate a link up event before handling the link down completes.
> 
> Since the link is active, the power off work queued from the first link
> down will cause a second down event when the power is disabled. The second
> down event should be ignored because the slot is already powering off;
> however, the "link up" event sets the slot state to POWERON before the
> event to handle this is enqueued, making the second down event believe
> it needs to do something. This creates a constant link up and down
> event cycle.
> 
> This patch fixes that by setting the p_slot->state only when the work to
> handle the power event is executing, protected by the p_slot->hotplug_lock.

What I don't like about this patch is that I can't figure out what
we're fixing from the patch.  That's not your fault; it's just a
symptom of the convoluted logic in pciehp.  But if we can fix the
problem by simplifying that logic, I'd rather do that than patch up
the holes.

So let me first try to understand what's going on with the current
code.  In the normal case where a device is removed or turned off and
pciehp can complete everything before another device appears, I think
the flow is like this:

      p_slot->state == STATIC_STATE (powered on, link up)

                        <-- surprise link down interrupt
      pciehp_isr()
        queue INT_LINK_DOWN work

      interrupt_event_handler(INT_LINK_DOWN)
        set p_slot->state = POWEROFF_STATE
        queue DISABLE_REQ work

      pciehp_power_thread(DISABLE_REQ)
        send PCI_EXP_SLTCTL_PWR_OFF command
        wait for power-off to complete
        set p_slot->state = STATIC_STATE

      p_slot->state == STATIC_STATE (powered off)

In the problem case, the link goes down, and while pciehp is still
dealing with that, the link comes back up.  So I think one possible
sequence is like this:

      p_slot->state == STATIC_STATE (powered on, link up)

                        <-- surprise link down interrupt
  1a  pciehp_isr()
        queue INT_LINK_DOWN work                     # queued: 1-LD

  1b  interrupt_event_handler(INT_LINK_DOWN)         # process 1-LD
        # handle_link_event() sees case STATIC_STATE
        set p_slot->state = POWEROFF_STATE
        queue DISABLE_REQ work                       # queued: 1-DR

                        <-- surprise link up interrupt
  2a  pciehp_isr()
        queue INT_LINK_UP work                       # queued: 1-DR 2-LU

  1c  pciehp_power_thread(DISABLE_REQ)               # process 1-DR
        send PCI_EXP_SLTCTL_PWR_OFF command
        wait for power-off to complete
        set p_slot->state = STATIC_STATE

                        <-- link down interrupt (result of PWR_OFF)
  3a  pciehp_isr()
        queue INT_LINK_DOWN work                     # queued: 2-LU 3-LD

  2b  interrupt_event_handler(INT_LINK_UP)           # process 2-LU
        # handle_link_event() sees case STATIC_STATE
        set p_slot->state = POWERON_STATE
        queue ENABLE_REQ work                        # queued: 3-LD 2-ER

  3b  interrupt_event_handler(INT_LINK_DOWN)         # process 3-LD
        # handle_link_event() sees case POWERON_STATE, so we emit
        # "Link Down event queued; currently getting powered on"
        set p_slot->state = POWEROFF_STATE
        queue DISABLE_REQ work                       # queued: 2-ER 3-DR

  2c  pciehp_power_thread(ENABLE_REQ)                # process 2-ER
        send PCI_EXP_SLTCTL_PWR_ON command
        wait for power-on to complete
        set p_slot->state = STATIC_STATE

                        <-- link up interrupt (result of PWR_ON)
  4a  pciehp_isr()
        queue INT_LINK_UP work                       # queued: 3-DR 4-LU

  3c  pciehp_power_thread(DISABLE_REQ)               # process 3-DR
        send PCI_EXP_SLTCTL_PWR_OFF command
        wait for power-off to complete
        set p_slot->state = STATIC_STATE

                        <-- link down interrupt (result of PWR_OFF)
  5a  pciehp_isr()
        queue INT_LINK_DOWN work                     # queued: 4-LU 5-LD

State 5a is the same as 3a (we're in STATIC_STATE with Link Up and
Link Down work items queued), so the whole cycle can repeat.

Now let's assume we apply this patch and see what changes.  The patch
changes where we set p_slot->state.  Currently we set POWEROFF_STATE
or POWERON_STATE in the interrupt_event_handler() work item.  The
patch moves that to the pciehp_power_thread() work item, where the
power commands are actually sent.

      p_slot->state == STATIC_STATE (powered on, link up)

                        <-- surprise link down interrupt
  1A  pciehp_isr()
        queue INT_LINK_DOWN work                     # queued: 1-LD

  1B  interrupt_event_handler(INT_LINK_DOWN)         # process 1-LD
        # handle_link_event() sees case STATIC_STATE
        # set p_slot->state = POWEROFF_STATE         # (removed by patch)
        queue DISABLE_REQ work                       # queued: 1-DR

                        <-- surprise link up interrupt
  2A  pciehp_isr()
        queue INT_LINK_UP work                       # queued: 1-DR 2-LU

  1C  pciehp_power_thread(DISABLE_REQ)               # process 1-DR
        set p_slot->state = POWEROFF_STATE           # (added by patch)
        send PCI_EXP_SLTCTL_PWR_OFF command
        wait for power-off to complete
        set p_slot->state = STATIC_STATE

                        <-- link down interrupt (result of PWR_OFF)
  3A  pciehp_isr()
        queue INT_LINK_DOWN work                     # queued: 2-LU 3-LD

  2B  interrupt_event_handler(INT_LINK_UP)           # process 2-LU
        # handle_link_event() sees case STATIC_STATE
        # set p_slot->state = POWERON_STATE          # (removed by patch)
        queue ENABLE_REQ work                        # queued: 3-LD 2-ER

  3B  interrupt_event_handler(INT_LINK_DOWN)         # process 3-LD
        # handle_link_event() sees case STATIC_STATE,
        # unlike 3b above, which saw POWERON_STATE;
        # doesn't emit a message, but still queues DISABLE_REQ
        # set p_slot->state = POWEROFF_STATE         # (removed by patch)
        queue DISABLE_REQ work                       # queued: 2-ER 3-DR

  2C  pciehp_power_thread(ENABLE_REQ)                # process 2-ER
        set p_slot->state = POWERON_STATE            # (added by patch)
        send PCI_EXP_SLTCTL_PWR_ON command
        wait for power-on to complete
        set p_slot->state = STATIC_STATE

                        <-- link up interrupt (result of PWR_ON)
  4A  pciehp_isr()
        queue INT_LINK_UP work                       # queued: 3-DR 4-LU

  3C  pciehp_power_thread(DISABLE_REQ)               # process 3-DR
        set p_slot->state = POWEROFF_STATE           # (added by patch)
        send PCI_EXP_SLTCTL_PWR_OFF command
        wait for power-off to complete
        set p_slot->state = STATIC_STATE

                        <-- link down interrupt (result of PWR_OFF)
  5A  pciehp_isr()
        queue INT_LINK_DOWN work                     # queued: 4-LU 5-LD

With this particular ordering, I think we still have the same problem:
5A is the same as 3A, so I think the cycle could repeat.

Obviously many other orderings are possible.  Since the patch fixes
your system, I suspect you're seeing an ordering where at 3B,
handle_link_event() sees POWEROFF_STATE.  In that case, you probably
see the "Link Down event ignored" message, but we don't queue the
DISABLE_REQ work, and there is no cycle.

This all makes me a little bleary-eyed, so maybe I'm missing
something, but it looks to me like we could still hit the same problem
even with this patch.

I really think the only way to make this all intelligible and reliable
is to rework this to use ordered workqueues so we only have one thing
at a time going on.  That wouldn't be a panacea, but I think it would
make things a lot simpler.

Bjorn

> To: Bjorn Helgass <bhelgaas@google.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Keith Busch <keith.busch@intel.com>
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Reviewed-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/hotplug/pciehp_ctrl.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index ec0b4c1..4cf4772 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -182,6 +182,9 @@ static void pciehp_power_thread(struct work_struct *work)
>  	switch (info->req) {
>  	case DISABLE_REQ:
>  		mutex_lock(&p_slot->hotplug_lock);
> +		mutex_lock(&p_slot->lock);
> +		p_slot->state = POWEROFF_STATE;
> +		mutex_unlock(&p_slot->lock);
>  		pciehp_disable_slot(p_slot);
>  		mutex_unlock(&p_slot->hotplug_lock);
>  		mutex_lock(&p_slot->lock);
> @@ -190,6 +193,9 @@ static void pciehp_power_thread(struct work_struct *work)
>  		break;
>  	case ENABLE_REQ:
>  		mutex_lock(&p_slot->hotplug_lock);
> +		mutex_lock(&p_slot->lock);
> +		p_slot->state = POWERON_STATE;
> +		mutex_unlock(&p_slot->lock);
>  		ret = pciehp_enable_slot(p_slot);
>  		mutex_unlock(&p_slot->hotplug_lock);
>  		if (ret)
> @@ -209,8 +215,6 @@ static void pciehp_queue_power_work(struct slot *p_slot, int req)
>  {
>  	struct power_work_info *info;
>  
> -	p_slot->state = (req == ENABLE_REQ) ? POWERON_STATE : POWEROFF_STATE;
> -
>  	info = kmalloc(sizeof(*info), GFP_KERNEL);
>  	if (!info) {
>  		ctrl_err(p_slot->ctrl, "no memory to queue %s request\n",
> -- 
> 2.7.4
>
Ashok Raj Feb. 3, 2017, 6 a.m. UTC | #6
Hi Bjorn

On Thu, Feb 02, 2017 at 08:59:01PM -0600, Bjorn Helgaas wrote:
> Hi Ashok,
> 
> Sorry it took me so long to review this.  I never felt like I really
> understood it, and it took me a long time to try to figure out a more
> useful response.

No worries. Agree its a litte tricky, and took me several iterations before
doing someting that was simple enough, without a complete overhaul of
state management. 

Thanks a ton for capturing the sequence, I did capture
some debug output along at that time. My apologies for not adding it
along. But this becomes excellant notes and perhaps would be good to 
capture in commit or in the documentation. Going through this isn't fun :-)


Responses below:
> > 
> > This patch fixes that by setting the p_slot->state only when the work to
> > handle the power event is executing, protected by the p_slot->hotplug_lock.
> 
> So let me first try to understand what's going on with the current
> code.  In the normal case where a device is removed or turned off and
> pciehp can complete everything before another device appears, I think
> the flow is like this:

You got this problem part right. Spot on!
> 
>       p_slot->state == STATIC_STATE (powered on, link up)
> 
>                         <-- surprise link down interrupt
>       pciehp_isr()
>         queue INT_LINK_DOWN work
> 
>       interrupt_event_handler(INT_LINK_DOWN)
>         set p_slot->state = POWEROFF_STATE
>         queue DISABLE_REQ work
> 
>       pciehp_power_thread(DISABLE_REQ)
>         send PCI_EXP_SLTCTL_PWR_OFF command
>         wait for power-off to complete
>         set p_slot->state = STATIC_STATE
> 
>       p_slot->state == STATIC_STATE (powered off)
> 
> In the problem case, the link goes down, and while pciehp is still
> dealing with that, the link comes back up.  So I think one possible
> sequence is like this:
> 
>       p_slot->state == STATIC_STATE (powered on, link up)
> 
>                         <-- surprise link down interrupt
>   1a  pciehp_isr()
>         queue INT_LINK_DOWN work                     # queued: 1-LD
> 
>   1b  interrupt_event_handler(INT_LINK_DOWN)         # process 1-LD
>         # handle_link_event() sees case STATIC_STATE
>         set p_slot->state = POWEROFF_STATE
>         queue DISABLE_REQ work                       # queued: 1-DR
> 
>                         <-- surprise link up interrupt
>   2a  pciehp_isr()
>         queue INT_LINK_UP work                       # queued: 1-DR 2-LU
> 
>   1c  pciehp_power_thread(DISABLE_REQ)               # process 1-DR
>         send PCI_EXP_SLTCTL_PWR_OFF command
>         wait for power-off to complete
>         set p_slot->state = STATIC_STATE
> 
>                         <-- link down interrupt (result of PWR_OFF)
>   3a  pciehp_isr()
>         queue INT_LINK_DOWN work                     # queued: 2-LU 3-LD
> 
>   2b  interrupt_event_handler(INT_LINK_UP)           # process 2-LU
>         # handle_link_event() sees case STATIC_STATE
>         set p_slot->state = POWERON_STATE
>         queue ENABLE_REQ work                        # queued: 3-LD 2-ER
> 
>   3b  interrupt_event_handler(INT_LINK_DOWN)         # process 3-LD
>         # handle_link_event() sees case POWERON_STATE, so we emit
>         # "Link Down event queued; currently getting powered on"
>         set p_slot->state = POWEROFF_STATE
>         queue DISABLE_REQ work                       # queued: 2-ER 3-DR
> 
>   2c  pciehp_power_thread(ENABLE_REQ)                # process 2-ER
>         send PCI_EXP_SLTCTL_PWR_ON command
>         wait for power-on to complete
>         set p_slot->state = STATIC_STATE
> 
>                         <-- link up interrupt (result of PWR_ON)
>   4a  pciehp_isr()
>         queue INT_LINK_UP work                       # queued: 3-DR 4-LU
> 
>   3c  pciehp_power_thread(DISABLE_REQ)               # process 3-DR
>         send PCI_EXP_SLTCTL_PWR_OFF command
>         wait for power-off to complete
>         set p_slot->state = STATIC_STATE
> 
>                         <-- link down interrupt (result of PWR_OFF)
>   5a  pciehp_isr()
>         queue INT_LINK_DOWN work                     # queued: 4-LU 5-LD
> 
> State 5a is the same as 3a (we're in STATIC_STATE with Link Up and
> Link Down work items queued), so the whole cycle can repeat.
> 
> Now let's assume we apply this patch and see what changes.  The patch
> changes where we set p_slot->state.  Currently we set POWEROFF_STATE
> or POWERON_STATE in the interrupt_event_handler() work item.  The
> patch moves that to the pciehp_power_thread() work item, where the
> power commands are actually sent.

Right. The difference with this patch is when we set the state to 
POWERON_STATE or POWEROFF_STATE, we only do that when the previous
POWER* operation has entirely completed. Since now its protected with the
hotplug_lock mutex.

In the problem case, since we set the state before the pciehp_power_thread,
we end up changing the state to POWER*_STATE before the previous POWER*
action has completed.
> 
>       p_slot->state == STATIC_STATE (powered on, link up)
> 
>                         <-- surprise link down interrupt
>   1A  pciehp_isr()
>         queue INT_LINK_DOWN work                     # queued: 1-LD
> 
>   1B  interrupt_event_handler(INT_LINK_DOWN)         # process 1-LD
>         # handle_link_event() sees case STATIC_STATE
>         # set p_slot->state = POWEROFF_STATE         # (removed by patch)
>         queue DISABLE_REQ work                       # queued: 1-DR
> 
>                         <-- surprise link up interrupt
>   2A  pciehp_isr()
>         queue INT_LINK_UP work                       # queued: 1-DR 2-LU
> 
>   1C  pciehp_power_thread(DISABLE_REQ)               # process 1-DR

	Also mutex hotplug_lock is held.

>         set p_slot->state = POWEROFF_STATE           # (added by patch)
>         send PCI_EXP_SLTCTL_PWR_OFF command
>         wait for power-off to complete
>         set p_slot->state = STATIC_STATE
> 
>                         <-- link down interrupt (result of PWR_OFF)
>   3A  pciehp_isr()
>         queue INT_LINK_DOWN work                     # queued: 2-LU 3-LD

The above INT_LINK_DOWN will eventually be ignored in handle_link_event()
because we are in POWEROFF_STATE, and a link down while in POWEROFF will
be ignored.
> 
>   2B  interrupt_event_handler(INT_LINK_UP)           # process 2-LU
>         # handle_link_event() sees case STATIC_STATE
>         # set p_slot->state = POWERON_STATE          # (removed by patch)
>         queue ENABLE_REQ work                        # queued: 3-LD 2-ER
> 
>   3B  interrupt_event_handler(INT_LINK_DOWN)         # process 3-LD
>         # handle_link_event() sees case STATIC_STATE,
>         # unlike 3b above, which saw POWERON_STATE;
>         # doesn't emit a message, but still queues DISABLE_REQ
>         # set p_slot->state = POWEROFF_STATE         # (removed by patch)
>         queue DISABLE_REQ work                       # queued: 2-ER 3-DR

3B will be ignored, since handle_link_event() knows we are in process
of POWEROFF.

> 
>   2C  pciehp_power_thread(ENABLE_REQ)                # process 2-ER

We are also protected by mutex hotplug_lock here. So  the following
wont get executed until step 1C has run to completion and the 
mutex is released.

>         set p_slot->state = POWERON_STATE            # (added by patch)
>         send PCI_EXP_SLTCTL_PWR_ON command
>         wait for power-on to complete
>         set p_slot->state = STATIC_STATE
> 
>                         <-- link up interrupt (result of PWR_ON)
>   4A  pciehp_isr()
>         queue INT_LINK_UP work                       # queued: 3-DR 4-LU

handle_link_event() would eventually dismiss the INT_LINK_UP since
it knows we are in process of POWERON.
> 
>   3C  pciehp_power_thread(DISABLE_REQ)               # process 3-DR
>         set p_slot->state = POWEROFF_STATE           # (added by patch)
>         send PCI_EXP_SLTCTL_PWR_OFF command
>         wait for power-off to complete
>         set p_slot->state = STATIC_STATE
> 
>                         <-- link down interrupt (result of PWR_OFF)
>   5A  pciehp_isr()
>         queue INT_LINK_DOWN work                     # queued: 4-LU 5-LD
> 
> With this particular ordering, I think we still have the same problem:
> 5A is the same as 3A, so I think the cycle could repeat.

I think the sequence is almost right, except the fact since we are protected
by hotplug_lock, we don't allow another POWERON or POWEROFF to be processed
until the previous POWER* operation is completed entirely.

We have run through several experiments with managed power on, power off,
using attention button sequence and surprise link down, surprise remove
in the lab and we haven't run into any other issues.

Just to summarize, we only queue the POWEROFF due to surprise link down
and another POWERON due to link becoming back up. The transient link-down 
events are coveniently ignored.

Hope this helps, and sincere thanks for looking into this in great detail!

Cheers,
Ashok
Bjorn Helgaas Feb. 3, 2017, 4:51 p.m. UTC | #7
On Thu, Feb 02, 2017 at 10:00:53PM -0800, Raj, Ashok wrote:
> Hi Bjorn
> 
> On Thu, Feb 02, 2017 at 08:59:01PM -0600, Bjorn Helgaas wrote:
> > Hi Ashok,
> > 
> > Sorry it took me so long to review this.  I never felt like I really
> > understood it, and it took me a long time to try to figure out a more
> > useful response.
> 
> No worries. Agree its a litte tricky, and took me several iterations before
> doing someting that was simple enough, without a complete overhaul of
> state management. 
> 
> Thanks a ton for capturing the sequence, I did capture
> some debug output along at that time. My apologies for not adding it
> along. But this becomes excellant notes and perhaps would be good to 
> capture in commit or in the documentation. Going through this isn't fun :-)

Maybe you could open a kernel.org bugzilla and attach the dmesg log
and "lspci -vv" output.  Then we could capture some of your logs and
this discussion there and include a pointer in the changelog.

> Responses below:
> > > 
> > > This patch fixes that by setting the p_slot->state only when the work to
> > > handle the power event is executing, protected by the p_slot->hotplug_lock.
> > 
> > So let me first try to understand what's going on with the current
> > code.  In the normal case where a device is removed or turned off and
> > pciehp can complete everything before another device appears, I think
> > the flow is like this:
> 
> You got this problem part right. Spot on!
> > 
> >       p_slot->state == STATIC_STATE (powered on, link up)
> > 
> >                         <-- surprise link down interrupt
> >       pciehp_isr()
> >         queue INT_LINK_DOWN work
> > 
> >       interrupt_event_handler(INT_LINK_DOWN)
> >         set p_slot->state = POWEROFF_STATE
> >         queue DISABLE_REQ work
> > 
> >       pciehp_power_thread(DISABLE_REQ)
> >         send PCI_EXP_SLTCTL_PWR_OFF command
> >         wait for power-off to complete
> >         set p_slot->state = STATIC_STATE
> > 
> >       p_slot->state == STATIC_STATE (powered off)
> > 
> > In the problem case, the link goes down, and while pciehp is still
> > dealing with that, the link comes back up.  So I think one possible
> > sequence is like this:
> > 
> >       p_slot->state == STATIC_STATE (powered on, link up)
> > 
> >                         <-- surprise link down interrupt
> >   1a  pciehp_isr()
> >         queue INT_LINK_DOWN work                     # queued: 1-LD
> > 
> >   1b  interrupt_event_handler(INT_LINK_DOWN)         # process 1-LD
> >         # handle_link_event() sees case STATIC_STATE
> >         set p_slot->state = POWEROFF_STATE
> >         queue DISABLE_REQ work                       # queued: 1-DR
> > 
> >                         <-- surprise link up interrupt
> >   2a  pciehp_isr()
> >         queue INT_LINK_UP work                       # queued: 1-DR 2-LU
> > 
> >   1c  pciehp_power_thread(DISABLE_REQ)               # process 1-DR
> >         send PCI_EXP_SLTCTL_PWR_OFF command
> >         wait for power-off to complete
> >         set p_slot->state = STATIC_STATE
> > 
> >                         <-- link down interrupt (result of PWR_OFF)
> >   3a  pciehp_isr()
> >         queue INT_LINK_DOWN work                     # queued: 2-LU 3-LD
> > 
> >   2b  interrupt_event_handler(INT_LINK_UP)           # process 2-LU
> >         # handle_link_event() sees case STATIC_STATE
> >         set p_slot->state = POWERON_STATE
> >         queue ENABLE_REQ work                        # queued: 3-LD 2-ER
> > 
> >   3b  interrupt_event_handler(INT_LINK_DOWN)         # process 3-LD
> >         # handle_link_event() sees case POWERON_STATE, so we emit
> >         # "Link Down event queued; currently getting powered on"
> >         set p_slot->state = POWEROFF_STATE
> >         queue DISABLE_REQ work                       # queued: 2-ER 3-DR
> > 
> >   2c  pciehp_power_thread(ENABLE_REQ)                # process 2-ER
> >         send PCI_EXP_SLTCTL_PWR_ON command
> >         wait for power-on to complete
> >         set p_slot->state = STATIC_STATE
> > 
> >                         <-- link up interrupt (result of PWR_ON)
> >   4a  pciehp_isr()
> >         queue INT_LINK_UP work                       # queued: 3-DR 4-LU
> > 
> >   3c  pciehp_power_thread(DISABLE_REQ)               # process 3-DR
> >         send PCI_EXP_SLTCTL_PWR_OFF command
> >         wait for power-off to complete
> >         set p_slot->state = STATIC_STATE
> > 
> >                         <-- link down interrupt (result of PWR_OFF)
> >   5a  pciehp_isr()
> >         queue INT_LINK_DOWN work                     # queued: 4-LU 5-LD
> > 
> > State 5a is the same as 3a (we're in STATIC_STATE with Link Up and
> > Link Down work items queued), so the whole cycle can repeat.
> > 
> > Now let's assume we apply this patch and see what changes.  The patch
> > changes where we set p_slot->state.  Currently we set POWEROFF_STATE
> > or POWERON_STATE in the interrupt_event_handler() work item.  The
> > patch moves that to the pciehp_power_thread() work item, where the
> > power commands are actually sent.
> 
> Right. The difference with this patch is when we set the state to 
> POWERON_STATE or POWEROFF_STATE, we only do that when the previous
> POWER* operation has entirely completed. Since now its protected with the
> hotplug_lock mutex.
> 
> In the problem case, since we set the state before the pciehp_power_thread,
> we end up changing the state to POWER*_STATE before the previous POWER*
> action has completed.
> > 
> >       p_slot->state == STATIC_STATE (powered on, link up)
> > 
> >                         <-- surprise link down interrupt
> >   1A  pciehp_isr()
> >         queue INT_LINK_DOWN work                     # queued: 1-LD
> > 
> >   1B  interrupt_event_handler(INT_LINK_DOWN)         # process 1-LD
> >         # handle_link_event() sees case STATIC_STATE
> >         # set p_slot->state = POWEROFF_STATE         # (removed by patch)
> >         queue DISABLE_REQ work                       # queued: 1-DR
> > 
> >                         <-- surprise link up interrupt
> >   2A  pciehp_isr()
> >         queue INT_LINK_UP work                       # queued: 1-DR 2-LU
> > 
> >   1C  pciehp_power_thread(DISABLE_REQ)               # process 1-DR
> 
> 	Also mutex hotplug_lock is held.
> 
> >         set p_slot->state = POWEROFF_STATE           # (added by patch)
> >         send PCI_EXP_SLTCTL_PWR_OFF command
> >         wait for power-off to complete
> >         set p_slot->state = STATIC_STATE
> > 
> >                         <-- link down interrupt (result of PWR_OFF)
> >   3A  pciehp_isr()
> >         queue INT_LINK_DOWN work                     # queued: 2-LU 3-LD
> 
> The above INT_LINK_DOWN will eventually be ignored in handle_link_event()
> because we are in POWEROFF_STATE, and a link down while in POWEROFF will
> be ignored.
> > 
> >   2B  interrupt_event_handler(INT_LINK_UP)           # process 2-LU
> >         # handle_link_event() sees case STATIC_STATE
> >         # set p_slot->state = POWERON_STATE          # (removed by patch)
> >         queue ENABLE_REQ work                        # queued: 3-LD 2-ER
> > 
> >   3B  interrupt_event_handler(INT_LINK_DOWN)         # process 3-LD
> >         # handle_link_event() sees case STATIC_STATE,
> >         # unlike 3b above, which saw POWERON_STATE;
> >         # doesn't emit a message, but still queues DISABLE_REQ
> >         # set p_slot->state = POWEROFF_STATE         # (removed by patch)
> >         queue DISABLE_REQ work                       # queued: 2-ER 3-DR
> 
> 3B will be ignored, since handle_link_event() knows we are in process
> of POWEROFF.

What enforces this ordering?  handle_link_event() will only see
POWEROFF_STATE if it happens to read the state after
pciehp_power_thread() sets POWEROFF_STATE and before it
sets it back to STATIC_STATE.  Given our work item concurrency,
I think that's possible, but I don't see how it's guaranteed.

> >   2C  pciehp_power_thread(ENABLE_REQ)                # process 2-ER
> 
> We are also protected by mutex hotplug_lock here. So  the following
> wont get executed until step 1C has run to completion and the 
> mutex is released.
> 
> >         set p_slot->state = POWERON_STATE            # (added by patch)
> >         send PCI_EXP_SLTCTL_PWR_ON command
> >         wait for power-on to complete
> >         set p_slot->state = STATIC_STATE
> > 
> >                         <-- link up interrupt (result of PWR_ON)
> >   4A  pciehp_isr()
> >         queue INT_LINK_UP work                       # queued: 3-DR 4-LU
> 
> handle_link_event() would eventually dismiss the INT_LINK_UP since
> it knows we are in process of POWERON.
> > 
> >   3C  pciehp_power_thread(DISABLE_REQ)               # process 3-DR
> >         set p_slot->state = POWEROFF_STATE           # (added by patch)
> >         send PCI_EXP_SLTCTL_PWR_OFF command
> >         wait for power-off to complete
> >         set p_slot->state = STATIC_STATE
> > 
> >                         <-- link down interrupt (result of PWR_OFF)
> >   5A  pciehp_isr()
> >         queue INT_LINK_DOWN work                     # queued: 4-LU 5-LD
> > 
> > With this particular ordering, I think we still have the same problem:
> > 5A is the same as 3A, so I think the cycle could repeat.
> 
> I think the sequence is almost right, except the fact since we are protected
> by hotplug_lock, we don't allow another POWERON or POWEROFF to be processed
> until the previous POWER* operation is completed entirely.

handle_link_event() is protected by "lock" but not by "hotplug_lock",
so I think it can queue ENABLE/DISABLE items even before the previous
POWER* operation completes.

You're right that I omitted the hotplug_lock details.  I added them to
my outline (at https://goo.gl/szqWTC if you're interested), but I
don't see how that prevents the scenario above.

> Just to summarize, we only queue the POWEROFF due to surprise link down
> and another POWERON due to link becoming back up. The transient link-down 
> events are coveniently ignored.

I'm leery about ignoring events, though it happens to be convenient in
this case.  I think we're ignoring them because we're running work
items simultaneously with other items, and I think that concurrency is
unnecessary complexity.

I think it would be safer to queue every event and process every event
serially.
Bjorn Helgaas March 7, 2017, 12:24 a.m. UTC | #8
On Fri, Feb 03, 2017 at 10:51:04AM -0600, Bjorn Helgaas wrote:
> On Thu, Feb 02, 2017 at 10:00:53PM -0800, Raj, Ashok wrote:
> > ...
> > Just to summarize, we only queue the POWEROFF due to surprise link down
> > and another POWERON due to link becoming back up. The transient link-down 
> > events are coveniently ignored.
> 
> I'm leery about ignoring events, though it happens to be convenient in
> this case.  I think we're ignoring them because we're running work
> items simultaneously with other items, and I think that concurrency is
> unnecessary complexity.
> 
> I think it would be safer to queue every event and process every event
> serially.

Hi Ashok,

Just a ping to make sure we're not deadlocked.  I'm waiting for you,
so I hope you're not also waiting for me :)  I'm not trying to rush you;
I just don't want to drop this by mistake.

Bjorn
Ashok Raj March 8, 2017, 12:27 p.m. UTC | #9
On Mon, Mar 06, 2017 at 06:24:17PM -0600, Bjorn Helgaas wrote:
> On Fri, Feb 03, 2017 at 10:51:04AM -0600, Bjorn Helgaas wrote:
> 
> Hi Ashok,
> 
> Just a ping to make sure we're not deadlocked.  I'm waiting for you,
> so I hope you're not also waiting for me :)  I'm not trying to rush you;
> I just don't want to drop this by mistake.
> 
Hi Bjorn

no we aren't deadlocked :-). I didn't get around changing it to ordered
queue yet, mostly worried about having to retest all the different 
combinations with ATTN, POWER_CTL, SLD.

I'm depending on other folks to test SLD. They are tied up with other
issues ATM.

I have had another OEM test with several disks and multiple ATTN's 
pressed/cancel and current code seems to be working well so far, except the 
SLD case.

The change in the patch was only ensuring that we don't start another 
POWER_ON or POWER_OFF before the earlier operation was complete.

Would it be alright to fix SLD with this version while we can probe a clean
approach that can give us sufficient time to test a clean approach that works
with all the different combinations and OEM systems?

Cheers
Ashok
Bjorn Helgaas March 9, 2017, 2:37 p.m. UTC | #10
On Wed, Mar 08, 2017 at 04:27:26AM -0800, Raj, Ashok wrote:
> On Mon, Mar 06, 2017 at 06:24:17PM -0600, Bjorn Helgaas wrote:
> > On Fri, Feb 03, 2017 at 10:51:04AM -0600, Bjorn Helgaas wrote:
> > 
> > Hi Ashok,
> > 
> > Just a ping to make sure we're not deadlocked.  I'm waiting for you,
> > so I hope you're not also waiting for me :)  I'm not trying to rush you;
> > I just don't want to drop this by mistake.
> > 
> Hi Bjorn
> 
> no we aren't deadlocked :-). I didn't get around changing it to ordered
> queue yet, mostly worried about having to retest all the different 
> combinations with ATTN, POWER_CTL, SLD.
> 
> I'm depending on other folks to test SLD. They are tied up with other
> issues ATM.
> 
> I have had another OEM test with several disks and multiple ATTN's 
> pressed/cancel and current code seems to be working well so far, except the 
> SLD case.
> 
> The change in the patch was only ensuring that we don't start another 
> POWER_ON or POWER_OFF before the earlier operation was complete.
> 
> Would it be alright to fix SLD with this version while we can probe a clean
> approach that can give us sufficient time to test a clean approach that works
> with all the different combinations and OEM systems?

This version avoids some SLD issues, but we can't guarantee that it is
a complete solution.

I don't really like to put in an incomplete solution because it
reduces the urgency for doing a proper fix, and it also complicates
debugging if we trip over an SLD issue we haven't seen yet during
testing.

Bjorn
diff mbox

Patch

diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index ec0b4c1..4cf4772 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -182,6 +182,9 @@  static void pciehp_power_thread(struct work_struct *work)
 	switch (info->req) {
 	case DISABLE_REQ:
 		mutex_lock(&p_slot->hotplug_lock);
+		mutex_lock(&p_slot->lock);
+		p_slot->state = POWEROFF_STATE;
+		mutex_unlock(&p_slot->lock);
 		pciehp_disable_slot(p_slot);
 		mutex_unlock(&p_slot->hotplug_lock);
 		mutex_lock(&p_slot->lock);
@@ -190,6 +193,9 @@  static void pciehp_power_thread(struct work_struct *work)
 		break;
 	case ENABLE_REQ:
 		mutex_lock(&p_slot->hotplug_lock);
+		mutex_lock(&p_slot->lock);
+		p_slot->state = POWERON_STATE;
+		mutex_unlock(&p_slot->lock);
 		ret = pciehp_enable_slot(p_slot);
 		mutex_unlock(&p_slot->hotplug_lock);
 		if (ret)
@@ -209,8 +215,6 @@  static void pciehp_queue_power_work(struct slot *p_slot, int req)
 {
 	struct power_work_info *info;
 
-	p_slot->state = (req == ENABLE_REQ) ? POWERON_STATE : POWEROFF_STATE;
-
 	info = kmalloc(sizeof(*info), GFP_KERNEL);
 	if (!info) {
 		ctrl_err(p_slot->ctrl, "no memory to queue %s request\n",