diff mbox series

scsi: lpfc: Fix potential deadlock on &phba->hbalock

Message ID 20230726155342.51623-1-dg573847474@gmail.com (mailing list archive)
State Rejected
Headers show
Series scsi: lpfc: Fix potential deadlock on &phba->hbalock | expand

Commit Message

Chengfeng Ye July 26, 2023, 3:53 p.m. UTC
As &phba->hbalock is acquired by hardirq such as lpfc_sli_intr_handler(),
process context code acquiring the lock &phba->hbalock should disable
irq, otherwise deadlock could happen if the irq preempt the execution
while the lock is held in process context on the same CPU.

Most lock acquicision site disables irq but inside the callback
lpfc_cmpl_els_uvem() the lock is acquired without explicitly disable irq.
The outside caller of this callback also seems not disable irq.

[Deadlock Scenario]
lpfc_cmpl_els_uvem()
    -> spin_lock(&phba->hbalock)
        <irq>
        -> lpfc_sli_intr_handle()
        -> spin_lock(&phba->hbalock); (deadlock here)

This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.

The patch fix the potential deadlock by spin_lock_irqsave() just like
other callsite.

Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
---
 drivers/scsi/lpfc/lpfc_els.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Justin Tee July 26, 2023, 10:16 p.m. UTC | #1
Hi Chengfeng,

lpfc_cmpl_els_uvem is for the VMID feature that could only ever be
called on an SLI4 type HBA.
lpfc_sli_intr_handler can only ever be called on an SLI3 type HBA.

So, the deadlock being referred to can never happen.

Thanks,
Justin

On Wed, Jul 26, 2023 at 8:55 AM Chengfeng Ye <dg573847474@gmail.com> wrote:
>
> As &phba->hbalock is acquired by hardirq such as lpfc_sli_intr_handler(),
> process context code acquiring the lock &phba->hbalock should disable
> irq, otherwise deadlock could happen if the irq preempt the execution
> while the lock is held in process context on the same CPU.
>
> Most lock acquicision site disables irq but inside the callback
> lpfc_cmpl_els_uvem() the lock is acquired without explicitly disable irq.
> The outside caller of this callback also seems not disable irq.
>
> [Deadlock Scenario]
> lpfc_cmpl_els_uvem()
>     -> spin_lock(&phba->hbalock)
>         <irq>
>         -> lpfc_sli_intr_handle()
>         -> spin_lock(&phba->hbalock); (deadlock here)
>
> This flaw was found by an experimental static analysis tool I am
> developing for irq-related deadlock.
>
> The patch fix the potential deadlock by spin_lock_irqsave() just like
> other callsite.
>
> Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
> ---
>  drivers/scsi/lpfc/lpfc_els.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
> index 2bad9954c355..9667b4937b3a 100644
> --- a/drivers/scsi/lpfc/lpfc_els.c
> +++ b/drivers/scsi/lpfc/lpfc_els.c
> @@ -12398,6 +12398,7 @@ lpfc_cmpl_els_uvem(struct lpfc_hba *phba, struct lpfc_iocbq *icmdiocb,
>         u32 ulp_word4 = get_job_word4(phba, rspiocb);
>         struct lpfc_dmabuf *dmabuf = icmdiocb->cmd_dmabuf;
>         struct lpfc_vmid *vmid;
> +       unsigned long flags;
>
>         vmid = vmid_context->vmp;
>         if (!ndlp || ndlp->nlp_state != NLP_STE_UNMAPPED_NODE)
> @@ -12419,11 +12420,11 @@ lpfc_cmpl_els_uvem(struct lpfc_hba *phba, struct lpfc_iocbq *icmdiocb,
>                                  ulp_status, ulp_word4);
>                 goto out;
>         }
> -       spin_lock(&phba->hbalock);
> +       spin_lock_irqsave(&phba->hbalock, flags);
>         /* Set IN USE flag */
>         vport->vmid_flag |= LPFC_VMID_IN_USE;
>         phba->pport->vmid_flag |= LPFC_VMID_IN_USE;
> -       spin_unlock(&phba->hbalock);
> +       spin_unlock_irqrestore(&phba->hbalock, flags);
>
>         if (vmid_context->instantiated) {
>                 write_lock(&vport->vmid_lock);
> --
> 2.17.1
>
Chengfeng Ye July 27, 2023, 5:40 a.m. UTC | #2
Hi Justin,

Thanks much for the reply,  it is my negligence of not have noticed it,
and sorry for this.

I inspect the bug report of my tool again and just find that actually
lpfc_sli4_intr_handler() also acquires that lock.

lpfc_sli4_intr_handler()
    -> lpfc_sli4_hba_intr_handler()
    -> spin_lock_irqsave(&phba->hbalock, iflag);

It seems like this isr is called on an SLI4 type HBA. If consider this
one could it be a deadlock problem?

Thanks again,
Chengfeng
Justin Tee July 27, 2023, 5:53 p.m. UTC | #3
Hi Chengfeng,

That’s still a unlikely scenario:

        /* Check device state for handling interrupt */
        if (unlikely(lpfc_intr_state_check(phba))) {
                /* Check again for link_state with lock held */
                spin_lock_irqsave(&phba->hbalock, iflag);
                if (phba->link_state < LPFC_LINK_DOWN)
                        /* Flush, clear interrupt, and rearm the EQ */
                        lpfc_sli4_eqcq_flush(phba, fpeq);
                spin_unlock_irqrestore(&phba->hbalock, iflag);
                return IRQ_NONE;
        }

In order to enter that if statement and obtain the hbalock, the PCI
channel has to be offline or the HBA’s link is in not in an
initialized state.  If either of those were true, lpfc_cmpl_els_uvem
would never get called to begin with.

Thanks,
Justin


On Wed, Jul 26, 2023 at 10:40 PM Chengfeng Ye <dg573847474@gmail.com> wrote:
>
> Hi Justin,
>
> Thanks much for the reply,  it is my negligence of not have noticed it,
> and sorry for this.
>
> I inspect the bug report of my tool again and just find that actually
> lpfc_sli4_intr_handler() also acquires that lock.
>
> lpfc_sli4_intr_handler()
>     -> lpfc_sli4_hba_intr_handler()
>     -> spin_lock_irqsave(&phba->hbalock, iflag);
>
> It seems like this isr is called on an SLI4 type HBA. If consider this
> one could it be a deadlock problem?
>
> Thanks again,
> Chengfeng
diff mbox series

Patch

diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index 2bad9954c355..9667b4937b3a 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -12398,6 +12398,7 @@  lpfc_cmpl_els_uvem(struct lpfc_hba *phba, struct lpfc_iocbq *icmdiocb,
 	u32 ulp_word4 = get_job_word4(phba, rspiocb);
 	struct lpfc_dmabuf *dmabuf = icmdiocb->cmd_dmabuf;
 	struct lpfc_vmid *vmid;
+	unsigned long flags;
 
 	vmid = vmid_context->vmp;
 	if (!ndlp || ndlp->nlp_state != NLP_STE_UNMAPPED_NODE)
@@ -12419,11 +12420,11 @@  lpfc_cmpl_els_uvem(struct lpfc_hba *phba, struct lpfc_iocbq *icmdiocb,
 				 ulp_status, ulp_word4);
 		goto out;
 	}
-	spin_lock(&phba->hbalock);
+	spin_lock_irqsave(&phba->hbalock, flags);
 	/* Set IN USE flag */
 	vport->vmid_flag |= LPFC_VMID_IN_USE;
 	phba->pport->vmid_flag |= LPFC_VMID_IN_USE;
-	spin_unlock(&phba->hbalock);
+	spin_unlock_irqrestore(&phba->hbalock, flags);
 
 	if (vmid_context->instantiated) {
 		write_lock(&vport->vmid_lock);