diff mbox series

dmaengine: plx_dma: Fix potential deadlock on &plxdev->ring_lock

Message ID 20230726104827.60382-1-dg573847474@gmail.com (mailing list archive)
State Superseded
Headers show
Series dmaengine: plx_dma: Fix potential deadlock on &plxdev->ring_lock | expand

Commit Message

Chengfeng Ye July 26, 2023, 10:48 a.m. UTC
As plx_dma_process_desc() is invoked by both tasklet plx_dma_desc_task()
under softirq context and plx_dma_tx_status() callback that executed under
process context, the lock aquicision of &plxdev->ring_lock inside
plx_dma_process_desc() should disable irq otherwise deadlock could happen
if the irq preempts the execution of process context code while the lock
is held in process context on the same CPU.

Possible deadlock scenario:
plx_dma_tx_status()
    -> plx_dma_process_desc()
    -> spin_lock(&plxdev->ring_lock)
        <tasklet softirq>
        -> plx_dma_desc_task()
        -> plx_dma_process_desc()
        -> spin_lock(&plxdev->ring_lock) (deadlock here)

This flaw was found by an experimental static analysis tool I am developing
for irq-related deadlock.

The tentative patch fixes the potential deadlock by spin_lock_irqsave() in
plx_dma_process_desc() to disable irq while lock is held.

Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
---
 drivers/dma/plx_dma.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Chengfeng Ye July 26, 2023, 10:58 a.m. UTC | #1
Fixes: 1d05a0bdb420 ("dmaengine: plx_dma: Move spin_lock_bh() to spin_lock() ")
Logan Gunthorpe July 26, 2023, 3:57 p.m. UTC | #2
On 2023-07-26 04:48, Chengfeng Ye wrote:
> As plx_dma_process_desc() is invoked by both tasklet plx_dma_desc_task()
> under softirq context and plx_dma_tx_status() callback that executed under
> process context, the lock aquicision of &plxdev->ring_lock inside
> plx_dma_process_desc() should disable irq otherwise deadlock could happen
> if the irq preempts the execution of process context code while the lock
> is held in process context on the same CPU.
> 
> Possible deadlock scenario:
> plx_dma_tx_status()
>     -> plx_dma_process_desc()
>     -> spin_lock(&plxdev->ring_lock)
>         <tasklet softirq>
>         -> plx_dma_desc_task()
>         -> plx_dma_process_desc()
>         -> spin_lock(&plxdev->ring_lock) (deadlock here)
> 
> This flaw was found by an experimental static analysis tool I am developing
> for irq-related deadlock.
> 
> The tentative patch fixes the potential deadlock by spin_lock_irqsave() in
> plx_dma_process_desc() to disable irq while lock is held.
> 
> Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>

Makes sense. Thanks!

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Logan
Christophe JAILLET July 26, 2023, 9 p.m. UTC | #3
Le 26/07/2023 à 17:57, Logan Gunthorpe a écrit :
> 
> 
> On 2023-07-26 04:48, Chengfeng Ye wrote:
>> As plx_dma_process_desc() is invoked by both tasklet plx_dma_desc_task()
>> under softirq context and plx_dma_tx_status() callback that executed under
>> process context, the lock aquicision of &plxdev->ring_lock inside
>> plx_dma_process_desc() should disable irq otherwise deadlock could happen
>> if the irq preempts the execution of process context code while the lock
>> is held in process context on the same CPU.
>>
>> Possible deadlock scenario:
>> plx_dma_tx_status()
>>      -> plx_dma_process_desc()
>>      -> spin_lock(&plxdev->ring_lock)
>>          <tasklet softirq>
>>          -> plx_dma_desc_task()
>>          -> plx_dma_process_desc()
>>          -> spin_lock(&plxdev->ring_lock) (deadlock here)
>>
>> This flaw was found by an experimental static analysis tool I am developing
>> for irq-related deadlock.
>>
>> The tentative patch fixes the potential deadlock by spin_lock_irqsave() in
>> plx_dma_process_desc() to disable irq while lock is held.
>>
>> Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
> 
> Makes sense. Thanks!
> 
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
> 
> Logan
> 

Hi,

as explained in another reply [1], would spin_lock_bh() be enough in 
such a case?

CJ

[1]: 
https://lore.kernel.org/all/5125e39b-0faf-63fc-0c51-982b2a567e21@wanadoo.fr/
Logan Gunthorpe July 26, 2023, 9:10 p.m. UTC | #4
On 2023-07-26 15:00, Christophe JAILLET wrote:
> Le 26/07/2023 à 17:57, Logan Gunthorpe a écrit :
>>
>>
>> On 2023-07-26 04:48, Chengfeng Ye wrote:
>>> As plx_dma_process_desc() is invoked by both tasklet plx_dma_desc_task()
>>> under softirq context and plx_dma_tx_status() callback that executed
>>> under
>>> process context, the lock aquicision of &plxdev->ring_lock inside
>>> plx_dma_process_desc() should disable irq otherwise deadlock could
>>> happen
>>> if the irq preempts the execution of process context code while the lock
>>> is held in process context on the same CPU.
>>>
>>> Possible deadlock scenario:
>>> plx_dma_tx_status()
>>>      -> plx_dma_process_desc()
>>>      -> spin_lock(&plxdev->ring_lock)
>>>          <tasklet softirq>
>>>          -> plx_dma_desc_task()
>>>          -> plx_dma_process_desc()
>>>          -> spin_lock(&plxdev->ring_lock) (deadlock here)
>>>
>>> This flaw was found by an experimental static analysis tool I am
>>> developing
>>> for irq-related deadlock.
>>>
>>> The tentative patch fixes the potential deadlock by
>>> spin_lock_irqsave() in
>>> plx_dma_process_desc() to disable irq while lock is held.
>>>
>>> Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
>>
>> Makes sense. Thanks!
>>
>> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
>>
>> Logan
>>
> 
> Hi,
> 
> as explained in another reply [1], would spin_lock_bh() be enough in
> such a case?

The driver originally used spin_lock_bh(). It was removed by Yunbo Yu in
2022 who said that it was unnecessary to be used with a tasklet:

1d05a0bdb420 ("dmaengine: plx_dma: Move spin_lock_bh() to spin_lock() ")

If spin_lock_bh() is correct (which is what I originally thought when I
wrote the driver, though I'm a bit confused now) then I guess that
Yunbo's change was just incorrect. It sounded sensible at the time, but
it looks like there are two call sites of plx_dma_desc_task(): one in
the tasklet and one not in the tasklet. The one not in the tasklet needs
to use the bh version.

So perhaps we should just revert 1d05a0bdb420?

Logan
Chengfeng Ye July 27, 2023, 6:48 a.m. UTC | #5
Hi Logan and Christophe,

Thanks much for the reply and reminder, and yes, spin_lock_bh() should
be better.

When I wrote the patch I thought the spin_lock_bh() cannot be nested,
and afraid that if some outside callers called .dma_tx_status() callback
with softirq already disable, then spin_unlock_bh() would unintentionally
re-enable softirq(). spin_lock_irqsave() is always safer in general thus I
used it.

But I just check the document [1] about these API and found that _bh()
can be nested. Then use spin_lock_bh() should be better due to
performance concern.


> So perhaps we should just revert 1d05a0bdb420?
Then for this one I think revert 1d05a0bdb420 should be enough. May I
ask to revert that patch, should I do anything further? (like sending
a new patch).

> as explained in another reply [1], would spin_lock_bh() be enough in
> such a case?
For the another one [2], I would send a v2 patch to change to spin_lock_bh()

[1] http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch07lev1sec6.html
[2] https://lore.kernel.org/all/5125e39b-0faf-63fc-0c51-982b2a567e21@wanadoo.fr/

Thanks again,
Chengfeng
Eric Schwarz July 27, 2023, 9:45 a.m. UTC | #6
Hello,

Am 27.07.2023 um 08:48 schrieb Chengfeng Ye:
> Hi Logan and Christophe,
> 
> Thanks much for the reply and reminder, and yes, spin_lock_bh() should
> be better.
> 
> When I wrote the patch I thought the spin_lock_bh() cannot be nested,
> and afraid that if some outside callers called .dma_tx_status() callback
> with softirq already disable, then spin_unlock_bh() would unintentionally
> re-enable softirq(). spin_lock_irqsave() is always safer in general thus I
> used it.
> 
> But I just check the document [1] about these API and found that _bh()
> can be nested. Then use spin_lock_bh() should be better due to
> performance concern.
> 
> 
>> So perhaps we should just revert 1d05a0bdb420?
> Then for this one I think revert 1d05a0bdb420 should be enough. May I
> ask to revert that patch, should I do anything further? (like sending
> a new patch).
> 
>> as explained in another reply [1], would spin_lock_bh() be enough in
>> such a case?
> For the another one [2], I would send a v2 patch to change to spin_lock_bh()
> 
> [1] http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch07lev1sec6.html
> [2] https://lore.kernel.org/all/5125e39b-0faf-63fc-0c51-982b2a567e21@wanadoo.fr/

For uniformity reason across drivers and also that not something else 
gets missed please compare your requirements and solution to the 
implementation of the "altera-msgdma" driver (altera-msgdma.c).

W/ special emphasis on commit edf10919e5fc ("dmaengine: altera: fix 
spinlock usage")

spin_lock_bh was changed to spin_lock_irqsave w/ this patch.

Cheers
Eric
Chengfeng Ye July 27, 2023, 2:34 p.m. UTC | #7
Hi Eric,

> W/ special emphasis on commit edf10919e5fc ("dmaengine: altera: fix
> spinlock usage")
> spin_lock_bh was changed to spin_lock_irqsave w/ this patch.

Yeah, that patch is to fix the deadlock between irq and kernel thread
thus use spin_lock_irqsave(), exactly the kind of bug that I am reporting
about. But slight difference is the deadlock on that commit concerns
hardirq and thus fixed by spin_lock_irqsave(), but this one concerns softirq.

> For uniformity reason across drivers and also that not something else
> gets missed please compare your requirements and solution to the
> implementation of the "altera-msgdma" driver (altera-msgdma.c).

Maybe for our case spin_lock_bh() seems to be enough since we are handling
deadlock between tasklet and kernel thread. It's truth that uniformity
is an important concern, also as I glance at other DMA driver, in general
spin_lock_irqsave() is more frequently used than spin_lock_bh(). But this driver
already consistently uses spin_lock_bh() on &plxdev->ring_lock, only change to
spin_lock_irqsave() at this function could instead make the code seems
a bit weird.

Even though my original patch did use spin_lock_irqsave(), now I think at least
for this bug fix just revert back to spin_lock_bh() may be better and
will not complicate
things much.

Thanks,
Chengfeng
Logan Gunthorpe July 27, 2023, 3:31 p.m. UTC | #8
On 7/27/23 00:48, Chengfeng Ye wrote:
> Hi Logan and Christophe,
> 
> Thanks much for the reply and reminder, and yes, spin_lock_bh() should
> be better.
> 
> When I wrote the patch I thought the spin_lock_bh() cannot be nested,
> and afraid that if some outside callers called .dma_tx_status() callback
> with softirq already disable, then spin_unlock_bh() would unintentionally
> re-enable softirq(). spin_lock_irqsave() is always safer in general thus I
> used it.
> 
> But I just check the document [1] about these API and found that _bh()
> can be nested. Then use spin_lock_bh() should be better due to
> performance concern.
> 
> 
>> So perhaps we should just revert 1d05a0bdb420?
> Then for this one I think revert 1d05a0bdb420 should be enough. May I
> ask to revert that patch, should I do anything further? (like sending
> a new patch).

Yes, I think you can just send a revert patch explaining the reasoning
further in a commit message.

Thanks,

Logan
Chengfeng Ye July 29, 2023, 6:01 p.m. UTC | #9
> Yes, I think you can just send a revert patch explaining the reasoning
> further in a commit message.

v2 patch is just sent to address the problem.

Best Regards,
Chengfeng
diff mbox series

Patch

diff --git a/drivers/dma/plx_dma.c b/drivers/dma/plx_dma.c
index 34b6416c3287..46068b8d83f9 100644
--- a/drivers/dma/plx_dma.c
+++ b/drivers/dma/plx_dma.c
@@ -135,9 +135,10 @@  static void plx_dma_process_desc(struct plx_dma_dev *plxdev)
 {
 	struct dmaengine_result res;
 	struct plx_dma_desc *desc;
+	unsigned long lock_flags;
 	u32 flags;
 
-	spin_lock(&plxdev->ring_lock);
+	spin_lock_irqsave(&plxdev->ring_lock, lock_flags);
 
 	while (plxdev->tail != plxdev->head) {
 		desc = plx_dma_get_desc(plxdev, plxdev->tail);
@@ -165,7 +166,7 @@  static void plx_dma_process_desc(struct plx_dma_dev *plxdev)
 		plxdev->tail++;
 	}
 
-	spin_unlock(&plxdev->ring_lock);
+	spin_unlock_irqrestore(&plxdev->ring_lock, lock_flags);
 }
 
 static void plx_dma_abort_desc(struct plx_dma_dev *plxdev)