diff mbox series

[v2] dmaengine: plx_dma: Fix potential deadlock on &plxdev->ring_lock

Message ID 20230729175952.4068-1-dg573847474@gmail.com (mailing list archive)
State Changes Requested
Headers show
Series [v2] dmaengine: plx_dma: Fix potential deadlock on &plxdev->ring_lock | expand

Commit Message

Chengfeng Ye July 29, 2023, 5:59 p.m. UTC
As plx_dma_process_desc() is invoked by both tasklet plx_dma_desc_task()
under softirq context and plx_dma_tx_status() callback that executed under
process context, the lock aquicision of &plxdev->ring_lock inside
plx_dma_process_desc() should disable irq otherwise deadlock could happen
if the irq preempts the execution of process context code while the lock
is held in process context on the same CPU.

Possible deadlock scenario:
plx_dma_tx_status()
    -> plx_dma_process_desc()
    -> spin_lock(&plxdev->ring_lock)
        <tasklet softirq>
        -> plx_dma_desc_task()
        -> plx_dma_process_desc()
        -> spin_lock(&plxdev->ring_lock) (deadlock here)

This flaw was found by an experimental static analysis tool I am developing
for irq-related deadlock.

The lock was changed from spin_lock_bh() to spin_lock() by a previous patch
for performance concern but unintentionally brought this potential deadlock
problem.

This patch reverts back to spin_lock_bh() to fix the deadlock problem.

Fixes: 1d05a0bdb420 ("dmaengine: plx_dma: Move spin_lock_bh() to spin_lock()")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>

Changes in v2
- Consistently use spin_lock_bh() on &plxdev->ring_lock instead of
spin_lock_irqsave().
---
 drivers/dma/plx_dma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Logan Gunthorpe July 30, 2023, 11:50 p.m. UTC | #1
On 7/29/23 11:59, Chengfeng Ye wrote:
> As plx_dma_process_desc() is invoked by both tasklet plx_dma_desc_task()
> under softirq context and plx_dma_tx_status() callback that executed under
> process context, the lock aquicision of &plxdev->ring_lock inside
> plx_dma_process_desc() should disable irq otherwise deadlock could happen
> if the irq preempts the execution of process context code while the lock
> is held in process context on the same CPU.
> 
> Possible deadlock scenario:
> plx_dma_tx_status()
>     -> plx_dma_process_desc()
>     -> spin_lock(&plxdev->ring_lock)
>         <tasklet softirq>
>         -> plx_dma_desc_task()
>         -> plx_dma_process_desc()
>         -> spin_lock(&plxdev->ring_lock) (deadlock here)
> 
> This flaw was found by an experimental static analysis tool I am developing
> for irq-related deadlock.
> 
> The lock was changed from spin_lock_bh() to spin_lock() by a previous patch
> for performance concern but unintentionally brought this potential deadlock
> problem.
> 
> This patch reverts back to spin_lock_bh() to fix the deadlock problem.
> 
> Fixes: 1d05a0bdb420 ("dmaengine: plx_dma: Move spin_lock_bh() to spin_lock()")
> Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
> 

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>

Thanks!

Logan
Eric Schwarz Aug. 28, 2023, 4:18 p.m. UTC | #2
Hello,

Am 29.07.2023 um 19:59 schrieb Chengfeng Ye:
> This flaw was found by an experimental static analysis tool I am developing
> for irq-related deadlock.

Just out of curiosity, did/could
- Linux kernel config checks like CONFIG_DEBUG_SPINLOCK option or
- Smatch [1]
find that issue too?

I have also found an article from Dan Carpenter on the net about lock 
checking capability of Smatch which relates IMHO to what you are doing [2].

The question is, whether the checks/algorithm what you have developed 
already exists in form of other tools or they might be added to an 
already existing one, which is already spread across the community and 
used accordingly.

Many thanks for your reply in advance.

[1] https://github.com/error27/smatch
[2] https://blogs.oracle.com/linux/post/writing-the-ultimate-locking-check

Cheers
Eric
Chengfeng Ye Aug. 29, 2023, 3:10 a.m. UTC | #3
Hi Eric,

Thank you for your interest in it.

For a dynamic detection solution, then the answer is yes.
Lockdep, which should be enabled by CONFIG_DEBUG_SPINLOCK,
has the ability to detect such deadlocks. But the problem is that the detection
requires input and exact thread interleaving to trigger the bug, otherwise
the bugs would be buried and cannot be detected.

For static analysis, I think the answer is no. Smatch, like other
static deadlock detection algorithms in CBMC[1] and Infer[2], should be
designed to reason thread interaction but not interrupts, which requires
new algorithms that I am working on.

Besides, may I ask a question that I have sent some patches[3][4] weeks
ago, but have not yet got a reply. Would reviewers check the patches
later or should I ping them again?

[1] http://www.cprover.org/deadlock-detection/
[2] https://github.com/facebook/infer
[3] https://lore.kernel.org/lkml/20230726062313.77121-1-dg573847474@gmail.com/
[4] https://lore.kernel.org/lkml/20230726051727.64088-1-dg573847474@gmail.com/

Thanks,
Chengfeng
Eric Schwarz Aug. 29, 2023, 11:05 a.m. UTC | #4
Hello Chengfeng,

Am 29.08.2023 um 05:10 schrieb Chengfeng Ye:
> Hi Eric,
> 
> Thank you for your interest in it.

Thanks for getting back to me.

> For a dynamic detection solution, then the answer is yes.
> Lockdep, which should be enabled by CONFIG_DEBUG_SPINLOCK,
> has the ability to detect such deadlocks. But the problem is that the detection
> requires input and exact thread interleaving to trigger the bug, otherwise
> the bugs would be buried and cannot be detected.
> 
> For static analysis, I think the answer is no. Smatch, like other
> static deadlock detection algorithms in CBMC[1] and Infer[2], should be
> designed to reason thread interaction but not interrupts, which requires
> new algorithms that I am working on.

Will you publish your work later on e.g. on github?
Actually maybe it would even make sense to integrate your work into 
scripts/checkpatch.pl of the Linux kernel (or the like).
Basically if a patch to be committed fails locking it should not be 
committed anyway.
IMHO the quality standard one could expect from the code should always 
be the same. So adding it to a mandatory check procedure (script which 
must be executed before committing patches) and/or to "0-DAY CI Kernel 
Test Service" [5] would definitely be worth a thought.

> Besides, may I ask a question that I have sent some patches[3][4] weeks
> ago, but have not yet got a reply. Would reviewers check the patches
> later or should I ping them again?

You never have a guarantee who will when review your patch on the 
mailing list. It is kind of best effort based system mainly of volunteers.
Just give people a bit of time since it is currently also holiday time.
You may ping the maintainer of the subsystem when some time has passed 
since he is responsible for the patches to be administered.
BTW, I think you already pinged indirectly w/ your e-mail.

> [1] http://www.cprover.org/deadlock-detection/
> [2] https://github.com/facebook/infer
> [3] https://lore.kernel.org/lkml/20230726062313.77121-1-dg573847474@gmail.com/
> [4] https://lore.kernel.org/lkml/20230726051727.64088-1-dg573847474@gmail.com/

[5] https://github.com/intel/lkp-tests/wiki

Cheers
Eric
diff mbox series

Patch

diff --git a/drivers/dma/plx_dma.c b/drivers/dma/plx_dma.c
index 34b6416c3287..7693c067a1aa 100644
--- a/drivers/dma/plx_dma.c
+++ b/drivers/dma/plx_dma.c
@@ -137,7 +137,7 @@  static void plx_dma_process_desc(struct plx_dma_dev *plxdev)
 	struct plx_dma_desc *desc;
 	u32 flags;
 
-	spin_lock(&plxdev->ring_lock);
+	spin_lock_bh(&plxdev->ring_lock);
 
 	while (plxdev->tail != plxdev->head) {
 		desc = plx_dma_get_desc(plxdev, plxdev->tail);
@@ -165,7 +165,7 @@  static void plx_dma_process_desc(struct plx_dma_dev *plxdev)
 		plxdev->tail++;
 	}
 
-	spin_unlock(&plxdev->ring_lock);
+	spin_unlock_bh(&plxdev->ring_lock);
 }
 
 static void plx_dma_abort_desc(struct plx_dma_dev *plxdev)