Message ID | 20230726104827.60382-1-dg573847474@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | dmaengine: plx_dma: Fix potential deadlock on &plxdev->ring_lock | expand |
Fixes: 1d05a0bdb420 ("dmaengine: plx_dma: Move spin_lock_bh() to spin_lock() ")
On 2023-07-26 04:48, Chengfeng Ye wrote: > As plx_dma_process_desc() is invoked by both tasklet plx_dma_desc_task() > under softirq context and plx_dma_tx_status() callback that executed under > process context, the lock aquicision of &plxdev->ring_lock inside > plx_dma_process_desc() should disable irq otherwise deadlock could happen > if the irq preempts the execution of process context code while the lock > is held in process context on the same CPU. > > Possible deadlock scenario: > plx_dma_tx_status() > -> plx_dma_process_desc() > -> spin_lock(&plxdev->ring_lock) > <tasklet softirq> > -> plx_dma_desc_task() > -> plx_dma_process_desc() > -> spin_lock(&plxdev->ring_lock) (deadlock here) > > This flaw was found by an experimental static analysis tool I am developing > for irq-related deadlock. > > The tentative patch fixes the potential deadlock by spin_lock_irqsave() in > plx_dma_process_desc() to disable irq while lock is held. > > Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Makes sense. Thanks! Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Logan
Le 26/07/2023 à 17:57, Logan Gunthorpe a écrit : > > > On 2023-07-26 04:48, Chengfeng Ye wrote: >> As plx_dma_process_desc() is invoked by both tasklet plx_dma_desc_task() >> under softirq context and plx_dma_tx_status() callback that executed under >> process context, the lock aquicision of &plxdev->ring_lock inside >> plx_dma_process_desc() should disable irq otherwise deadlock could happen >> if the irq preempts the execution of process context code while the lock >> is held in process context on the same CPU. >> >> Possible deadlock scenario: >> plx_dma_tx_status() >> -> plx_dma_process_desc() >> -> spin_lock(&plxdev->ring_lock) >> <tasklet softirq> >> -> plx_dma_desc_task() >> -> plx_dma_process_desc() >> -> spin_lock(&plxdev->ring_lock) (deadlock here) >> >> This flaw was found by an experimental static analysis tool I am developing >> for irq-related deadlock. >> >> The tentative patch fixes the potential deadlock by spin_lock_irqsave() in >> plx_dma_process_desc() to disable irq while lock is held. >> >> Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> > > Makes sense. Thanks! > > Reviewed-by: Logan Gunthorpe <logang@deltatee.com> > > Logan > Hi, as explained in another reply [1], would spin_lock_bh() be enough in such a case? CJ [1]: https://lore.kernel.org/all/5125e39b-0faf-63fc-0c51-982b2a567e21@wanadoo.fr/
On 2023-07-26 15:00, Christophe JAILLET wrote: > Le 26/07/2023 à 17:57, Logan Gunthorpe a écrit : >> >> >> On 2023-07-26 04:48, Chengfeng Ye wrote: >>> As plx_dma_process_desc() is invoked by both tasklet plx_dma_desc_task() >>> under softirq context and plx_dma_tx_status() callback that executed >>> under >>> process context, the lock aquicision of &plxdev->ring_lock inside >>> plx_dma_process_desc() should disable irq otherwise deadlock could >>> happen >>> if the irq preempts the execution of process context code while the lock >>> is held in process context on the same CPU. >>> >>> Possible deadlock scenario: >>> plx_dma_tx_status() >>> -> plx_dma_process_desc() >>> -> spin_lock(&plxdev->ring_lock) >>> <tasklet softirq> >>> -> plx_dma_desc_task() >>> -> plx_dma_process_desc() >>> -> spin_lock(&plxdev->ring_lock) (deadlock here) >>> >>> This flaw was found by an experimental static analysis tool I am >>> developing >>> for irq-related deadlock. >>> >>> The tentative patch fixes the potential deadlock by >>> spin_lock_irqsave() in >>> plx_dma_process_desc() to disable irq while lock is held. >>> >>> Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> >> >> Makes sense. Thanks! >> >> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> >> >> Logan >> > > Hi, > > as explained in another reply [1], would spin_lock_bh() be enough in > such a case? The driver originally used spin_lock_bh(). It was removed by Yunbo Yu in 2022 who said that it was unnecessary to be used with a tasklet: 1d05a0bdb420 ("dmaengine: plx_dma: Move spin_lock_bh() to spin_lock() ") If spin_lock_bh() is correct (which is what I originally thought when I wrote the driver, though I'm a bit confused now) then I guess that Yunbo's change was just incorrect. It sounded sensible at the time, but it looks like there are two call sites of plx_dma_desc_task(): one in the tasklet and one not in the tasklet. The one not in the tasklet needs to use the bh version. So perhaps we should just revert 1d05a0bdb420? Logan
Hi Logan and Christophe, Thanks much for the reply and reminder, and yes, spin_lock_bh() should be better. When I wrote the patch I thought the spin_lock_bh() cannot be nested, and afraid that if some outside callers called .dma_tx_status() callback with softirq already disable, then spin_unlock_bh() would unintentionally re-enable softirq(). spin_lock_irqsave() is always safer in general thus I used it. But I just check the document [1] about these API and found that _bh() can be nested. Then use spin_lock_bh() should be better due to performance concern. > So perhaps we should just revert 1d05a0bdb420? Then for this one I think revert 1d05a0bdb420 should be enough. May I ask to revert that patch, should I do anything further? (like sending a new patch). > as explained in another reply [1], would spin_lock_bh() be enough in > such a case? For the another one [2], I would send a v2 patch to change to spin_lock_bh() [1] http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch07lev1sec6.html [2] https://lore.kernel.org/all/5125e39b-0faf-63fc-0c51-982b2a567e21@wanadoo.fr/ Thanks again, Chengfeng
Hello, Am 27.07.2023 um 08:48 schrieb Chengfeng Ye: > Hi Logan and Christophe, > > Thanks much for the reply and reminder, and yes, spin_lock_bh() should > be better. > > When I wrote the patch I thought the spin_lock_bh() cannot be nested, > and afraid that if some outside callers called .dma_tx_status() callback > with softirq already disable, then spin_unlock_bh() would unintentionally > re-enable softirq(). spin_lock_irqsave() is always safer in general thus I > used it. > > But I just check the document [1] about these API and found that _bh() > can be nested. Then use spin_lock_bh() should be better due to > performance concern. > > >> So perhaps we should just revert 1d05a0bdb420? > Then for this one I think revert 1d05a0bdb420 should be enough. May I > ask to revert that patch, should I do anything further? (like sending > a new patch). > >> as explained in another reply [1], would spin_lock_bh() be enough in >> such a case? > For the another one [2], I would send a v2 patch to change to spin_lock_bh() > > [1] http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch07lev1sec6.html > [2] https://lore.kernel.org/all/5125e39b-0faf-63fc-0c51-982b2a567e21@wanadoo.fr/ For uniformity reason across drivers and also that not something else gets missed please compare your requirements and solution to the implementation of the "altera-msgdma" driver (altera-msgdma.c). W/ special emphasis on commit edf10919e5fc ("dmaengine: altera: fix spinlock usage") spin_lock_bh was changed to spin_lock_irqsave w/ this patch. Cheers Eric
Hi Eric, > W/ special emphasis on commit edf10919e5fc ("dmaengine: altera: fix > spinlock usage") > spin_lock_bh was changed to spin_lock_irqsave w/ this patch. Yeah, that patch is to fix the deadlock between irq and kernel thread thus use spin_lock_irqsave(), exactly the kind of bug that I am reporting about. But slight difference is the deadlock on that commit concerns hardirq and thus fixed by spin_lock_irqsave(), but this one concerns softirq. > For uniformity reason across drivers and also that not something else > gets missed please compare your requirements and solution to the > implementation of the "altera-msgdma" driver (altera-msgdma.c). Maybe for our case spin_lock_bh() seems to be enough since we are handling deadlock between tasklet and kernel thread. It's truth that uniformity is an important concern, also as I glance at other DMA driver, in general spin_lock_irqsave() is more frequently used than spin_lock_bh(). But this driver already consistently uses spin_lock_bh() on &plxdev->ring_lock, only change to spin_lock_irqsave() at this function could instead make the code seems a bit weird. Even though my original patch did use spin_lock_irqsave(), now I think at least for this bug fix just revert back to spin_lock_bh() may be better and will not complicate things much. Thanks, Chengfeng
On 7/27/23 00:48, Chengfeng Ye wrote: > Hi Logan and Christophe, > > Thanks much for the reply and reminder, and yes, spin_lock_bh() should > be better. > > When I wrote the patch I thought the spin_lock_bh() cannot be nested, > and afraid that if some outside callers called .dma_tx_status() callback > with softirq already disable, then spin_unlock_bh() would unintentionally > re-enable softirq(). spin_lock_irqsave() is always safer in general thus I > used it. > > But I just check the document [1] about these API and found that _bh() > can be nested. Then use spin_lock_bh() should be better due to > performance concern. > > >> So perhaps we should just revert 1d05a0bdb420? > Then for this one I think revert 1d05a0bdb420 should be enough. May I > ask to revert that patch, should I do anything further? (like sending > a new patch). Yes, I think you can just send a revert patch explaining the reasoning further in a commit message. Thanks, Logan
> Yes, I think you can just send a revert patch explaining the reasoning > further in a commit message. v2 patch is just sent to address the problem. Best Regards, Chengfeng
diff --git a/drivers/dma/plx_dma.c b/drivers/dma/plx_dma.c index 34b6416c3287..46068b8d83f9 100644 --- a/drivers/dma/plx_dma.c +++ b/drivers/dma/plx_dma.c @@ -135,9 +135,10 @@ static void plx_dma_process_desc(struct plx_dma_dev *plxdev) { struct dmaengine_result res; struct plx_dma_desc *desc; + unsigned long lock_flags; u32 flags; - spin_lock(&plxdev->ring_lock); + spin_lock_irqsave(&plxdev->ring_lock, lock_flags); while (plxdev->tail != plxdev->head) { desc = plx_dma_get_desc(plxdev, plxdev->tail); @@ -165,7 +166,7 @@ static void plx_dma_process_desc(struct plx_dma_dev *plxdev) plxdev->tail++; } - spin_unlock(&plxdev->ring_lock); + spin_unlock_irqrestore(&plxdev->ring_lock, lock_flags); } static void plx_dma_abort_desc(struct plx_dma_dev *plxdev)
As plx_dma_process_desc() is invoked by both tasklet plx_dma_desc_task() under softirq context and plx_dma_tx_status() callback that executed under process context, the lock aquicision of &plxdev->ring_lock inside plx_dma_process_desc() should disable irq otherwise deadlock could happen if the irq preempts the execution of process context code while the lock is held in process context on the same CPU. Possible deadlock scenario: plx_dma_tx_status() -> plx_dma_process_desc() -> spin_lock(&plxdev->ring_lock) <tasklet softirq> -> plx_dma_desc_task() -> plx_dma_process_desc() -> spin_lock(&plxdev->ring_lock) (deadlock here) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. The tentative patch fixes the potential deadlock by spin_lock_irqsave() in plx_dma_process_desc() to disable irq while lock is held. Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> --- drivers/dma/plx_dma.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)