Message ID | 20230504113509.184633-2-angelogioacchino.delregno@collabora.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Krzysztof Wilczyński |
Headers | show |
Series | MediaTek PCIe Gen3: Suspend fixes | expand |
On 04.05.23 13:35, AngeloGioacchino Del Regno wrote: Hi, looking at your patch I am afraid there is an issue. > In mtk_pcie_suspend_noirq() and mtk_pcie_resume_noirq() we are, > respectively, disabling and enabling generation of interrupts and > then saving and restoring the enabled interrupts register: since > we're using noirq PM callbacks, that can be safely done without > holding any spin lock. Why? You can still race with another CPU in task context. That is if you say that you do not need locking to touch PCIE_INT_ENABLE_REG that is fine, but then why do you remove it from one place only? It is also touched in mtk_pcie_probe() at a minimum. > That was noticed because of, and solves, the following issue: > > <4>[ 74.185982] ======================================================== > <4>[ 74.192629] WARNING: possible irq lock inversion dependency detected > <4>[ 74.199276] 6.3.0-next-20230428+ #51 Tainted: G W > <4>[ 74.205664] -------------------------------------------------------- > <4>[ 74.212309] systemd-sleep/809 just changed the state of lock: > <4>[ 74.218347] ffff65a5c34c65a0 (&pcie->irq_lock){+...}-{2:2}, at: mtk_pcie_resume+0x50/0xa8 > <4>[ 74.226870] but this lock was taken by another, HARDIRQ-safe lock in the past: > <4>[ 74.234389] (&irq_desc_lock_class){-.-.}-{2:2} > <4>[ 74.234409] > <4>[ 74.234409] > <4>[ 74.234409] and interrupts could create inverse lock ordering between them. > <4>[ 74.234409] > <4>[ 74.251704] > <4>[ 74.251704] other info that might help us debug this: > <4>[ 74.258785] Possible interrupt unsafe locking scenario: > <4>[ 74.258785] > <4>[ 74.266126] CPU0 CPU1 > <4>[ 74.270942] ---- ---- > <4>[ 74.275758] lock(&pcie->irq_lock); Lock A > <4>[ 74.279627] local_irq_disable(); strictly speaking irrelevant > <4>[ 74.285836] lock(&irq_desc_lock_class); lock B > <4>[ 74.292667] lock(&pcie->irq_lock); lock A > <4>[ 74.299061] <Interrupt> You do not need that interrupt. > <4>[ 74.301960] lock(&irq_desc_lock_class); lock B > <4>[ 74.306438] > <4>[ 74.306438] *** DEADLOCK ***
Il 08/05/23 09:44, Oliver Neukum ha scritto: > On 04.05.23 13:35, AngeloGioacchino Del Regno wrote: > > Hi, > > looking at your patch I am afraid there is an issue. > >> In mtk_pcie_suspend_noirq() and mtk_pcie_resume_noirq() we are, >> respectively, disabling and enabling generation of interrupts and >> then saving and restoring the enabled interrupts register: since >> we're using noirq PM callbacks, that can be safely done without >> holding any spin lock. > > Why? You can still race with another CPU in task context. > That is if you say that you do not need locking to touch > PCIE_INT_ENABLE_REG that is fine, but then why do you remove > it from one place only? > It is also touched in mtk_pcie_probe() at a minimum. > > >> That was noticed because of, and solves, the following issue: >> >> <4>[ 74.185982] ======================================================== >> <4>[ 74.192629] WARNING: possible irq lock inversion dependency detected >> <4>[ 74.199276] 6.3.0-next-20230428+ #51 Tainted: G W >> <4>[ 74.205664] -------------------------------------------------------- >> <4>[ 74.212309] systemd-sleep/809 just changed the state of lock: >> <4>[ 74.218347] ffff65a5c34c65a0 (&pcie->irq_lock){+...}-{2:2}, at: >> mtk_pcie_resume+0x50/0xa8 >> <4>[ 74.226870] but this lock was taken by another, HARDIRQ-safe lock in the past: >> <4>[ 74.234389] (&irq_desc_lock_class){-.-.}-{2:2} >> <4>[ 74.234409] >> <4>[ 74.234409] >> <4>[ 74.234409] and interrupts could create inverse lock ordering between them. >> <4>[ 74.234409] >> <4>[ 74.251704] >> <4>[ 74.251704] other info that might help us debug this: >> <4>[ 74.258785] Possible interrupt unsafe locking scenario: >> <4>[ 74.258785] >> <4>[ 74.266126] CPU0 CPU1 >> <4>[ 74.270942] ---- ---- >> <4>[ 74.275758] lock(&pcie->irq_lock); > > Lock A > >> <4>[ 74.279627] local_irq_disable(); > > strictly speaking irrelevant > >> <4>[ 74.285836] lock(&irq_desc_lock_class); > > lock B > >> <4>[ 74.292667] lock(&pcie->irq_lock); > > lock A > >> <4>[ 74.299061] <Interrupt> > > You do not need that interrupt. > >> <4>[ 74.301960] lock(&irq_desc_lock_class); > > lock B > >> <4>[ 74.306438] >> <4>[ 74.306438] *** DEADLOCK *** Sorry for the very late reply. I just noticed this. I'm unsure, at this point, about how to solve this warning; ideas? Thanks, Angelo
On Thu, May 04, 2023 at 01:35:08PM +0200, AngeloGioacchino Del Regno wrote: > In mtk_pcie_suspend_noirq() and mtk_pcie_resume_noirq() we are, > respectively, disabling and enabling generation of interrupts and > then saving and restoring the enabled interrupts register: since > we're using noirq PM callbacks, that can be safely done without > holding any spin lock. Tangent: it's annoying that pcie-mediatek.c and pcie-mediatek-gen3.c use identical "mtk_pcie_suspend_noirq()" names. That makes browsing harder than it needs to be. But I see that you refer to mediatek-gen3. > +++ b/drivers/pci/controller/pcie-mediatek-gen3.c > @@ -963,8 +963,6 @@ static void mtk_pcie_irq_save(struct mtk_gen3_pcie *pcie) > { > int i; > > - raw_spin_lock(&pcie->irq_lock); > - > pcie->saved_irq_state = readl_relaxed(pcie->base + PCIE_INT_ENABLE_REG); > > for (i = 0; i < PCIE_MSI_SET_NUM; i++) { > @@ -973,16 +971,12 @@ static void mtk_pcie_irq_save(struct mtk_gen3_pcie *pcie) > msi_set->saved_irq_state = readl_relaxed(msi_set->base + > PCIE_MSI_SET_ENABLE_OFFSET); > } > - > - raw_spin_unlock(&pcie->irq_lock); > } Jianjun added mtk_pcie_irq_save() and mtk_pcie_irq_restore() with d537dc125f07 ("PCI: mediatek-gen3: Add system PM support"). I suggest looking at other drivers and structuring mediatek-gen3 similarly, including using similar function names. No other drivers have a .*_pcie_irq_save() function. Several have .*_pcie_host_init(), and some of those do include some IRQ setup. Bjorn
Il 08/06/23 19:15, Bjorn Helgaas ha scritto: > On Thu, May 04, 2023 at 01:35:08PM +0200, AngeloGioacchino Del Regno wrote: >> In mtk_pcie_suspend_noirq() and mtk_pcie_resume_noirq() we are, >> respectively, disabling and enabling generation of interrupts and >> then saving and restoring the enabled interrupts register: since >> we're using noirq PM callbacks, that can be safely done without >> holding any spin lock. > > Tangent: it's annoying that pcie-mediatek.c and pcie-mediatek-gen3.c > use identical "mtk_pcie_suspend_noirq()" names. That makes browsing > harder than it needs to be. But I see that you refer to > mediatek-gen3. > >> +++ b/drivers/pci/controller/pcie-mediatek-gen3.c >> @@ -963,8 +963,6 @@ static void mtk_pcie_irq_save(struct mtk_gen3_pcie *pcie) >> { >> int i; >> >> - raw_spin_lock(&pcie->irq_lock); >> - >> pcie->saved_irq_state = readl_relaxed(pcie->base + PCIE_INT_ENABLE_REG); >> >> for (i = 0; i < PCIE_MSI_SET_NUM; i++) { >> @@ -973,16 +971,12 @@ static void mtk_pcie_irq_save(struct mtk_gen3_pcie *pcie) >> msi_set->saved_irq_state = readl_relaxed(msi_set->base + >> PCIE_MSI_SET_ENABLE_OFFSET); >> } >> - >> - raw_spin_unlock(&pcie->irq_lock); >> } > > Jianjun added mtk_pcie_irq_save() and mtk_pcie_irq_restore() with > d537dc125f07 ("PCI: mediatek-gen3: Add system PM support"). > > I suggest looking at other drivers and structuring mediatek-gen3 > similarly, including using similar function names. No other drivers > have a .*_pcie_irq_save() function. Several have .*_pcie_host_init(), > and some of those do include some IRQ setup. > > Bjorn Hello Bjorn, thanks for the feedback! Yes, I of course refer to the Gen3 driver... I'll check other drivers and will try to improve the consistency of this one with the others as soon as I have some bandwidth to perform the job. Thanks again, Angelo
diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c index b8612ce5f4d0..52f52ca5db71 100644 --- a/drivers/pci/controller/pcie-mediatek-gen3.c +++ b/drivers/pci/controller/pcie-mediatek-gen3.c @@ -963,8 +963,6 @@ static void mtk_pcie_irq_save(struct mtk_gen3_pcie *pcie) { int i; - raw_spin_lock(&pcie->irq_lock); - pcie->saved_irq_state = readl_relaxed(pcie->base + PCIE_INT_ENABLE_REG); for (i = 0; i < PCIE_MSI_SET_NUM; i++) { @@ -973,16 +971,12 @@ static void mtk_pcie_irq_save(struct mtk_gen3_pcie *pcie) msi_set->saved_irq_state = readl_relaxed(msi_set->base + PCIE_MSI_SET_ENABLE_OFFSET); } - - raw_spin_unlock(&pcie->irq_lock); } static void mtk_pcie_irq_restore(struct mtk_gen3_pcie *pcie) { int i; - raw_spin_lock(&pcie->irq_lock); - writel_relaxed(pcie->saved_irq_state, pcie->base + PCIE_INT_ENABLE_REG); for (i = 0; i < PCIE_MSI_SET_NUM; i++) { @@ -991,8 +985,6 @@ static void mtk_pcie_irq_restore(struct mtk_gen3_pcie *pcie) writel_relaxed(msi_set->saved_irq_state, msi_set->base + PCIE_MSI_SET_ENABLE_OFFSET); } - - raw_spin_unlock(&pcie->irq_lock); } static int mtk_pcie_turn_off_link(struct mtk_gen3_pcie *pcie)
In mtk_pcie_suspend_noirq() and mtk_pcie_resume_noirq() we are, respectively, disabling and enabling generation of interrupts and then saving and restoring the enabled interrupts register: since we're using noirq PM callbacks, that can be safely done without holding any spin lock. That was noticed because of, and solves, the following issue: <4>[ 74.185982] ======================================================== <4>[ 74.192629] WARNING: possible irq lock inversion dependency detected <4>[ 74.199276] 6.3.0-next-20230428+ #51 Tainted: G W <4>[ 74.205664] -------------------------------------------------------- <4>[ 74.212309] systemd-sleep/809 just changed the state of lock: <4>[ 74.218347] ffff65a5c34c65a0 (&pcie->irq_lock){+...}-{2:2}, at: mtk_pcie_resume+0x50/0xa8 <4>[ 74.226870] but this lock was taken by another, HARDIRQ-safe lock in the past: <4>[ 74.234389] (&irq_desc_lock_class){-.-.}-{2:2} <4>[ 74.234409] <4>[ 74.234409] <4>[ 74.234409] and interrupts could create inverse lock ordering between them. <4>[ 74.234409] <4>[ 74.251704] <4>[ 74.251704] other info that might help us debug this: <4>[ 74.258785] Possible interrupt unsafe locking scenario: <4>[ 74.258785] <4>[ 74.266126] CPU0 CPU1 <4>[ 74.270942] ---- ---- <4>[ 74.275758] lock(&pcie->irq_lock); <4>[ 74.279627] local_irq_disable(); <4>[ 74.285836] lock(&irq_desc_lock_class); <4>[ 74.292667] lock(&pcie->irq_lock); <4>[ 74.299061] <Interrupt> <4>[ 74.301960] lock(&irq_desc_lock_class); <4>[ 74.306438] <4>[ 74.306438] *** DEADLOCK *** Fixes: d537dc125f07 ("PCI: mediatek-gen3: Add system PM support") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> --- drivers/pci/controller/pcie-mediatek-gen3.c | 8 -------- 1 file changed, 8 deletions(-)