diff mbox series

[v2] ufs: core: fix deadlock when rtc update

Message ID 20240715063831.29792-1-peter.wang@mediatek.com (mailing list archive)
State New
Headers show
Series [v2] ufs: core: fix deadlock when rtc update | expand

Commit Message

Peter Wang (王信友) July 15, 2024, 6:38 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

There is a deadlock when runtime suspend waits for the flush of RTC work,
and the RTC work calls ufshcd_rpm_get_sync to wait for runtime resume.

Here is deadlock backtrace
kworker/0:1     D 4892.876354 10 10971 4859 0x4208060 0x8 10 0 120 670730152367
ptr            f0ffff80c2e40000 0 1 0x00000001 0x000000ff 0x000000ff 0x000000ff
<ffffffee5e71ddb0> __switch_to+0x1a8/0x2d4
<ffffffee5e71e604> __schedule+0x684/0xa98
<ffffffee5e71ea60> schedule+0x48/0xc8
<ffffffee5e725f78> schedule_timeout+0x48/0x170
<ffffffee5e71fb74> do_wait_for_common+0x108/0x1b0
<ffffffee5e71efe0> wait_for_completion+0x44/0x60
<ffffffee5d6de968> __flush_work+0x39c/0x424
<ffffffee5d6decc0> __cancel_work_sync+0xd8/0x208
<ffffffee5d6dee2c> cancel_delayed_work_sync+0x14/0x28
<ffffffee5e2551b8> __ufshcd_wl_suspend+0x19c/0x480
<ffffffee5e255fb8> ufshcd_wl_runtime_suspend+0x3c/0x1d4
<ffffffee5dffd80c> scsi_runtime_suspend+0x78/0xc8
<ffffffee5df93580> __rpm_callback+0x94/0x3e0
<ffffffee5df90b0c> rpm_suspend+0x2d4/0x65c
<ffffffee5df91448> __pm_runtime_suspend+0x80/0x114
<ffffffee5dffd95c> scsi_runtime_idle+0x38/0x6c
<ffffffee5df912f4> rpm_idle+0x264/0x338
<ffffffee5df90f14> __pm_runtime_idle+0x80/0x110
<ffffffee5e24ce44> ufshcd_rtc_work+0x128/0x1e4
<ffffffee5d6e3a40> process_one_work+0x26c/0x650
<ffffffee5d6e65c8> worker_thread+0x260/0x3d8
<ffffffee5d6edec8> kthread+0x110/0x134
<ffffffee5d616b18> ret_from_fork+0x10/0x20

Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
Cc: <stable@vger.kernel.org> 6.9.x

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufshcd-priv.h | 5 +++++
 drivers/ufs/core/ufshcd.c      | 5 ++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

Comments

Bean Huo July 15, 2024, 9:29 a.m. UTC | #1
On Mon, 2024-07-15 at 14:38 +0800, peter.wang@mediatek.com wrote:
> -       ufshcd_rpm_get_sync(hba);
> +       /* Skip update RTC if RPM state is not RPM_ACTIVE */
> +       if (ufshcd_rpm_get_if_active(hba) <= 0)
> +               return;

I understood your intention of this 'retun', but my understanding is
you assume that __ufshcd_wl_resume() will schedule rtc update work,
however, before the time that __ufshcd_wl_resume() completes, the RPM
status is not RPM_ACTIVE until __ufshcd_wl_resume() completes and
__update_runtime_status(dev, RPM_ACTIVE) is called.

If rtc update work is performed before __update_runtime_status(dev,
RPM_ACTIVE), here you return, then no RTC work will be scheduled.

do you think it is possible?
Bean Huo July 15, 2024, 9:34 a.m. UTC | #2
On Mon, 2024-07-15 at 14:38 +0800, peter.wang@mediatek.com wrote:
> @@ -8171,7 +8171,10 @@ static void ufshcd_update_rtc(struct ufs_hba
> *hba)
>          */
>         val = ts64.tv_sec - hba->dev_info.rtc_time_baseline;
>  
> -       ufshcd_rpm_get_sync(hba);
> +       /* Skip update RTC if RPM state is not RPM_ACTIVE */
> +       if (ufshcd_rpm_get_if_active(hba) <= 0)
> +               return;
> +
>         err = ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_WRITE_ATTR,
> QUERY_ATTR_IDN_SECONDS_PASSED,
>                                 0, 0, &val);
>         ufshcd_rpm_put_sync(hba);

My suggestion would be to not return here and just skip the update, but
reschedule it for the next time that doesn't affect the suspend/resume
flow you're worried about.
Bean Huo July 15, 2024, 9:37 a.m. UTC | #3
On Mon, 2024-07-15 at 14:38 +0800, peter.wang@mediatek.com wrote:
> Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
> Cc: <stable@vger.kernel.org> 6.9.x
> 
> Signed-off-by: Peter Wang <peter.wang@mediatek.com>

ignore my previous two emails, I saw you have just skipped update, not
skip schedule in this version.

Reviewed-by: Bean Huo <beanhuo@micron.com>
Peter Wang (王信友) July 15, 2024, 11:48 a.m. UTC | #4
On Mon, 2024-07-15 at 11:37 +0200, Bean Huo wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Mon, 2024-07-15 at 14:38 +0800, peter.wang@mediatek.com wrote:
> > Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
> > Cc: <stable@vger.kernel.org> 6.9.x
> > 
> > Signed-off-by: Peter Wang <peter.wang@mediatek.com>
> 
> ignore my previous two emails, I saw you have just skipped update,
> not
> skip schedule in this version.
> 
> Reviewed-by: Bean Huo <beanhuo@micron.com>

Hi Bean,

Yes, just skip update RTC, thanks for review.

Peter
Bart Van Assche July 15, 2024, 5:17 p.m. UTC | #5
On 7/14/24 11:38 PM, peter.wang@mediatek.com wrote:
> There is a deadlock when runtime suspend waits for the flush of RTC work,
> and the RTC work calls ufshcd_rpm_get_sync to wait for runtime resume.

The above description is too brief - a description of how the fix works
is missing. Please include a more detailed description in future
patches.

Anyway:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Peter Wang (王信友) July 16, 2024, 2:03 a.m. UTC | #6
On Mon, 2024-07-15 at 10:17 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 7/14/24 11:38 PM, peter.wang@mediatek.com wrote:
> > There is a deadlock when runtime suspend waits for the flush of RTC
> work,
> > and the RTC work calls ufshcd_rpm_get_sync to wait for runtime
> resume.
> 
> The above description is too brief - a description of how the fix
> works
> is missing. Please include a more detailed description in future
> patches.
> 
> Anyway:
> 
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>

Hi Bart,

Will improve in future patches.

Thanks for review.
Peter
Martin K. Petersen July 16, 2024, 2:57 a.m. UTC | #7
> There is a deadlock when runtime suspend waits for the flush of RTC
> work, and the RTC work calls ufshcd_rpm_get_sync to wait for runtime
> resume.

Applied to 6.11/scsi-staging, thanks!
Martin K. Petersen July 23, 2024, 1:23 a.m. UTC | #8
On Mon, 15 Jul 2024 14:38:31 +0800, peter.wang@mediatek.com wrote:

> There is a deadlock when runtime suspend waits for the flush of RTC work,
> and the RTC work calls ufshcd_rpm_get_sync to wait for runtime resume.
> 
> Here is deadlock backtrace
> kworker/0:1     D 4892.876354 10 10971 4859 0x4208060 0x8 10 0 120 670730152367
> ptr            f0ffff80c2e40000 0 1 0x00000001 0x000000ff 0x000000ff 0x000000ff
> <ffffffee5e71ddb0> __switch_to+0x1a8/0x2d4
> <ffffffee5e71e604> __schedule+0x684/0xa98
> <ffffffee5e71ea60> schedule+0x48/0xc8
> <ffffffee5e725f78> schedule_timeout+0x48/0x170
> <ffffffee5e71fb74> do_wait_for_common+0x108/0x1b0
> <ffffffee5e71efe0> wait_for_completion+0x44/0x60
> <ffffffee5d6de968> __flush_work+0x39c/0x424
> <ffffffee5d6decc0> __cancel_work_sync+0xd8/0x208
> <ffffffee5d6dee2c> cancel_delayed_work_sync+0x14/0x28
> <ffffffee5e2551b8> __ufshcd_wl_suspend+0x19c/0x480
> <ffffffee5e255fb8> ufshcd_wl_runtime_suspend+0x3c/0x1d4
> <ffffffee5dffd80c> scsi_runtime_suspend+0x78/0xc8
> <ffffffee5df93580> __rpm_callback+0x94/0x3e0
> <ffffffee5df90b0c> rpm_suspend+0x2d4/0x65c
> <ffffffee5df91448> __pm_runtime_suspend+0x80/0x114
> <ffffffee5dffd95c> scsi_runtime_idle+0x38/0x6c
> <ffffffee5df912f4> rpm_idle+0x264/0x338
> <ffffffee5df90f14> __pm_runtime_idle+0x80/0x110
> <ffffffee5e24ce44> ufshcd_rtc_work+0x128/0x1e4
> <ffffffee5d6e3a40> process_one_work+0x26c/0x650
> <ffffffee5d6e65c8> worker_thread+0x260/0x3d8
> <ffffffee5d6edec8> kthread+0x110/0x134
> <ffffffee5d616b18> ret_from_fork+0x10/0x20
> 
> [...]

Applied to 6.11/scsi-queue, thanks!

[1/1] ufs: core: fix deadlock when rtc update
      https://git.kernel.org/mkp/scsi/c/3911af778f20
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
index f42d99ce5bf1..81d6f0cfb148 100644
--- a/drivers/ufs/core/ufshcd-priv.h
+++ b/drivers/ufs/core/ufshcd-priv.h
@@ -329,6 +329,11 @@  static inline int ufshcd_rpm_get_sync(struct ufs_hba *hba)
 	return pm_runtime_get_sync(&hba->ufs_device_wlun->sdev_gendev);
 }
 
+static inline int ufshcd_rpm_get_if_active(struct ufs_hba *hba)
+{
+	return pm_runtime_get_if_active(&hba->ufs_device_wlun->sdev_gendev);
+}
+
 static inline int ufshcd_rpm_put_sync(struct ufs_hba *hba)
 {
 	return pm_runtime_put_sync(&hba->ufs_device_wlun->sdev_gendev);
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 46433ecf0c4d..2e5adfa0f757 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -8171,7 +8171,10 @@  static void ufshcd_update_rtc(struct ufs_hba *hba)
 	 */
 	val = ts64.tv_sec - hba->dev_info.rtc_time_baseline;
 
-	ufshcd_rpm_get_sync(hba);
+	/* Skip update RTC if RPM state is not RPM_ACTIVE */
+	if (ufshcd_rpm_get_if_active(hba) <= 0)
+		return;
+
 	err = ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_WRITE_ATTR, QUERY_ATTR_IDN_SECONDS_PASSED,
 				0, 0, &val);
 	ufshcd_rpm_put_sync(hba);