diff mbox series

[v1] ufs: core: fix deadlock when rtc update

Message ID 20240712094355.21572-1-peter.wang@mediatek.com (mailing list archive)
State New
Headers show
Series [v1] ufs: core: fix deadlock when rtc update | expand

Commit Message

Peter Wang (王信友) July 12, 2024, 9:43 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

Three have deadlock when runtime suspend wait flush rtc work,
and rtc work call ufshcd_rpm_put_sync to wait runtime resume.

Here is deadlock backtrace
kworker/0:1     D 4892.876354 10 10971 4859 0x4208060 0x8 10 0 120 670730152367
ptr            f0ffff80c2e40000 0 1 0x00000001 0x000000ff 0x000000ff 0x000000ff
<ffffffee5e71ddb0> __switch_to+0x1a8/0x2d4
<ffffffee5e71e604> __schedule+0x684/0xa98
<ffffffee5e71ea60> schedule+0x48/0xc8
<ffffffee5e725f78> schedule_timeout+0x48/0x170
<ffffffee5e71fb74> do_wait_for_common+0x108/0x1b0
<ffffffee5e71efe0> wait_for_completion+0x44/0x60
<ffffffee5d6de968> __flush_work+0x39c/0x424
<ffffffee5d6decc0> __cancel_work_sync+0xd8/0x208
<ffffffee5d6dee2c> cancel_delayed_work_sync+0x14/0x28
<ffffffee5e2551b8> __ufshcd_wl_suspend+0x19c/0x480
<ffffffee5e255fb8> ufshcd_wl_runtime_suspend+0x3c/0x1d4
<ffffffee5dffd80c> scsi_runtime_suspend+0x78/0xc8
<ffffffee5df93580> __rpm_callback+0x94/0x3e0
<ffffffee5df90b0c> rpm_suspend+0x2d4/0x65c
<ffffffee5df91448> __pm_runtime_suspend+0x80/0x114
<ffffffee5dffd95c> scsi_runtime_idle+0x38/0x6c
<ffffffee5df912f4> rpm_idle+0x264/0x338
<ffffffee5df90f14> __pm_runtime_idle+0x80/0x110
<ffffffee5e24ce44> ufshcd_rtc_work+0x128/0x1e4
<ffffffee5d6e3a40> process_one_work+0x26c/0x650
<ffffffee5d6e65c8> worker_thread+0x260/0x3d8
<ffffffee5d6edec8> kthread+0x110/0x134
<ffffffee5d616b18> ret_from_fork+0x10/0x20

Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
Cc: <stable@vger.kernel.org> 6.6.x

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufshcd.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Bean Huo July 12, 2024, 9:51 a.m. UTC | #1
On Fri, 2024-07-12 at 17:43 +0800, peter.wang@mediatek.com wrote:
> From: Peter Wang <peter.wang@mediatek.com>
> 
> Three have deadlock when runtime suspend wait flush rtc work,
> and rtc work call ufshcd_rpm_put_sync to wait runtime resume.
Reviewed-by: Bean Huo <beanhuo@micron.com>
Avri Altman July 12, 2024, 10:33 a.m. UTC | #2
> @@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct work_struct
> *work)
> 
>         hba = container_of(to_delayed_work(work), struct ufs_hba,
> ufs_rtc_update_work);
Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works?
And remove it in the 2nd if clause?

Thanks,
Avri
> 
> -        /* Update RTC only when there are no requests in progress and UFSHCI is
> operational */
> -       if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state ==
> UFSHCD_STATE_OPERATIONAL)
> +        /*
> +         * Update RTC only when
> +         * 1. there are no requests in progress
> +         * 2. UFSHCI is operational
> +         * 3. pm operation is not in progress
> +         */
> +       if (!ufshcd_is_ufs_dev_busy(hba) &&
> +           hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL &&
> +           !hba->pm_op_in_progress)
>                 ufshcd_update_rtc(hba);
> 
>         if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period)
> --
> 2.18.0
Bean Huo July 12, 2024, 10:49 a.m. UTC | #3
On Fri, 2024-07-12 at 10:33 +0000, Avri Altman wrote:
> > @@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct
> > work_struct
> > *work)
> > 
> >          hba = container_of(to_delayed_work(work), struct ufs_hba,
> > ufs_rtc_update_work);
> Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works?
> And remove it in the 2nd if clause?

Avri, 

we need to reschedule next time work in the below code.  if return,
cannot.

whatelse I missed?

kind regards, 
Bean
Avri Altman July 12, 2024, 12:31 p.m. UTC | #4
> 
> On Fri, 2024-07-12 at 10:33 +0000, Avri Altman wrote:
> > > @@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct
> > > work_struct
> > > *work)
> > >
> > >          hba = container_of(to_delayed_work(work), struct ufs_hba,
> > > ufs_rtc_update_work);
> > Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works?
> > And remove it in the 2nd if clause?
> 
> Avri,
> 
> we need to reschedule next time work in the below code.  if return, cannot.
> 
> whatelse I missed?
a) If (!ufshcd_is_ufs_dev_active(hba)) - will not schedule ?
b) schedule on next __ufshcd_wl_resume?

> 
> kind regards,
> Bean
Bart Van Assche July 12, 2024, 5:34 p.m. UTC | #5
On 7/12/24 2:43 AM, peter.wang@mediatek.com wrote:
> Three have deadlock when runtime suspend wait flush rtc work,
> and rtc work call ufshcd_rpm_put_sync to wait runtime resume.

"Three have"? The above description is very hard to understand. Please
improve it.

> -	 /* Update RTC only when there are no requests in progress and UFSHCI is operational */
> -	if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL)
> +	 /*
> +	  * Update RTC only when
> +	  * 1. there are no requests in progress
> +	  * 2. UFSHCI is operational
> +	  * 3. pm operation is not in progress
> +	  */
> +	if (!ufshcd_is_ufs_dev_busy(hba) &&
> +	    hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL &&
> +	    !hba->pm_op_in_progress)
>   		ufshcd_update_rtc(hba);
>   
>   	if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period)

The above seems racy to me. I don't think there is any mechanism that
prevents that hba->pm_op_in_progress is set after it has been checked
and before ufshcd_update_rtc() is called. Has it been considered to add
an ufshcd_rpm_get_sync_nowait() call before the hba->pm_op_in_progress
check and a ufshcd_rpm_put_sync() call after the ufshcd_update_rtc()
call?

Thanks,

Bart.
Bean Huo July 13, 2024, 8:17 p.m. UTC | #6
On Fri, 2024-07-12 at 12:31 +0000, Avri Altman wrote:
> > > >           hba = container_of(to_delayed_work(work), struct
> > > > ufs_hba,
> > > > ufs_rtc_update_work);
> > > Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works?
> > > And remove it in the 2nd if clause?
> > 
> > Avri,
> > 
> > we need to reschedule next time work in the below code.  if return,
> > cannot.
> > 
> > whatelse I missed?
> a) If (!ufshcd_is_ufs_dev_active(hba)) - will not schedule ?
> b) schedule on next __ufshcd_wl_resume?

hba->pm_op_in_progress is true during __ufshcd_wl_resume(), will not
schedule update work.
Bean Huo July 14, 2024, 10:37 p.m. UTC | #7
On Fri, 2024-07-12 at 10:34 -0700, Bart Van Assche wrote:
> > -        /* Update RTC only when there are no requests in progress
> > and UFSHCI is operational */
> > -       if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state ==
> > UFSHCD_STATE_OPERATIONAL)
> > +        /*
> > +         * Update RTC only when
> > +         * 1. there are no requests in progress
> > +         * 2. UFSHCI is operational
> > +         * 3. pm operation is not in progress
> > +         */
> > +       if (!ufshcd_is_ufs_dev_busy(hba) &&
> > +           hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL &&
> > +           !hba->pm_op_in_progress)
> >                 ufshcd_update_rtc(hba);
> >    
> >         if (ufshcd_is_ufs_dev_active(hba) && hba-
> > >dev_info.rtc_update_period)
> 
> The above seems racy to me. I don't think there is any mechanism that
> prevents that hba->pm_op_in_progress is set after it has been checked
> and before ufshcd_update_rtc() is called. Has it been considered to
> add
> an ufshcd_rpm_get_sync_nowait() call before the hba-
> >pm_op_in_progress
> check and a ufshcd_rpm_put_sync() call after the ufshcd_update_rtc()
> call?
> 
> Thanks,
> 
> Bart.

Bart,

do you want this:

diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-
priv.h
index ce36154ce963..2b74d6329b9d 100644
--- a/drivers/ufs/core/ufshcd-priv.h
+++ b/drivers/ufs/core/ufshcd-priv.h
@@ -311,6 +311,25 @@ static inline int ufshcd_update_ee_usr_mask(struct
ufs_hba *hba,
                                        &hba->ee_drv_mask, set, clr);
 }
 
+static inline int ufshcd_rpm_get_sync_nowait(struct ufs_hba *hba)
+{
+       int ret = 0;
+       struct device *dev = &hba->ufs_device_wlun->sdev_gendev;
+
+       pm_runtime_get_noresume(dev);
+
+       /* Check if the device is already active */
+       if (pm_runtime_active(dev))
+               return 0;
+
+       /* Attempt to resume the device without blocking */
+       ret = pm_request_resume(dev);
+       if (ret < 0)
+               pm_runtime_put_noidle(dev);
+
+       return ret;
+}
+
 static inline int ufshcd_rpm_get_sync(struct ufs_hba *hba)
 {
        return pm_runtime_get_sync(&hba->ufs_device_wlun->sdev_gendev);
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index bea00e069e9a..1b7fc4ce9e5c 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -8209,10 +8209,8 @@ static void ufshcd_update_rtc(struct ufs_hba
*hba)
         */
        val = ts64.tv_sec - hba->dev_info.rtc_time_baseline;
 
-       ufshcd_rpm_get_sync(hba);
        err = ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_WRITE_ATTR,
QUERY_ATTR_IDN_SECONDS_PASSED,
                                0, 0, &val);
-       ufshcd_rpm_put_sync(hba);
 
        if (err)
                dev_err(hba->dev, "%s: Failed to update rtc %d\n",
__func__, err);
@@ -8226,10 +8224,14 @@ static void ufshcd_rtc_work(struct work_struct
*work)
 
        hba = container_of(to_delayed_work(work), struct ufs_hba,
ufs_rtc_update_work);
 
+       if (ufshcd_rpm_get_sync_nowait(hba))
+               goto out;
+
         /* Update RTC only when there are no requests in progress and
UFSHCI is operational */
        if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state ==
UFSHCD_STATE_OPERATIONAL)
                ufshcd_update_rtc(hba);
-
+       ufshcd_rpm_put_sync(hba);
+out:
        if (ufshcd_is_ufs_dev_active(hba) && hba-
>dev_info.rtc_update_period)
                schedule_delayed_work(&hba->ufs_rtc_update_work,
                                      msecs_to_jiffies(hba-
>dev_info.rtc_update_period));
(END)




or can we change cancel_delayed_work_sync(&hba->ufs_rtc_update_work);
to cancel_delayed_work(&hba->ufs_rtc_update_work);  ??



kind regards,
Bean
Peter Wang (王信友) July 15, 2024, 6:27 a.m. UTC | #8
On Fri, 2024-07-12 at 12:31 +0000, Avri Altman wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  > 
> > On Fri, 2024-07-12 at 10:33 +0000, Avri Altman wrote:
> > > > @@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct
> > > > work_struct
> > > > *work)
> > > >
> > > >          hba = container_of(to_delayed_work(work), struct
> ufs_hba,
> > > > ufs_rtc_update_work);
> > > Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works?
> > > And remove it in the 2nd if clause?
> > 
> > Avri,
> > 
> > we need to reschedule next time work in the below code.  if return,
> cannot.
> > 
> > whatelse I missed?
> a) If (!ufshcd_is_ufs_dev_active(hba)) - will not schedule ?
> b) schedule on next __ufshcd_wl_resume?
> 

Hi Avri,

Yes, if dev is not active (RPM state is not RPM_ACTIVE), will not
schedule rtc work and schedule on next __ufshcd_wl_resume.

Thanks.
Peter


> > 
> > kind regards,
> > Bean
>
Peter Wang (王信友) July 15, 2024, 6:29 a.m. UTC | #9
On Fri, 2024-07-12 at 10:34 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 7/12/24 2:43 AM, peter.wang@mediatek.com wrote:
> > Three have deadlock when runtime suspend wait flush rtc work,
> > and rtc work call ufshcd_rpm_put_sync to wait runtime resume.
> 
> "Three have"? The above description is very hard to understand.
> Please
> improve it.

Hi Bart,

Sorry, will improve the description next version.

> 
> > - /* Update RTC only when there are no requests in progress and
> UFSHCI is operational */
> > -if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state ==
> UFSHCD_STATE_OPERATIONAL)
> > + /*
> > +  * Update RTC only when
> > +  * 1. there are no requests in progress
> > +  * 2. UFSHCI is operational
> > +  * 3. pm operation is not in progress
> > +  */
> > +if (!ufshcd_is_ufs_dev_busy(hba) &&
> > +    hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL &&
> > +    !hba->pm_op_in_progress)
> >   ufshcd_update_rtc(hba);
> >   
> >   if (ufshcd_is_ufs_dev_active(hba) && hba-
> >dev_info.rtc_update_period)
> 
> The above seems racy to me. I don't think there is any mechanism that
> prevents that hba->pm_op_in_progress is set after it has been checked
> and before ufshcd_update_rtc() is called. Has it been considered to
> add
> an ufshcd_rpm_get_sync_nowait() call before the hba-
> >pm_op_in_progress
> check and a ufshcd_rpm_put_sync() call after the ufshcd_update_rtc()
> call?
> 
> Thanks,
> 
> Bart.

Yes, check pm_op_in_progress still cannot guarantee the absence of
race 
conditions. But use ufshcd_rpm_get_sync_nowait might be a bit
complicated.
How about use ufshcd_rpm_get_if_active? I will update next version.

Thanks.
Peter
Peter Wang (王信友) July 15, 2024, 6:31 a.m. UTC | #10
On Mon, 2024-07-15 at 00:37 +0200, Bean Huo wrote:
> 
> or can we change cancel_delayed_work_sync(&hba->ufs_rtc_update_work);
> to cancel_delayed_work(&hba->ufs_rtc_update_work);  ??
> 
> 
> 
> kind regards,
> Bean

Hi Bean,

Using cancel_delayed_work instead of cancel_delayed_work_sync could
work, 
but it may lead to wasted time resume, RTC update, and suspend again.
It has increased the system's power consumption.

Thanks.
Peter
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 46433ecf0c4d..bfcf2d468b5e 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -8188,8 +8188,15 @@  static void ufshcd_rtc_work(struct work_struct *work)
 
 	hba = container_of(to_delayed_work(work), struct ufs_hba, ufs_rtc_update_work);
 
-	 /* Update RTC only when there are no requests in progress and UFSHCI is operational */
-	if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL)
+	 /*
+	  * Update RTC only when
+	  * 1. there are no requests in progress
+	  * 2. UFSHCI is operational
+	  * 3. pm operation is not in progress
+	  */
+	if (!ufshcd_is_ufs_dev_busy(hba) &&
+	    hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL &&
+	    !hba->pm_op_in_progress)
 		ufshcd_update_rtc(hba);
 
 	if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period)