diff mbox series

[v3] scsi: ufs: correct ufshcd_shutdown flow

Message ID 20220721065833.26887-1-peter.wang@mediatek.com (mailing list archive)
State Changes Requested
Headers show
Series [v3] scsi: ufs: correct ufshcd_shutdown flow | expand

Commit Message

Peter Wang (王信友) July 21, 2022, 6:58 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

Both ufshcd_shtdown and ufshcd_wl_shutdown could run concurrently.
And it have race condition when ufshcd_wl_shutdown on-going and
clock/power turn off by ufshcd_shutdown.

The normal case:
ufshcd_wl_shutdown -> ufshcd_shtdown
ufshcd_shtdown should turn off clock/power.

The abnormal case:
ufshcd_shtdown -> ufshcd_wl_shutdown
Wait ufshcd_wl_shutdown set device to power off and turn off clock/power.
If timeout happen, means device still in active mode, cannot turn off
clock/power directly. Skip and keep clock/power on in this case.

Also remove pm_runtime_get_sync because shutdown is focus on
turn off clock/power. We don't need turn on(resume) and turn off.
The second reason is ufshcd_wl_shutdown call ufshcd_rpm_get_sync
already, if ufshcd_shtdown wait ufshcd_wl_shutdown finish,
hba->dev is already resume and no need pm_runtime_get_sync.

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufshcd.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Bart Van Assche July 22, 2022, 9:12 p.m. UTC | #1
On 7/20/22 23:58, peter.wang@mediatek.com wrote:
> Both ufshcd_shtdown and ufshcd_wl_shutdown could run concurrently.
> And it have race condition when ufshcd_wl_shutdown on-going and
> clock/power turn off by ufshcd_shutdown.
> 
> The normal case:
> ufshcd_wl_shutdown -> ufshcd_shtdown
> ufshcd_shtdown should turn off clock/power.
> 
> The abnormal case:
> ufshcd_shtdown -> ufshcd_wl_shutdown

How can this happen since device_shutdown() iterates over devices in the 
opposite order in which these have been created?

Thanks,

Bart.
Stanley Jhu July 25, 2022, 2:11 a.m. UTC | #2
On Sat, Jul 23, 2022 at 5:15 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 7/20/22 23:58, peter.wang@mediatek.com wrote:
> > Both ufshcd_shtdown and ufshcd_wl_shutdown could run concurrently.
> > And it have race condition when ufshcd_wl_shutdown on-going and
> > clock/power turn off by ufshcd_shutdown.
> >
> > The normal case:
> > ufshcd_wl_shutdown -> ufshcd_shtdown
> > ufshcd_shtdown should turn off clock/power.
> >
> > The abnormal case:
> > ufshcd_shtdown -> ufshcd_wl_shutdown
>
> How can this happen since device_shutdown() iterates over devices in the
> opposite order in which these have been created?
>

Is it possible for more than one initiator to invoke
kernel_power_off() at the same time?

In this case, the order of device shutdown is not promised anymore
because devices_kset is manipulated simultaneously by multiple
initiators in device_shutdown().

> Thanks,
>
> Bart.
Peter Wang (王信友) July 25, 2022, 3:47 a.m. UTC | #3
On 7/23/22 5:12 AM, Bart Van Assche wrote:
> On 7/20/22 23:58, peter.wang@mediatek.com wrote:
>> Both ufshcd_shtdown and ufshcd_wl_shutdown could run concurrently.
>> And it have race condition when ufshcd_wl_shutdown on-going and
>> clock/power turn off by ufshcd_shutdown.
>>
>> The normal case:
>> ufshcd_wl_shutdown -> ufshcd_shtdown
>> ufshcd_shtdown should turn off clock/power.
>>
>> The abnormal case:
>> ufshcd_shtdown -> ufshcd_wl_shutdown
>
> How can this happen since device_shutdown() iterates over devices in 
> the opposite order in which these have been created?
>
> Thanks,
>
> Bart.

Hi Bart,

Because kernel_restart is export, and mediatek may call kernel_restart 
while shutdown is on going.
EXPORT_SYMBOL_GPL(kernel_restart);
kernel_restart -> kernel_restart_prepare -> device_shutdown

So, there may have two thread execute device_shutdown concurrently.
And the list_lock in device_shutdown function is not protect shutdown 
callback function,
caused two callback function(ufshcd_shutdown/ufshcd_wl_shutdown) could 
run concurrently.

Here is the error log that two thread in device_shutdown.
[37684.002227] [T1500641] platform +platform:112b0000.ufshci 
kpoc_charger: ufshcd-mtk 112b0000.ufshci: [name:core&]shutdown
[37684.002264] [T1600339] scsi +scsi:0:0:0:49488 charger_thread: 
ufs_device_wlun 0:0:0:49488: [name:core&]shutdown

Thanks.
BR
Peter
Bart Van Assche July 25, 2022, 5:07 p.m. UTC | #4
On 7/24/22 20:47, Peter Wang wrote:
> Because kernel_restart is export, and mediatek may call kernel_restart 
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Is this code upstream?

> while shutdown is on going.
> EXPORT_SYMBOL_GPL(kernel_restart);
> kernel_restart -> kernel_restart_prepare -> device_shutdown
> 
> So, there may have two thread execute device_shutdown concurrently.
> And the list_lock in device_shutdown function is not protect shutdown 
> callback function,
> caused two callback function(ufshcd_shutdown/ufshcd_wl_shutdown) could 
> run concurrently.
> 
> Here is the error log that two thread in device_shutdown.
> [37684.002227] [T1500641] platform +platform:112b0000.ufshci 
> kpoc_charger: ufshcd-mtk 112b0000.ufshci: [name:core&]shutdown
> [37684.002264] [T1600339] scsi +scsi:0:0:0:49488 charger_thread: 
> ufs_device_wlun 0:0:0:49488: [name:core&]shutdown

Hi Peter,

I had not yet taken a look at the kernel_restart() function. Now that I 
have taken a look, it seems to me that kernel_restart() calls must be 
serialized via the system_transition_mutex. Please make sure that the 
kernel_restart() calls are serialized.

Thanks,

Bart.
Peter Wang (王信友) July 26, 2022, 9:16 a.m. UTC | #5
Hi Bart,

On 7/26/22 1:07 AM, Bart Van Assche wrote:
> On 7/24/22 20:47, Peter Wang wrote:
>> Because kernel_restart is export, and mediatek may call kernel_restart 
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Is this code upstream?
>
No, it not upstream.
And sorry that I correct mediatek usage, it is kernel_power_off, not 
kernel_restart.

>> while shutdown is on going.
>> EXPORT_SYMBOL_GPL(kernel_restart);
>> kernel_restart -> kernel_restart_prepare -> device_shutdown
>>
>> So, there may have two thread execute device_shutdown concurrently.
>> And the list_lock in device_shutdown function is not protect shutdown 
>> callback function,
>> caused two callback function(ufshcd_shutdown/ufshcd_wl_shutdown) 
>> could run concurrently.
>>
>> Here is the error log that two thread in device_shutdown.
>> [37684.002227] [T1500641] platform +platform:112b0000.ufshci 
>> kpoc_charger: ufshcd-mtk 112b0000.ufshci: [name:core&]shutdown
>> [37684.002264] [T1600339] scsi +scsi:0:0:0:49488 charger_thread: 
>> ufs_device_wlun 0:0:0:49488: [name:core&]shutdown
>
> Hi Peter,
>
> I had not yet taken a look at the kernel_restart() function. Now that 
> I have taken a look, it seems to me that kernel_restart() calls must 
> be serialized via the system_transition_mutex. Please make sure that 
> the kernel_restart() calls are serialized.
>
> Thanks,
>
> Bart.

I think kernel_power_off could use system_transition_mutex to protect 
shutdown racing.
We will try it, Thanks for the suggestion.
And if no need wait, it could more simple in this patch. I will upload 
next version again.

Thanks.
Peter
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index c7b337480e3e..47b639fd28b9 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -58,6 +58,9 @@ 
 /* Task management command timeout */
 #define TM_CMD_TIMEOUT	100 /* msecs */
 
+/* Shutdown wait devcie into power off timeout */
+#define UFS_SHUTDOWN_TIMEOUT	500 /* msecs */
+
 /* maximum number of retries for a general UIC command  */
 #define UFS_UIC_COMMAND_RETRIES 3
 
@@ -9461,10 +9464,15 @@  EXPORT_SYMBOL(ufshcd_runtime_resume);
  */
 int ufshcd_shutdown(struct ufs_hba *hba)
 {
-	if (ufshcd_is_ufs_dev_poweroff(hba) && ufshcd_is_link_off(hba))
-		goto out;
+	unsigned long timeout;
 
-	pm_runtime_get_sync(hba->dev);
+	/* Wait ufshcd_wl_shutdown clear ufs state */
+	timeout = jiffies + msecs_to_jiffies(UFS_SHUTDOWN_TIMEOUT);
+	while (!ufshcd_is_ufs_dev_poweroff(hba) || !ufshcd_is_link_off(hba)) {
+		if (time_after(jiffies, timeout))
+			goto out;
+		msleep(1);
+	}
 
 	ufshcd_suspend(hba);
 out: