Message ID | 20221208072520.26210-1-peter.wang@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v6] ufs: core: wlun suspend SSU/enter hibern8 fail recovery | expand |
On Thu, Dec 08, 2022 at 03:25:20PM +0800, peter.wang@mediatek.com wrote: > From: Peter Wang <peter.wang@mediatek.com> > > When SSU/enter hibern8 fail in wlun suspend flow, trigger error > handler and return busy to break the suspend. > If not, wlun runtime pm status become error and the consumer will > stuck in runtime suspend status. > > Fixes: b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun") > Cc: stable@vger.kernel.org > Signed-off-by: Peter Wang <peter.wang@mediatek.com> > Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> > Reviewed-by: Bart Van Assche <bvanassche@acm.org> > Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> > --- > drivers/ufs/core/ufshcd.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index b1f59a5fe632..31ed3fdb5266 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -6070,6 +6070,14 @@ void ufshcd_schedule_eh_work(struct ufs_hba *hba) > } > } > > +static void ufshcd_force_error_recovery(struct ufs_hba *hba) > +{ > + spin_lock_irq(hba->host->host_lock); > + hba->force_reset = true; > + ufshcd_schedule_eh_work(hba); > + spin_unlock_irq(hba->host->host_lock); > +} > + > static void ufshcd_clk_scaling_allow(struct ufs_hba *hba, bool allow) > { > down_write(&hba->clk_scaling_lock); > @@ -9049,6 +9057,15 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) > > if (!hba->dev_info.b_rpm_dev_flush_capable) { > ret = ufshcd_set_dev_pwr_mode(hba, req_dev_pwr_mode); > + if (ret && pm_op != UFS_SHUTDOWN_PM) { > + /* > + * If return err in suspend flow, IO will hang. > + * Trigger error handler and break suspend for > + * error recovery. > + */ > + ufshcd_force_error_recovery(hba); > + ret = -EBUSY; > + } > if (ret) > goto enable_scaling; > } > @@ -9060,6 +9077,15 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) > */ > check_for_bkops = !ufshcd_is_ufs_dev_deepsleep(hba); > ret = ufshcd_link_state_transition(hba, req_link_state, check_for_bkops); > + if (ret && pm_op != UFS_SHUTDOWN_PM) { > + /* > + * If return err in suspend flow, IO will hang. > + * Trigger error handler and break suspend for > + * error recovery. > + */ > + ufshcd_force_error_recovery(hba); > + ret = -EBUSY; > + } > if (ret) > goto set_dev_active; > > -- > 2.18.0 > Hi, This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him a patch that has triggered this response. He used to manually respond to these common problems, but in order to save his sanity (he kept writing the same thing over and over, yet to different people), I was created. Hopefully you will not take offence and will fix the problem in your patch and resubmit it so that it can be accepted into the Linux kernel tree. You are receiving this message because of the following common error(s) as indicated below: - This looks like a new version of a previously submitted patch, but you did not list below the --- line any changes from the previous version. Please read the section entitled "The canonical patch format" in the kernel file, Documentation/SubmittingPatches for what needs to be done here to properly describe this. If you wish to discuss this problem further, or you have questions about how to resolve this issue, please feel free to respond to this email and Greg will reply once he has dug out from the pending patches received from other developers. thanks, greg k-h's patch email bot
Peter, > When SSU/enter hibern8 fail in wlun suspend flow, trigger error > handler and return busy to break the suspend. If not, wlun runtime pm > status become error and the consumer will stuck in runtime suspend > status. Applied to 6.2/scsi-staging, thanks!
> Applied to 6.2/scsi-staging, thanks!
There is an interesting side effect of the patch in this iteration
(which I am not sure was present in the past iteration I tried):
If the device auto suspends while running purge - controller is
seemingly recent and thus the purge is aborted (with no patch at all
it hangs).
That might be ok behaviour though - it will just make it an explicit
requirement to disable runtime suspend during the management
operation.
localhost ~ # ufs-utils fl -t 6 -e -p /dev/bsg/ufs-bsg0
localhost ~ # ufs-utils attr -a -p /dev/bsg/ufs-bsg0 | grep bPurgeStatus
bPurgeStatus := 0x00
[ 25.801980] ufs_device_wlun 0:0:0:49488: START_STOP failed for
power mode: 2, result 2
[ 25.802002] ufs_device_wlun 0:0:0:49488: Sense Key : Not Ready [current]
[ 25.802009] ufs_device_wlun 0:0:0:49488: Add. Sense: No additional
sense information
[ 25.802020] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend
failed: -16
On Wed, 2022-12-21 at 08:00 +1100, Daniil Lunev wrote: > > Applied to 6.2/scsi-staging, thanks! > > There is an interesting side effect of the patch in this iteration > (which I am not sure was present in the past iteration I tried): > If the device auto suspends while running purge - controller is > seemingly recent and thus the purge is aborted (with no patch at all > it hangs). > That might be ok behaviour though - it will just make it an explicit > requirement to disable runtime suspend during the management > operation. > Hi Daniil, I am not sure if this is similar reason we get SSU(sleep) fail. But if without this patch when purge is onging, system IO will hang, this is no better. And I have another idea about rpm and purge. To disable runtime suspend when purge operation is ongoing: 1. Disable rpm when fPurgeEnable is set, polling bPurgeStatus become 0 and enable rpm. But polling bPurgeStatus will extend rpm timer, so we don't need really disable rpm, right? 2. Check bPurgeStatus if enter runtime suspend, return EBUSY if bPurgeStatus is not 0 to break suspend. This is correct design to tell rpm flamework that driver is busy with purge and suspend is inappropriate. But it should be similar as current flow, return EBUSY when get SSU fail? So, with current design, if purge initiator do not want to see rpm EBUSY, then he should polling bPurgeStatus. What do you think? Thanks. BR Peter > localhost ~ # ufs-utils fl -t 6 -e -p /dev/bsg/ufs-bsg0 > localhost ~ # ufs-utils attr -a -p /dev/bsg/ufs-bsg0 | grep > bPurgeStatus > bPurgeStatus := 0x00 > > [ 25.801980] ufs_device_wlun 0:0:0:49488: START_STOP failed for > power mode: 2, result 2 > [ 25.802002] ufs_device_wlun 0:0:0:49488: Sense Key : Not Ready > [current] > [ 25.802009] ufs_device_wlun 0:0:0:49488: Add. Sense: No additional > sense information > [ 25.802020] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend > failed: -16
On Wed, Dec 21, 2022 at 4:59 PM Peter Wang (王信友) <peter.wang@mediatek.com> wrote: > But if without this patch when purge is onging, system IO will hang, > this is no better. Yes, that is why I am just pointing this out as a matter of fact, not as a bug. It is arguable if resetting the controller in the deadlock situation is a proper thing to do, but it might be the next best thing, so I don't argue that neither. > So, with current design, if purge initiator do not want to see rpm > EBUSY, then he should polling bPurgeStatus. > What do you think? I am actually not sure if management operations extend the timeout - they are going through bsg interface, and I am not sure it properly re-sets the timeouts on all possible nexus interfaces, need to check that. But even if it does, there are two problems: * If you make kernel be polling that parameter - it will actually make the application level to miss the completion code (since after querying completion once it will return Not Started afterwards). * And application polling is race prone. We set runtime suspend to 100ms - so depending on the scheduling quirks it may miss the event. --Daniil
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index b1f59a5fe632..31ed3fdb5266 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -6070,6 +6070,14 @@ void ufshcd_schedule_eh_work(struct ufs_hba *hba) } } +static void ufshcd_force_error_recovery(struct ufs_hba *hba) +{ + spin_lock_irq(hba->host->host_lock); + hba->force_reset = true; + ufshcd_schedule_eh_work(hba); + spin_unlock_irq(hba->host->host_lock); +} + static void ufshcd_clk_scaling_allow(struct ufs_hba *hba, bool allow) { down_write(&hba->clk_scaling_lock); @@ -9049,6 +9057,15 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) if (!hba->dev_info.b_rpm_dev_flush_capable) { ret = ufshcd_set_dev_pwr_mode(hba, req_dev_pwr_mode); + if (ret && pm_op != UFS_SHUTDOWN_PM) { + /* + * If return err in suspend flow, IO will hang. + * Trigger error handler and break suspend for + * error recovery. + */ + ufshcd_force_error_recovery(hba); + ret = -EBUSY; + } if (ret) goto enable_scaling; } @@ -9060,6 +9077,15 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) */ check_for_bkops = !ufshcd_is_ufs_dev_deepsleep(hba); ret = ufshcd_link_state_transition(hba, req_link_state, check_for_bkops); + if (ret && pm_op != UFS_SHUTDOWN_PM) { + /* + * If return err in suspend flow, IO will hang. + * Trigger error handler and break suspend for + * error recovery. + */ + ufshcd_force_error_recovery(hba); + ret = -EBUSY; + } if (ret) goto set_dev_active;