diff mbox series

[RESEND,v2] scsi: ufs: clear doorbell for hibern8 errors when using ah8

Message ID 1634619427-171880-1-git-send-email-kwmad.kim@samsung.com (mailing list archive)
State Changes Requested
Headers show
Series [RESEND,v2] scsi: ufs: clear doorbell for hibern8 errors when using ah8 | expand

Commit Message

Kiwoong Kim Oct. 19, 2021, 4:57 a.m. UTC
Changes from v1:
* Change the time to requeue pended commands

When an scsi command is dispatched right after host complete
all the pended requests and ufs driver tries to ring a doorbell,
host might be still during entering into hibern8.
If the hibern8 error occurrs during that period, the doorbell
might not be zero and clearing it should have done.
But, current ufshcd_err_handler goes directly to reset
w/o clearing the doorbell when the driver's link state is broken.
This patch is to requeue pended commands after host reset.

Here's an actual symptom that I've faced. At the time, tag #17
is still pended even after host reset. And then the block timer
is expired.

exynos-ufs 11100000.ufs: ufshcd_check_errors: Auto Hibern8
Enter failed - status: 0x00000040, upmcrs: 0x00000001
..
host_regs: 00000050: b8671000 00000008 00020000 00000000
..
exynos-ufs 11100000.ufs: ufshcd_abort: Device abort task at tag 17

Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
---
 drivers/scsi/ufs/ufshcd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Adrian Hunter Oct. 20, 2021, 3:33 p.m. UTC | #1
On 19/10/2021 07:57, Kiwoong Kim wrote:
> Changes from v1:
> * Change the time to requeue pended commands
> 
> When an scsi command is dispatched right after host complete
> all the pended requests and ufs driver tries to ring a doorbell,
> host might be still during entering into hibern8.
> If the hibern8 error occurrs during that period, the doorbell
> might not be zero and clearing it should have done.
> But, current ufshcd_err_handler goes directly to reset
> w/o clearing the doorbell when the driver's link state is broken.

So you mean HCE 1->0 does not clear the doorbell register?

> This patch is to requeue pended commands after host reset.

So you mean HCE 0->1 does clear the doorbell register?

> 
> Here's an actual symptom that I've faced. At the time, tag #17
> is still pended even after host reset. And then the block timer
> is expired.
> 
> exynos-ufs 11100000.ufs: ufshcd_check_errors: Auto Hibern8
> Enter failed - status: 0x00000040, upmcrs: 0x00000001
> ..
> host_regs: 00000050: b8671000 00000008 00020000 00000000
> ..
> exynos-ufs 11100000.ufs: ufshcd_abort: Device abort task at tag 17
> 
> Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
> ---
>  drivers/scsi/ufs/ufshcd.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 9faf02c..e5d4ef7 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -7136,8 +7136,10 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
>  	err = ufshcd_hba_enable(hba);
>  
>  	/* Establish the link again and restore the device */
> -	if (!err)
> +	if (!err) {
> +		ufshcd_retry_aborted_requests(hba);
>  		err = ufshcd_probe_hba(hba, false);
> +	}
>  
>  	if (err)
>  		dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
>
Can Guo Oct. 21, 2021, 6:10 a.m. UTC | #2
On 2021-10-19 12:57, Kiwoong Kim wrote:
> Changes from v1:
> * Change the time to requeue pended commands
> 
> When an scsi command is dispatched right after host complete
> all the pended requests and ufs driver tries to ring a doorbell,
> host might be still during entering into hibern8.
> If the hibern8 error occurrs during that period, the doorbell
> might not be zero and clearing it should have done.
> But, current ufshcd_err_handler goes directly to reset
> w/o clearing the doorbell when the driver's link state is broken.

         /*
          * Stop the host controller and complete the requests
          * cleared by h/w
          */
         ufshcd_hba_stop(hba);
         hba->silence_err_logs = true;
         ufshcd_complete_requests(hba);

Same ask as Adrian did, ufshcd_hba_stop() should clear all doorbell
bits as it disables UFS host controller, then ufshcd_complete_requests()
completes any pending requests, no?

> This patch is to requeue pended commands after host reset.
> 
> Here's an actual symptom that I've faced. At the time, tag #17
> is still pended even after host reset. And then the block timer
> is expired.
> 
> exynos-ufs 11100000.ufs: ufshcd_check_errors: Auto Hibern8
> Enter failed - status: 0x00000040, upmcrs: 0x00000001
> ..
> host_regs: 00000050: b8671000 00000008 00020000 00000000
> ..
> exynos-ufs 11100000.ufs: ufshcd_abort: Device abort task at tag 17
> 
> Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
> ---
>  drivers/scsi/ufs/ufshcd.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 9faf02c..e5d4ef7 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -7136,8 +7136,10 @@ static int ufshcd_host_reset_and_restore(struct
> ufs_hba *hba)
>  	err = ufshcd_hba_enable(hba);
> 
>  	/* Establish the link again and restore the device */
> -	if (!err)
> +	if (!err) {
> +		ufshcd_retry_aborted_requests(hba);
>  		err = ufshcd_probe_hba(hba, false);
> +	}
> 
>  	if (err)
>  		dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
Kiwoong Kim Oct. 21, 2021, 11:04 a.m. UTC | #3
> On 19/10/2021 07:57, Kiwoong Kim wrote:
> > Changes from v1:
> > * Change the time to requeue pended commands
> >
> > When an scsi command is dispatched right after host complete all the
> > pended requests and ufs driver tries to ring a doorbell, host might be
> > still during entering into hibern8.
> > If the hibern8 error occurrs during that period, the doorbell might
> > not be zero and clearing it should have done.
> > But, current ufshcd_err_handler goes directly to reset w/o clearing
> > the doorbell when the driver's link state is broken.
> 
> So you mean HCE 1->0 does not clear the doorbell register?
> 
> > This patch is to requeue pended commands after host reset.
> 
> So you mean HCE 0->1 does clear the doorbell register?


I talked about this again and maybe he didn't seem to accept its description like that
Because he just focused on the term 'disable' in the description.
Instead, there is an vendor sfr to clear all the contexts.

Yes, the description contains like this, but I think he could think it's done when setting one.
--
When HCE is ‘0’ and software writes ‘1’, the host 
controller hardware shall execute the step 2 described in 7.1.1 of this standard, 
including >>>>> reset <<<<< of the host UTP and UIC layers.

Of course, some statements, such as 8.2.2. UIC Error Handling, seems to show setting zero means clearing.
But speaking the description, it's not quite clear to me.

Anyway, let me know how to deal with this.

Thanks.
Kiwoong Kim
Kiwoong Kim Oct. 21, 2021, 11:05 a.m. UTC | #4
> Same ask as Adrian did, ufshcd_hba_stop() should clear all doorbell bits
> as it disables UFS host controller, then ufshcd_complete_requests()
> completes any pending requests, no?

I replied Adrian's feedback.

Thanks.
Kiwoong Kim
Adrian Hunter Oct. 25, 2021, 5:38 a.m. UTC | #5
On 21/10/2021 14:04, Kiwoong Kim wrote:
>> On 19/10/2021 07:57, Kiwoong Kim wrote:
>>> Changes from v1:
>>> * Change the time to requeue pended commands
>>>
>>> When an scsi command is dispatched right after host complete all the
>>> pended requests and ufs driver tries to ring a doorbell, host might be
>>> still during entering into hibern8.
>>> If the hibern8 error occurrs during that period, the doorbell might
>>> not be zero and clearing it should have done.
>>> But, current ufshcd_err_handler goes directly to reset w/o clearing
>>> the doorbell when the driver's link state is broken.
>>
>> So you mean HCE 1->0 does not clear the doorbell register?
>>
>>> This patch is to requeue pended commands after host reset.
>>
>> So you mean HCE 0->1 does clear the doorbell register?
> 
> 
> I talked about this again and maybe he didn't seem to accept its description like that
> Because he just focused on the term 'disable' in the description.
> Instead, there is an vendor sfr to clear all the contexts.
> 
> Yes, the description contains like this, but I think he could think it's done when setting one.
> --
> When HCE is ‘0’ and software writes ‘1’, the host 
> controller hardware shall execute the step 2 described in 7.1.1 of this standard, 
> including >>>>> reset <<<<< of the host UTP and UIC layers.
> 
> Of course, some statements, such as 8.2.2. UIC Error Handling, seems to show setting zero means clearing.
> But speaking the description, it's not quite clear to me.
> 
> Anyway, let me know how to deal with this.

It seems vendor-specific.  Perhaps export ufshcd_complete_requests()
and call it from vendor ops->hce_enable_notify(hba, POST_CHANGE) ?

Note that Bart submitted a patch to remove ufshcd_retry_aborted_requests().
diff mbox series

Patch

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 9faf02c..e5d4ef7 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -7136,8 +7136,10 @@  static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
 	err = ufshcd_hba_enable(hba);
 
 	/* Establish the link again and restore the device */
-	if (!err)
+	if (!err) {
+		ufshcd_retry_aborted_requests(hba);
 		err = ufshcd_probe_hba(hba, false);
+	}
 
 	if (err)
 		dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);