diff mbox series

[v1] scsi: ufs: clear doorbell for hibern8 errors when using ah8

Message ID 1633592411-129911-1-git-send-email-kwmad.kim@samsung.com (mailing list archive)
State Superseded
Headers show
Series [v1] scsi: ufs: clear doorbell for hibern8 errors when using ah8 | expand

Commit Message

Kiwoong Kim Oct. 7, 2021, 7:40 a.m. UTC
When an scsi command is dispatched right after host complete
all the pending requests goes to idle and ufs driver tries
to ring a doorbell, host might be still during entering into
hibern8. If an error occurrs during that period, the doorbell
might not be zero. In this case, clearing it should be needed
to requeue its command.
Currently, ufshcd_err_handler goes directly to reset when the
driver's link state is broken. This patch is to make it clear
doorbells in the case. In the situation, communication would
be disabled, so TM isn't necessary or we can say it's not
available.

Here's an actual symptom that I've faced. At the time, tag #17
is still pended even after host reset. And then the block timer
is expired.

exynos-ufs 11100000.ufs: ufshcd_check_errors: Auto Hibern8
Enter failed - status: 0x00000040, upmcrs: 0x00000001
..
host_regs: 00000050: b8671000 00000008 00020000 00000000
..
exynos-ufs 11100000.ufs: ufshcd_abort: Device abort task at tag 17

Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
---
 drivers/scsi/ufs/ufshcd.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Oct. 7, 2021, 9:47 p.m. UTC | #1
On 10/7/21 12:40 AM, Kiwoong Kim wrote:
> When an scsi command is dispatched right after host complete
> all the pending requests goes to idle and ufs driver tries
> to ring a doorbell, host might be still during entering into
> hibern8. If an error occurrs during that period, the doorbell
> might not be zero. In this case, clearing it should be needed
> to requeue its command.
> Currently, ufshcd_err_handler goes directly to reset when the
> driver's link state is broken. This patch is to make it clear
> doorbells in the case. In the situation, communication would
> be disabled, so TM isn't necessary or we can say it's not
> available.

The above text is too hard to comprehend. Please make it more clear. As 
an example, in the first sentence, what does "goes to idle" apply to? 
Does it apply to "SCSI command", "pending requests" or something else?

What mechanism does "If an error occurs" refer to? A SCSI command 
processing error, a link error or another type of error?

> Here's an actual symptom that I've faced. At the time, tag #17
> is still pended even after host reset. And then the block timer
> is expired.

pended -> pending.

> exynos-ufs 11100000.ufs: ufshcd_check_errors: Auto Hibern8
> Enter failed - status: 0x00000040, upmcrs: 0x00000001
> ..
> host_regs: 00000050: b8671000 00000008 00020000 00000000
> ..
> exynos-ufs 11100000.ufs: ufshcd_abort: Device abort task at tag 17

The relationship between the example and the description above the 
example is not clear. If a command is pending before the error handler 
starts, aborting that command fails and the host is not reset then the 
command will still be pending after the error handler has finished.

> @@ -6138,7 +6139,12 @@ static void ufshcd_err_handler(struct Scsi_Host *host)
>   	     (hba->saved_uic_err & (UFSHCD_UIC_DL_NAC_RECEIVED_ERROR |
>   				    UFSHCD_UIC_DL_TCx_REPLAY_ERROR)))) {
>   		needs_reset = true;
> -		goto do_reset;
> +		spin_lock_irqsave(hba->host->host_lock, flags);
> +		if (!hba->ahit && ufshcd_is_link_broken(hba))
> +			link_broken_in_ah8 = true;
> +		spin_unlock_irqrestore(hba->host->host_lock, flags);
> +		if (!link_broken_in_ah8)
> +			goto do_reset;
>   	}
>   

My understanding is that hba->ahit == 0 means that auto-hibernation is 
disabled. Hence, the above code sets 'link_broken_in_ah8' only if 
auto-hibernation is disabled. So what does the name of the variable 
'link_broken_in_ah8' mean?

> @@ -6168,6 +6174,9 @@ static void ufshcd_err_handler(struct Scsi_Host *host)
>   		}
>   	}
>   
> +	if (link_broken_in_ah8)
> +		goto lock_skip_pending_xfer_clear;
> +
>   	/* Clear pending task management requests */
>   	for_each_set_bit(tag, &hba->outstanding_tasks, hba->nutmrs) {
>   		if (ufshcd_clear_tm_cmd(hba, tag)) {

Why is skipping the ufshcd_clear_tm_cmd() calls useful in this case?

Thanks,

Bart.
Kiwoong Kim Oct. 8, 2021, 2:02 a.m. UTC | #2
> On 10/7/21 12:40 AM, Kiwoong Kim wrote:
> > When an scsi command is dispatched right after host complete all the
> > pending requests goes to idle and ufs driver tries to ring a doorbell,
> > host might be still during entering into hibern8. If an error occurrs
> > during that period, the doorbell might not be zero. In this case,
> > clearing it should be needed to requeue its command.
> > Currently, ufshcd_err_handler goes directly to reset when the driver's
> > link state is broken. This patch is to make it clear doorbells in the
> > case. In the situation, communication would be disabled, so TM isn't
> > necessary or we can say it's not available.
> 
> The above text is too hard to comprehend. Please make it more clear. As an
> example, in the first sentence, what does "goes to idle" apply to?
> Does it apply to "SCSI command", "pending requests" or something else?

My 'goes to idle' means a state of no pended UTP requests.
I write 'scsi command' because the symptom that I saw is related with scsi command.

> 
> What mechanism does "If an error occurs" refer to? A SCSI command
> processing error, a link error or another type of error?

Hibern8 errors written in the title.

> 
> > Here's an actual symptom that I've faced. At the time, tag #17 is
> > still pended even after host reset. And then the block timer is
> > expired.
> 
> pended -> pending.

Got it.
> 
> > exynos-ufs 11100000.ufs: ufshcd_check_errors: Auto Hibern8 Enter
> > failed - status: 0x00000040, upmcrs: 0x00000001 ..
> > host_regs: 00000050: b8671000 00000008 00020000 00000000 ..
> > exynos-ufs 11100000.ufs: ufshcd_abort: Device abort task at tag 17
> 
> The relationship between the example and the description above the example
> is not clear. If a command is pending before the error handler starts,
> aborting that command fails and the host is not reset then the command
> will still be pending after the error handler has finished.
> 
> > @@ -6138,7 +6139,12 @@ static void ufshcd_err_handler(struct Scsi_Host
> *host)
> >   	     (hba->saved_uic_err & (UFSHCD_UIC_DL_NAC_RECEIVED_ERROR |
> >   				    UFSHCD_UIC_DL_TCx_REPLAY_ERROR)))) {
> >   		needs_reset = true;
> > -		goto do_reset;
> > +		spin_lock_irqsave(hba->host->host_lock, flags);
> > +		if (!hba->ahit && ufshcd_is_link_broken(hba))
> > +			link_broken_in_ah8 = true;
> > +		spin_unlock_irqrestore(hba->host->host_lock, flags);
> > +		if (!link_broken_in_ah8)
> > +			goto do_reset;
> >   	}
> >
> 
> My understanding is that hba->ahit == 0 means that auto-hibernation is
> disabled. Hence, the above code sets 'link_broken_in_ah8' only if auto-
> hibernation is disabled. So what does the name of the variable
> 'link_broken_in_ah8' mean?

Mistake. And while I'm commenting, I get a better idea and will revise this patch.

> 
> > @@ -6168,6 +6174,9 @@ static void ufshcd_err_handler(struct Scsi_Host
> *host)
> >   		}
> >   	}
> >
> > +	if (link_broken_in_ah8)
> > +		goto lock_skip_pending_xfer_clear;
> > +
> >   	/* Clear pending task management requests */
> >   	for_each_set_bit(tag, &hba->outstanding_tasks, hba->nutmrs) {
> >   		if (ufshcd_clear_tm_cmd(hba, tag)) {
> 
> Why is skipping the ufshcd_clear_tm_cmd() calls useful in this case?
> 
> Thanks,
> 
> Bart.
>
diff mbox series

Patch

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 9faf02c..13f406d 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6076,6 +6076,7 @@  static void ufshcd_err_handler(struct Scsi_Host *host)
 	int err = 0, pmc_err;
 	int tag;
 	bool needs_reset = false, needs_restore = false;
+	bool link_broken_in_ah8 = false;
 
 	down(&hba->host_sem);
 	spin_lock_irqsave(hba->host->host_lock, flags);
@@ -6138,7 +6139,12 @@  static void ufshcd_err_handler(struct Scsi_Host *host)
 	     (hba->saved_uic_err & (UFSHCD_UIC_DL_NAC_RECEIVED_ERROR |
 				    UFSHCD_UIC_DL_TCx_REPLAY_ERROR)))) {
 		needs_reset = true;
-		goto do_reset;
+		spin_lock_irqsave(hba->host->host_lock, flags);
+		if (!hba->ahit && ufshcd_is_link_broken(hba))
+			link_broken_in_ah8 = true;
+		spin_unlock_irqrestore(hba->host->host_lock, flags);
+		if (!link_broken_in_ah8)
+			goto do_reset;
 	}
 
 	/*
@@ -6168,6 +6174,9 @@  static void ufshcd_err_handler(struct Scsi_Host *host)
 		}
 	}
 
+	if (link_broken_in_ah8)
+		goto lock_skip_pending_xfer_clear;
+
 	/* Clear pending task management requests */
 	for_each_set_bit(tag, &hba->outstanding_tasks, hba->nutmrs) {
 		if (ufshcd_clear_tm_cmd(hba, tag)) {