diff mbox series

[v1,1/3] scsi: ufs: fix empty check of error history

Message ID 1578147968-30938-2-git-send-email-stanley.chu@mediatek.com (mailing list archive)
State Mainlined
Commit 645728a6448fece4817acdb7efba7c19e5c3e311
Headers show
Series scsi: ufs: fix error history and complete device reset history | expand

Commit Message

Stanley Chu Jan. 4, 2020, 2:26 p.m. UTC
Currently checking if an error history element is empty or
not is by its "value". In most cases, value is error code.

However this checking is not correct because some errors or
events do not specify any values in error history so values
remain as 0, and this will lead to incorrect empty checking.

Fix it by checking "timestamp" instead of "value" because
timestamp will be always assigned for all history elements

Cc: Alim Akhtar <alim.akhtar@samsung.com>
Cc: Asutosh Das <asutoshd@codeaurora.org>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Bean Huo <beanhuo@micron.com>
Cc: Can Guo <cang@codeaurora.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
---
 drivers/scsi/ufs/ufshcd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Asutosh Das (asd) Jan. 6, 2020, 5:34 p.m. UTC | #1
On 2020-01-04 06:26, Stanley Chu wrote:
> Currently checking if an error history element is empty or
> not is by its "value". In most cases, value is error code.
> 
> However this checking is not correct because some errors or
> events do not specify any values in error history so values
> remain as 0, and this will lead to incorrect empty checking.
> 
> Fix it by checking "timestamp" instead of "value" because
> timestamp will be always assigned for all history elements
> 
> Cc: Alim Akhtar <alim.akhtar@samsung.com>
> Cc: Asutosh Das <asutoshd@codeaurora.org>
> Cc: Avri Altman <avri.altman@wdc.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Bean Huo <beanhuo@micron.com>
> Cc: Can Guo <cang@codeaurora.org>
> Cc: Matthias Brugger <matthias.bgg@gmail.com>
> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
> ---
>  drivers/scsi/ufs/ufshcd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 1b97f2dc0b63..bae43da00bb6 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -385,7 +385,7 @@ static void ufshcd_print_err_hist(struct ufs_hba 
> *hba,
>  	for (i = 0; i < UFS_ERR_REG_HIST_LENGTH; i++) {
>  		int p = (i + err_hist->pos) % UFS_ERR_REG_HIST_LENGTH;
> 
> -		if (err_hist->reg[p] == 0)
> +		if (err_hist->tstamp[p] == 0)
>  			continue;
>  		dev_err(hba->dev, "%s[%d] = 0x%x at %lld us\n", err_name, p,
>  			err_hist->reg[p], ktime_to_us(err_hist->tstamp[p]));

Looks good to me.

Reviewed by:- Asutosh Das <asutoshd@codeaurora.org>
Bean Huo Jan. 13, 2020, 9:28 a.m. UTC | #2
Hi, Stanley

> 
> Currently checking if an error history element is empty or not is by its "value". In
> most cases, value is error code.
> 
> However this checking is not correct because some errors or events do not
> specify any values in error history so values remain as 0, and this will lead to
> incorrect empty checking.
> 
Do you think this is a bug of UFS host controller? According to the UFS host Spec, 
If there had error detected in each layer, at least bit31 in its error code register
Should be set to 1.

Why there was an error happening, but host didn't set this bit31?

Thanks.
//Bean
Stanley Chu Jan. 13, 2020, 9:43 a.m. UTC | #3
Hi Bean,

On Mon, 2020-01-13 at 09:28 +0000, Bean Huo (beanhuo) wrote:
> Hi, Stanley
> 
> > 
> > Currently checking if an error history element is empty or not is by its "value". In
> > most cases, value is error code.
> > 
> > However this checking is not correct because some errors or events do not
> > specify any values in error history so values remain as 0, and this will lead to
> > incorrect empty checking.
> > 
> Do you think this is a bug of UFS host controller? According to the UFS host Spec, 
> If there had error detected in each layer, at least bit31 in its error code register
> Should be set to 1.
> 
> Why there was an error happening, but host didn't set this bit31?
> 

Thanks so much for review.

Yes, the case bit[31] set is true for UIC errors.

However the users of UFS error history, i.e., users of
ufshcd_update_reg_hlist(), are not only UIC errors. Some other essential
events will update history too, for example, device reset events and
abort events.

Take "device reset events" as example: parameter "val" may be 0 while
invoking ufshcd_update_reg_hlist(). If this happens, the device reset
event will not be printed out because its err_hist->reg[p] is 0
according to the original logic in ufshcd_print_err_hist().

Feel free to correct above description if it is wrong.

Thanks,
Stanley
Bean Huo Jan. 13, 2020, 9:58 a.m. UTC | #4
> Subject: RE: [EXT] [PATCH v1 1/3] scsi: ufs: fix empty check of error history
> 
> Hi Bean,
> 
> On Mon, 2020-01-13 at 09:28 +0000, Bean Huo (beanhuo) wrote:
> > Hi, Stanley
> >
> > >
> > > Currently checking if an error history element is empty or not is by
> > > its "value". In most cases, value is error code.
> > >
> > > However this checking is not correct because some errors or events
> > > do not specify any values in error history so values remain as 0,
> > > and this will lead to incorrect empty checking.
> > >
> > Do you think this is a bug of UFS host controller? According to the
> > UFS host Spec, If there had error detected in each layer, at least
> > bit31 in its error code register Should be set to 1.
> >
> > Why there was an error happening, but host didn't set this bit31?
> >
> 
> Thanks so much for review.
> 
> Yes, the case bit[31] set is true for UIC errors.
> 
> However the users of UFS error history, i.e., users of ufshcd_update_reg_hlist(),
> are not only UIC errors. Some other essential events will update history too, for
> example, device reset events and abort events.
> 
> Take "device reset events" as example: parameter "val" may be 0 while invoking
> ufshcd_update_reg_hlist(). If this happens, the device reset event will not be
> printed out because its err_hist->reg[p] is 0 according to the original logic in
> ufshcd_print_err_hist().
> 
> Feel free to correct above description if it is wrong.
> 
> Thanks,
> Stanley

Hi, Stanley
Thanks, now clear, it is not controller negative act issue.

Reviewed-by: Bean Huo <beanhuo@micron.com>
diff mbox series

Patch

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1b97f2dc0b63..bae43da00bb6 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -385,7 +385,7 @@  static void ufshcd_print_err_hist(struct ufs_hba *hba,
 	for (i = 0; i < UFS_ERR_REG_HIST_LENGTH; i++) {
 		int p = (i + err_hist->pos) % UFS_ERR_REG_HIST_LENGTH;
 
-		if (err_hist->reg[p] == 0)
+		if (err_hist->tstamp[p] == 0)
 			continue;
 		dev_err(hba->dev, "%s[%d] = 0x%x at %lld us\n", err_name, p,
 			err_hist->reg[p], ktime_to_us(err_hist->tstamp[p]));