diff mbox series

[V2,1/1] scsi/ses: Saw "Failed to get diagnostic page 0x1"

Message ID 1631300645-27662-1-git-send-email-wenxiong@linux.vnet.ibm.com (mailing list archive)
State Superseded
Headers show
Series [V2,1/1] scsi/ses: Saw "Failed to get diagnostic page 0x1" | expand

Commit Message

wenxiong@linux.vnet.ibm.com Sept. 10, 2021, 7:04 p.m. UTC
From: Wen Xiong <root@ltczz405-lp2.aus.stglabs.ibm.com>

We saw two errors with Slider drawer:
- Failed to get diagnostic page 0x1 during booting up
- /sys/class/enclosure is empty with ipr adapter + Slider drawer

From scsi logging level with error=3, looks ses_recv_diag not try on a UA.
The patch addes retrying on a UA in ses_recv_diag();

Signed-Off-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/scsi/ses.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

kernel test robot Sept. 11, 2021, 12:23 a.m. UTC | #1
Hi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on scsi/for-next]
[also build test ERROR on mkp-scsi/for-next v5.14 next-20210910]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/wenxiong-linux-vnet-ibm-com/scsi-ses-Saw-Failed-to-get-diagnostic-page-0x1/20210911-043434
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
config: microblaze-buildonly-randconfig-r001-20210910 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/a117eeeef2a13989a97ac0e10d86ffa6314f481e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review wenxiong-linux-vnet-ibm-com/scsi-ses-Saw-Failed-to-get-diagnostic-page-0x1/20210911-043434
        git checkout a117eeeef2a13989a97ac0e10d86ffa6314f481e
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=microblaze SHELL=/bin/bash drivers/scsi/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/posix_types.h:5,
                    from include/uapi/linux/types.h:14,
                    from include/linux/types.h:6,
                    from include/linux/kasan-checks.h:5,
                    from include/asm-generic/rwonce.h:26,
                    from ./arch/microblaze/include/generated/asm/rwonce.h:1,
                    from include/linux/compiler.h:264,
                    from include/asm-generic/bug.h:5,
                    from ./arch/microblaze/include/generated/asm/bug.h:1,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/gfp.h:5,
                    from include/linux/slab.h:15,
                    from drivers/scsi/ses.c:8:
   drivers/scsi/ses.c: In function 'ses_recv_diag':
>> include/linux/stddef.h:8:14: warning: passing argument 7 of 'scsi_execute_req' makes integer from pointer without a cast [-Wint-conversion]
       8 | #define NULL ((void *)0)
         |              ^~~~~~~~~~~
         |              |
         |              void *
   drivers/scsi/ses.c:95:42: note: in expansion of macro 'NULL'
      95 |                         bufflen, &sshdr, NULL, SES_TIMEOUT, SES_RETRIES, NULL);
         |                                          ^~~~
   In file included from include/scsi/scsi_cmnd.h:12,
                    from drivers/scsi/ses.c:15:
   include/scsi/scsi_device.h:467:61: note: expected 'int' but argument is of type 'void *'
     467 |         unsigned bufflen, struct scsi_sense_hdr *sshdr, int timeout,
         |                                                         ~~~~^~~~~~~
>> drivers/scsi/ses.c:61:21: warning: passing argument 9 of 'scsi_execute_req' makes pointer from integer without a cast [-Wint-conversion]
      61 | #define SES_RETRIES 3
         |                     ^
         |                     |
         |                     int
   drivers/scsi/ses.c:95:61: note: in expansion of macro 'SES_RETRIES'
      95 |                         bufflen, &sshdr, NULL, SES_TIMEOUT, SES_RETRIES, NULL);
         |                                                             ^~~~~~~~~~~
   In file included from include/scsi/scsi_cmnd.h:12,
                    from drivers/scsi/ses.c:15:
   include/scsi/scsi_device.h:468:27: note: expected 'int *' but argument is of type 'int'
     468 |         int retries, int *resid)
         |                      ~~~~~^~~~~
>> drivers/scsi/ses.c:94:24: error: too many arguments to function 'scsi_execute_req'
      94 |                 ret =  scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf,
         |                        ^~~~~~~~~~~~~~~~
   In file included from include/scsi/scsi_cmnd.h:12,
                    from drivers/scsi/ses.c:15:
   include/scsi/scsi_device.h:465:19: note: declared here
     465 | static inline int scsi_execute_req(struct scsi_device *sdev,
         |                   ^~~~~~~~~~~~~~~~


vim +/scsi_execute_req +94 drivers/scsi/ses.c

    59	
    60	#define SES_TIMEOUT (30 * HZ)
  > 61	#define SES_RETRIES 3
    62	
    63	static void init_device_slot_control(unsigned char *dest_desc,
    64					     struct enclosure_component *ecomp,
    65					     unsigned char *status)
    66	{
    67		memcpy(dest_desc, status, 4);
    68		dest_desc[0] = 0;
    69		/* only clear byte 1 for ENCLOSURE_COMPONENT_DEVICE */
    70		if (ecomp->type == ENCLOSURE_COMPONENT_DEVICE)
    71			dest_desc[1] = 0;
    72		dest_desc[2] &= 0xde;
    73		dest_desc[3] &= 0x3c;
    74	}
    75	
    76	
    77	static int ses_recv_diag(struct scsi_device *sdev, int page_code,
    78				 void *buf, int bufflen)
    79	{
    80		int ret;
    81		unsigned char cmd[] = {
    82			RECEIVE_DIAGNOSTIC,
    83			1,		/* Set PCV bit */
    84			page_code,
    85			bufflen >> 8,
    86			bufflen & 0xff,
    87			0
    88		};
    89		unsigned char recv_page_code;
    90		struct scsi_sense_hdr sshdr;
    91		int retries = SES_RETRIES;
    92	
    93		do {
  > 94			ret =  scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf,
    95				bufflen, &sshdr, NULL, SES_TIMEOUT, SES_RETRIES, NULL);
    96	
    97		} while (scsi_sense_valid(&sshdr) &&
    98	                 sshdr.sense_key == UNIT_ATTENTION && --retries);
    99	
   100		if (unlikely(ret))
   101			return ret;
   102	
   103		recv_page_code = ((unsigned char *)buf)[0];
   104	
   105		if (likely(recv_page_code == page_code))
   106			return ret;
   107	
   108		/* successful diagnostic but wrong page code.  This happens to some
   109		 * USB devices, just print a message and pretend there was an error */
   110	
   111		sdev_printk(KERN_ERR, sdev,
   112			    "Wrong diagnostic page; asked for %d got %u\n",
   113			    page_code, recv_page_code);
   114	
   115		return -EINVAL;
   116	}
   117	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
James Bottomley Sept. 11, 2021, 12:44 a.m. UTC | #2
On Fri, 2021-09-10 at 14:04 -0500, wenxiong@linux.vnet.ibm.com wrote:
> From: Wen Xiong <root@ltczz405-lp2.aus.stglabs.ibm.com>
> 
> We saw two errors with Slider drawer:
> - Failed to get diagnostic page 0x1 during booting up
> - /sys/class/enclosure is empty with ipr adapter + Slider drawer
> 
> From scsi logging level with error=3, looks ses_recv_diag not try on
> a UA. The patch addes retrying on a UA in ses_recv_diag();

Do we know why the device is returning a UA?  Presumably it's a check
condition UA meaning the device is trying to tell us something and
wants us to request sense?

> Signed-Off-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
> Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
> ---
>  drivers/scsi/ses.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
> index c2afba2a5414..93f6a8ce1bea 100644
> --- a/drivers/scsi/ses.c
> +++ b/drivers/scsi/ses.c
> @@ -87,9 +87,16 @@ static int ses_recv_diag(struct scsi_device *sdev,
> int page_code,
>  		0
>  	};
>  	unsigned char recv_page_code;
> +	struct scsi_sense_hdr sshdr;
> +	int retries = SES_RETRIES;
> +
> +	do {
> +		ret =  scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE,
> buf,
> +			bufflen, &sshdr, NULL, SES_TIMEOUT, 

This grew an additional argument: you want to replace the NULL with the
sshdr, I think ... and compile test it next time.

> SES_RETRIES, NULL);

I think you want a 1 here instead of SES_RETRIES because you're
retrying for SES_RETRIES in an outer loop now, so if you maxed out both
sets of retries, you'd retry for SES_RETRIES^2.

If this is a CC/UA, you can simplify all this by setting

cmd->expecting_cc_ua = 1

and avoiding the loop.

James
Martin K. Petersen Sept. 13, 2021, 9:13 p.m. UTC | #3
Hi Wendy!

> I am going to re-do V2 patch with retries. Is it ok?

We probably do not want to blindly retry on all errors. We should only
retry in cases where the command has the potential to subsequently
succeed. Hence the request to retry "transient errors" in my previous
mail.

Classifying which errors should result in a retry is the important
piece. And for that we need to know exactly what error your tray is
returning.

I also suggest you have a look at scsi_check_sense().

Thanks!
Martin K. Petersen Sept. 14, 2021, 3:27 p.m. UTC | #4
Wendy,

> Below is error with enabling scsi_log_level;
>  
> [108017.427791] ses 0:0:9:0: tag#641 Sense Key : Unit Attention [current]
> [108017.427793] ses 0:0:9:0: tag#641 Add. Sense: Bus device reset function occurred

OK. I propose you refine your check to retry on NOT_READY as well as
UNIT_ATTENTION with ASC 0x29. I think that would be a good start. Do the
same for ses_send_diag().

Thanks!
James Bottomley Sept. 14, 2021, 3:50 p.m. UTC | #5
On Tue, 2021-09-14 at 11:27 -0400, Martin K. Petersen wrote:
> Wendy,
> 
> > Below is error with enabling scsi_log_level;
> >  
> > [108017.427791] ses 0:0:9:0: tag#641 Sense Key : Unit Attention
> > [current]
> > [108017.427793] ses 0:0:9:0: tag#641 Add. Sense: Bus device reset
> > function occurred
> 
> OK. I propose you refine your check to retry on NOT_READY as well as
> UNIT_ATTENTION with ASC 0x29. I think that would be a good start. Do
> the same for ses_send_diag().

Something is wrong with the outbound email: the list isn't getting
copies of this:

https://lore.kernel.org/all/yq1v934yysg.fsf@ca-mkp.ca.oracle.com/

because you're sending HTML email ... please don't.

I still think setting expecting_cc_ua is the best thing to do because
it's designed for exactly the problem you're seeing, but open coding it
in the loop would be fine as well.

James
Martin K. Petersen Sept. 14, 2021, 3:52 p.m. UTC | #6
Wendy,

A few small additional nits...

> From: Wen Xiong <root@ltczz405-lp2.aus.stglabs.ibm.com>

Please make sure your email address is correct.

> Signed-Off-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>

This should be Signed-off-by: and you need a space before your email
address.

> +		ret =  scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf,
> +			bufflen, &sshdr, NULL, SES_TIMEOUT, SES_RETRIES, NULL);

The sense header goes in the field before SES_TIMEOUT:

int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
                 int data_direction, void *buffer, unsigned bufflen,
                 unsigned char *sense, struct scsi_sense_hdr *sshdr,
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 int timeout, int retries, u64 flags, req_flags_t rq_flags,
                 ^^^^^^^^^^^
                 int *resid)

Thanks!

PS. Bonus points for whoever fixes up the scsi_execute calls to have
less than 10,000 arguments. One would be good. And some sensible input
validation/type checking.
Martin K. Petersen Sept. 14, 2021, 6:13 p.m. UTC | #7
James,

> I still think setting expecting_cc_ua is the best thing to do because
> it's designed for exactly the problem you're seeing, but open coding
> it in the loop would be fine as well.

The problem with that approach is that (as far as I can tell) we didn't
issue the reset in question.

We could explicitly set the flag in ses_recv_diag(). However, even
though scsi_check_sense() returns RETRY, scsi_decide_disposition()
bypasses it because a check condition is considered a "no retry" command
by scsi_noretry_cmd(). So despite expecting_cc_ua being set, the retry
is not performed.

Fun can of worms...
diff mbox series

Patch

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index c2afba2a5414..93f6a8ce1bea 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -87,9 +87,16 @@  static int ses_recv_diag(struct scsi_device *sdev, int page_code,
 		0
 	};
 	unsigned char recv_page_code;
+	struct scsi_sense_hdr sshdr;
+	int retries = SES_RETRIES;
+
+	do {
+		ret =  scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf,
+			bufflen, &sshdr, NULL, SES_TIMEOUT, SES_RETRIES, NULL);
+
+	} while (scsi_sense_valid(&sshdr) &&
+                 sshdr.sense_key == UNIT_ATTENTION && --retries);
 
-	ret =  scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, bufflen,
-				NULL, SES_TIMEOUT, SES_RETRIES, NULL);
 	if (unlikely(ret))
 		return ret;