SCSI regression in 4.11

Message ID	20170227171931.30b9f619@xeon-e3 (mailing list archive)
State	Not Applicable, archived
Headers	show Return-Path: <linux-scsi-owner@kernel.org> Date: Mon, 27 Feb 2017 17:19:31 -0800 From: Stephen Hemminger <stephen@networkplumber.org> To: Christoph Hellwig <hch@lst.de>, Linus Torvalds <torvalds@linux-foundation.org>, "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>, "Martin K. Petersen" <martin.petersen@oracle.com>, "K. Y. Srinivasan" <kys@microsoft.com>, Dexuan Cui <decui@microsoft.com>, Jens Axboe <axboe@kernel.dk>, "Long Li" <longli@microsoft.com>, "Josh Poulson" <jopoulso@microsoft.com>, v-adsuho@microsoft.com Cc: linux-scsi@vger.kernel.org Subject: Re: SCSI regression in 4.11 Message-ID: <20170227171931.30b9f619@xeon-e3> In-Reply-To: <20170227152955.1362aabb@xeon-e3> References: <20170227152955.1362aabb@xeon-e3> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk

Stephen Hemminger Feb. 28, 2017, 1:19 a.m. UTC

On Mon, 27 Feb 2017 15:30:30 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:

> Something in SCSI in 4.11 broke booting on Hyper-V Generation 2 VM with 8 VCPU and 4G of memory.
> Both Linus's current tree (4.11 pre-rc1) and linux-next fail in a similar manner. It looks like some error
> in SCSI device detection because there is only a single device.
> 
> The offending commit causing the regression is:
> 
> $ git bisect bad
> e9c787e65c0c36529745be47d490d998b4b6e589 is the first bad commit
> commit e9c787e65c0c36529745be47d490d998b4b6e589
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Mon Jan 2 21:55:26 2017 +0300
> 
>     scsi: allocate scsi_cmnd structures as part of struct request
>     
>     Rely on the new block layer functionality to allocate additional driver
>     specific data behind struct request instead of implementing it in SCSI
>     itѕelf.
>     
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
>     Reviewed-by: Hannes Reinecke <hare@suse.com>
>     Signed-off-by: Jens Axboe <axboe@fb.com>
> 
> :040000 040000 6ff016fcdae227efeb19c1c301b17ccd7ea35da6 70d79f99d9b79ecf4dccbe067fc697219f5c78da M	drivers
> :040000 040000 a672ff52df8b2c211b3f98cae4a88d8a96ccde0b 1aaaed7de0994f597c7f8290c722a0b4a7789429 M	include
> 
> I checked and tree is current and up to date and includes 
> commit ee5242360424b9b967454e9183767323d10cf985
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Tue Feb 21 10:04:55 2017 +0100
> 
>     scsi: zero per-cmd driver data before each I/O
> 
> Kernel config is attached. It started with Ubuntu config, but then did localmodconfig and pruned
> out unnecessary stuff.
> 

This problem I am seeing looks like the one addressed by:

Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each I/O")

but that is already in linux-next.

Noticed another place where memset(of the data was being done not the extra bits.
Tried this, but didn't fix it either...

Jens Axboe Feb. 28, 2017, 2:16 a.m. UTC | #1

On 02/27/2017 06:19 PM, Stephen Hemminger wrote:
> On Mon, 27 Feb 2017 15:30:30 -0800
> Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
>> Something in SCSI in 4.11 broke booting on Hyper-V Generation 2 VM with 8 VCPU and 4G of memory.
>> Both Linus's current tree (4.11 pre-rc1) and linux-next fail in a similar manner. It looks like some error
>> in SCSI device detection because there is only a single device.
>>
>> The offending commit causing the regression is:
>>
>> $ git bisect bad
>> e9c787e65c0c36529745be47d490d998b4b6e589 is the first bad commit
>> commit e9c787e65c0c36529745be47d490d998b4b6e589
>> Author: Christoph Hellwig <hch@lst.de>
>> Date:   Mon Jan 2 21:55:26 2017 +0300
>>
>>     scsi: allocate scsi_cmnd structures as part of struct request
>>     
>>     Rely on the new block layer functionality to allocate additional driver
>>     specific data behind struct request instead of implementing it in SCSI
>>     itѕelf.
>>     
>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>     Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
>>     Reviewed-by: Hannes Reinecke <hare@suse.com>
>>     Signed-off-by: Jens Axboe <axboe@fb.com>
>>
>> :040000 040000 6ff016fcdae227efeb19c1c301b17ccd7ea35da6 70d79f99d9b79ecf4dccbe067fc697219f5c78da M	drivers
>> :040000 040000 a672ff52df8b2c211b3f98cae4a88d8a96ccde0b 1aaaed7de0994f597c7f8290c722a0b4a7789429 M	include
>>
>> I checked and tree is current and up to date and includes 
>> commit ee5242360424b9b967454e9183767323d10cf985
>> Author: Christoph Hellwig <hch@lst.de>
>> Date:   Tue Feb 21 10:04:55 2017 +0100
>>
>>     scsi: zero per-cmd driver data before each I/O
>>
>> Kernel config is attached. It started with Ubuntu config, but then did localmodconfig and pruned
>> out unnecessary stuff.
>>
> 
> This problem I am seeing looks like the one addressed by:
> 
> Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each I/O")
> 
> but that is already in linux-next.
> 
> Noticed another place where memset(of the data was being done not the extra bits.
> Tried this, but didn't fix it either...

Yeah, that fix is already in Linus's tree. But it does look like a replica of
what Dexuan reported. Out of curiosity, does it boot if you enable
CONFIG_SCSI_MQ_DEFAULT?

Christoph, looks like the previous fix wasn't complete...

Christoph Hellwig Feb. 28, 2017, 2:08 p.m. UTC | #2

On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:
> Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each I/O")
> 
> but that is already in linux-next.
> 
> Noticed another place where memset(of the data was being done not the extra bits.
> Tried this, but didn't fix it either...

Are you using blk-mq or the legacy request code?

Jens Axboe Feb. 28, 2017, 3:32 p.m. UTC | #3

On 02/28/2017 07:08 AM, Christoph Hellwig wrote:
> On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:
>> Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each I/O")
>>
>> but that is already in linux-next.
>>
>> Noticed another place where memset(of the data was being done not the extra bits.
>> Tried this, but didn't fix it either...
> 
> Are you using blk-mq or the legacy request code?

Stephen doesn't have MQ set in the config he posted, I'm assuming he didn't
boot with scsi_mod.use_blk_mq=true. In a previous email, I asked if turning
on MQ makes a difference.

James Bottomley Feb. 28, 2017, 5:06 p.m. UTC | #4

On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> On 02/28/2017 07:08 AM, Christoph Hellwig wrote:
> > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:
> > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
> > > I/O")
> > > 
> > > but that is already in linux-next.
> > > 
> > > Noticed another place where memset(of the data was being done not
> > > the extra bits.
> > > Tried this, but didn't fix it either...
> > 
> > Are you using blk-mq or the legacy request code?
> 
> Stephen doesn't have MQ set in the config he posted, I'm assuming he 
> didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
> asked if turning on MQ makes a difference.

OK, since we're not making much progress, Stephen, could you insert
some debugging into the storvsc driver?  The trace clearly shows we're
getting zeros back in the buffer when we should have data from the
initial scan.  Firstly, does the vmbus think it's transferring any data
for the INQUIRY and READ_CAPACITY commands (looks like
storvsc_command_completion() data_transfer_length)?  If it does,
there's probably an issue initialising the sg list.  If it doesn't,
we're probably sending bogus commands.

James

Stephen Hemminger Feb. 28, 2017, 5:16 p.m. UTC | #5

On Tue, 28 Feb 2017 09:06:13 -0800
James Bottomley <jejb@linux.vnet.ibm.com> wrote:

> On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> > On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
> > > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:  
> > > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
> > > > I/O")
> > > > 
> > > > but that is already in linux-next.
> > > > 
> > > > Noticed another place where memset(of the data was being done not
> > > > the extra bits.
> > > > Tried this, but didn't fix it either...  
> > > 
> > > Are you using blk-mq or the legacy request code?  
> > 
> > Stephen doesn't have MQ set in the config he posted, I'm assuming he 
> > didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
> > asked if turning on MQ makes a difference.  
> 
> OK, since we're not making much progress, Stephen, could you insert
> some debugging into the storvsc driver?  The trace clearly shows we're
> getting zeros back in the buffer when we should have data from the
> initial scan.  Firstly, does the vmbus think it's transferring any data
> for the INQUIRY and READ_CAPACITY commands (looks like
> storvsc_command_completion() data_transfer_length)?  If it does,
> there's probably an issue initialising the sg list.  If it doesn't,
> we're probably sending bogus commands.
> 
> James

I tried booting with scsi_mod.use_blk_mq=true and that did nothing.
Rebuilding now with 
CONFIG_SCSI_MQ_DEFAULT=y

Sure, can instrument. after that test.

Jens Axboe Feb. 28, 2017, 5:31 p.m. UTC | #6

On 02/28/2017 10:16 AM, Stephen Hemminger wrote:
> On Tue, 28 Feb 2017 09:06:13 -0800
> James Bottomley <jejb@linux.vnet.ibm.com> wrote:
> 
>> On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
>>> On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
>>>> On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:  
>>>>> Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
>>>>> I/O")
>>>>>
>>>>> but that is already in linux-next.
>>>>>
>>>>> Noticed another place where memset(of the data was being done not
>>>>> the extra bits.
>>>>> Tried this, but didn't fix it either...  
>>>>
>>>> Are you using blk-mq or the legacy request code?  
>>>
>>> Stephen doesn't have MQ set in the config he posted, I'm assuming he 
>>> didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
>>> asked if turning on MQ makes a difference.  
>>
>> OK, since we're not making much progress, Stephen, could you insert
>> some debugging into the storvsc driver?  The trace clearly shows we're
>> getting zeros back in the buffer when we should have data from the
>> initial scan.  Firstly, does the vmbus think it's transferring any data
>> for the INQUIRY and READ_CAPACITY commands (looks like
>> storvsc_command_completion() data_transfer_length)?  If it does,
>> there's probably an issue initialising the sg list.  If it doesn't,
>> we're probably sending bogus commands.
>>
>> James
> 
> I tried booting with scsi_mod.use_blk_mq=true and that did nothing.
> Rebuilding now with 
> CONFIG_SCSI_MQ_DEFAULT=y

If you already booted with scsi_mod.use_blk_mq=true and tested that,
then don't bother with the config option. They basically toggle
the same switch.

> Sure, can instrument. after that test.

That'd be great!

Stephen Hemminger Feb. 28, 2017, 5:33 p.m. UTC | #7

On Tue, 28 Feb 2017 15:08:12 +0100
Christoph Hellwig <hch@lst.de> wrote:

> On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:
> > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each I/O")
> > 
> > but that is already in linux-next.
> > 
> > Noticed another place where memset(of the data was being done not the extra bits.
> > Tried this, but didn't fix it either...  
> 
> Are you using blk-mq or the legacy request code?

I was using legacy, but even with CONFIG_SCSI_MQ_DEFAULT=y
the same failure occurs.

Stephen Hemminger Feb. 28, 2017, 6:57 p.m. UTC | #8

On Tue, 28 Feb 2017 09:06:13 -0800
James Bottomley <jejb@linux.vnet.ibm.com> wrote:

> On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> > On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
> > > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:  
> > > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
> > > > I/O")
> > > > 
> > > > but that is already in linux-next.
> > > > 
> > > > Noticed another place where memset(of the data was being done not
> > > > the extra bits.
> > > > Tried this, but didn't fix it either...  
> > > 
> > > Are you using blk-mq or the legacy request code?  
> > 
> > Stephen doesn't have MQ set in the config he posted, I'm assuming he 
> > didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
> > asked if turning on MQ makes a difference.  
> 
> OK, since we're not making much progress, Stephen, could you insert
> some debugging into the storvsc driver?  The trace clearly shows we're
> getting zeros back in the buffer when we should have data from the
> initial scan.  Firstly, does the vmbus think it's transferring any data
> for the INQUIRY and READ_CAPACITY commands (looks like
> storvsc_command_completion() data_transfer_length)?  If it does,
> there's probably an issue initialising the sg list.  If it doesn't,
> we're probably sending bogus commands.
> 
> James
> 

The following code in storvsc looks suspicious

static void storvsc_on_io_completion(struct storvsc_device *stor_device,
				  struct vstor_packet *vstor_packet,
				  struct storvsc_cmd_request *request)
{
	struct vstor_packet *stor_pkt;
	struct hv_device *device = stor_device->device;

	stor_pkt = &request->vstor_packet;

	/*
	 * The current SCSI handling on the host side does
	 * not correctly handle:
	 * INQUIRY command with page code parameter set to 0x80
	 * MODE_SENSE command with cmd[2] == 0x1c
	 *
	 * Setup srb and scsi status so this won't be fatal.
	 * We do this so we can distinguish truly fatal failues
	 * (srb status == 0x4) and off-line the device in that case.
	 */

	if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
	   (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
		vstor_packet->vm_srb.scsi_status = 0;
		vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
	}

If SCSI layer is sending inquiry about devices to do scanning then wouldn't this
workaround break things?  Maybe a better to fully test for the broken command.

Original commit was:

commit 4ed51a21c0f69e1379cf858fc21a9d9022bfe0e7
Author: K. Y. Srinivasan <kys@microsoft.com>
Date:   Sat Aug 27 11:31:26 2011 -0700

    Staging: hv: storvsc: Fixup srb and scsi status for INQUIRY and MODE_SENSE
    
    The current VHD handler on the Windows Host does not correctly handle
    INQUIRY and MODE_SENSE commands with some options. Fixup srb_status
    in these cases since the failure is not fatal.
    
    Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
    Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

James Bottomley Feb. 28, 2017, 11:48 p.m. UTC | #9

On Tue, 2017-02-28 at 10:57 -0800, Stephen Hemminger wrote:
> On Tue, 28 Feb 2017 09:06:13 -0800
> James Bottomley <jejb@linux.vnet.ibm.com> wrote:
> 
> > On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> > > On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
> > > > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger
> > > > wrote:  
> > > > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before
> > > > > each
> > > > > I/O")
> > > > > 
> > > > > but that is already in linux-next.
> > > > > 
> > > > > Noticed another place where memset(of the data was being done
> > > > > not
> > > > > the extra bits.
> > > > > Tried this, but didn't fix it either...  
> > > > 
> > > > Are you using blk-mq or the legacy request code?  
> > > 
> > > Stephen doesn't have MQ set in the config he posted, I'm assuming
> > > he 
> > > didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I
> > > asked if turning on MQ makes a difference.  
> > 
> > OK, since we're not making much progress, Stephen, could you insert
> > some debugging into the storvsc driver?  The trace clearly shows
> > we're
> > getting zeros back in the buffer when we should have data from the
> > initial scan.  Firstly, does the vmbus think it's transferring any
> > data
> > for the INQUIRY and READ_CAPACITY commands (looks like
> > storvsc_command_completion() data_transfer_length)?  If it does,
> > there's probably an issue initialising the sg list.  If it doesn't,
> > we're probably sending bogus commands.
> > 
> > James
> > 
> 
> The following code in storvsc looks suspicious
> 
> static void storvsc_on_io_completion(struct storvsc_device
> *stor_device,
> 				  struct vstor_packet *vstor_packet,
> 				  struct storvsc_cmd_request *request)
> {
> 	struct vstor_packet *stor_pkt;
> 	struct hv_device *device = stor_device->device;
> 
> 	stor_pkt = &request->vstor_packet;
> 
> 	/*
> 	 * The current SCSI handling on the host side does
> 	 * not correctly handle:
> 	 * INQUIRY command with page code parameter set to 0x80
> 	 * MODE_SENSE command with cmd[2] == 0x1c
> 	 *
> 	 * Setup srb and scsi status so this won't be fatal.
> 	 * We do this so we can distinguish truly fatal failues
> 	 * (srb status == 0x4) and off-line the device in that case.
> 	 */
> 
> 	if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> 	   (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
> 		vstor_packet->vm_srb.scsi_status = 0;
> 		vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
> 	}
> 
> If SCSI layer is sending inquiry about devices to do scanning then 
> wouldn't this workaround break things?  Maybe a better to fully test 
> for the broken command.

Let's concentrate on INQUIRY since that's the first command in the
probe sequence.  I think it's completing successfully because your
hyperv layer says it has 36 bytes of transfer and that's the size of a
successful initial INQUIRY, so the fact that the code above would break
stuff if the INQUIRY failed is orthogonal to the the current problem.

can you print out some of the DMA buffer in storvsc_on_io_completion()?

I think just the stor_pkt->vm_srb.cdb[0] (to identify the command
completing) and byte 5 of the buffer will tell us what we need to know.
 It's going to be complex to get byte 5, you'll need to do a
kmap_atomic_pfn on request->payload->range.pfn_array[0] and then look
at byte 5.  If that's zero it means there's some problem with hyperv
writing to the pfn if it's 0x24 (expected value for an initial inquiry)
we've got a problem somewhere in bio completion not copying the value
back.

James

SCSI regression in 4.11

Commit Message

Comments

Patch