diff mbox series

scsi: isci: initialize shost fully before calling scsi_add_host()

Message ID 20190108205043.3122-1-logang@deltatee.com (mailing list archive)
State New, archived
Headers show
Series scsi: isci: initialize shost fully before calling scsi_add_host() | expand

Commit Message

Logan Gunthorpe Jan. 8, 2019, 8:50 p.m. UTC
scsi_mq_setup_tags(), which is called by scsi_add_host(), calculates
the command size to allocate based on the prot_capabilities. In the
isci driver, scsi_host_set_prot() is called after scsi_add_host()
so the command size gets calculated to be smaller than it needs to be.
Eventually, scsi_mq_init_request() locates the 'prot_sdb' after the command
assuming it was sized correctly and a buffer overrun may occur.

However, seeing blk_mq_alloc_rqs() rounds up to the nearest cache line
size, the mistake can go unnoticed.

The bug was noticed after the struct request size was reduced by
commit 9d037ad707ed ("block: remove req->timeout_list")

Which likely reduced the allocated space for the request by an entire
cache line, enough that the overflow could be hit and it caused a panic,
on boot, at:

  RIP: 0010:t10_pi_complete+0x77/0x1c0
  Call Trace:
    <IRQ>
    sd_done+0xf5/0x340
    scsi_finish_command+0xc3/0x120
    blk_done_softirq+0x83/0xb0
    __do_softirq+0xa1/0x2e6
    irq_exit+0xbc/0xd0
    call_function_single_interrupt+0xf/0x20
    </IRQ>

sd_done() would call scsi_prot_sg_count() which reads the number of
entities in 'prot_sdb', but seeing 'prot_sdb' is located after the end of
the allocated space it reads a garbage number and erroneously calls
t10_pi_complete().

To prevent this, the calls to scsi_host_set_prot() are moved into
isci_host_alloc() before the call to scsi_add_host(). Out of caution,
also move the similar call to scsi_host_set_guard().

Fixes: 3d2d75254915 ("[SCSI] isci: T10 DIF support")
Link: http://lkml.kernel.org/r/da851333-eadd-163a-8c78-e1f4ec5ec857@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Intel SCU Linux support <intel-linux-scu@intel.com>
Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jeff Moyer <jmoyer@redhat.com>
---
 drivers/scsi/isci/init.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

Comments

Jeff Moyer Jan. 8, 2019, 9:25 p.m. UTC | #1
Logan Gunthorpe <logang@deltatee.com> writes:

> scsi_mq_setup_tags(), which is called by scsi_add_host(), calculates
> the command size to allocate based on the prot_capabilities. In the
> isci driver, scsi_host_set_prot() is called after scsi_add_host()
> so the command size gets calculated to be smaller than it needs to be.
> Eventually, scsi_mq_init_request() locates the 'prot_sdb' after the command
> assuming it was sized correctly and a buffer overrun may occur.

[...]

> To prevent this, the calls to scsi_host_set_prot() are moved into
> isci_host_alloc() before the call to scsi_add_host(). Out of caution,
> also move the similar call to scsi_host_set_guard().
>
> Fixes: 3d2d75254915 ("[SCSI] isci: T10 DIF support")
> Link: http://lkml.kernel.org/r/da851333-eadd-163a-8c78-e1f4ec5ec857@deltatee.com
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Intel SCU Linux support <intel-linux-scu@intel.com>
> Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Jeff Moyer <jmoyer@redhat.com>

Nice job, and excellent commit message.  We'll need a similar patch for
lpfc.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

> ---
>  drivers/scsi/isci/init.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
> index 68b90c4f79a3..1727d0c71b12 100644
> --- a/drivers/scsi/isci/init.c
> +++ b/drivers/scsi/isci/init.c
> @@ -576,6 +576,13 @@ static struct isci_host *isci_host_alloc(struct pci_dev *pdev, int id)
>  	shost->max_lun = ~0;
>  	shost->max_cmd_len = MAX_COMMAND_SIZE;
>  
> +	/* turn on DIF support */
> +	scsi_host_set_prot(shost,
> +			   SHOST_DIF_TYPE1_PROTECTION |
> +			   SHOST_DIF_TYPE2_PROTECTION |
> +			   SHOST_DIF_TYPE3_PROTECTION);
> +	scsi_host_set_guard(shost, SHOST_DIX_GUARD_CRC);
> +
>  	err = scsi_add_host(shost, &pdev->dev);
>  	if (err)
>  		goto err_shost;
> @@ -663,13 +670,6 @@ static int isci_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  			goto err_host_alloc;
>  		}
>  		pci_info->hosts[i] = h;
> -
> -		/* turn on DIF support */
> -		scsi_host_set_prot(to_shost(h),
> -				   SHOST_DIF_TYPE1_PROTECTION |
> -				   SHOST_DIF_TYPE2_PROTECTION |
> -				   SHOST_DIF_TYPE3_PROTECTION);
> -		scsi_host_set_guard(to_shost(h), SHOST_DIX_GUARD_CRC);
>  	}
>  
>  	err = isci_setup_interrupts(pdev);
Jens Axboe Jan. 8, 2019, 9:30 p.m. UTC | #2
On 1/8/19 1:50 PM, Logan Gunthorpe wrote:
> scsi_mq_setup_tags(), which is called by scsi_add_host(), calculates
> the command size to allocate based on the prot_capabilities. In the
> isci driver, scsi_host_set_prot() is called after scsi_add_host()
> so the command size gets calculated to be smaller than it needs to be.
> Eventually, scsi_mq_init_request() locates the 'prot_sdb' after the command
> assuming it was sized correctly and a buffer overrun may occur.
> 
> However, seeing blk_mq_alloc_rqs() rounds up to the nearest cache line
> size, the mistake can go unnoticed.
> 
> The bug was noticed after the struct request size was reduced by
> commit 9d037ad707ed ("block: remove req->timeout_list")
> 
> Which likely reduced the allocated space for the request by an entire
> cache line, enough that the overflow could be hit and it caused a panic,
> on boot, at:
> 
>   RIP: 0010:t10_pi_complete+0x77/0x1c0
>   Call Trace:
>     <IRQ>
>     sd_done+0xf5/0x340
>     scsi_finish_command+0xc3/0x120
>     blk_done_softirq+0x83/0xb0
>     __do_softirq+0xa1/0x2e6
>     irq_exit+0xbc/0xd0
>     call_function_single_interrupt+0xf/0x20
>     </IRQ>
> 
> sd_done() would call scsi_prot_sg_count() which reads the number of
> entities in 'prot_sdb', but seeing 'prot_sdb' is located after the end of
> the allocated space it reads a garbage number and erroneously calls
> t10_pi_complete().
> 
> To prevent this, the calls to scsi_host_set_prot() are moved into
> isci_host_alloc() before the call to scsi_add_host(). Out of caution,
> also move the similar call to scsi_host_set_guard().

Nice work!

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Martin K. Petersen Jan. 9, 2019, 3:29 a.m. UTC | #3
Logan,

> To prevent this, the calls to scsi_host_set_prot() are moved into
> isci_host_alloc() before the call to scsi_add_host(). Out of caution,
> also move the similar call to scsi_host_set_guard().

Applied to 5.0/scsi-fixes. Thanks much!
Christoph Hellwig Jan. 9, 2019, 6:41 p.m. UTC | #4
This looks good.  I wonder if there is any good way to prevent other
drivers from picking up this bug byt using a better interface, but
that should not delay your fix.
John Garry Jan. 10, 2019, 9:11 a.m. UTC | #5
On 09/01/2019 18:41, Christoph Hellwig wrote:
> This looks good.  I wonder if there is any good way to prevent other
> drivers from picking up this bug byt using a better interface, but
> that should not delay your fix.
>
> .
>

I noticed that hisi_sas has this same problem but I forgot to fix it.

So how about just drop these APIs and let the user set the shost 
protection parameters directly, like other shost parameters, which 
should make it a bit clearer when these should be set, i.e. before 
scsi_add_host()?

John
Martin K. Petersen Jan. 12, 2019, 2:34 a.m. UTC | #6
John,

> So how about just drop these APIs and let the user set the shost
> protection parameters directly, like other shost parameters,

The protection interfaces here obviously predate the block layer
allocation changes that made this particular issue pop up.

> which should make it a bit clearer when these should be set,
> i.e. before scsi_add_host()?

In general, I am not so keen on the somewhat messy intersection between
the host parameters and the host template. The static host templates
made lots of sense in the days of Seagate ST01 and fixed hardware
capabilities.  But reality is that most modern controllers have to query
firmware interfaces to figure out what their actual capabilities are.

So in this case I think that accessor functions are actually better
because they allow us to print a big fat warning when you twiddle
something you shouldn't post-initialization. So that's something I think
we could--and should--improve.
John Garry Jan. 14, 2019, 12:10 p.m. UTC | #7
On 12/01/2019 02:34, Martin K. Petersen wrote:
>
> John,
>
>> So how about just drop these APIs and let the user set the shost
>> protection parameters directly, like other shost parameters,
>
> The protection interfaces here obviously predate the block layer
> allocation changes that made this particular issue pop up.
>
>> which should make it a bit clearer when these should be set,
>> i.e. before scsi_add_host()?
>
> In general, I am not so keen on the somewhat messy intersection between
> the host parameters and the host template. The static host templates
> made lots of sense in the days of Seagate ST01 and fixed hardware
> capabilities.  But reality is that most modern controllers have to query
> firmware interfaces to figure out what their actual capabilities are.

Hi Martin,

I am not suggested setting the parameters via scsi host template, but 
rather dynamically (as we currently do) but just drop the set helper 
functions, like:

     shost->max_channel = 1;
     shost->max_cmd_len = 16;

     ...

     if (hisi_hba->prot_mask) {
         dev_info(dev, "Registering for DIF/DIX prot_mask=0x%x\n",
              prot_mask);
-        scsi_host_set_prot(hisi_hba->shost, prot_mask);
+        shost->prot_capabilities = prot_mask;
     }

     rc = scsi_add_host(shost, dev);
     if (rc)
         goto err_out_ha;

     rc = sas_register_ha(sha);
     if (rc)
         goto err_out_register_ha;

I find that it is not crystal clear when scsi_host_set_prot() and 
scsi_host_set_guard() should be called, but not so for setting the shost 
parameters directly, which is common.

>
> So in this case I think that accessor functions are actually better
> because they allow us to print a big fat warning when you twiddle
> something you shouldn't post-initialization. So that's something I think
> we could--and should--improve.
>

Sure, this is an alternative, but I would rather make it obvious when 
these parameters should be set so that this would not be required.

Thanks,
John
Martin K. Petersen Jan. 16, 2019, 2:54 a.m. UTC | #8
Hi John,

>> So in this case I think that accessor functions are actually better
>> because they allow us to print a big fat warning when you twiddle
>> something you shouldn't post-initialization. So that's something I think
>> we could--and should--improve.
>>
> Sure, this is an alternative, but I would rather make it obvious when
> these parameters should be set so that this would not be required.

I would like to have a mechanism in place that warns if you twiddle
things that break assumptions made at host registration time. That's not
a scenario the old registration interface was designed to handle.

I am not sure I agree with your assertion that setting these masks in
the struct prior to scsi_add_host() makes this ordering requirement much
more obvious. It is not like you are passing in a list of parameters and
then receiving a separately instantiated immutable scsi_host object. You
are performing an operation on something you already have and own.

That's why I commented that the current intersection between
compile-time static host template, dynamically discovered
pre-registration scsi_host parameters, and the registered runtime
scsi_host struct is somewhat messy.

Btw. I have no attachment to the prot wrappers whatsoever. The reason
they exist is that the SCSI integrity support was optional. And
therefore we had stub functions so things could be compiled without any
of the integrity fields being present in the various SCSI structs. So I
don't have any problem killing the wrappers except they may actually be
handy for regressions like this one where you could #error if the driver
writer violates the ordering requirement.
John Garry Jan. 16, 2019, 2:44 p.m. UTC | #9
On 16/01/2019 02:54, Martin K. Petersen wrote:
>
> Hi John,
>

Hi Martin,

>>> So in this case I think that accessor functions are actually better
>>> because they allow us to print a big fat warning when you twiddle
>>> something you shouldn't post-initialization. So that's something I think
>>> we could--and should--improve.
>>>
>> Sure, this is an alternative, but I would rather make it obvious when
>> these parameters should be set so that this would not be required.
>
> I would like to have a mechanism in place that warns if you twiddle
> things that break assumptions made at host registration time.

Yes, something more robust would be good.

That's not
> a scenario the old registration interface was designed to handle.
>
> I am not sure I agree with your assertion that setting these masks in
> the struct prior to scsi_add_host() makes this ordering requirement much
> more obvious.
It is not like you are passing in a list of parameters and
> then receiving a separately instantiated immutable scsi_host object. You
> are performing an operation on something you already have and own.
>
> That's why I commented that the current intersection between
> compile-time static host template, dynamically discovered
> pre-registration scsi_host parameters, and the registered runtime
> scsi_host struct is somewhat messy.
>
> Btw. I have no attachment to the prot wrappers whatsoever. The reason
> they exist is that the SCSI integrity support was optional. And
> therefore we had stub functions so things could be compiled without any
> of the integrity fields being present in the various SCSI structs.

I never noticed stubs for setting/getting 
Scsi_host.prot_{capabilities,guard_type}

So I
> don't have any problem killing the wrappers except they may actually be
> handy for regressions like this one where you could #error if the driver
> writer violates the ordering requirement.
>

We set many Scsi_host parameters without such safeguarding, and I don't 
know what's special about these protection-related members.

Thanks,
John
diff mbox series

Patch

diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
index 68b90c4f79a3..1727d0c71b12 100644
--- a/drivers/scsi/isci/init.c
+++ b/drivers/scsi/isci/init.c
@@ -576,6 +576,13 @@  static struct isci_host *isci_host_alloc(struct pci_dev *pdev, int id)
 	shost->max_lun = ~0;
 	shost->max_cmd_len = MAX_COMMAND_SIZE;
 
+	/* turn on DIF support */
+	scsi_host_set_prot(shost,
+			   SHOST_DIF_TYPE1_PROTECTION |
+			   SHOST_DIF_TYPE2_PROTECTION |
+			   SHOST_DIF_TYPE3_PROTECTION);
+	scsi_host_set_guard(shost, SHOST_DIX_GUARD_CRC);
+
 	err = scsi_add_host(shost, &pdev->dev);
 	if (err)
 		goto err_shost;
@@ -663,13 +670,6 @@  static int isci_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 			goto err_host_alloc;
 		}
 		pci_info->hosts[i] = h;
-
-		/* turn on DIF support */
-		scsi_host_set_prot(to_shost(h),
-				   SHOST_DIF_TYPE1_PROTECTION |
-				   SHOST_DIF_TYPE2_PROTECTION |
-				   SHOST_DIF_TYPE3_PROTECTION);
-		scsi_host_set_guard(to_shost(h), SHOST_DIX_GUARD_CRC);
 	}
 
 	err = isci_setup_interrupts(pdev);