scsi: Update max_hw_sectors on rescan

Message ID	20240117213620.132880-1-brking@linux.vnet.ibm.com (mailing list archive)
State	Changes Requested
Headers	show Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40D1A63A5 for <linux-scsi@vger.kernel.org>; Wed, 17 Jan 2024 21:36:38 +0000 (UTC) From: Brian King <brking@linux.vnet.ibm.com> To: linux-scsi@vger.kernel.org Cc: brking@pobox.com, jejb@linux.ibm.com, martin.petersen@oracle.com, Brian King <brking@linux.vnet.ibm.com> Subject: [PATCH] scsi: Update max_hw_sectors on rescan Date: Wed, 17 Jan 2024 15:36:20 -0600 Message-Id: <20240117213620.132880-1-brking@linux.vnet.ibm.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	scsi: Update max_hw_sectors on rescan \| expand scsi: Update max_hw_sectors on rescan

Brian King Jan. 17, 2024, 9:36 p.m. UTC

This addresses an issue discovered on ibmvfc LUNs. For this driver,
max_sectors is negotiated with the VIOS. This gets done at initialization
time, then LUNs get scanned and things generally work fine. However,
this attribute can be changed on the VIOS, either due to a sysadmin
change or potentially a VIOS code level change. If this decreases
to a smaller value, due to one of these reasons, the next time the
ibmvfc driver performs an NPIV login, it will only be able to use
the smaller value. In the case of a VIOS reboot, when the VIOS goes
down, all paths through that VIOS will go to devloss state. When
the VIOS comes back up, ibmvfc negotiates max_sectors and will only
be able to get the smaller value and it will update shost->max_sectors.
However, when LUNs are scanned, the devloss paths will be found
and brought back online, still using the old max_hw_sectors. This
change ensures that max_hw_sectors gets updated.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/scsi/scsi_scan.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

John Garry Jan. 18, 2024, 3:44 p.m. UTC | #1

On 17/01/2024 21:36, Brian King wrote:
> This addresses an issue discovered on ibmvfc LUNs. For this driver,
> max_sectors is negotiated with the VIOS. This gets done at initialization
> time, then LUNs get scanned and things generally work fine. However,
> this attribute can be changed on the VIOS, either due to a sysadmin
> change or potentially a VIOS code level change. If this decreases
> to a smaller value, due to one of these reasons, the next time the
> ibmvfc driver performs an NPIV login, it will only be able to use
> the smaller value. In the case of a VIOS reboot, when the VIOS goes
> down, all paths through that VIOS will go to devloss state. When
> the VIOS comes back up, ibmvfc negotiates max_sectors and will only
> be able to get the smaller value and it will update shost->max_sectors.

Are you saying that the driver will manually update shost->max_sectors 
after adding the scsi host? I didn't think that was permitted.

Thanks,
John

> However, when LUNs are scanned, the devloss paths will be found
> and brought back online, still using the old max_hw_sectors. This
> change ensures that max_hw_sectors gets updated.
> 
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
> ---
>   drivers/scsi/scsi_scan.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 44680f65ea14..01f2b38daab3 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1162,6 +1162,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
>   	blist_flags_t bflags;
>   	int res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
>   	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> +	struct request_queue *q;
>   
>   	/*
>   	 * The rescan flag is used as an optimization, the first scan of a
> @@ -1182,6 +1183,10 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
>   				*bflagsp = scsi_get_device_flags(sdev,
>   								 sdev->vendor,
>   								 sdev->model);
> +			q = sdev->request_queue;
> +			if (queue_max_hw_sectors(q) > shost->max_sectors)
> +				blk_queue_max_hw_sectors(q, shost->max_sectors);
> +
>   			return SCSI_SCAN_LUN_PRESENT;
>   		}
>   		scsi_device_put(sdev);
> @@ -2006,4 +2011,3 @@ void scsi_forget_host(struct Scsi_Host *shost)
>   	}
>   	spin_unlock_irqrestore(shost->host_lock, flags);
>   }
> -

Brian King Jan. 18, 2024, 5:22 p.m. UTC | #2

On 1/18/24 9:44 AM, John Garry wrote:
> On 17/01/2024 21:36, Brian King wrote:
>> This addresses an issue discovered on ibmvfc LUNs. For this driver,
>> max_sectors is negotiated with the VIOS. This gets done at initialization
>> time, then LUNs get scanned and things generally work fine. However,
>> this attribute can be changed on the VIOS, either due to a sysadmin
>> change or potentially a VIOS code level change. If this decreases
>> to a smaller value, due to one of these reasons, the next time the
>> ibmvfc driver performs an NPIV login, it will only be able to use
>> the smaller value. In the case of a VIOS reboot, when the VIOS goes
>> down, all paths through that VIOS will go to devloss state. When
>> the VIOS comes back up, ibmvfc negotiates max_sectors and will only
>> be able to get the smaller value and it will update shost->max_sectors.
> 
> Are you saying that the driver will manually update shost->max_sectors after adding the scsi host? I didn't think that was permitted.

That is what happens. The characteristics of the underlying hardware can change across
a virtual adapter reset. 

Thanks,

Brian

> 
> Thanks,
> John
> 
>> However, when LUNs are scanned, the devloss paths will be found
>> and brought back online, still using the old max_hw_sectors. This
>> change ensures that max_hw_sectors gets updated.
>>
>> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
>> ---
>>   drivers/scsi/scsi_scan.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
>> index 44680f65ea14..01f2b38daab3 100644
>> --- a/drivers/scsi/scsi_scan.c
>> +++ b/drivers/scsi/scsi_scan.c
>> @@ -1162,6 +1162,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
>>       blist_flags_t bflags;
>>       int res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
>>       struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>> +    struct request_queue *q;
>>         /*
>>        * The rescan flag is used as an optimization, the first scan of a
>> @@ -1182,6 +1183,10 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
>>                   *bflagsp = scsi_get_device_flags(sdev,
>>                                    sdev->vendor,
>>                                    sdev->model);
>> +            q = sdev->request_queue;
>> +            if (queue_max_hw_sectors(q) > shost->max_sectors)
>> +                blk_queue_max_hw_sectors(q, shost->max_sectors);
>> +
>>               return SCSI_SCAN_LUN_PRESENT;
>>           }
>>           scsi_device_put(sdev);
>> @@ -2006,4 +2011,3 @@ void scsi_forget_host(struct Scsi_Host *shost)
>>       }
>>       spin_unlock_irqrestore(shost->host_lock, flags);
>>   }
>> -
>

John Garry Jan. 19, 2024, 9:02 a.m. UTC | #3

On 18/01/2024 17:22, Brian King wrote:
> On 1/18/24 9:44 AM, John Garry wrote:
>> On 17/01/2024 21:36, Brian King wrote:
>>> This addresses an issue discovered on ibmvfc LUNs. For this driver,
>>> max_sectors is negotiated with the VIOS. This gets done at initialization
>>> time, then LUNs get scanned and things generally work fine. However,
>>> this attribute can be changed on the VIOS, either due to a sysadmin
>>> change or potentially a VIOS code level change. If this decreases
>>> to a smaller value, due to one of these reasons, the next time the
>>> ibmvfc driver performs an NPIV login, it will only be able to use
>>> the smaller value. In the case of a VIOS reboot, when the VIOS goes
>>> down, all paths through that VIOS will go to devloss state. When
>>> the VIOS comes back up, ibmvfc negotiates max_sectors and will only
>>> be able to get the smaller value and it will update shost->max_sectors.
>>
>> Are you saying that the driver will manually update shost->max_sectors after adding the scsi host? I didn't think that was permitted.
> 
> That is what happens. The characteristics of the underlying hardware can change across
> a virtual adapter reset.

That's unfortunate.

I don't think that it's a good idea to change shost->max_sectors after 
adding the scsi host or to add core code to condone doing it. Indeed, 
there is code there to limit shost->max_sectors from DMA mapping 
constraints in scsi_add_host() path, which should not be ignored.

Would it be possible to initially set shost->max_sectors for this 
adapter at the lowest anticipated value for that adapter and don't 
change thereafter?

Thanks,
John

> 
> Thanks,
> 
> Brian
> 
>>
>> Thanks,
>> John
>>
>>> However, when LUNs are scanned, the devloss paths will be found
>>> and brought back online, still using the old max_hw_sectors. This
>>> change ensures that max_hw_sectors gets updated.
>>>
>>> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
>>> ---
>>>    drivers/scsi/scsi_scan.c | 6 +++++-
>>>    1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
>>> index 44680f65ea14..01f2b38daab3 100644
>>> --- a/drivers/scsi/scsi_scan.c
>>> +++ b/drivers/scsi/scsi_scan.c
>>> @@ -1162,6 +1162,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
>>>        blist_flags_t bflags;
>>>        int res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
>>>        struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>>> +    struct request_queue *q;
>>>          /*
>>>         * The rescan flag is used as an optimization, the first scan of a
>>> @@ -1182,6 +1183,10 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
>>>                    *bflagsp = scsi_get_device_flags(sdev,
>>>                                     sdev->vendor,
>>>                                     sdev->model);
>>> +            q = sdev->request_queue;
>>> +            if (queue_max_hw_sectors(q) > shost->max_sectors)
>>> +                blk_queue_max_hw_sectors(q, shost->max_sectors);
>>> +
>>>                return SCSI_SCAN_LUN_PRESENT;
>>>            }
>>>            scsi_device_put(sdev);
>>> @@ -2006,4 +2011,3 @@ void scsi_forget_host(struct Scsi_Host *shost)
>>>        }
>>>        spin_unlock_irqrestore(shost->host_lock, flags);
>>>    }
>>> -
>>
>

Brian King Jan. 23, 2024, 1:59 p.m. UTC | #4

On 1/19/24 3:02 AM, John Garry wrote:
> On 18/01/2024 17:22, Brian King wrote:
>> On 1/18/24 9:44 AM, John Garry wrote:
>>> On 17/01/2024 21:36, Brian King wrote:
>>>> This addresses an issue discovered on ibmvfc LUNs. For this driver,
>>>> max_sectors is negotiated with the VIOS. This gets done at initialization
>>>> time, then LUNs get scanned and things generally work fine. However,
>>>> this attribute can be changed on the VIOS, either due to a sysadmin
>>>> change or potentially a VIOS code level change. If this decreases
>>>> to a smaller value, due to one of these reasons, the next time the
>>>> ibmvfc driver performs an NPIV login, it will only be able to use
>>>> the smaller value. In the case of a VIOS reboot, when the VIOS goes
>>>> down, all paths through that VIOS will go to devloss state. When
>>>> the VIOS comes back up, ibmvfc negotiates max_sectors and will only
>>>> be able to get the smaller value and it will update shost->max_sectors.
>>>
>>> Are you saying that the driver will manually update shost->max_sectors after adding the scsi host? I didn't think that was permitted.
>>
>> That is what happens. The characteristics of the underlying hardware can change across
>> a virtual adapter reset.
> 
> That's unfortunate.
> 
> I don't think that it's a good idea to change shost->max_sectors after adding the scsi host or to add core code to condone doing it. Indeed, there is code there to limit shost->max_sectors from DMA mapping constraints in scsi_add_host() path, which should not be ignored.

Good point. However, this patch only lowers max_hw sectors if shost->max_sectors has since been decreased.

> 
> Would it be possible to initially set shost->max_sectors for this adapter at the lowest anticipated value for that adapter and don't change thereafter?

Different physical backing devices support different ranges of values and the physical backing
device can change dynamically. There is currently no defined way for the client to determine
what the lowest possible value is. The downside to adding such an attribute would be that
we'd then always be limited to an arbitrarily small value, which would limit the performance.

Thanks,

Brian

> 
> Thanks,
> John
> 
>>
>> Thanks,
>>
>> Brian
>>
>>>
>>> Thanks,
>>> John
>>>
>>>> However, when LUNs are scanned, the devloss paths will be found
>>>> and brought back online, still using the old max_hw_sectors. This
>>>> change ensures that max_hw_sectors gets updated.
>>>>
>>>> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
>>>> ---
>>>>    drivers/scsi/scsi_scan.c | 6 +++++-
>>>>    1 file changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
>>>> index 44680f65ea14..01f2b38daab3 100644
>>>> --- a/drivers/scsi/scsi_scan.c
>>>> +++ b/drivers/scsi/scsi_scan.c
>>>> @@ -1162,6 +1162,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
>>>>        blist_flags_t bflags;
>>>>        int res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
>>>>        struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>>>> +    struct request_queue *q;
>>>>          /*
>>>>         * The rescan flag is used as an optimization, the first scan of a
>>>> @@ -1182,6 +1183,10 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
>>>>                    *bflagsp = scsi_get_device_flags(sdev,
>>>>                                     sdev->vendor,
>>>>                                     sdev->model);
>>>> +            q = sdev->request_queue;
>>>> +            if (queue_max_hw_sectors(q) > shost->max_sectors)
>>>> +                blk_queue_max_hw_sectors(q, shost->max_sectors);
>>>> +
>>>>                return SCSI_SCAN_LUN_PRESENT;
>>>>            }
>>>>            scsi_device_put(sdev);
>>>> @@ -2006,4 +2011,3 @@ void scsi_forget_host(struct Scsi_Host *shost)
>>>>        }
>>>>        spin_unlock_irqrestore(shost->host_lock, flags);
>>>>    }
>>>> -
>>>
>>
> 
>

Mike Christie Jan. 23, 2024, 10:40 p.m. UTC | #5

On 1/17/24 3:36 PM, Brian King wrote:
> This addresses an issue discovered on ibmvfc LUNs. For this driver,
> max_sectors is negotiated with the VIOS. This gets done at initialization
> time, then LUNs get scanned and things generally work fine. However,
> this attribute can be changed on the VIOS, either due to a sysadmin
> change or potentially a VIOS code level change. If this decreases
> to a smaller value, due to one of these reasons, the next time the
> ibmvfc driver performs an NPIV login, it will only be able to use
> the smaller value. In the case of a VIOS reboot, when the VIOS goes
> down, all paths through that VIOS will go to devloss state. When
> the VIOS comes back up, ibmvfc negotiates max_sectors and will only
> be able to get the smaller value and it will update shost->max_sectors.
> However, when LUNs are scanned, the devloss paths will be found
> and brought back online, still using the old max_hw_sectors. This
> change ensures that max_hw_sectors gets updated.
> 
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
> ---
>  drivers/scsi/scsi_scan.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 44680f65ea14..01f2b38daab3 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1162,6 +1162,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
>  	blist_flags_t bflags;
>  	int res = SCSI_SCAN_NO_RESPONSE, result_len = 256;
>  	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> +	struct request_queue *q;
>  
>  	/*
>  	 * The rescan flag is used as an optimization, the first scan of a
> @@ -1182,6 +1183,10 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget,
>  				*bflagsp = scsi_get_device_flags(sdev,
>  								 sdev->vendor,
>  								 sdev->model);
> +			q = sdev->request_queue;
> +			if (queue_max_hw_sectors(q) > shost->max_sectors)
> +				blk_queue_max_hw_sectors(q, shost->max_sectors);
> +

What happens if commands that are larger than the new shost->max_sectors get
sent to the driver/device?

For example, if we called fc_remote_port_add and scsi_target_unblock puts the
existing devices into SDEV_RUNNING, then we do the scsi_scan_target call and
hit the code above, could we have commands in the request_queue already (we
relogin before fast_io_fail even fires so the commands never get failed)?
It looks like commands have already passed checks like bio_may_exceed_limit
and will be sent to the driver. Will the driver/device spit out an error?

Is this ok, or do you need some sort of flush and limit re-check/re-split?

Christoph Hellwig Jan. 24, 2024, 9:24 a.m. UTC | #6

We can't change the host-wide limit here (it wouldn't apply to all
LUs anyway).  If your limit is per-LU, you can call
blk_queue_max_hw_sectors from ->slave_configure.

Brian King Jan. 24, 2024, 10:46 p.m. UTC | #7

On 1/24/24 3:24 AM, Christoph Hellwig wrote:
> We can't change the host-wide limit here (it wouldn't apply to all
> LUs anyway).  If your limit is per-LU, you can call
> blk_queue_max_hw_sectors from ->slave_configure.

Unfortunately, it doesn't look like slave_configure gets called in the
scenario in question. In this case we already have a scsi_device created but
its in devloss state and the FC transport layer is bringing it back online.

There is also the point that Mike brought up in that if fast fail tmo
has not yet fired, there could be I/O still in the queue that is now
too large. 

To answer your earlier question, Mike, if the VIOS receives a request that
is too large it closes the CRQ, forcing an entire reinit / discovery,
so its definitely not something we want to encounter. I'm trying to get this
behavior improved so that only the one command fails, but that's not what
happens today.

I suppose I could iterate through all the LUNs and call blk_queue_max_hw_sectors
on them, but I'm not sure if that solves the problem. It would close the window
that Mike highlighted, but if there are commands outstanding when this occurs
that are larger than the new max_hw_sectors and they get requeued, will they
get split in the block layer when they get resent to the LLD or will they
just get resent as-is? If its the latter, I'd get a request larger than
what I can support.

-Brian

scsi: Update max_hw_sectors on rescan

Commit Message

Comments

Patch