Message ID | 20211008043546.6006-1-decui@microsoft.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [v3] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() | expand |
On 08/10/2021 05:35, Dexuan Cui wrote: > After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during > boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int" > value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets > shost->cmd_per_lun to a negative "short" value. > > Use min_t(int, ...) to fix the issue. > > Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue") > Cc:stable@vger.kernel.org > Signed-off-by: Dexuan Cui<decui@microsoft.com> > Reviewed-by: Haiyang Zhang<haiyangz@microsoft.com> > Reviewed-by: Ming Lei<ming.lei@redhat.com> Reviewed-by: John Garry <john.garry@huawei.com> thanks
Dexuan, > After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during > boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int" > value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets > shost->cmd_per_lun to a negative "short" value. > > Use min_t(int, ...) to fix the issue. I queued this up as a short term workaround. However, I am hoping that the rework of the scaling code in storvsc lands soon.
> From: Martin K. Petersen <martin.petersen@oracle.com> > Sent: Tuesday, October 12, 2021 9:42 AM > > Dexuan, > > > After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during > > boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int" > > value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets > > shost->cmd_per_lun to a negative "short" value. > > > > Use min_t(int, ...) to fix the issue. > > I queued this up as a short term workaround. However, I am hoping that > the rework of the scaling code in storvsc lands soon. Thanks, Martin! I know Michael Kelley will improve the netvsc. Regarding this patch, I'm not sure if it's a "workaround": if it's incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value, probably we should change scsi_driver.can_queue from "int" to "u16"? BTW, I guess the "cmd_per_lun" should also be "u16" rather than "short"? This was discussed in May, and it looks like the conclusion was not clear to me: https://lwn.net/ml/linux-kernel/457d23a9-deb0-4ee1-fe7f-5a63605d9686@huawei.com/ Thanks, -- Dexuan
On Thu, 7 Oct 2021 21:35:46 -0700, Dexuan Cui wrote: > After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during > boot because the hv_storvsc driver sets scsi_driver.can_queue to an "int" > value that exceeds SHRT_MAX, and hence scsi_add_host_with_dma() sets > shost->cmd_per_lun to a negative "short" value. > > Use min_t(int, ...) to fix the issue. > > [...] Applied to 5.15/scsi-fixes, thanks! [1/1] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() https://git.kernel.org/mkp/scsi/c/50b6cb351636
Dexuan, > Regarding this patch, I'm not sure if it's a "workaround": if it's > incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value, > probably we should change scsi_driver.can_queue from "int" to "u16"? > BTW, I guess the "cmd_per_lun" should also be "u16" rather than > "short"? I agree that it would be nice to get all this cleaned up. Several, somewhat peculiar, 25-year old design choices. cmd_per_lun has traditionally been in the ballpark of low hundreds, can_queue typically in the low thousands. And the block layer currently caps at ~10K. Happy to take patches fixing this up, although I am a bit worried about how much churn it will generate. That said, I do think that cleaning this up is somewhat orthogonal to the issue with storvsc. I suspect that allowing a huge amount of concurrent outstanding commands is going to be detrimental to performance for most workloads. And from that perspective I think that the short->int fix, while valid given the type discrepancy, is just treating the symptom. Therefore I consider the short->int fix a workaround. And the proper fix involves looking closely at things are scaled in the storvsc case. Which I have noted that Michael is working on.
> From: Martin K. Petersen <martin.petersen@oracle.com> > Sent: Tuesday, October 12, 2021 2:28 PM > To: Dexuan Cui <decui@microsoft.com> > > > Regarding this patch, I'm not sure if it's a "workaround": if it's > > incorrect to set a bigger-than-SHRT_MAX scsi_driver.can_queue value, > > probably we should change scsi_driver.can_queue from "int" to "u16"? > > > BTW, I guess the "cmd_per_lun" should also be "u16" rather than > > "short"? > > I agree that it would be nice to get all this cleaned up. Several, > somewhat peculiar, 25-year old design choices. > > cmd_per_lun has traditionally been in the ballpark of low hundreds, > can_queue typically in the low thousands. And the block layer currently > caps at ~10K. Happy to take patches fixing this up, although I am a bit > worried about how much churn it will generate. Thanks for the explanation! > That said, I do think that cleaning this up is somewhat orthogonal to > the issue with storvsc. I suspect that allowing a huge amount of > concurrent outstanding commands is going to be detrimental to > performance for most workloads. And from that perspective I think that > the short->int fix, while valid given the type discrepancy, is just > treating the symptom. Agreed. > Therefore I consider the short->int fix a workaround. And the proper fix > involves looking closely at things are scaled in the storvsc case. Which > I have noted that Michael is working on. Agreed. My v1 actually tried to work around the storvsc driver instread. :-)
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 3f6f14f0cafb..24b72ee4246f 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -220,7 +220,8 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, goto fail; } - shost->cmd_per_lun = min_t(short, shost->cmd_per_lun, + /* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */ + shost->cmd_per_lun = min_t(int, shost->cmd_per_lun, shost->can_queue); error = scsi_init_sense_cache(shost);