Message ID | 20240214221411.2888112-1-martin.petersen@oracle.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [v2] scsi: core: Consult supported VPD page list prior to fetching page | expand |
On Wed, Feb 14, 2024 at 2:14 PM Martin K. Petersen <martin.petersen@oracle.com> wrote: > > Commit c92a6b5d6335 ("scsi: core: Query VPD size before getting full > page") removed the logic which checks whether a VPD page is present on > the supported pages list before asking for the page itself. That was > done because SPC helpfully states "The Supported VPD Pages VPD page > list may or may not include all the VPD pages that are able to be > returned by the device server". Testing had revealed a few devices > that supported some of the 0xBn pages but didn't actually list them in > page 0. > > Julian Sikorski bisected a problem with his drive resetting during > discovery to the commit above. As it turns out, this particular drive > firmware will crash if we attempt to fetch page 0xB9. > > Various approaches were attempted to work around this. In the end, > reinstating the logic that consults VPD page 0 before fetching any > other page was the path of least resistance. A firmware update for the > devices which originally compelled us to remove the check has since > been released. > > Cc: stable@vger.kernel.org > Cc: Bart Van Assche <bvanassche@acm.org> > Fixes: c92a6b5d6335 ("scsi: core: Query VPD size before getting full page") > Reported-by: Julian Sikorski <belegdol@gmail.com> > Tested-by: Julian Sikorski <belegdol@gmail.com> > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > > --- > > v2: Address Bart's comments. > --- > drivers/scsi/scsi.c | 22 ++++++++++++++++++++-- > include/scsi/scsi_device.h | 4 ---- > 2 files changed, 20 insertions(+), 6 deletions(-) > > diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c > index 76d369343c7a..8cad9792a562 100644 > --- a/drivers/scsi/scsi.c > +++ b/drivers/scsi/scsi.c > @@ -328,21 +328,39 @@ static int scsi_vpd_inquiry(struct scsi_device *sdev, unsigned char *buffer, > return result + 4; > } > > +enum scsi_vpd_parameters { > + SCSI_VPD_HEADER_SIZE = 4, > + SCSI_VPD_LIST_SIZE = 36, > +}; > + > static int scsi_get_vpd_size(struct scsi_device *sdev, u8 page) > { > - unsigned char vpd_header[SCSI_VPD_HEADER_SIZE] __aligned(4); > + unsigned char vpd[SCSI_VPD_LIST_SIZE] __aligned(4); > int result; > > if (sdev->no_vpd_size) > return SCSI_DEFAULT_VPD_LEN; > > + /* > + * Fetch the supported pages VPD and validate that the requested page > + * number is present. > + */ > + if (page != 0) { > + result = scsi_vpd_inquiry(sdev, vpd, 0, sizeof(vpd)); > + if (result < SCSI_VPD_HEADER_SIZE) > + return 0; > + > + result -= SCSI_VPD_HEADER_SIZE; > + if (!memchr(&vpd[SCSI_VPD_HEADER_SIZE], page, result)) > + return 0; > + } > /* > * Fetch the VPD page header to find out how big the page > * is. This is done to prevent problems on legacy devices > * which can not handle allocation lengths as large as > * potentially requested by the caller. > */ > - result = scsi_vpd_inquiry(sdev, vpd_header, page, sizeof(vpd_header)); > + result = scsi_vpd_inquiry(sdev, vpd, page, SCSI_VPD_HEADER_SIZE); > if (result < 0) > return 0; > > diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h > index cb019c80763b..72a6b3923fc7 100644 > --- a/include/scsi/scsi_device.h > +++ b/include/scsi/scsi_device.h > @@ -100,10 +100,6 @@ struct scsi_vpd { > unsigned char data[]; > }; > > -enum scsi_vpd_parameters { > - SCSI_VPD_HEADER_SIZE = 4, > -}; > - > struct scsi_device { > struct Scsi_Host *host; > struct request_queue *request_queue; > -- > 2.42.1 > > Reviewed-by: Lee Duncan <lee.duncan@suse.com>
On 2/14/24 14:14, Martin K. Petersen wrote: > Commit c92a6b5d6335 ("scsi: core: Query VPD size before getting full > page") removed the logic which checks whether a VPD page is present on > the supported pages list before asking for the page itself. That was > done because SPC helpfully states "The Supported VPD Pages VPD page > list may or may not include all the VPD pages that are able to be > returned by the device server". Testing had revealed a few devices > that supported some of the 0xBn pages but didn't actually list them in > page 0. > > Julian Sikorski bisected a problem with his drive resetting during > discovery to the commit above. As it turns out, this particular drive > firmware will crash if we attempt to fetch page 0xB9. > > Various approaches were attempted to work around this. In the end, > reinstating the logic that consults VPD page 0 before fetching any > other page was the path of least resistance. A firmware update for the > devices which originally compelled us to remove the check has since > been released. > > Cc: stable@vger.kernel.org > Cc: Bart Van Assche <bvanassche@acm.org> > Fixes: c92a6b5d6335 ("scsi: core: Query VPD size before getting full page") > Reported-by: Julian Sikorski <belegdol@gmail.com> > Tested-by: Julian Sikorski <belegdol@gmail.com> > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> BTW, here is another report related to this patch: https://lore.kernel.org/linux-scsi/64phxapjp742qob7gr74o2tnnkaic6wmxgfa3uxn33ukrwumbi@cfd6kmix3bbm/ Vitaly, can you help with testing this patch? See also: https://lore.kernel.org/linux-scsi/20240214221411.2888112-1-martin.petersen@oracle.com/
Bart, On Thu, Feb 15, 2024 at 10:28:06AM -0800, Bart Van Assche wrote: > On 2/14/24 14:14, Martin K. Petersen wrote: > > Commit c92a6b5d6335 ("scsi: core: Query VPD size before getting full > > page") removed the logic which checks whether a VPD page is present on > > the supported pages list before asking for the page itself. That was > > done because SPC helpfully states "The Supported VPD Pages VPD page > > list may or may not include all the VPD pages that are able to be > > returned by the device server". Testing had revealed a few devices > > that supported some of the 0xBn pages but didn't actually list them in > > page 0. > > > > Julian Sikorski bisected a problem with his drive resetting during > > discovery to the commit above. As it turns out, this particular drive > > firmware will crash if we attempt to fetch page 0xB9. > > > > Various approaches were attempted to work around this. In the end, > > reinstating the logic that consults VPD page 0 before fetching any > > other page was the path of least resistance. A firmware update for the > > devices which originally compelled us to remove the check has since > > been released. > > > > Cc: stable@vger.kernel.org > > Cc: Bart Van Assche <bvanassche@acm.org> > > Fixes: c92a6b5d6335 ("scsi: core: Query VPD size before getting full page") > > Reported-by: Julian Sikorski <belegdol@gmail.com> > > Tested-by: Julian Sikorski <belegdol@gmail.com> > > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > > Reviewed-by: Bart Van Assche <bvanassche@acm.org> > > BTW, here is another report related to this patch: > https://lore.kernel.org/linux-scsi/64phxapjp742qob7gr74o2tnnkaic6wmxgfa3uxn33ukrwumbi@cfd6kmix3bbm/ > > Vitaly, can you help with testing this patch? See also: > https://lore.kernel.org/linux-scsi/20240214221411.2888112-1-martin.petersen@oracle.com/ With this patch applied over 6.6.16 problem still persists. Also reading page 0x89 (mentioned in patch description) does not show any signs of crash (any errors in dmesg with it of with the following reads): [root@host-226 ~]# sg_vpd -p 0x89 /dev/sdc ATA information VPD page: SAT Vendor identification: LSI SAT Product identification: LSI SATL SAT Product revision level: 0008 Device signature indicates SATA transport Command code: 0xec ATA command IDENTIFY DEVICE response summary: model: WDC WD2005FBYZ-01YCBB2 serial number: WD-WMC6N0J5VKLH firmware revision: RR07 Thanks,
Vitaly,
> With this patch applied over 6.6.16 problem still persists.
I figured. Couldn't understand why any of these changes would lead to
the symptoms you are experiencing.
I think my original patch just exposed a timing issue. Since your device
has a SATL, we should never be sending a WRITE SAME command in the first
place.
I have another patch in the pipeline which, in combination with
Christoph's atomic queue limit update series, should close this gap.
I'll reach out when it's ready for testing.
Thanks for your help!
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 76d369343c7a..8cad9792a562 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -328,21 +328,39 @@ static int scsi_vpd_inquiry(struct scsi_device *sdev, unsigned char *buffer, return result + 4; } +enum scsi_vpd_parameters { + SCSI_VPD_HEADER_SIZE = 4, + SCSI_VPD_LIST_SIZE = 36, +}; + static int scsi_get_vpd_size(struct scsi_device *sdev, u8 page) { - unsigned char vpd_header[SCSI_VPD_HEADER_SIZE] __aligned(4); + unsigned char vpd[SCSI_VPD_LIST_SIZE] __aligned(4); int result; if (sdev->no_vpd_size) return SCSI_DEFAULT_VPD_LEN; + /* + * Fetch the supported pages VPD and validate that the requested page + * number is present. + */ + if (page != 0) { + result = scsi_vpd_inquiry(sdev, vpd, 0, sizeof(vpd)); + if (result < SCSI_VPD_HEADER_SIZE) + return 0; + + result -= SCSI_VPD_HEADER_SIZE; + if (!memchr(&vpd[SCSI_VPD_HEADER_SIZE], page, result)) + return 0; + } /* * Fetch the VPD page header to find out how big the page * is. This is done to prevent problems on legacy devices * which can not handle allocation lengths as large as * potentially requested by the caller. */ - result = scsi_vpd_inquiry(sdev, vpd_header, page, sizeof(vpd_header)); + result = scsi_vpd_inquiry(sdev, vpd, page, SCSI_VPD_HEADER_SIZE); if (result < 0) return 0; diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index cb019c80763b..72a6b3923fc7 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -100,10 +100,6 @@ struct scsi_vpd { unsigned char data[]; }; -enum scsi_vpd_parameters { - SCSI_VPD_HEADER_SIZE = 4, -}; - struct scsi_device { struct Scsi_Host *host; struct request_queue *request_queue;