Message ID | 20220302053559.32147-3-martin.petersen@oracle.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [01/14] scsi: mpt3sas: Use cached ATA Information VPD page | expand |
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
Looks good, Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Two minior nits below: > + result = scsi_vpd_inquiry(sdev, vpd_header, page, sizeof(vpd_header)); > + > + if (result < 0) > + return 0; can we remove the empty line? > + memset(buf, 0, buf_len); > + result = scsi_vpd_inquiry(sdev, buf, page, vpd_len); > if (result < 0) > - goto fail; > + return -EINVAL; > + else if (result > vpd_len) { > + dev_warn_once(&sdev->sdev_gendev, > + "%s: VPD page 0x%02x result %d > %d bytes\n", > + __func__, page, result, vpd_len); > + vpd_len = min(result, buf_len); > + goto retry_pg; > + } > Adding {} to the if block as well makes it look nicer IMHO
On 3/1/22 21:35, Martin K. Petersen wrote: > +static int scsi_get_vpd_size(struct scsi_device *sdev, u8 page) > +{ > + unsigned char vpd_header[SCSI_VPD_HEADER_SIZE]; > + int result; The SCSI core sets the minimum DMA alignment to 4 bytes. How about aligning the vpd_header[] array explicitly to a four-byte boundary such that the block layer does not have to allocate a temporary buffer? > + vpd_len = min(vpd_len, buf_len); > > - found: > - result = scsi_vpd_inquiry(sdev, buf, page, buf_len); > +retry_pg: > + /* > + * Fetch the actual page. Since the appropriate size was reported > + * by the device it is now safe to ask for something bigger. > + */ > + memset(buf, 0, buf_len); > + result = scsi_vpd_inquiry(sdev, buf, page, vpd_len); > if (result < 0) > - goto fail; > + return -EINVAL; > + else if (result > vpd_len) { > + dev_warn_once(&sdev->sdev_gendev, > + "%s: VPD page 0x%02x result %d > %d bytes\n", > + __func__, page, result, vpd_len); > + vpd_len = min(result, buf_len); > + goto retry_pg; > + } Will an endless loop be triggered if the VPD page length is larger than 'buf_len'? Thanks, Bart.
Bart, >> + vpd_len = min(vpd_len, buf_len); >> - found: >> - result = scsi_vpd_inquiry(sdev, buf, page, buf_len); >> +retry_pg: >> + /* >> + * Fetch the actual page. Since the appropriate size was reported >> + * by the device it is now safe to ask for something bigger. >> + */ >> + memset(buf, 0, buf_len); >> + result = scsi_vpd_inquiry(sdev, buf, page, vpd_len); >> if (result < 0) >> - goto fail; >> + return -EINVAL; >> + else if (result > vpd_len) { >> + dev_warn_once(&sdev->sdev_gendev, >> + "%s: VPD page 0x%02x result %d > %d bytes\n", >> + __func__, page, result, vpd_len); >> + vpd_len = min(result, buf_len); >> + goto retry_pg; >> + } > > Will an endless loop be triggered if the VPD page length is larger > than 'buf_len'? Ah, transplant thinko from scsi_get_vpd_buf() which reallocates the buffer on mismatch. Will fix. Thanks!
Johannes, > Two minior nits below: > >> + result = scsi_vpd_inquiry(sdev, vpd_header, page, sizeof(vpd_header)); >> + >> + if (result < 0) >> + return 0; > > can we remove the empty line? Sure! >> + memset(buf, 0, buf_len); >> + result = scsi_vpd_inquiry(sdev, buf, page, vpd_len); >> if (result < 0) >> - goto fail; >> + return -EINVAL; >> + else if (result > vpd_len) { >> + dev_warn_once(&sdev->sdev_gendev, >> + "%s: VPD page 0x%02x result %d > %d bytes\n", >> + __func__, page, result, vpd_len); >> + vpd_len = min(result, buf_len); >> + goto retry_pg; >> + } >> > > Adding {} to the if block as well makes it look nicer IMHO Roger.
Hi! On a customer system, we found that after commit c92a6b5d63359dd6d2ce6ea88ecd8e31dd769f6b (scsi: core: Query VPD size before getting full page) `qemu-img convert` writing to a certain HDD would be very slow when zero blocks are involved. What the commit seems to have changed is make it possible to issue `fallocate` calls with "FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE" flags. Without the `fallocate` support, it would use an alternative method to write out zeroes that is considerably faster. The question is if allowing `fallocate` in this case is an intentional effect of commit c92a6b5d? The target is an HDD behind a MegaRAID controller configured as JBOD. We see the same behavior if LVM is used on top of the disk. These calls are rather slow on this system, especially when the offset is not aligned to 4k. 2 MiB fallocate 4k aligned: --- root@pve1:~# time fallocate --punch-hole --keep-size -l 2M /dev/sda --offset 4M real 0m0.030s user 0m0.001s sys 0m0.000s --- 2 MiB fallocate at 4 MiB - 512 byte: --- root@pve1:~# time fallocate --punch-hole --keep-size -l 2M /dev/sda --offset 4294966784 real 0m0.707s user 0m0.000s sys 0m0.001s --- In our case, `qemu-img convert` introduces a misalignment of -512 bytes every 2 GiB because it handles up to INT_MAX (2^31-1) 512 byte sectors at a time. Improving the alignment in `qemu-img convert` is another approach we are considering. During boot (no matter which of the tested kernel versions) we see the following log line: --- Jun 03 15:28:32 pve1 kernel: pci 0000:01:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) --- That PCI address is the controller to which the disk is connected: --- root@pve1:~# lspci -nnk -s 01:00.0 01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] [1000:005f] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] [1000:9343] Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas --- The system in question is using a MegaRAID 9341-4i controller with the following firmware versions: FW Package Build = 24.21.0-0159 BIOS Version = 6.36.00.3_4.19.08.00_0x06180206 FW Version = 4.680.01-8576 A test with kernel 6.10-rc2 did show that fallocate support is still detected. I found the following other bug report that looks similar [0] from February. But the mentioned patch [1] (if this is the correct one) seems to be already included in recent kernel versions. The sg_vpd info, as was requested in [0] for this system follows. [2] We checked the output of `sg_vpd -a /dev/sda` and the output does not differ between a kernel that allows fallocate and one that doesn't. If you need additional information or tests, please let me know. I have access to the affected system. Thanks and kind regards, Aaron [0] https://lore.kernel.org/all/20240210011831.47f55oe67utq2yr7@altlinux.org/T/#mb37c7d063c07d356d11111231fac4c048d7678e5 [1] https://lore.kernel.org/linux-scsi/20240214221411.2888112-1-martin.petersen@oracle.com/ [2] root@pve1:~# sg_readcap -l /dev/sda Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last LBA=7814037167 (0x1d1c0beaf), Number of logical blocks=7814037168 Logical block length=512 bytes Logical blocks per physical block exponent=3 [so physical block length=4096 bytes] Lowest aligned LBA=0 Hence: Device size: 4000787030016 bytes, 3815447.8 MiB, 4000.79 GB, 4.00 TB root@pve1:~# sg_vpd -l /dev/sda Supported VPD pages VPD page: [PQual=0 Peripheral device type: disk] 0x00 Supported VPD pages [sv] 0x80 Unit serial number [sn] 0x83 Device identification [di] 0x87 Mode page policy [mpp] 0x89 ATA information (SAT) [ai] 0x8a Power condition [pc] 0xb0 Block limits (SBC) [bl] 0xb1 Block device characteristics (SBC) [bdc] 0xb2 Logical block provisioning (SBC) [lbpv] root@pve1:~# sg_vpd -p 0xb0 /dev/sda Block limits VPD page (SBC): Write same non-zero (WSNZ): 0 Maximum compare and write length: 0 blocks [Command not implemented] Optimal transfer length granularity: 0 blocks [not reported] Maximum transfer length: 0 blocks [not reported] Optimal transfer length: 0 blocks [not reported] Maximum prefetch transfer length: 0 blocks [ignored] Maximum unmap LBA count: 0 [Unmap command not implemented] Maximum unmap block descriptor count: 0 [Unmap command not implemented] Optimal unmap granularity: 0 blocks [not reported] Unmap granularity alignment valid: false Unmap granularity alignment: 0 [invalid] Maximum write same length: 0 blocks [not reported] Maximum atomic transfer length: 0 blocks [not reported] Atomic alignment: 0 [unaligned atomic writes permitted] Atomic transfer length granularity: 0 [no granularity requirement Maximum atomic transfer length with atomic boundary: 0 blocks [not reported] Maximum atomic boundary size: 0 blocks [can only write atomic 1 block] root@pve1:~# sg_vpd -p 0xb1 /dev/sda Block device characteristics VPD page (SBC): Nominal rotation rate: 7200 rpm Product type: Not specified WABEREQ=0 WACEREQ=0 Nominal form factor: 3.5 inch ZONED=0 RBWZ=0 BOCS=0 FUAB=0 VBULS=0 DEPOPULATION_TIME=0 (seconds) root@pve1:~# sg_vpd -p 0xb2 /dev/sda Logical block provisioning VPD page (SBC): Unmap command supported (LBPU): 0 Write same (16) with unmap bit supported (LBPWS): 0 Write same (10) with unmap bit supported (LBPWS10): 0 Logical block provisioning read zeros (LBPRZ): 0 Anchored LBAs supported (ANC_SUP): 0 Threshold exponent: 0 [threshold sets not supported] Descriptor present (DP): 0 Minimum percentage: 0 [not reported] Provisioning type: 0 (not known or fully provisioned) Threshold percentage: 0 [percentages not supported] On 2022-03-02 06:35, Martin K. Petersen wrote: > We currently default to 255 bytes when fetching VPD pages during discovery. > However, we have had a few devices that are known to wedge if the requested > buffer exceeds a certain size. See commit af73623f5f10 ("[SCSI] sd: Reduce > buffer size for vpd request") which works around one example of this > problem in the SCSI disk driver. > > With commit d188b0675b21 ("scsi: core: Add sysfs attributes for VPD pages > 0h and 89h") we now risk triggering the same issue in the generic midlayer > code. > > The problem with the ATA VPD page in particular is that the SCSI portion of > the page is trailed by 512 bytes of verbatim ATA Identify Device > information. However, not all controllers actually provide the additional > 512 bytes and will lock up if one asks for more than the 64 bytes > containing the SCSI protocol fields. > > Instead of picking a new, somewhat arbitrary, number of bytes for the VPD > buffer size, start fetching the 4-byte header for each page. The header > contains the size of the page as far as the device is concerned. We can use > the reported size to specify the correct allocation length when > subsequently fetching the full page. > > The header validation is done by a new helper function scsi_get_vpd_size() > and both scsi_get_vpd_page() and scsi_get_vpd_buf() now rely on this to > query the page size. > > In addition, scsi_get_vpd_page() is simplified to mirror the logic in > scsi_get_vpd_page(). This involves removing the Supported VPD Pages lookup > prior to attempting to query a page. There does not appear any evidence, > even in the oldest SCSI specs, that this step is required. We already rely > on scsi_get_vpd_page() throughout the stack and this function never > consulted the Supported VPD Pages. Since this has not caused any problems > it should be safe to remove the precondition from scsi_get_vpd_page(). > > Instrumented runs also revealed that the Supported VPD Pages lookup had > little effect since the device page index often was larger than the > supplied buffer size. As a result, inquiries frequently bypassed the index > check and went through the "If we ran off the end of the buffer, give us > the benefit of the doubt" code path which assumed the page was present > despite not being listed. The revised code takes both the page size > reported by the device as well as the size of the buffer provided by the > scsi_get_vpd_page() caller into account. > > Fixes: d188b0675b21 ("scsi: core: Add sysfs attributes for VPD pages 0h and 89h") > Reported-by: Maciej W. Rozycki <macro@orcam.me.uk> > Tested-by: Maciej W. Rozycki <macro@orcam.me.uk> > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Aaron, > The target is an HDD behind a MegaRAID controller configured as JBOD. Running in JBOD mode is probably part of the problem. > root@pve1:~# sg_vpd -l /dev/sda > Supported VPD pages VPD page: > [PQual=0 Peripheral device type: disk] > 0x00 Supported VPD pages [sv] > 0x80 Unit serial number [sn] > 0x83 Device identification [di] > 0x87 Mode page policy [mpp] > 0x89 ATA information (SAT) [ai] > 0x8a Power condition [pc] > 0xb0 Block limits (SBC) [bl] > 0xb1 Block device characteristics (SBC) [bdc] > 0xb2 Logical block provisioning (SBC) [lbpv] I am working on a fix for what's probably a related issue. I would appreciate if you could do two things: 1. Please send me the output of: # sg_opcodes /dev/sda 2. Try building and booting: https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/log/?h=6.10/scsi-fixes and see if that makes a difference in your case. Thanks! Martin
Hi Martin, thanks for getting back to me. On 2024-06-05 04:51, Martin K. Petersen wrote: > > Aaron, > >> The target is an HDD behind a MegaRAID controller configured as JBOD. > > Running in JBOD mode is probably part of the problem. > >> root@pve1:~# sg_vpd -l /dev/sda >> Supported VPD pages VPD page: >> [PQual=0 Peripheral device type: disk] >> 0x00 Supported VPD pages [sv] >> 0x80 Unit serial number [sn] >> 0x83 Device identification [di] >> 0x87 Mode page policy [mpp] >> 0x89 ATA information (SAT) [ai] >> 0x8a Power condition [pc] >> 0xb0 Block limits (SBC) [bl] >> 0xb1 Block device characteristics (SBC) [bdc] >> 0xb2 Logical block provisioning (SBC) [lbpv] > > I am working on a fix for what's probably a related issue. > > I would appreciate if you could do two things: > > 1. Please send me the output of: > > # sg_opcodes /dev/sda Here you go: --- root@pve1:~# sg_opcodes /dev/sda ATA HGST HUS726T4TAL W984 Peripheral device type: disk Report supported operation codes: Illegal request, Invalid opcode --- I get the same result for the other 2 block devices configured on this RAID controller. One is another JBOD device (an SSD), the other is a RAID-1. > > 2. Try building and booting: > > https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/log/?h=6.10/scsi-fixes > > and see if that makes a difference in your case. I checked out the 6.10/scsi-fixes branch which was on commit d53b681 scsi: ufs: mcq: Fix error output and clean up ufshcd_mcq_abort() to build a new kernel to test. Unfortunately, I must report that running the `fallocate`[0] command still works and does not fail with an unsupported error. If there is more I can do to help, please let me know. Best regards, Aaron [0] fallocate --punch-hole --keep-size -l 2M /dev/sda --offset 4M
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 211aace69c22..af896d0647a7 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -321,6 +321,32 @@ static int scsi_vpd_inquiry(struct scsi_device *sdev, unsigned char *buffer, return get_unaligned_be16(&buffer[2]) + 4; } +static int scsi_get_vpd_size(struct scsi_device *sdev, u8 page) +{ + unsigned char vpd_header[SCSI_VPD_HEADER_SIZE]; + int result; + + /* + * Fetch the VPD page header to find out how big the page + * is. This is done to prevent problems on legacy devices + * which can not handle allocation lengths as large as + * potentially requested by the caller. + */ + result = scsi_vpd_inquiry(sdev, vpd_header, page, sizeof(vpd_header)); + + if (result < 0) + return 0; + + if (result < SCSI_VPD_HEADER_SIZE) { + dev_warn_once(&sdev->sdev_gendev, + "%s: short VPD page 0x%02x length: %d bytes\n", + __func__, page, result); + return 0; + } + + return result; +} + /** * scsi_get_vpd_page - Get Vital Product Data from a SCSI device * @sdev: The device to ask @@ -330,47 +356,42 @@ static int scsi_vpd_inquiry(struct scsi_device *sdev, unsigned char *buffer, * * SCSI devices may optionally supply Vital Product Data. Each 'page' * of VPD is defined in the appropriate SCSI document (eg SPC, SBC). - * If the device supports this VPD page, this routine returns a pointer - * to a buffer containing the data from that page. The caller is - * responsible for calling kfree() on this pointer when it is no longer - * needed. If we cannot retrieve the VPD page this routine returns %NULL. + * If the device supports this VPD page, this routine fills @buf + * with the data from that page and return 0. If the VPD page is not + * supported or its content cannot be retrieved, -EINVAL is returned. */ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, unsigned char *buf, int buf_len) { - int i, result; + int result, vpd_len; - if (sdev->skip_vpd_pages) - goto fail; - - /* Ask for all the pages supported by this device */ - result = scsi_vpd_inquiry(sdev, buf, 0, buf_len); - if (result < 4) - goto fail; - - /* If the user actually wanted this page, we can skip the rest */ - if (page == 0) - return 0; + if (!scsi_device_supports_vpd(sdev)) + return -EINVAL; - for (i = 4; i < min(result, buf_len); i++) - if (buf[i] == page) - goto found; + vpd_len = scsi_get_vpd_size(sdev, page); + if (vpd_len <= 0) + return -EINVAL; - if (i < result && i >= buf_len) - /* ran off the end of the buffer, give us benefit of doubt */ - goto found; - /* The device claims it doesn't support the requested page */ - goto fail; + vpd_len = min(vpd_len, buf_len); - found: - result = scsi_vpd_inquiry(sdev, buf, page, buf_len); +retry_pg: + /* + * Fetch the actual page. Since the appropriate size was reported + * by the device it is now safe to ask for something bigger. + */ + memset(buf, 0, buf_len); + result = scsi_vpd_inquiry(sdev, buf, page, vpd_len); if (result < 0) - goto fail; + return -EINVAL; + else if (result > vpd_len) { + dev_warn_once(&sdev->sdev_gendev, + "%s: VPD page 0x%02x result %d > %d bytes\n", + __func__, page, result, vpd_len); + vpd_len = min(result, buf_len); + goto retry_pg; + } return 0; - - fail: - return -EINVAL; } EXPORT_SYMBOL_GPL(scsi_get_vpd_page); @@ -384,9 +405,17 @@ EXPORT_SYMBOL_GPL(scsi_get_vpd_page); static struct scsi_vpd *scsi_get_vpd_buf(struct scsi_device *sdev, u8 page) { struct scsi_vpd *vpd_buf; - int vpd_len = SCSI_VPD_PG_LEN, result; + int vpd_len, result; + + vpd_len = scsi_get_vpd_size(sdev, page); + if (vpd_len <= 0) + return NULL; retry_pg: + /* + * Fetch the actual page. Since the appropriate size was reported + * by the device it is now safe to ask for something bigger. + */ vpd_buf = kmalloc(sizeof(*vpd_buf) + vpd_len, GFP_KERNEL); if (!vpd_buf) return NULL; @@ -397,6 +426,9 @@ static struct scsi_vpd *scsi_get_vpd_buf(struct scsi_device *sdev, u8 page) return NULL; } if (result > vpd_len) { + dev_warn_once(&sdev->sdev_gendev, + "%s: VPD page 0x%02x result %d > %d bytes\n", + __func__, page, result, vpd_len); vpd_len = result; kfree(vpd_buf); goto retry_pg; diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 647c53b26105..144d3a801c8d 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -100,6 +100,10 @@ struct scsi_vpd { unsigned char data[]; }; +enum scsi_vpd_parameters { + SCSI_VPD_HEADER_SIZE = 4, +}; + struct scsi_device { struct Scsi_Host *host; struct request_queue *request_queue; @@ -141,7 +145,6 @@ struct scsi_device { const char * model; /* ... after scan; point to static string */ const char * rev; /* ... "nullnullnullnull" before scan */ -#define SCSI_VPD_PG_LEN 255 struct scsi_vpd __rcu *vpd_pg0; struct scsi_vpd __rcu *vpd_pg83; struct scsi_vpd __rcu *vpd_pg80;