Message ID | 20190214221558.09174756@endymion (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | sd: disable logical block provisioning if 'lbpme' is not set | expand |
On Thu, 2019-02-14 at 22:15 +0100, Jean Delvare wrote: > From: Hannes Reinecke <hare@suse.com> > > When evaluating the 'block limits' VPD page we need to check if > the 'lbpme' (logical block provisioning management enable) bit > is set in the READ CAPACITY (16) output. > If it isn't we can safely assume that we cannot use DISCARD on > this device. > > [JD: forward-ported to kernel v4.20] > > Signed-off-by: Hannes Reinecke <hare@suse.com> > Signed-off-by: Jean Delvare <jdelvare@suse.com> > --- > Hannes, please double-check that my forward-port is correct. > > drivers/scsi/sd.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -411,6 +411,13 @@ provisioning_mode_store(struct device *d > if (mode < 0) > return -EINVAL; > > + /* > + * If logical block provisioning isn't enabled we can only > + * select 'disable' here. > + */ > + if (!sdkp->lbpme && mode != SD_LBP_DISABLE) > + return -EINVAL; > + > sd_config_discard(sdkp, mode); > > return count; > @@ -2942,8 +2949,10 @@ static void sd_read_block_limits(struct > > sdkp->max_ws_blocks = (u32)get_unaligned_be64(&buffer[36]); > > - if (!sdkp->lbpme) > + if (!sdkp->lbpme) { > + sd_config_discard(sdkp, SD_LBP_DISABLE); > goto out; > + } > > lba_count = get_unaligned_be32(&buffer[20]); > desc_count = get_unaligned_be32(&buffer[24]); What is the impact of this patch on SATA SSDs? Since these SSDs do not support logical provisioning, does this patch break trim support for these SSDs? Bart.
Jean, > When evaluating the 'block limits' VPD page we need to check if the > 'lbpme' (logical block provisioning management enable) bit is set in > the READ CAPACITY (16) output. If it isn't we can safely assume that > we cannot use DISCARD on this device. That's not entirely correct. We still support devices which predate logical block provisioning getting standardized in SBC. These devices are typically driven by the zeroing method and don't have LBPME, nor the LBP VPD page to key off of. Instead they are triggered by setting the provisioning_mode via udev. I am not sure how many of these are still around, I am hoping very few. But I'd prefer to not break anything. Can you describe the specific problem your patch is aiming to address?
Hi Bart, Sorry to resend this, my previous reply had a stray backslash in the headers (no idea why) that confused certain MTA, so at least Hannes did not receive it. I take benefit of this resend to update with the latest information, so keep reading. On Fri, 15 Feb 2019 07:56:11 -0800, Bart Van Assche wrote: > On Thu, 2019-02-14 at 22:15 +0100, Jean Delvare wrote: > > From: Hannes Reinecke <hare@suse.com> > > > > When evaluating the 'block limits' VPD page we need to check if > > the 'lbpme' (logical block provisioning management enable) bit > > is set in the READ CAPACITY (16) output. > > If it isn't we can safely assume that we cannot use DISCARD on > > this device. > > > > [JD: forward-ported to kernel v4.20] > > > > Signed-off-by: Hannes Reinecke <hare@suse.com> > > Signed-off-by: Jean Delvare <jdelvare@suse.com> > > --- > > Hannes, please double-check that my forward-port is correct. > > > > drivers/scsi/sd.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > --- a/drivers/scsi/sd.c > > +++ b/drivers/scsi/sd.c > > @@ -411,6 +411,13 @@ provisioning_mode_store(struct device *d > > if (mode < 0) > > return -EINVAL; > > > > + /* > > + * If logical block provisioning isn't enabled we can only > > + * select 'disable' here. > > + */ > > + if (!sdkp->lbpme && mode != SD_LBP_DISABLE) > > + return -EINVAL; > > + > > sd_config_discard(sdkp, mode); > > > > return count; > > @@ -2942,8 +2949,10 @@ static void sd_read_block_limits(struct > > > > sdkp->max_ws_blocks = (u32)get_unaligned_be64(&buffer[36]); > > > > - if (!sdkp->lbpme) > > + if (!sdkp->lbpme) { > > + sd_config_discard(sdkp, SD_LBP_DISABLE); > > goto out; > > + } > > > > lba_count = get_unaligned_be32(&buffer[20]); > > desc_count = get_unaligned_be32(&buffer[24]); > > What is the impact of this patch on SATA SSDs? Since these SSDs do not support > logical provisioning, does this patch break trim support for these SSDs? Thanks for your feedback. It should be noted that I am only the messenger here, not the author of the patch. I am only trying to ensure that upstream is not missing a fix that we have in our SUSE kernels. My knowledge about the sd driver or storage in general is very thin, but I'll do my best to answer questions and address issues. Any help from the experts will be appreciated. What makes you think that SATA SSDs do not support logical provisioning? I added log messages to the sd driver and booted the instrumented driver on my workstation, which has 1 SATA SSD and 2 SATA HHD. sd_read_block_limits is called 3 times for each (not sure why). For the SSD, sdkp->lbpme=1. For the HDD, sdkp->lbpme=0. So I think the code should do the right thing for SATA SSD? [ 1.351917] scsi 0:0:0:0: Direct-Access ATA Samsung SSD 850 2B6Q PQ: 0 ANSI: 5 [ 1.352975] scsi_device 0:0:0:0: sd_read_block_limits called, sdkp->lbpme=1 [ 1.366916] scsi 1:0:0:0: Direct-Access ATA ST2000VM003-1ET1 SC12 PQ: 0 ANSI: 5 [ 1.367669] scsi_device 1:0:0:0: sd_read_block_limits called, sdkp->lbpme=0 [ 1.391266] scsi 3:0:0:0: Direct-Access ATA ST1000DM003-1ER1 CC45 PQ: 0 ANSI: 5 [ 1.391854] scsi_device 3:0:0:0: sd_read_block_limits called, sdkp->lbpme=0 Of course that's only one sample. Meanwhile I noticed that sdkp->lbpme can be read from the "thin_provisioning" sysfs attribute, without instrumenting the kernel. So I checked on my father's system and wife's system, and both an INTEL SSDSA2CT04 (2011) and SAMSUNG SSD 830 (2013) have sdkp->lbpme=1. So SATA SSD seems to be safe. One thing that worries me more is that SATA HDD have provisioning_mode set to "full" (SD_LBP_FULL) by default, while Hannes' patch only allows writing "disabled" (SD_LBP_DISABLE) to this sysfs attribute file when sdkp->lbpme=0. If SD_LBP_FULL was valid at boot time, then writing that value back should certainly be allowed. Hannes?
On 2/18/19 10:56 AM, Jean Delvare wrote: > Hi Bart, > > Sorry to resend this, my previous reply had a stray backslash in the > headers (no idea why) that confused certain MTA, so at least Hannes did > not receive it. > > I take benefit of this resend to update with the latest information, so > keep reading. > > On Fri, 15 Feb 2019 07:56:11 -0800, Bart Van Assche wrote: >> On Thu, 2019-02-14 at 22:15 +0100, Jean Delvare wrote: >>> From: Hannes Reinecke <hare@suse.com> >>> >>> When evaluating the 'block limits' VPD page we need to check if >>> the 'lbpme' (logical block provisioning management enable) bit >>> is set in the READ CAPACITY (16) output. >>> If it isn't we can safely assume that we cannot use DISCARD on >>> this device. >>> >>> [JD: forward-ported to kernel v4.20] >>> >>> Signed-off-by: Hannes Reinecke <hare@suse.com> >>> Signed-off-by: Jean Delvare <jdelvare@suse.com> >>> --- >>> Hannes, please double-check that my forward-port is correct. >>> >>> drivers/scsi/sd.c | 11 ++++++++++- >>> 1 file changed, 10 insertions(+), 1 deletion(-) >>> >>> --- a/drivers/scsi/sd.c >>> +++ b/drivers/scsi/sd.c >>> @@ -411,6 +411,13 @@ provisioning_mode_store(struct device *d >>> if (mode < 0) >>> return -EINVAL; >>> >>> + /* >>> + * If logical block provisioning isn't enabled we can only >>> + * select 'disable' here. >>> + */ >>> + if (!sdkp->lbpme && mode != SD_LBP_DISABLE) >>> + return -EINVAL; >>> + >>> sd_config_discard(sdkp, mode); >>> >>> return count; >>> @@ -2942,8 +2949,10 @@ static void sd_read_block_limits(struct >>> >>> sdkp->max_ws_blocks = (u32)get_unaligned_be64(&buffer[36]); >>> >>> - if (!sdkp->lbpme) >>> + if (!sdkp->lbpme) { >>> + sd_config_discard(sdkp, SD_LBP_DISABLE); >>> goto out; >>> + } >>> >>> lba_count = get_unaligned_be32(&buffer[20]); >>> desc_count = get_unaligned_be32(&buffer[24]); >> >> What is the impact of this patch on SATA SSDs? Since these SSDs do not support >> logical provisioning, does this patch break trim support for these SSDs? > None. SATA SSDs as provided by libata will always set the 'lbpme' bit if TRIM is supported: if (ata_id_has_trim(args->id) && !(dev->horkage & ATA_HORKAGE_NOTRIM)) { rbuf[14] |= 0x80; /* LBPME */ if (ata_id_has_zero_after_trim(args->id) && dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { ata_dev_info(dev, "Enabling discard_zeroes_data\n"); rbuf[14] |= 0x40; /* LBPRZ */ } } Of course, I cannot speak for other SATL implementations (mpt3sas etc), but they are known to have other issues, too, so I don't really care. Cheers, Hannes
Hi Martin, On Fri, 2019-02-15 at 22:30 -0500, Martin K. Petersen wrote: > When evaluating the 'block limits' VPD page we need to check if the > > 'lbpme' (logical block provisioning management enable) bit is set in > > the READ CAPACITY (16) output. If it isn't we can safely assume that > > we cannot use DISCARD on this device. > > That's not entirely correct. We still support devices which predate > logical block provisioning getting standardized in SBC. These devices > are typically driven by the zeroing method and don't have LBPME, nor the > LBP VPD page to key off of. Instead they are triggered by setting the > provisioning_mode via udev. > > I am not sure how many of these are still around, I am hoping very > few. But I'd prefer to not break anything. So basically the condition "if (!sdkp->lbpme)" should be replaced by something like "if (LBP_VPD_page_is_supported && !sdkp->lbpme)"? Unfortunately it does not look like the driver keeps memory of successfully executing read_capacity_16(). do we need a new flag in sdkp to remember read_capacity_16() success? Or can we just test sdkp->lbpvpd? > Can you describe the specific problem your patch is aiming to address? The problem that was reported to us by several customers and which this patch attempts to address is file system corruption. Specifically metadata corruption reported on XFS filesystems by xfs_agf_read_verify(). Investigation revealed that the problematic areas had been zeroed-out at the block device level. This was in Azure VM, using dm-stripe, with an fstrim service running automatically on a weekly basis (default SUSE setting, in case the system has one or more SSD). The details are beyond me to be honest, but my limited understanding is that calling TRIM on virtual block devices for which TRIM makes no sense ended up zeroing storage areas and causing the corruption. Hannes or Jeff may be able to explain it better, hopefully.
On 2/18/19 5:48 AM, Jean Delvare wrote: > Hi Martin, > > On Fri, 2019-02-15 at 22:30 -0500, Martin K. Petersen wrote: >> When evaluating the 'block limits' VPD page we need to check if the >>> 'lbpme' (logical block provisioning management enable) bit is set in >>> the READ CAPACITY (16) output. If it isn't we can safely assume that >>> we cannot use DISCARD on this device. >> >> That's not entirely correct. We still support devices which predate >> logical block provisioning getting standardized in SBC. These devices >> are typically driven by the zeroing method and don't have LBPME, nor the >> LBP VPD page to key off of. Instead they are triggered by setting the >> provisioning_mode via udev. >> >> I am not sure how many of these are still around, I am hoping very >> few. But I'd prefer to not break anything. > > So basically the condition "if (!sdkp->lbpme)" should be replaced by > something like "if (LBP_VPD_page_is_supported && !sdkp->lbpme)"? > > Unfortunately it does not look like the driver keeps memory of > successfully executing read_capacity_16(). do we need a new flag in > sdkp to remember read_capacity_16() success? Or can we just test > sdkp->lbpvpd? > >> Can you describe the specific problem your patch is aiming to address? > > The problem that was reported to us by several customers and which this > patch attempts to address is file system corruption. Specifically > metadata corruption reported on XFS filesystems by > xfs_agf_read_verify(). Investigation revealed that the problematic > areas had been zeroed-out at the block device level. This was in Azure > VM, using dm-stripe, with an fstrim service running automatically on a > weekly basis (default SUSE setting, in case the system has one or more > SSD). > > The details are beyond me to be honest, but my limited understanding is > that calling TRIM on virtual block devices for which TRIM makes no > sense ended up zeroing storage areas and causing the corruption. Hannes > or Jeff may be able to explain it better, hopefully. > That's more or less how the problem presents itself, but it's not that we issue an fstrim on a block and it gets zeroed. That would be fine from the file system perspective. It's that the fstrim is issued and unrelated blocks are zeroed. Once I saw the storage layer was responsible, I stopped digging and passed it to Hannes. -Jeff
On Tue, 2019-02-19 at 10:07 -0500, Jeff Mahoney wrote: > That's more or less how the problem presents itself, but it's not that > we issue an fstrim on a block and it gets zeroed. That would be fine > from the file system perspective. It's that the fstrim is issued and > unrelated blocks are zeroed. Once I saw the storage layer was > responsible, I stopped digging and passed it to Hannes. How can issuing an fstrim cause unrelated blocks to be zeroed? I think the root cause needs to be identified before we continue with this patch. Thanks, Bart.
Jean, > So basically the condition "if (!sdkp->lbpme)" should be replaced by > something like "if (LBP_VPD_page_is_supported && !sdkp->lbpme)"? We can not depend on the VPD page here. It was added to the spec long after the LBPME flag.
Jeff, > That's more or less how the problem presents itself, but it's not that > we issue an fstrim on a block and it gets zeroed. That would be fine > from the file system perspective. It's that the fstrim is issued and > unrelated blocks are zeroed. Once I saw the storage layer was > responsible, I stopped digging and passed it to Hannes. --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -411,6 +411,13 @@ provisioning_mode_store(struct device *d if (mode < 0) return -EINVAL; + /* + * If logical block provisioning isn't enabled we can only + * select 'disable' here. + */ + if (!sdkp->lbpme && mode != SD_LBP_DISABLE) + return -EINVAL; + sd_config_discard(sdkp, mode); return count; This first hunk disables the ability for the user to set the provisioning_mode to SD_LBP_ZERO. Zero mode is only triggered through sysfs and is never set by the kernel since there is no way we can detect if a device is capable. Zero mode predates any of the thin provisioning knobs being added to the SCSI protocol. This mode is how enterprise arrays used to work, and is akin to zero block detection on modern SSDs. @@ -2942,8 +2949,10 @@ static void sd_read_block_limits(struct sdkp->max_ws_blocks = (u32)get_unaligned_be64(&buffer[36]); - if (!sdkp->lbpme) + if (!sdkp->lbpme) { + sd_config_discard(sdkp, SD_LBP_DISABLE); goto out; + } lba_count = get_unaligned_be32(&buffer[20]); desc_count = get_unaligned_be32(&buffer[24]); The only place we ever set lbpme (and thus enable discard) is if the device sets the LBPME field in READ CAPACITY(16). So there should not be any way for this code to do anything. If sdkp->lbpme is unset, discard is already disabled (unless the device temporarily reports lbpme and then switches it off!?). Under the assumption that this patch actually fixed the problem, it smells like there was a udev or user script at play here that twiddled the provisioning mode in sysfs. I concur that the discard detection logic is gnarly because we support several iterations of what turned out to be a constantly moving target in the SCSI spec. But I don't see how the kernel would go down the path of automatically enabling discards unless the device reported LBPME in the first place. Definitely need more data.
Bart, > How can issuing an fstrim cause unrelated blocks to be zeroed? I think > the root cause needs to be identified before we continue with this > patch. Absolutely!
--- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -411,6 +411,13 @@ provisioning_mode_store(struct device *d if (mode < 0) return -EINVAL; + /* + * If logical block provisioning isn't enabled we can only + * select 'disable' here. + */ + if (!sdkp->lbpme && mode != SD_LBP_DISABLE) + return -EINVAL; + sd_config_discard(sdkp, mode); return count; @@ -2942,8 +2949,10 @@ static void sd_read_block_limits(struct sdkp->max_ws_blocks = (u32)get_unaligned_be64(&buffer[36]); - if (!sdkp->lbpme) + if (!sdkp->lbpme) { + sd_config_discard(sdkp, SD_LBP_DISABLE); goto out; + } lba_count = get_unaligned_be32(&buffer[20]); desc_count = get_unaligned_be32(&buffer[24]);