Message ID | yq1wri6zel9.fsf@sermon.lab.mkp.net (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Wed, May 04 2011 at 11:10am -0400, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes: > > Lukas> Nevertheless there is something weird going on, because even when > Lukas> I create striped volume I get this: > > Could you please try the following patch? It has a bunch of small tweaks > to the discard stack in it. I'll split it up before posting for real but > I'd like to know if it fixes your issue... > > > block/libata/scsi: Various logical block provisioning fixes > > - Add sysfs documentation for the discard topology parameters > > - Fix discard stacking problem > > - Switch our libata SAT over to using the WRITE SAME limits > > - UNMAP alignment needs to be converted to bytes > > - Only report alignment and zeroes_data if the device supports discard > > Reported-by: Lukas Czerner <lczerner@redhat.com> > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 1fa7692..42d3bf5 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim) > lim->discard_granularity = 0; > lim->discard_alignment = 0; > lim->discard_misaligned = 0; > - lim->discard_zeroes_data = -1; > + lim->discard_zeroes_data = 1; > lim->logical_block_size = lim->physical_block_size = lim->io_min = 512; > lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT); > lim->alignment_offset = 0; lim->discard_zeroes_data = -1; was suspect to me too. But why default to 1 here? > @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) > > blk_set_default_limits(&q->limits); > blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS); > + q->limits.discard_zeroes_data = 0; > > /* > * by default assume old behaviour and bounce for any highmem page Only to then reset to 0 here? Shouldn't we default to 0 and only set to 1 where applicable (e.g. sd_config_discard)? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: Mike> lim->discard_zeroes_data = -1; was suspect to me too. Mike> But why default to 1 here? Because otherwise DM would default to having dzd to "unsupported", meaning the feature would never be turned on regardless of the bottom device capabilities. The old approach used the -1 value to indicate "has not been set". That was only really intended as a value for the stacking drivers, not for the LLDs. It was a bit of a hack and I'd rather deal with dzd the same way as we do with clustering. >> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue >> *q, make_request_fn *mfn) >> >> blk_set_default_limits(&q->limits); >> blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS); >> + q->limits.discard_zeroes_data = 0; >> >> /* >> * by default assume old behaviour and bounce for any highmem page Mike> Only to then reset to 0 here? Shouldn't we default to 0 and only Mike> set to 1 where applicable (e.g. sd_config_discard)? My first approach was to set it in dm-table.c before stacking. But I thought it was icky to have the stacking driver ask for defaults and then have to tweak them for things to work correctly. The other option is to have blk_set_default_stacking_limits(). Or we could add a flag to blk_set_default_limits to indicate whether this is a LLD or a stacking driver. We already special-case BLK_SAFE_MAX_SECTORS when setting the request function. And that's the only non-stacking user of the default limits call. So that's why I disabled dzd there. Since this is a stable bugfix I also wanted to keep it small and simple. But I'm totally open to suggestions.
On Wed, 4 May 2011, Martin K. Petersen wrote: > >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes: > > Lukas> Nevertheless there is something weird going on, because even when > Lukas> I create striped volume I get this: > > Could you please try the following patch? It has a bunch of small tweaks > to the discard stack in it. I'll split it up before posting for real but > I'd like to know if it fixes your issue... It works thanks! This is before the patch applied: NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sdb 0 0B 0B 0 ??sdb1 4293918720 0B 0B 0 ??sdb2 4293918720 0B 0B 0 ??sdb3 4293918720 0B 0B 0 ??vg_test-lvol0 (dm-0) 0 512B 16E 1 sdd 0 512B 16E 1 ??sdd1 0 512B 16E 1 ??vg_test-lvol0 (dm-0) 0 512B 16E 1 and this is with the patch: NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sdb 0 0B 0B 0 ??sdb1 0 0B 0B 0 ??sdb2 0 0B 0B 0 ??sdb3 0 0B 0B 0 ??vg_test-lvol0 (dm-0) 0 512B 2G 0 sdd 0 512B 2G 1 ??sdd1 0 512B 2G 1 ??vg_test-lvol0 (dm-0) 0 512B 2G 0 It also works properly with two devices supporting discard. NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sdd 0 512B 2G 1 ??sdd1 0 512B 2G 1 ? ??vg_test-lvol0 (dm-0) 0 512B 2G 1 ??sdd2 0 512B 2G 1 ??vg_test-lvol0 (dm-0) 0 512B 2G 1 Also I am not sure why this changed discard_max_bytes from 4294966784 to 2147450880 in my case, that does not seem right...lsblk in the first place apparently overflowed. Thanks for solving this! -Lukas > > > block/libata/scsi: Various logical block provisioning fixes > > - Add sysfs documentation for the discard topology parameters > > - Fix discard stacking problem > > - Switch our libata SAT over to using the WRITE SAME limits > > - UNMAP alignment needs to be converted to bytes > > - Only report alignment and zeroes_data if the device supports discard > > Reported-by: Lukas Czerner <lczerner@redhat.com> > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > > diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block > index 4873c75..c1eb41c 100644 > --- a/Documentation/ABI/testing/sysfs-block > +++ b/Documentation/ABI/testing/sysfs-block > @@ -142,3 +142,67 @@ Description: > with the previous I/O request are enabled. When set to 2, > all merge tries are disabled. The default value is 0 - > which enables all types of merge tries. > + > +What: /sys/block/<disk>/discard_alignment > +Date: May 2011 > +Contact: Martin K. Petersen <martin.petersen@oracle.com> > +Description: > + Devices that support discard functionality may > + internally allocate space in units that are bigger than > + the exported logical block size. The discard_alignment > + parameter indicates how many bytes the beginning of the > + device is offset from the internal allocation unit's > + natural alignment. > + > +What: /sys/block/<disk>/<partition>/discard_alignment > +Date: May 2011 > +Contact: Martin K. Petersen <martin.petersen@oracle.com> > +Description: > + Devices that support discard functionality may > + internally allocate space in units that are bigger than > + the exported logical block size. The discard_alignment > + parameter indicates how many bytes the beginning of the > + partition is offset from the internal allocation unit's > + natural alignment. > + > +What: /sys/block/<disk>/queue/discard_granularity > +Date: May 2011 > +Contact: Martin K. Petersen <martin.petersen@oracle.com> > +Description: > + Devices that support discard functionality may > + internally allocate space using units that are bigger > + than the logical block size. The discard_granularity > + parameter indicates the size of the internal allocation > + unit in bytes if reported by the device. Otherwise the > + discard_granularity will be set to match the device's > + physical block size. A discard_granularity of 0 means > + that the device does not support discard functionality. > + > +What: /sys/block/<disk>/queue/discard_max_bytes > +Date: May 2011 > +Contact: Martin K. Petersen <martin.petersen@oracle.com> > +Description: > + Devices that support discard functionality may have > + internal limits on the number of bytes that can be > + trimmed or unmapped in a single operation. Some storage > + protocols also have inherent limits on the number of > + blocks that can be described in a single command. The > + discard_max_bytes parameter is set by the device driver > + to the maximum number of bytes that can be discarded in > + a single operation. Discard requests issued to the > + device must not exceed this limit. A discard_max_bytes > + value of 0 means that the device does not support > + discard functionality. > + > +What: /sys/block/<disk>/queue/discard_zeroes_data > +Date: May 2011 > +Contact: Martin K. Petersen <martin.petersen@oracle.com> > +Description: > + Devices that support discard functionality may return > + stale or random data when a previously discarded block > + is read back. This can cause problems if the filesystem > + expects discarded blocks to be explicitly cleared. If a > + device reports that it deterministically returns zeroes > + when a discarded area is read the discard_zeroes_data > + parameter will be set to one. Otherwise it will be 0 and > + the result of reading a discarded area is undefined. > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 1fa7692..42d3bf5 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim) > lim->discard_granularity = 0; > lim->discard_alignment = 0; > lim->discard_misaligned = 0; > - lim->discard_zeroes_data = -1; > + lim->discard_zeroes_data = 1; > lim->logical_block_size = lim->physical_block_size = lim->io_min = 512; > lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT); > lim->alignment_offset = 0; > @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) > > blk_set_default_limits(&q->limits); > blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS); > + q->limits.discard_zeroes_data = 0; > > /* > * by default assume old behaviour and bounce for any highmem page > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index e2f57e9e..30ea95f 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -2138,7 +2138,7 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) > * with the unmap bit set. > */ > if (ata_id_has_trim(args->id)) { > - put_unaligned_be32(65535 * 512 / 8, &rbuf[20]); > + put_unaligned_be64(65535 * 512 / 8, &rbuf[36]); > put_unaligned_be32(1, &rbuf[28]); > } > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index bd0806e..cc081dd 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -490,7 +490,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) > unsigned int max_blocks = 0; > > q->limits.discard_zeroes_data = sdkp->lbprz; > - q->limits.discard_alignment = sdkp->unmap_alignment; > + q->limits.discard_alignment = sdkp->unmap_alignment * > + (logical_block_size >> 9); > q->limits.discard_granularity = > max(sdkp->physical_block_size, > sdkp->unmap_granularity * logical_block_size); > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 2ad95fa..1416e3c 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -257,7 +257,7 @@ struct queue_limits { > unsigned char misaligned; > unsigned char discard_misaligned; > unsigned char cluster; > - signed char discard_zeroes_data; > + unsigned char discard_zeroes_data; > }; > > struct request_queue > @@ -1066,13 +1066,16 @@ static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector > { > unsigned int alignment = (sector << 9) & (lim->discard_granularity - 1); > > + if (!lim->max_discard_sectors) > + return 0; > + > return (lim->discard_granularity + lim->discard_alignment - alignment) > & (lim->discard_granularity - 1); > } > > static inline unsigned int queue_discard_zeroes_data(struct request_queue *q) > { > - if (q->limits.discard_zeroes_data == 1) > + if (q->limits.max_discard_sectors && q->limits.discard_zeroes_data == 1) > return 1; > > return 0; > -- -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
>>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
Lukas> Also I am not sure why this changed discard_max_bytes from
Lukas> 4294966784 to 2147450880 in my case, that does not seem
Lukas> right...lsblk in the first place apparently overflowed.
2G is correct. The relevant libata patch was missing from the series I
submitted during the merge window and I don't see it in the archives. It
was at the end of the stack so I guess I had an off-by-one in my commit
range...
On Wed, 4 May 2011, Martin K. Petersen wrote: > >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes: > > Lukas> Also I am not sure why this changed discard_max_bytes from > Lukas> 4294966784 to 2147450880 in my case, that does not seem > Lukas> right...lsblk in the first place apparently overflowed. > > 2G is correct. The relevant libata patch was missing from the series I > submitted during the merge window and I don't see it in the archives. It > was at the end of the stack so I guess I had an off-by-one in my commit > range... > Ok, thanks. -Lukas -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, May 04 2011 at 12:50pm -0400, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike> lim->discard_zeroes_data = -1; was suspect to me too. > Mike> But why default to 1 here? > > Because otherwise DM would default to having dzd to "unsupported", > meaning the feature would never be turned on regardless of the bottom > device capabilities. > > The old approach used the -1 value to indicate "has not been set". That > was only really intended as a value for the stacking drivers, not for > the LLDs. It was a bit of a hack and I'd rather deal with dzd the same > way as we do with clustering. > > > >> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue > >> *q, make_request_fn *mfn) > >> > >> blk_set_default_limits(&q->limits); > >> blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS); > >> + q->limits.discard_zeroes_data = 0; > >> > >> /* > >> * by default assume old behaviour and bounce for any highmem page > > Mike> Only to then reset to 0 here? Shouldn't we default to 0 and only > Mike> set to 1 where applicable (e.g. sd_config_discard)? > > My first approach was to set it in dm-table.c before stacking. But I > thought it was icky to have the stacking driver ask for defaults and > then have to tweak them for things to work correctly. > > The other option is to have blk_set_default_stacking_limits(). Or we > could add a flag to blk_set_default_limits to indicate whether this is a > LLD or a stacking driver. > > We already special-case BLK_SAFE_MAX_SECTORS when setting the request > function. And that's the only non-stacking user of the default limits > call. So that's why I disabled dzd there. Since this is a stable bugfix > I also wanted to keep it small and simple. But I'm totally open to > suggestions. Your current approach sounds good. Might be good to briefly speak to the duality of the stacking vs non-stacking approach in the associated patch header. Thanks for clarifying. Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Hey Martin, On Wed, May 04 2011 at 11:10am -0400, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes: > > Lukas> Nevertheless there is something weird going on, because even when > Lukas> I create striped volume I get this: > > Could you please try the following patch? It has a bunch of small tweaks > to the discard stack in it. I'll split it up before posting for real but > I'd like to know if it fixes your issue... Would be ideal to get these fixes staged for 2.6.40. Do you intend to split these patches up and post for 2.6.40 inclussion? Please advise, thanks! > block/libata/scsi: Various logical block provisioning fixes > > - Add sysfs documentation for the discard topology parameters > > - Fix discard stacking problem > > - Switch our libata SAT over to using the WRITE SAME limits > > - UNMAP alignment needs to be converted to bytes > > - Only report alignment and zeroes_data if the device supports discard > > Reported-by: Lukas Czerner <lczerner@redhat.com> > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > > diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block > index 4873c75..c1eb41c 100644 > --- a/Documentation/ABI/testing/sysfs-block > +++ b/Documentation/ABI/testing/sysfs-block > @@ -142,3 +142,67 @@ Description: > with the previous I/O request are enabled. When set to 2, > all merge tries are disabled. The default value is 0 - > which enables all types of merge tries. > + > +What: /sys/block/<disk>/discard_alignment > +Date: May 2011 > +Contact: Martin K. Petersen <martin.petersen@oracle.com> > +Description: > + Devices that support discard functionality may > + internally allocate space in units that are bigger than > + the exported logical block size. The discard_alignment > + parameter indicates how many bytes the beginning of the > + device is offset from the internal allocation unit's > + natural alignment. > + > +What: /sys/block/<disk>/<partition>/discard_alignment > +Date: May 2011 > +Contact: Martin K. Petersen <martin.petersen@oracle.com> > +Description: > + Devices that support discard functionality may > + internally allocate space in units that are bigger than > + the exported logical block size. The discard_alignment > + parameter indicates how many bytes the beginning of the > + partition is offset from the internal allocation unit's > + natural alignment. > + > +What: /sys/block/<disk>/queue/discard_granularity > +Date: May 2011 > +Contact: Martin K. Petersen <martin.petersen@oracle.com> > +Description: > + Devices that support discard functionality may > + internally allocate space using units that are bigger > + than the logical block size. The discard_granularity > + parameter indicates the size of the internal allocation > + unit in bytes if reported by the device. Otherwise the > + discard_granularity will be set to match the device's > + physical block size. A discard_granularity of 0 means > + that the device does not support discard functionality. > + > +What: /sys/block/<disk>/queue/discard_max_bytes > +Date: May 2011 > +Contact: Martin K. Petersen <martin.petersen@oracle.com> > +Description: > + Devices that support discard functionality may have > + internal limits on the number of bytes that can be > + trimmed or unmapped in a single operation. Some storage > + protocols also have inherent limits on the number of > + blocks that can be described in a single command. The > + discard_max_bytes parameter is set by the device driver > + to the maximum number of bytes that can be discarded in > + a single operation. Discard requests issued to the > + device must not exceed this limit. A discard_max_bytes > + value of 0 means that the device does not support > + discard functionality. > + > +What: /sys/block/<disk>/queue/discard_zeroes_data > +Date: May 2011 > +Contact: Martin K. Petersen <martin.petersen@oracle.com> > +Description: > + Devices that support discard functionality may return > + stale or random data when a previously discarded block > + is read back. This can cause problems if the filesystem > + expects discarded blocks to be explicitly cleared. If a > + device reports that it deterministically returns zeroes > + when a discarded area is read the discard_zeroes_data > + parameter will be set to one. Otherwise it will be 0 and > + the result of reading a discarded area is undefined. > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 1fa7692..42d3bf5 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim) > lim->discard_granularity = 0; > lim->discard_alignment = 0; > lim->discard_misaligned = 0; > - lim->discard_zeroes_data = -1; > + lim->discard_zeroes_data = 1; > lim->logical_block_size = lim->physical_block_size = lim->io_min = 512; > lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT); > lim->alignment_offset = 0; > @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) > > blk_set_default_limits(&q->limits); > blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS); > + q->limits.discard_zeroes_data = 0; > > /* > * by default assume old behaviour and bounce for any highmem page > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index e2f57e9e..30ea95f 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -2138,7 +2138,7 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) > * with the unmap bit set. > */ > if (ata_id_has_trim(args->id)) { > - put_unaligned_be32(65535 * 512 / 8, &rbuf[20]); > + put_unaligned_be64(65535 * 512 / 8, &rbuf[36]); > put_unaligned_be32(1, &rbuf[28]); > } > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index bd0806e..cc081dd 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -490,7 +490,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) > unsigned int max_blocks = 0; > > q->limits.discard_zeroes_data = sdkp->lbprz; > - q->limits.discard_alignment = sdkp->unmap_alignment; > + q->limits.discard_alignment = sdkp->unmap_alignment * > + (logical_block_size >> 9); > q->limits.discard_granularity = > max(sdkp->physical_block_size, > sdkp->unmap_granularity * logical_block_size); > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 2ad95fa..1416e3c 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -257,7 +257,7 @@ struct queue_limits { > unsigned char misaligned; > unsigned char discard_misaligned; > unsigned char cluster; > - signed char discard_zeroes_data; > + unsigned char discard_zeroes_data; > }; > > struct request_queue > @@ -1066,13 +1066,16 @@ static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector > { > unsigned int alignment = (sector << 9) & (lim->discard_granularity - 1); > > + if (!lim->max_discard_sectors) > + return 0; > + > return (lim->discard_granularity + lim->discard_alignment - alignment) > & (lim->discard_granularity - 1); > } > > static inline unsigned int queue_discard_zeroes_data(struct request_queue *q) > { > - if (q->limits.discard_zeroes_data == 1) > + if (q->limits.max_discard_sectors && q->limits.discard_zeroes_data == 1) > return 1; > > return 0; > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Wed, May 18 2011 at 8:16am -0400, Mike Snitzer <snitzer@redhat.com> wrote: > Hey Martin, > > On Wed, May 04 2011 at 11:10am -0400, > Martin K. Petersen <martin.petersen@oracle.com> wrote: > > > >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes: > > > > Lukas> Nevertheless there is something weird going on, because even when > > Lukas> I create striped volume I get this: > > > > Could you please try the following patch? It has a bunch of small tweaks > > to the discard stack in it. I'll split it up before posting for real but > > I'd like to know if it fixes your issue... > > Would be ideal to get these fixes staged for 2.6.40. > > Do you intend to split these patches up and post for 2.6.40 inclussion? Ah, I now see you posted them and Jens picked some up.. thanks Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index 4873c75..c1eb41c 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block @@ -142,3 +142,67 @@ Description: with the previous I/O request are enabled. When set to 2, all merge tries are disabled. The default value is 0 - which enables all types of merge tries. + +What: /sys/block/<disk>/discard_alignment +Date: May 2011 +Contact: Martin K. Petersen <martin.petersen@oracle.com> +Description: + Devices that support discard functionality may + internally allocate space in units that are bigger than + the exported logical block size. The discard_alignment + parameter indicates how many bytes the beginning of the + device is offset from the internal allocation unit's + natural alignment. + +What: /sys/block/<disk>/<partition>/discard_alignment +Date: May 2011 +Contact: Martin K. Petersen <martin.petersen@oracle.com> +Description: + Devices that support discard functionality may + internally allocate space in units that are bigger than + the exported logical block size. The discard_alignment + parameter indicates how many bytes the beginning of the + partition is offset from the internal allocation unit's + natural alignment. + +What: /sys/block/<disk>/queue/discard_granularity +Date: May 2011 +Contact: Martin K. Petersen <martin.petersen@oracle.com> +Description: + Devices that support discard functionality may + internally allocate space using units that are bigger + than the logical block size. The discard_granularity + parameter indicates the size of the internal allocation + unit in bytes if reported by the device. Otherwise the + discard_granularity will be set to match the device's + physical block size. A discard_granularity of 0 means + that the device does not support discard functionality. + +What: /sys/block/<disk>/queue/discard_max_bytes +Date: May 2011 +Contact: Martin K. Petersen <martin.petersen@oracle.com> +Description: + Devices that support discard functionality may have + internal limits on the number of bytes that can be + trimmed or unmapped in a single operation. Some storage + protocols also have inherent limits on the number of + blocks that can be described in a single command. The + discard_max_bytes parameter is set by the device driver + to the maximum number of bytes that can be discarded in + a single operation. Discard requests issued to the + device must not exceed this limit. A discard_max_bytes + value of 0 means that the device does not support + discard functionality. + +What: /sys/block/<disk>/queue/discard_zeroes_data +Date: May 2011 +Contact: Martin K. Petersen <martin.petersen@oracle.com> +Description: + Devices that support discard functionality may return + stale or random data when a previously discarded block + is read back. This can cause problems if the filesystem + expects discarded blocks to be explicitly cleared. If a + device reports that it deterministically returns zeroes + when a discarded area is read the discard_zeroes_data + parameter will be set to one. Otherwise it will be 0 and + the result of reading a discarded area is undefined. diff --git a/block/blk-settings.c b/block/blk-settings.c index 1fa7692..42d3bf5 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim) lim->discard_granularity = 0; lim->discard_alignment = 0; lim->discard_misaligned = 0; - lim->discard_zeroes_data = -1; + lim->discard_zeroes_data = 1; lim->logical_block_size = lim->physical_block_size = lim->io_min = 512; lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT); lim->alignment_offset = 0; @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) blk_set_default_limits(&q->limits); blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS); + q->limits.discard_zeroes_data = 0; /* * by default assume old behaviour and bounce for any highmem page diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index e2f57e9e..30ea95f 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -2138,7 +2138,7 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) * with the unmap bit set. */ if (ata_id_has_trim(args->id)) { - put_unaligned_be32(65535 * 512 / 8, &rbuf[20]); + put_unaligned_be64(65535 * 512 / 8, &rbuf[36]); put_unaligned_be32(1, &rbuf[28]); } diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index bd0806e..cc081dd 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -490,7 +490,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) unsigned int max_blocks = 0; q->limits.discard_zeroes_data = sdkp->lbprz; - q->limits.discard_alignment = sdkp->unmap_alignment; + q->limits.discard_alignment = sdkp->unmap_alignment * + (logical_block_size >> 9); q->limits.discard_granularity = max(sdkp->physical_block_size, sdkp->unmap_granularity * logical_block_size); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2ad95fa..1416e3c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -257,7 +257,7 @@ struct queue_limits { unsigned char misaligned; unsigned char discard_misaligned; unsigned char cluster; - signed char discard_zeroes_data; + unsigned char discard_zeroes_data; }; struct request_queue @@ -1066,13 +1066,16 @@ static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector { unsigned int alignment = (sector << 9) & (lim->discard_granularity - 1); + if (!lim->max_discard_sectors) + return 0; + return (lim->discard_granularity + lim->discard_alignment - alignment) & (lim->discard_granularity - 1); } static inline unsigned int queue_discard_zeroes_data(struct request_queue *q) { - if (q->limits.discard_zeroes_data == 1) + if (q->limits.max_discard_sectors && q->limits.discard_zeroes_data == 1) return 1; return 0;