Message ID | 1439099990.7880.0.camel@hasee (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Mike Snitzer |
Headers | show |
On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote: > +/* > + * Ensure that max discard sectors doesn't overflow bi_size and hopefully > + * it is of the proper granularity as long as the granularity is a power > + * of two. > + */ > +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS. If we ever to something like Kent's multipage biovecs we'll actually need it for regular read/write bios in addition to discard and write same. Except for that the patch looks reasonable to me. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sat, 2015-08-08 at 23:41 -0700, Christoph Hellwig wrote: > On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote: > > +/* > > + * Ensure that max discard sectors doesn't overflow bi_size and hopefully > > + * it is of the proper granularity as long as the granularity is a power > > + * of two. > > + */ > > +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) > > Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS. If we ever > to something like Kent's multipage biovecs we'll actually need it for > regular read/write bios in addition to discard and write same. > > Except for that the patch looks reasonable to me. Will change it to MAX_BIO_SECTORS. May I add your ACK? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > Will change it to MAX_BIO_SECTORS. > May I add your ACK? Yes, please go ahead. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > > Will change it to MAX_BIO_SECTORS. > > May I add your ACK? > > Yes, please go ahead. Thanks. I'll send a new version of the series once device-mapper guy acks. Hi Mike, I have updated my tree. Could you pull and re-test? https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req The 2 thin-provisioning tests passed. Hope I can have your ACK soon. Thanks. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Sun, Aug 09 2015 at 3:18am -0400, Ming Lin <mlin@kernel.org> wrote: > On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: > > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > > > Will change it to MAX_BIO_SECTORS. > > > May I add your ACK? > > > > Yes, please go ahead. > > Thanks. I'll send a new version of the series once device-mapper guy > acks. > > Hi Mike, > > I have updated my tree. Could you pull and re-test? > https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req > > The 2 thin-provisioning tests passed. I've merged your latest branch with my dm-4.3 branch, I had one conflict in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no surprise). I've published the result here: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting It passes the device-mapper-test-suite's 'thin-provisioning' tests. > Hope I can have your ACK soon. Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same (instead of UINT_MAX >> 9)? Aside from that, I'm in favor of seeing this late bio splitting patchset finally land upstream (hopefully in time for the 4.3 merge, Jens?): Acked-by: Mike Snitzer <snitzer@redhat.com> p.s. I'll be working with Joe Thornber on optimizing DM (particularly dm-thinp and dm-cache) once this patchset is included upstream. You'll see I've already added a couple WIP dm-thinp patches ontop. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: > On Sun, Aug 09 2015 at 3:18am -0400, > Ming Lin <mlin@kernel.org> wrote: > > > On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: > > > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > > > > Will change it to MAX_BIO_SECTORS. > > > > May I add your ACK? > > > > > > Yes, please go ahead. > > > > Thanks. I'll send a new version of the series once device-mapper guy > > acks. > > > > Hi Mike, > > > > I have updated my tree. Could you pull and re-test? > > https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req > > > > The 2 thin-provisioning tests passed. > > I've merged your latest branch with my dm-4.3 branch, I had one conflict > in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no > surprise). I've published the result here: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting > > It passes the device-mapper-test-suite's 'thin-provisioning' tests. > > > Hope I can have your ACK soon. > > Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same > (instead of UINT_MAX >> 9)? I also prefer using MAX_BIO_SECTORS. Otherwise, we may have non page size aligned splits. Say, write_same 8G. Using UINT_MAX >> 9, we'll have 2 sector aligned splits in blkdev_issue_write_same(): 0 - (4G - 512 - 1) (4G - 512, 8G -1) This looks weired. Using MAX_BIO_SECTORS, we'll have 4 page size aligned splits: 0 - (2G -1) 2G - (4G - 1) 4G - (6G - 1) 6G - (8G - 1) I'll use MAX_BIO_SECTORS in blkdev_issue_write_same() if no objection. > > Aside from that, I'm in favor of seeing this late bio splitting patchset > finally land upstream (hopefully in time for the 4.3 merge, Jens?): > > Acked-by: Mike Snitzer <snitzer@redhat.com> Thanks! May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely" also? > > p.s. I'll be working with Joe Thornber on optimizing DM (particularly > dm-thinp and dm-cache) once this patchset is included upstream. You'll > see I've already added a couple WIP dm-thinp patches ontop. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, 2015-08-10 at 09:14 -0700, Ming Lin wrote: > On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: > > On Sun, Aug 09 2015 at 3:18am -0400, > > Ming Lin <mlin@kernel.org> wrote: > > > > > On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: > > > > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > > > > > Will change it to MAX_BIO_SECTORS. > > > > > May I add your ACK? > > > > > > > > Yes, please go ahead. > > > > > > Thanks. I'll send a new version of the series once device-mapper guy > > > acks. > > > > > > Hi Mike, > > > > > > I have updated my tree. Could you pull and re-test? > > > https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req > > > > > > The 2 thin-provisioning tests passed. > > > > I've merged your latest branch with my dm-4.3 branch, I had one conflict > > in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no > > surprise). I've published the result here: > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting > > > > It passes the device-mapper-test-suite's 'thin-provisioning' tests. > > > > > Hope I can have your ACK soon. > > > > Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same > > (instead of UINT_MAX >> 9)? > > I also prefer using MAX_BIO_SECTORS. > Otherwise, we may have non page size aligned splits. > > Say, write_same 8G. > > Using UINT_MAX >> 9, we'll have 2 sector aligned splits in > blkdev_issue_write_same(): > 0 - (4G - 512 - 1) > (4G - 512, 8G -1) Actually, 3 sector aligned splits. 0 - (4G-512-1) (4G-512), (8G-512-1) (8G-512), (8G-1) > > This looks weired. > > Using MAX_BIO_SECTORS, we'll have 4 page size aligned splits: > 0 - (2G -1) > 2G - (4G - 1) > 4G - (6G - 1) > 6G - (8G - 1) > > I'll use MAX_BIO_SECTORS in blkdev_issue_write_same() if no objection. > > > > > Aside from that, I'm in favor of seeing this late bio splitting patchset > > finally land upstream (hopefully in time for the 4.3 merge, Jens?): > > > > Acked-by: Mike Snitzer <snitzer@redhat.com> > > Thanks! > > May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely" > also? > > > > > p.s. I'll be working with Joe Thornber on optimizing DM (particularly > > dm-thinp and dm-cache) once this patchset is included upstream. You'll > > see I've already added a couple WIP dm-thinp patches ontop. > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: Mike> Shouldn't we also be using MAX_BIO_SECTORS in Mike> blkdev_issue_write_same (instead of UINT_MAX >> 9)? The granularity of WRITE SAME is always 1 logical block and there is no reason to round it down to the next power of two. +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) + That's fine for SATA since we're already capping at 2TB minus change. But it means we'll be capping unnecessarily on SCSI. And larger range counts are impending in SATA as well. So this goes back to my original comment: The only place there can be a discard granularity is for SCSI UNMAP. And consequently, we should only enforce alignment and granularity when that code path is taken in sd.c. I'm OK with Ming's patch series in general. Let's leave the discard cap at UINT_MAX and I'll twiddle the rest in SCSI. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
>>>>> "Ming" == Ming Lin <mlin@kernel.org> writes:
Ming,
Ming> I also prefer using MAX_BIO_SECTORS. Otherwise, we may have non
Ming> page size aligned splits.
This does not matter for write same and discard since there is only a
single logical block of payload. Also, given limitations in SATA we're
always issuing 2GB-32KB sized discards. Rounding those down to an even
1GB would impact performance.
I am sympathetic to wanting to issue I/Os that are aligned to powers of
two. But for most devices this matters little since the additional cost
is limited to misaligned head and tail blocks.
One thing that may be worth considering is switching bi_size from bytes
to blocks for REQ_FS. That would give us some headroom without
increasing bi_size beyond 32 bits. But I'm not entirely sure it's worth
the pain.
On Mon, Aug 10 2015 at 12:14pm -0400, Ming Lin <mlin@kernel.org> wrote: > On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: > > > > Aside from that, I'm in favor of seeing this late bio splitting patchset > > finally land upstream (hopefully in time for the 4.3 merge, Jens?): > > > > Acked-by: Mike Snitzer <snitzer@redhat.com> > > Thanks! > > May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely" > also? Sure, but please fold in the removal of dm.c comments I made in this merge commit: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=block-late-bio-splitting&id=d6df875bb65ef1ee10c91cf09cb58d009286321f thanks. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, 2015-08-10 at 12:22 -0400, Martin K. Petersen wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike> Shouldn't we also be using MAX_BIO_SECTORS in > Mike> blkdev_issue_write_same (instead of UINT_MAX >> 9)? > > The granularity of WRITE SAME is always 1 logical block and there is no > reason to round it down to the next power of two. > > +/* > + * Ensure that max discard sectors doesn't overflow bi_size and hopefully > + * it is of the proper granularity as long as the granularity is a power > + * of two. > + */ > +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) > + > > That's fine for SATA since we're already capping at 2TB minus change. > But it means we'll be capping unnecessarily on SCSI. And larger range > counts are impending in SATA as well. > > So this goes back to my original comment: The only place there can be a > discard granularity is for SCSI UNMAP. And consequently, we should only > enforce alignment and granularity when that code path is taken in sd.c. > > I'm OK with Ming's patch series in general. Let's leave the discard cap > at UINT_MAX and I'll twiddle the rest in SCSI. Just to make sure I didn't misunderstand it. Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()? req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9); instead of: req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS); But that doesn't work for dm-thinp. See Kent's suggestion to use 1<<31. https://www.redhat.com/archives/dm-devel/2015-August/msg00053.html > > Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> > Thanks! -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Aug 10, 2015 at 11:13 AM, Mike Snitzer <snitzer@redhat.com> wrote: > On Mon, Aug 10 2015 at 12:14pm -0400, > Ming Lin <mlin@kernel.org> wrote: > >> On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: >> > >> > Aside from that, I'm in favor of seeing this late bio splitting patchset >> > finally land upstream (hopefully in time for the 4.3 merge, Jens?): >> > >> > Acked-by: Mike Snitzer <snitzer@redhat.com> >> >> Thanks! >> >> May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely" >> also? > > Sure, but please fold in the removal of dm.c comments I made in this > merge commit: > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=block-late-bio-splitting&id=d6df875bb65ef1ee10c91cf09cb58d009286321f Rebased on top of latest Linus tree(4.2-rc6+). https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/diff/drivers/md/dm.c?h=block-generic-req&id=3dd8509a3f05152cca83bc41c37a8ba3e9119736 > > thanks. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
>>>>> "Ming" == Ming Lin <mlin@kernel.org> writes:
Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()?
Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use
Ming> 1<<31.
I'm not sure why things are not working for dm-thinp. Presumably Kent's
code would split the discard at a granularity boundary so why would that
cause problems for dm?
In looking at this I just found out that we'll corrupt data on certain
SCSI configs with the granularity enforcement in place. I'll have to
conjure up a fix for that...
On Mon, Aug 10 2015 at 10:00pm -0400, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Ming" == Ming Lin <mlin@kernel.org> writes: > > Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()? > > Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use > Ming> 1<<31. > > I'm not sure why things are not working for dm-thinp. Presumably Kent's > code would split the discard at a granularity boundary so why would that > cause problems for dm? DM-thinp processes discards internally before it passes them down (if configured to do so). If a discard is smaller than the granularity of a thinp block (whose size is configurable) or if the start and end of the discard's extent is misaligned (relative to the thinp blocks mapped to the logical extent) then the discard won't actually discard partial thinp blocks. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Aug 10, 2015 at 10:41:55PM -0400, Mike Snitzer wrote: > On Mon, Aug 10 2015 at 10:00pm -0400, > Martin K. Petersen <martin.petersen@oracle.com> wrote: > > > >>>>> "Ming" == Ming Lin <mlin@kernel.org> writes: > > > > Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()? > > > > Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use > > Ming> 1<<31. > > > > I'm not sure why things are not working for dm-thinp. Presumably Kent's > > code would split the discard at a granularity boundary so why would that > > cause problems for dm? > > DM-thinp processes discards internally before it passes them down (if > configured to do so). If a discard is smaller than the granularity of a > thinp block (whose size is configurable) or if the start and end of the > discard's extent is misaligned (relative to the thinp blocks mapped to > the logical extent) then the discard won't actually discard partial > thinp blocks. This kind of logic really doesn't belong in dm - if it's needed, it really belongs in bio_split() (which is supposed to work correctly for discards - so if it is needed, then bio_split() needs fixing...) IMO though it belongs in the driver - if a discard needs to be dropped because it's too small and the hardware can't do it, that should be the driver's responsibility. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Aug 10 2015 at 11:38pm -0400, Kent Overstreet <kent.overstreet@gmail.com> wrote: > On Mon, Aug 10, 2015 at 10:41:55PM -0400, Mike Snitzer wrote: > > On Mon, Aug 10 2015 at 10:00pm -0400, > > Martin K. Petersen <martin.petersen@oracle.com> wrote: > > > > > >>>>> "Ming" == Ming Lin <mlin@kernel.org> writes: > > > > > > Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()? > > > > > > Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use > > > Ming> 1<<31. > > > > > > I'm not sure why things are not working for dm-thinp. Presumably Kent's > > > code would split the discard at a granularity boundary so why would that > > > cause problems for dm? > > > > DM-thinp processes discards internally before it passes them down (if > > configured to do so). If a discard is smaller than the granularity of a > > thinp block (whose size is configurable) or if the start and end of the > > discard's extent is misaligned (relative to the thinp blocks mapped to > > the logical extent) then the discard won't actually discard partial > > thinp blocks. > > This kind of logic really doesn't belong in dm - if it's needed, it really > belongs in bio_split() (which is supposed to work correctly for discards - so if > it is needed, then bio_split() needs fixing...) DM thinp does advertise discard_granularity that reflects the thinp blocksize. blk_queue_split() does look like it'd do the right thing. But the splitting that DM thinp is doing is a long standing implementation (in DM core) that will need to be carefully reviewed/rewritten. We can tackle it after all this late splitting code lands. > IMO though it belongs in the driver - if a discard needs to be dropped because > it's too small and the hardware can't do it, that should be the driver's > responsibility. This isn't about the hardware's limits. This is about the intermediate remapping/stacking driver's own space management hooking off of the discard bio too. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> DM-thinp processes discards internally before it passes them down
Mike> (if configured to do so). If a discard is smaller than the
Mike> granularity of a thinp block (whose size is configurable) or if
Mike> the start and end of the discard's extent is misaligned (relative
Mike> to the thinp blocks mapped to the logical extent) then the discard
Mike> won't actually discard partial thinp blocks.
That's fine. You can throw away anything you don't like as long as
discard_zeroes_data=0.
But I don't understand why having an artificial cap at 2GB fixes
things. Other than making it less likely for you to receive a runt by
virtue of being aligned to a power of two.
On Tue, Aug 11 2015 at 1:36pm -0400, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike> DM-thinp processes discards internally before it passes them down > Mike> (if configured to do so). If a discard is smaller than the > Mike> granularity of a thinp block (whose size is configurable) or if > Mike> the start and end of the discard's extent is misaligned (relative > Mike> to the thinp blocks mapped to the logical extent) then the discard > Mike> won't actually discard partial thinp blocks. > > That's fine. You can throw away anything you don't like as long as > discard_zeroes_data=0. Correct, dm-thinp sets discard_zeroes_data=0 > But I don't understand why having an artificial cap at 2GB fixes > things. Other than making it less likely for you to receive a runt by > virtue of being aligned to a power of two. That is the benefit. And when coupled with the new default max_discard of 64K (pending change from Jens for 4.3) this 2GB upper limit really isn't such a big deal. Unless I'm missing something... -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
>>>>> "Kent" == Kent Overstreet <kent.overstreet@gmail.com> writes:
Kent> This kind of logic really doesn't belong in dm
Well it does in this case since the thinp personality actually
provisions and unprovisions space.
But there is a difference between what dm thinp acts on for its own
internal provisioning purposes and what it passes down the stack. I am
really against dropping information anywhere along the path. We don't
round off read/write requests either.
The queue limits were meant as hints to mkfs.* so that on-disk data
structures could be laid out in an aligned and storage friendly way. I
never intended for the hints to affect runtime behavior.
Kent> IMO though it belongs in the driver - if a discard needs to be
Kent> dropped because it's too small and the hardware can't do it, that
Kent> should be the driver's responsibility.
I agree except I really don't want to lop off anything unless the device
locks up if we send it partial blocks. There was an array that had
problems a while back but I believe they have been fixed.
The fundamental premise should be that we pass as comprehensive
information as we can. And the device can then decide to ignore all or
parts of the request. That's fundamentally how things work at the
protocol level in both SCSI and SATA. I don't see any reason why the
Linux I/O stack should behave in a different manner.
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> That is the benefit. And when coupled with the new default
Mike> max_discard of 64K (pending change from Jens for 4.3) this 2GB
Mike> upper limit really isn't such a big deal. Unless I'm missing
Mike> something...
2GB is fine for current SATA due to the stupid range descriptors. But
there are some changes in the pipeline to support descriptors with
bigger ranges so that will change.
However, we currently do 4GB discards on SCSI so the proposed cap will
cut that in half. I'm OK with that as an interim solution. though.
But I'm a bit concerned about what might be lurking in dm thinp if you
trip over partial blocks like in Ming's example...
>>>>> "Martin" == Martin K Petersen <martin.petersen@oracle.com> writes:
Martin> I agree except I really don't want to lop off anything unless
Martin> the device locks up if we send it partial blocks. There was an
Martin> array that had problems a while back but I believe they have
Martin> been fixed.
Oh, and there are several arrays out there that have allocation units
that are not powers of two. *sigh*
On Tue, 2015-08-11 at 14:05 -0400, Martin K. Petersen wrote: > >>>>> "Martin" == Martin K Petersen <martin.petersen@oracle.com> writes: > > Martin> I agree except I really don't want to lop off anything unless > Martin> the device locks up if we send it partial blocks. There was an > Martin> array that had problems a while back but I believe they have > Martin> been fixed. > > Oh, and there are several arrays out there that have allocation units > that are not powers of two. *sigh* Do you still agree we cap discard to 2G as an interim solution? https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/commit/?h=block-generic-req&id=39d2f5f If OK, I'll send out new version asking Jens to apply. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
>>>>> "Ming" == Ming Lin <mlin@kernel.org> writes:
Ming> Do you still agree we cap discard to 2G as an interim solution?
I can live with the 2G cap for 4.3 but would like to see it fixed
properly in 4.4.
On Tue, 2015-08-11 at 20:24 -0400, Martin K. Petersen wrote: > >>>>> "Ming" == Ming Lin <mlin@kernel.org> writes: > > Ming> Do you still agree we cap discard to 2G as an interim solution? > > I can live with the 2G cap for 4.3 but would like to see it fixed > properly in 4.4. Sure. I'd like to work on the fix. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Aug 10, 2015 at 8:02 AM, Mike Snitzer <snitzer@redhat.com> wrote: > p.s. I'll be working with Joe Thornber on optimizing DM (particularly > dm-thinp and dm-cache) once this patchset is included upstream. You'll > see I've already added a couple WIP dm-thinp patches ontop. Hi Mike, Just to avoid duplicated work. Are you going to work on the dm-thinp/dm-cache discard rewritten? Otherwise, I can work on it. Thanks, Ming -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, Aug 17, 2015 at 10:09 PM, Ming Lin <mlin@kernel.org> wrote: > On Mon, Aug 10, 2015 at 8:02 AM, Mike Snitzer <snitzer@redhat.com> wrote: >> p.s. I'll be working with Joe Thornber on optimizing DM (particularly >> dm-thinp and dm-cache) once this patchset is included upstream. You'll >> see I've already added a couple WIP dm-thinp patches ontop. > > Hi Mike, > > Just to avoid duplicated work. > Are you going to work on the dm-thinp/dm-cache discard rewritten? Seems dm-stripe discard also needs rewrite. > > Otherwise, I can work on it. > > Thanks, > Ming -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Aug 18 2015 at 3:04am -0400, Ming Lin <mlin@kernel.org> wrote: > On Mon, Aug 17, 2015 at 10:09 PM, Ming Lin <mlin@kernel.org> wrote: > > On Mon, Aug 10, 2015 at 8:02 AM, Mike Snitzer <snitzer@redhat.com> wrote: > >> p.s. I'll be working with Joe Thornber on optimizing DM (particularly > >> dm-thinp and dm-cache) once this patchset is included upstream. You'll > >> see I've already added a couple WIP dm-thinp patches ontop. > > > > Hi Mike, > > > > Just to avoid duplicated work. > > Are you going to work on the dm-thinp/dm-cache discard rewritten? > > Seems dm-stripe discard also needs rewrite. Can you elaborate on what you feel needs re-writing in these targets? This is the basic initial cleanup I had in mind for dm-thinp: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=dm-4.4&id=cb0aca0a6bfad6b7f7146dde776f374082a73db6 A much more involved refactoring of the dm-cache and dm-thinp targets to eliminate the need for splitting will involve bio-prison range locking and a new metadata format for both targets to express ranges as opposed to blocks. This line of work is on Joe's radar but it is much further out given the associated on-disk metadata format change. That aside, I do need to look at DM core to see how we can do things differently so that block core's bio_split() et al is doing the splitting rather than DM core having a role. I'd prefer to be the one working these DM changes. But if you have ideas of how things should be cleaned up I'd be happy to consider them. Thanks, Mike -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..4859e4b 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -26,6 +26,13 @@ static void bio_batch_end_io(struct bio *bio, int err) bio_put(bio); } +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) + /** * blkdev_issue_discard - queue a discard * @bdev: blockdev to issue discard for @@ -43,8 +50,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; - unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -56,21 +61,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, if (!blk_queue_discard(q)) return -EOPNOTSUPP; - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q->limits.discard_granularity >> 9, 1U); - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; - - /* - * Ensure that max_discard_sectors is of the proper - * granularity, so that requests stay aligned after a split. - */ - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); - max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } - if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -84,7 +74,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(&plug); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -92,21 +82,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, break; } - req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* - * If splitting a request, and the next starting sector would be - * misaligned, stop the discard at the previous aligned sector. - */ + req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS); end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects < nr_sects && - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio->bi_iter.bi_sector = sector; bio->bi_end_io = bio_batch_end_io; @@ -166,10 +143,8 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, if (!q) return -ENXIO; - max_write_same_sectors = q->limits.max_write_same_sectors; - - if (max_write_same_sectors == 0) - return -EOPNOTSUPP; + /* Ensure that max_write_same_sectors doesn't overflow bi_size */ + max_write_same_sectors = UINT_MAX >> 9; atomic_set(&bb.done, 1); bb.flags = 1 << BIO_UPTODATE;