Message ID | 4DABDC60.2090009@fusionio.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Mon, 18 Apr 2011 08:38:24 +0200 Jens Axboe <jaxboe@fusionio.com> wrote: > On 2011-04-18 00:19, NeilBrown wrote: > > On Mon, 11 Apr 2011 14:11:58 +0200 Jens Axboe <jaxboe@fusionio.com> wrote: > > > >>> Yes. But I need to know when to release the requests that I have stored. > >>> I need to know when ->write_pages or ->read_pages or whatever has finished > >>> submitting a pile of pages so that I can start processing the request that I > >>> have put aside. So I need a callback from blk_finish_plug. > >> > >> OK fair enough, I'll add your callback patch. > >> > > > > But you didn't did you? You added a completely different patch which is > > completely pointless. > > If you don't like my patch I would really prefer you said so rather than > > silently replace it with something completely different (and broken). > > First of all, you were CC'ed on all that discussion, yet didn't speak up > until now. This was last week. Secondly, please change your tone. Yes, I was CC'ed on a discussion. In that discussion it was never mentioned that you had completely changed the patch I sent you, and it never contained the new patch in-line for review. Nothing that was discussed was particularly relevant to md's needs so there was nothing to speak up about. Yes- there were 'git pull' requests and I could have done a pull myself to review the code but there seemed to be no urgency because you had already agreed to apply my patch. When I did finally pull the patches (after all the other issues had settle down and I had time to finish of the RAID side) I found ... what I found. I apologise for my tone, but I was very frustrated. > > > I'll try to explain again. > > > > md does not use __make_request. At all. > > md does not use 'struct request'. At all. > > > > The 'list' in 'struct blk_plug' is a list of 'struct request'. > > I'm well aware of how these facts, but thanks for bringing it up. > > > Therefore md cannot put anything useful on the list in 'struct blk_plug'. > > > > So when blk_flush_plug_list calls queue_unplugged() on a queue that belonged > > to a request found on the blk_plug list, that queue cannot possibly ever be > > for an 'md' device (because no 'struct request' ever belongs to an md device, > > because md doesn't not use 'struct request'). > > > > So your patch (commit f75664570d8b) doesn't help MD at all. > > > > For md, I need to attach something to blk_plug which somehow identifies an md > > device, so that blk_finish_plug can get to that device and let it unplug. > > The most sensible thing to have is a completely generic callback. That way > > different block devices (which choose not to use __make_request) can attach > > different sorts of things to blk_plug. > > > > So can we please have my original patch applied? (Revised version using > > list_splice_init included below). > > > > Or if not, a clear explanation of why not? > > So correct me if I'm wrong here, but the _only_ real difference between > this patch and the current code in the tree, is the checking of the > callback list indicating a need to flush the callbacks. And that's > definitely an oversight. It should be functionally equivelant if md > would just flag this need to get a callback, eg instead of queueing a > callback on the list, just set plug->need_unplug from md instead of > queuing a callback and have blk_needs_flush_plug() do: > > return plug && (!list_empty(&plug->list) || plug->need_unplug); > > instead. Something like the below, completely untested. > No, that is not the only real difference. The real difference is that in the current code, md has no way to register anything with a blk_plug because you can only register a 'struct request' on a blk_plug, and md doesn't make any use of 'struct request'. As I said in the Email you quote above: > > Therefore md cannot put anything useful on the list in 'struct blk_plug'. That is the heart of the problem. NeilBrown -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 2011-04-18 09:25, NeilBrown wrote: > On Mon, 18 Apr 2011 08:38:24 +0200 Jens Axboe <jaxboe@fusionio.com> wrote: > >> On 2011-04-18 00:19, NeilBrown wrote: >>> On Mon, 11 Apr 2011 14:11:58 +0200 Jens Axboe <jaxboe@fusionio.com> wrote: >>> >>>>> Yes. But I need to know when to release the requests that I have stored. >>>>> I need to know when ->write_pages or ->read_pages or whatever has finished >>>>> submitting a pile of pages so that I can start processing the request that I >>>>> have put aside. So I need a callback from blk_finish_plug. >>>> >>>> OK fair enough, I'll add your callback patch. >>>> >>> >>> But you didn't did you? You added a completely different patch which is >>> completely pointless. >>> If you don't like my patch I would really prefer you said so rather than >>> silently replace it with something completely different (and broken). >> >> First of all, you were CC'ed on all that discussion, yet didn't speak up >> until now. This was last week. Secondly, please change your tone. > > Yes, I was CC'ed on a discussion. In that discussion it was never mentioned > that you had completely changed the patch I sent you, and it never contained > the new patch in-line for review. Nothing that was discussed was > particularly relevant to md's needs so there was nothing to speak up about. > > Yes- there were 'git pull' requests and I could have done a pull myself to > review the code but there seemed to be no urgency because you had already > agreed to apply my patch. > When I did finally pull the patches (after all the other issues had settle > down and I had time to finish of the RAID side) I found ... what I found. > > I apologise for my tone, but I was very frustrated. > >> >>> I'll try to explain again. >>> >>> md does not use __make_request. At all. >>> md does not use 'struct request'. At all. >>> >>> The 'list' in 'struct blk_plug' is a list of 'struct request'. >> >> I'm well aware of how these facts, but thanks for bringing it up. >> >>> Therefore md cannot put anything useful on the list in 'struct blk_plug'. >>> >>> So when blk_flush_plug_list calls queue_unplugged() on a queue that belonged >>> to a request found on the blk_plug list, that queue cannot possibly ever be >>> for an 'md' device (because no 'struct request' ever belongs to an md device, >>> because md doesn't not use 'struct request'). >>> >>> So your patch (commit f75664570d8b) doesn't help MD at all. >>> >>> For md, I need to attach something to blk_plug which somehow identifies an md >>> device, so that blk_finish_plug can get to that device and let it unplug. >>> The most sensible thing to have is a completely generic callback. That way >>> different block devices (which choose not to use __make_request) can attach >>> different sorts of things to blk_plug. >>> >>> So can we please have my original patch applied? (Revised version using >>> list_splice_init included below). >>> >>> Or if not, a clear explanation of why not? >> >> So correct me if I'm wrong here, but the _only_ real difference between >> this patch and the current code in the tree, is the checking of the >> callback list indicating a need to flush the callbacks. And that's >> definitely an oversight. It should be functionally equivelant if md >> would just flag this need to get a callback, eg instead of queueing a >> callback on the list, just set plug->need_unplug from md instead of >> queuing a callback and have blk_needs_flush_plug() do: >> >> return plug && (!list_empty(&plug->list) || plug->need_unplug); >> >> instead. Something like the below, completely untested. >> > > No, that is not the only real difference. > > The real difference is that in the current code, md has no way to register > anything with a blk_plug because you can only register a 'struct request' on a > blk_plug, and md doesn't make any use of 'struct request'. > > As I said in the Email you quote above: > >>> Therefore md cannot put anything useful on the list in 'struct blk_plug'. > > That is the heart of the problem. Hmm, I don't really see a way to avoid the list in that case. You really do need some way to queue items, a single callback or flag or pointer will not suffice. I've added the patch and removed the (now) useless ->unplugged_fn callback. I suggest you base your md changes on top of my for-linus branch and tell me when you are confident it looks good, then I'll pull in your MD changes and submit them later today. OK with you?
[[NOTE to dm-devel people - one of the patches here remove some now-unused code from dm-raid.c plus a declaration from device-mapper.h ]]] On Mon, 18 Apr 2011 10:10:18 +0200 Jens Axboe <jaxboe@fusionio.com> wrote: > On 2011-04-18 09:25, NeilBrown wrote: > >>> Therefore md cannot put anything useful on the list in 'struct blk_plug'. > > > > That is the heart of the problem. > > Hmm, I don't really see a way to avoid the list in that case. You really > do need some way to queue items, a single callback or flag or pointer > will not suffice. > > I've added the patch and removed the (now) useless ->unplugged_fn > callback. I suggest you base your md changes on top of my for-linus > branch and tell me when you are confident it looks good, then I'll pull > in your MD changes and submit them later today. > > OK with you? > Yes, that's perfect. Thanks. All of my plugging-related patches are now in a 'for-jens' branch: The following changes since commit 99e22598e9a8e0a996d69c8c0f6b7027cb57720a: block: drop queue lock before calling __blk_run_queue() for kblockd punt (2011-04-18 09:59:55 +0200) are available in the git repository at: git://neil.brown.name/md for-jens NeilBrown (6): md: use new plugging interface for RAID IO. md/dm - remove remains of plug_fn callback. md - remove old plugging code. md: provide generic support for handling unplug callbacks. md: incorporate new plugging into raid5. md: fix up raid1/raid10 unplugging. drivers/md/dm-raid.c | 8 ---- drivers/md/md.c | 87 +++++++++++++++++++++------------------- drivers/md/md.h | 26 ++---------- drivers/md/raid1.c | 29 +++++++------- drivers/md/raid10.c | 27 ++++++------- drivers/md/raid5.c | 61 ++++++++++++---------------- drivers/md/raid5.h | 2 - include/linux/device-mapper.h | 1 - 8 files changed, 103 insertions(+), 138 deletions(-) Thanks, NeilBrown -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 2011-04-18 10:33, NeilBrown wrote: > > > [[NOTE to dm-devel people - one of the patches here remove some > now-unused code from dm-raid.c plus a declaration from device-mapper.h ]]] > > > On Mon, 18 Apr 2011 10:10:18 +0200 Jens Axboe <jaxboe@fusionio.com> wrote: > >> On 2011-04-18 09:25, NeilBrown wrote: > >>>>> Therefore md cannot put anything useful on the list in 'struct blk_plug'. >>> >>> That is the heart of the problem. >> >> Hmm, I don't really see a way to avoid the list in that case. You really >> do need some way to queue items, a single callback or flag or pointer >> will not suffice. >> >> I've added the patch and removed the (now) useless ->unplugged_fn >> callback. I suggest you base your md changes on top of my for-linus >> branch and tell me when you are confident it looks good, then I'll pull >> in your MD changes and submit them later today. >> >> OK with you? >> > > Yes, that's perfect. Thanks. > > All of my plugging-related patches are now in a 'for-jens' branch: > > The following changes since commit 99e22598e9a8e0a996d69c8c0f6b7027cb57720a: > > block: drop queue lock before calling __blk_run_queue() for kblockd punt (2011-04-18 09:59:55 +0200) > > are available in the git repository at: > git://neil.brown.name/md for-jens > > NeilBrown (6): > md: use new plugging interface for RAID IO. > md/dm - remove remains of plug_fn callback. > md - remove old plugging code. > md: provide generic support for handling unplug callbacks. > md: incorporate new plugging into raid5. > md: fix up raid1/raid10 unplugging. > > drivers/md/dm-raid.c | 8 ---- > drivers/md/md.c | 87 +++++++++++++++++++++------------------- > drivers/md/md.h | 26 ++---------- > drivers/md/raid1.c | 29 +++++++------- > drivers/md/raid10.c | 27 ++++++------- > drivers/md/raid5.c | 61 ++++++++++++---------------- > drivers/md/raid5.h | 2 - > include/linux/device-mapper.h | 1 - > 8 files changed, 103 insertions(+), 138 deletions(-) Great, thanks a lot Neil! It's pulled in now, will send the request to Linus today.
Btw, I really start to wonder if the request level is the right place to do this on-stack plugging. Wouldn't it be better to just plug bios in the on-stack queue? That way we could also stop doing the special case merging when adding to the plug list, and leave all the merging / I/O schedule logic in the __make_request path. Probably not .39 material, but worth a prototype? Also what this dicussion brought up is that the block layer data structures are highly confusing. Using a small subset of the request_queue also for make_request based driver just doesn't make sense. It seems like we should try to migrate the required state to struct gendisk, and submit I/O through a block_device_ops.submit method, leaving the request_queue as an internal abstraction for the request based drivers. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On 04/18/2011 11:19 AM, hch@infradead.org wrote: > Btw, I really start to wonder if the request level is the right place > to do this on-stack plugging. Wouldn't it be better to just plug > bios in the on-stack queue? That way we could also stop doing the > special case merging when adding to the plug list, and leave all the > merging / I/O schedule logic in the __make_request path. Probably > not .39 material, but worth a prototype? > > Also what this dicussion brought up is that the block layer data > structures are highly confusing. Using a small subset of the > request_queue also for make_request based driver just doesn't make > sense. It seems like we should try to migrate the required state > to struct gendisk, and submit I/O through a block_device_ops.submit > method, leaving the request_queue as an internal abstraction for > the request based drivers. > Good point. It would also help us we the device-mapper redesign agk and myself discussed at LSF. Having a block_device_ops.submit function would allow us remap the actual request queue generically; and we would even be able to address more than one request queue, which sounds awfully similar to what Jens is trying to do ... Cheers, Hannes
On 2011-04-18 11:19, hch@infradead.org wrote: > Btw, I really start to wonder if the request level is the right place > to do this on-stack plugging. Wouldn't it be better to just plug > bios in the on-stack queue? That way we could also stop doing the > special case merging when adding to the plug list, and leave all the > merging / I/O schedule logic in the __make_request path. Probably > not .39 material, but worth a prototype? > > Also what this dicussion brought up is that the block layer data > structures are highly confusing. Using a small subset of the > request_queue also for make_request based driver just doesn't make > sense. It seems like we should try to migrate the required state > to struct gendisk, and submit I/O through a block_device_ops.submit > method, leaving the request_queue as an internal abstraction for > the request based drivers. Partially agree, I've never really liked the two methods we have where the light light version was originally meant for stacked devices but gets used elsewhere now too. It also causes IO scheduling problems, and then you get things like request based dm to work around that. But the idea is really to move more towards private queueing more localized, the multiqueue setup will really apply well there too. I'm trying to flesh out the design of that, ideally it would be nice to unify the different bits we have now. But agree on pulling the stacked bits into some lower part, like the gendisk. It would clean that up nicely.
On 2011-04-18 11:40, Hannes Reinecke wrote: > On 04/18/2011 11:19 AM, hch@infradead.org wrote: >> Btw, I really start to wonder if the request level is the right place >> to do this on-stack plugging. Wouldn't it be better to just plug >> bios in the on-stack queue? That way we could also stop doing the >> special case merging when adding to the plug list, and leave all the >> merging / I/O schedule logic in the __make_request path. Probably >> not .39 material, but worth a prototype? >> >> Also what this dicussion brought up is that the block layer data >> structures are highly confusing. Using a small subset of the >> request_queue also for make_request based driver just doesn't make >> sense. It seems like we should try to migrate the required state >> to struct gendisk, and submit I/O through a block_device_ops.submit >> method, leaving the request_queue as an internal abstraction for >> the request based drivers. >> > Good point. > It would also help us we the device-mapper redesign agk and myself > discussed at LSF. Having a block_device_ops.submit function would > allow us remap the actual request queue generically; and we would > even be able to address more than one request queue, which sounds > awfully similar to what Jens is trying to do ... The multiqueue bits would still have one request_queue, but multiple queueing structures (I called those blk_queue_ctx, iirc).
> NeilBrown (6): > md: use new plugging interface for RAID IO. > md/dm - remove remains of plug_fn callback. > md - remove old plugging code. > md: provide generic support for handling unplug callbacks. > md: incorporate new plugging into raid5. > md: fix up raid1/raid10 unplugging. Looking over more of the unplugging left over, is there a reason to keep the unplug_work bits in CFQ? They seem to rather counter the current scheme (and it is the last user of kblockd outside of blk-core.c) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
> md: provide generic support for handling unplug callbacks.
This looks like some horribly ugly code to me. The real fix is to do
the plugging in the block layers for bios instead of requests. The
effect should be about the same, except that merging will become a
little easier as all bios will be on the list now when calling into
__make_request or it's equivalent, and even better if we extent the
list sort callback to also sort by the start block it will actually
simplify the merge algorithm a lot as it only needs to do front merges
and no back merges for the on-stack merging.
In addition it should also allow for much more optimal queue_lock
roundtrips - we can keep it locked at the end of what's currently
__make_request to have it available for the next bio that's been
on the list. If it either can be merged now that we have the lock
and/or we optimize get_request_wait not to sleep in the fast path
we could get down to a single queue_lock roundtrip for each unplug.
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
On Mon, 18 Apr 2011 17:30:48 -0400 "hch@infradead.org" <hch@infradead.org> wrote: > > md: provide generic support for handling unplug callbacks. > > This looks like some horribly ugly code to me. The real fix is to do > the plugging in the block layers for bios instead of requests. The > effect should be about the same, except that merging will become a > little easier as all bios will be on the list now when calling into > __make_request or it's equivalent, and even better if we extent the > list sort callback to also sort by the start block it will actually > simplify the merge algorithm a lot as it only needs to do front merges > and no back merges for the on-stack merging. > > In addition it should also allow for much more optimal queue_lock > roundtrips - we can keep it locked at the end of what's currently > __make_request to have it available for the next bio that's been > on the list. If it either can be merged now that we have the lock > and/or we optimize get_request_wait not to sleep in the fast path > we could get down to a single queue_lock roundtrip for each unplug. Does the following match with your thinking? I'm trying to make for a more concrete understanding... - We change the ->make_request_fn interface so that it takes a list of bios rather than a single bio - linked on ->bi_next. These bios must all have the same ->bi_bdev. They *might* be sorted by bi_sector (that needs to be decided). - generic_make_request currently queues bios if there is already an active request (this limits recursion). We enhance this to also queue requests when code calls blk_start_plug. In effect, generic_make_request becomes: if (current->plug) blk_add_to_plug(current->plug, bio); else { struct blk_plug plug; blk_start_plug(&plug); __generic_make_request(bio); blk_finish_plug(&plug); } - __generic_make_request would sort the list of bios by bi_bdev (and maybe bi_sector) and pass them along to the different ->make_request_fn functions. As there are likely to be only a few different bi_bdev values (often 1) but hopefully lots and lots of bios it might be more efficient to do a linear bucket sort based on bi_bdev, and only sort those buckets on bi_sector if required. Then make_request_fn handlers can expect to get lots of bios at once, can optimise their handling as seems appropriate, and not require any further plugging. Is that at all close to what you are thinking? NeilBrown -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
On Tue, Apr 19, 2011 at 08:38:13AM +1000, NeilBrown wrote:
> Is that at all close to what you are thinking?
Yes, pretty much like that.
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff --git a/block/blk-core.c b/block/blk-core.c index 78b7b0c..e1f5635 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1305,12 +1305,12 @@ get_rq: */ if (list_empty(&plug->list)) trace_block_plug(q); - else if (!plug->should_sort) { + else if (!(plug->flags & BLK_PLUG_F_SORT)) { struct request *__rq; __rq = list_entry_rq(plug->list.prev); if (__rq->q != q) - plug->should_sort = 1; + plug->flags |= BLK_PLUG_F_SORT; } /* * Debug flag, kill later @@ -2638,7 +2638,7 @@ void blk_start_plug(struct blk_plug *plug) plug->magic = PLUG_MAGIC; INIT_LIST_HEAD(&plug->list); - plug->should_sort = 0; + plug->flags = 0; /* * If this is a nested plug, don't actually assign it. It will be @@ -2693,9 +2693,9 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule) list_splice_init(&plug->list, &list); - if (plug->should_sort) { + if (plug->flags & BLK_PLUG_F_SORT) { list_sort(NULL, &list, plug_rq_cmp); - plug->should_sort = 0; + plug->flags &= ~BLK_PLUG_F_SORT; } q = NULL; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ec0357d..1a0b76b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -860,7 +860,12 @@ extern void blk_put_queue(struct request_queue *); struct blk_plug { unsigned long magic; struct list_head list; - unsigned int should_sort; + unsigned int flags; +}; + +enum { + BLK_PLUG_F_SORT = 1, + BLK_PLUG_F_NEED_UNPLUG = 2, }; extern void blk_start_plug(struct blk_plug *); @@ -887,7 +892,8 @@ static inline bool blk_needs_flush_plug(struct task_struct *tsk) { struct blk_plug *plug = tsk->plug; - return plug && !list_empty(&plug->list); + return plug && (!list_empty(&plug->list) || + (plug->flags & BLK_PLUG_F_NEED_UNPLUG)); } /*