Message ID | 1480734921-23701-2-git-send-email-axboe@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote: > No functional changes with this patch, it's just in preparation for > supporting legacy schedulers on blk-mq. Ewww. I think without refactoring to clear what 'use_mq_path' means here and better naming this is a total non-started. Even with that we'll now have yet another code path to worry about. Is there any chance to instead consolidate into a single path? > struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) > { > - if (q->mq_ops) > + if (blk_use_mq_path(q)) > return blk_mq_alloc_request(q, rw, > (gfp_mask & __GFP_DIRECT_RECLAIM) ? > 0 : BLK_MQ_REQ_NOWAIT); So now with blk-mq and an elevator set we go into blk_old_get_request, hich will simply allocate new requests. How does this not break every existing driver? -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/05/2016 06:05 AM, Christoph Hellwig wrote: > On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote: >> No functional changes with this patch, it's just in preparation for >> supporting legacy schedulers on blk-mq. > > Ewww. I think without refactoring to clear what 'use_mq_path' > means here and better naming this is a total non-started. Even with > that we'll now have yet another code path to worry about. Is there > any chance to instead consolidate into a single path? It's not pretty at all. I should have prefaced this patchset with saying that it's an experiment in seeing what it would take to simply use the old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean it up a bit after posting: http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched but I'm not going to claim this is anywhere near merge read, nor clean. >> struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) >> { >> - if (q->mq_ops) >> + if (blk_use_mq_path(q)) >> return blk_mq_alloc_request(q, rw, >> (gfp_mask & __GFP_DIRECT_RECLAIM) ? >> 0 : BLK_MQ_REQ_NOWAIT); > > So now with blk-mq and an elevator set we go into blk_old_get_request, > hich will simply allocate new requests. How does this not break > every existing driver? Since Johannes found that confusion, maybe I should explain how it all works. Basically, if we are blk-mq (q->mq_ops) and now have an IO scheduler attached (q->elevator), then the path from bio to request is essentially the old one. We allocate a request through get_request() and friends, and insert it with the elevator interface. When the queue is later run, our blk_mq_sched_dispatch() asks the IO scheduler for a request, and if successful, we allocate a real MQ request and dispatch that. So all the driver ever sees is MQ requests, and it uses the MQ interface. Only the internal bits now look at blk_use_mq_path() to tell whether they should alloc+insert with that. The above will break if we have drivers manually allocating a request through blk_get_request(), and then using MQ functions to insert/run it. I didn't audit all of that, and I think most (all?) of them just use the MQ interfaces. That would also currently be broken, we either/or the logic to run software queues or run through the elevator.
On Mon, Dec 05, 2016 at 08:07:10AM -0700, Jens Axboe wrote: > On 12/05/2016 06:05 AM, Christoph Hellwig wrote: > > On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote: > >> No functional changes with this patch, it's just in preparation for > >> supporting legacy schedulers on blk-mq. > > > > Ewww. I think without refactoring to clear what 'use_mq_path' > > means here and better naming this is a total non-started. Even with > > that we'll now have yet another code path to worry about. Is there > > any chance to instead consolidate into a single path? > > It's not pretty at all. I should have prefaced this patchset with saying > that it's an experiment in seeing what it would take to simply use the > old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean > it up a bit after posting: > > http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched > > but I'm not going to claim this is anywhere near merge read, nor clean. > > >> struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) > >> { > >> - if (q->mq_ops) > >> + if (blk_use_mq_path(q)) > >> return blk_mq_alloc_request(q, rw, > >> (gfp_mask & __GFP_DIRECT_RECLAIM) ? > >> 0 : BLK_MQ_REQ_NOWAIT); > > > > So now with blk-mq and an elevator set we go into blk_old_get_request, > > hich will simply allocate new requests. How does this not break > > every existing driver? > > Since Johannes found that confusion, maybe I should explain how it all > works. To clarify the naming, how about sth. like blk_mq_use_sched() (to align with blk_mq_sched_dispatch())?
On 12/05/2016 08:49 AM, Johannes Thumshirn wrote: > On Mon, Dec 05, 2016 at 08:07:10AM -0700, Jens Axboe wrote: >> On 12/05/2016 06:05 AM, Christoph Hellwig wrote: >>> On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote: >>>> No functional changes with this patch, it's just in preparation for >>>> supporting legacy schedulers on blk-mq. >>> >>> Ewww. I think without refactoring to clear what 'use_mq_path' >>> means here and better naming this is a total non-started. Even with >>> that we'll now have yet another code path to worry about. Is there >>> any chance to instead consolidate into a single path? >> >> It's not pretty at all. I should have prefaced this patchset with saying >> that it's an experiment in seeing what it would take to simply use the >> old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean >> it up a bit after posting: >> >> http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched >> >> but I'm not going to claim this is anywhere near merge read, nor clean. >> >>>> struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) >>>> { >>>> - if (q->mq_ops) >>>> + if (blk_use_mq_path(q)) >>>> return blk_mq_alloc_request(q, rw, >>>> (gfp_mask & __GFP_DIRECT_RECLAIM) ? >>>> 0 : BLK_MQ_REQ_NOWAIT); >>> >>> So now with blk-mq and an elevator set we go into blk_old_get_request, >>> hich will simply allocate new requests. How does this not break >>> every existing driver? >> >> Since Johannes found that confusion, maybe I should explain how it all >> works. > > To clarify the naming, how about sth. like blk_mq_use_sched() (to align > with blk_mq_sched_dispatch())? Yeah, that is a much better name indeed. I'll make that change.
On Sat, Dec 3, 2016 at 11:15 AM, Jens Axboe <axboe@fb.com> wrote: > No functional changes with this patch, it's just in preparation for > supporting legacy schedulers on blk-mq. > > Signed-off-by: Jens Axboe <axboe@fb.com> > --- > block/blk-core.c | 2 +- > block/blk-exec.c | 2 +- > block/blk-flush.c | 26 ++++++++++++++------------ > block/blk.h | 12 +++++++++++- > 4 files changed, 27 insertions(+), 15 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index 3f2eb8d80189..0e23589ab3bf 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -1310,7 +1310,7 @@ static struct request *blk_old_get_request(struct request_queue *q, int rw, > > struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) > { > - if (q->mq_ops) > + if (blk_use_mq_path(q)) > return blk_mq_alloc_request(q, rw, > (gfp_mask & __GFP_DIRECT_RECLAIM) ? > 0 : BLK_MQ_REQ_NOWAIT); Another way might be to use mq allocator to allocate rq in case of mq_sched, such as: just replace mempool_alloc in __get_request() with blk_mq_alloc_request(), in this way, it should be possible to avoid one extra rq allocation in blk_mq_sched_dispatch(), and keep mq's benefit of rq preallocation, which can avoid to hold queue_lock during the allocation too. > diff --git a/block/blk-exec.c b/block/blk-exec.c > index 3ecb00a6cf45..73b8a701ae6d 100644 > --- a/block/blk-exec.c > +++ b/block/blk-exec.c > @@ -64,7 +64,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk, > * don't check dying flag for MQ because the request won't > * be reused after dying flag is set > */ > - if (q->mq_ops) { > + if (blk_use_mq_path(q)) { > blk_mq_insert_request(rq, at_head, true, false); > return; > } > diff --git a/block/blk-flush.c b/block/blk-flush.c > index 1bdbb3d3e5f5..0b68a1258bdd 100644 > --- a/block/blk-flush.c > +++ b/block/blk-flush.c > @@ -133,14 +133,16 @@ static void blk_flush_restore_request(struct request *rq) > > static bool blk_flush_queue_rq(struct request *rq, bool add_front) > { > - if (rq->q->mq_ops) { > + struct request_queue *q = rq->q; > + > + if (blk_use_mq_path(q)) { > blk_mq_add_to_requeue_list(rq, add_front, true); > return false; > } else { > if (add_front) > - list_add(&rq->queuelist, &rq->q->queue_head); > + list_add(&rq->queuelist, &q->queue_head); > else > - list_add_tail(&rq->queuelist, &rq->q->queue_head); > + list_add_tail(&rq->queuelist, &q->queue_head); > return true; > } > } > @@ -201,7 +203,7 @@ static bool blk_flush_complete_seq(struct request *rq, > BUG_ON(!list_empty(&rq->queuelist)); > list_del_init(&rq->flush.list); > blk_flush_restore_request(rq); > - if (q->mq_ops) > + if (blk_use_mq_path(q)) > blk_mq_end_request(rq, error); > else > __blk_end_request_all(rq, error); > @@ -224,7 +226,7 @@ static void flush_end_io(struct request *flush_rq, int error) > unsigned long flags = 0; > struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx); > > - if (q->mq_ops) { > + if (blk_use_mq_path(q)) { > struct blk_mq_hw_ctx *hctx; > > /* release the tag's ownership to the req cloned from */ > @@ -240,7 +242,7 @@ static void flush_end_io(struct request *flush_rq, int error) > /* account completion of the flush request */ > fq->flush_running_idx ^= 1; > > - if (!q->mq_ops) > + if (!blk_use_mq_path(q)) > elv_completed_request(q, flush_rq); > > /* and push the waiting requests to the next stage */ > @@ -267,7 +269,7 @@ static void flush_end_io(struct request *flush_rq, int error) > blk_run_queue_async(q); > } > fq->flush_queue_delayed = 0; > - if (q->mq_ops) > + if (blk_use_mq_path(q)) > spin_unlock_irqrestore(&fq->mq_flush_lock, flags); > } > > @@ -315,7 +317,7 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq) > * be in flight at the same time. And acquire the tag's > * ownership for flush req. > */ > - if (q->mq_ops) { > + if (blk_use_mq_path(q)) { > struct blk_mq_hw_ctx *hctx; > > flush_rq->mq_ctx = first_rq->mq_ctx; > @@ -409,7 +411,7 @@ void blk_insert_flush(struct request *rq) > * complete the request. > */ > if (!policy) { > - if (q->mq_ops) > + if (blk_use_mq_path(q)) > blk_mq_end_request(rq, 0); > else > __blk_end_bidi_request(rq, 0, 0, 0); > @@ -425,9 +427,9 @@ void blk_insert_flush(struct request *rq) > */ > if ((policy & REQ_FSEQ_DATA) && > !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) { > - if (q->mq_ops) { > + if (blk_use_mq_path(q)) > blk_mq_insert_request(rq, false, false, true); > - } else > + else > list_add_tail(&rq->queuelist, &q->queue_head); > return; > } > @@ -440,7 +442,7 @@ void blk_insert_flush(struct request *rq) > INIT_LIST_HEAD(&rq->flush.list); > rq->rq_flags |= RQF_FLUSH_SEQ; > rq->flush.saved_end_io = rq->end_io; /* Usually NULL */ > - if (q->mq_ops) { > + if (blk_use_mq_path(q)) { > rq->end_io = mq_flush_data_end_io; > > spin_lock_irq(&fq->mq_flush_lock); > diff --git a/block/blk.h b/block/blk.h > index 041185e5f129..094fc10429c3 100644 > --- a/block/blk.h > +++ b/block/blk.h > @@ -36,10 +36,20 @@ extern struct kmem_cache *request_cachep; > extern struct kobj_type blk_queue_ktype; > extern struct ida blk_queue_ida; > > +/* > + * Use the MQ path if we have mq_ops, but not if we are using an IO > + * scheduler. For the scheduler, we should use the legacy path. Only > + * for internal use in the block layer. > + */ > +static inline bool blk_use_mq_path(struct request_queue *q) > +{ > + return q->mq_ops && !q->elevator; > +} > + > static inline struct blk_flush_queue *blk_get_flush_queue( > struct request_queue *q, struct blk_mq_ctx *ctx) > { > - if (q->mq_ops) > + if (blk_use_mq_path(q)) > return blk_mq_map_queue(q, ctx->cpu)->fq; > return q->fq; > } > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-block" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/05/2016 10:00 AM, Ming Lei wrote: > On Sat, Dec 3, 2016 at 11:15 AM, Jens Axboe <axboe@fb.com> wrote: >> No functional changes with this patch, it's just in preparation for >> supporting legacy schedulers on blk-mq. >> >> Signed-off-by: Jens Axboe <axboe@fb.com> >> --- >> block/blk-core.c | 2 +- >> block/blk-exec.c | 2 +- >> block/blk-flush.c | 26 ++++++++++++++------------ >> block/blk.h | 12 +++++++++++- >> 4 files changed, 27 insertions(+), 15 deletions(-) >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 3f2eb8d80189..0e23589ab3bf 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -1310,7 +1310,7 @@ static struct request *blk_old_get_request(struct request_queue *q, int rw, >> >> struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) >> { >> - if (q->mq_ops) >> + if (blk_use_mq_path(q)) >> return blk_mq_alloc_request(q, rw, >> (gfp_mask & __GFP_DIRECT_RECLAIM) ? >> 0 : BLK_MQ_REQ_NOWAIT); > > Another way might be to use mq allocator to allocate rq in case of mq_sched, > such as: just replace mempool_alloc in __get_request() with > blk_mq_alloc_request(), in this way, it should be possible to > avoid one extra rq allocation in blk_mq_sched_dispatch(), and keep mq's benefit > of rq preallocation, which can avoid to hold queue_lock during the > allocation too. One problem with the MQ rq allocation is that it's tied to the device queue depth. This is a problem for scheduling, since we want to have a larger pool of requests that the IO scheduler can use, so that we actually have something that we can schedule with. This is a non-starter on QD=1 devices, but it's also a problem for SATA with 31 effectively usable tags. That's why I split it in two, so we have the "old" requests that we hand to the scheduler. I know the 'rq' field copy isn't super pretty, though.
On Tue, Dec 6, 2016 at 1:09 AM, Jens Axboe <axboe@fb.com> wrote: > On 12/05/2016 10:00 AM, Ming Lei wrote: >> On Sat, Dec 3, 2016 at 11:15 AM, Jens Axboe <axboe@fb.com> wrote: >>> No functional changes with this patch, it's just in preparation for >>> supporting legacy schedulers on blk-mq. >>> >>> Signed-off-by: Jens Axboe <axboe@fb.com> >>> --- >>> block/blk-core.c | 2 +- >>> block/blk-exec.c | 2 +- >>> block/blk-flush.c | 26 ++++++++++++++------------ >>> block/blk.h | 12 +++++++++++- >>> 4 files changed, 27 insertions(+), 15 deletions(-) >>> >>> diff --git a/block/blk-core.c b/block/blk-core.c >>> index 3f2eb8d80189..0e23589ab3bf 100644 >>> --- a/block/blk-core.c >>> +++ b/block/blk-core.c >>> @@ -1310,7 +1310,7 @@ static struct request *blk_old_get_request(struct request_queue *q, int rw, >>> >>> struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) >>> { >>> - if (q->mq_ops) >>> + if (blk_use_mq_path(q)) >>> return blk_mq_alloc_request(q, rw, >>> (gfp_mask & __GFP_DIRECT_RECLAIM) ? >>> 0 : BLK_MQ_REQ_NOWAIT); >> >> Another way might be to use mq allocator to allocate rq in case of mq_sched, >> such as: just replace mempool_alloc in __get_request() with >> blk_mq_alloc_request(), in this way, it should be possible to >> avoid one extra rq allocation in blk_mq_sched_dispatch(), and keep mq's benefit >> of rq preallocation, which can avoid to hold queue_lock during the >> allocation too. > > One problem with the MQ rq allocation is that it's tied to the device > queue depth. This is a problem for scheduling, since we want to have a > larger pool of requests that the IO scheduler can use, so that we > actually have something that we can schedule with. This is a non-starter > on QD=1 devices, but it's also a problem for SATA with 31 effectively > usable tags. > > That's why I split it in two, so we have the "old" requests that we hand > to the scheduler. I know the 'rq' field copy isn't super pretty, though. OK, got it, thanks for your explanation. So could we fall back to mempool_alloc() for allocating rq with mq size if MQ rq allocator fails? Then in this way the extra rq allocation in blk_mq_alloc_request() may be killed. Thanks, Ming -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/05/2016 12:22 PM, Ming Lei wrote: > On Tue, Dec 6, 2016 at 1:09 AM, Jens Axboe <axboe@fb.com> wrote: >> On 12/05/2016 10:00 AM, Ming Lei wrote: >>> On Sat, Dec 3, 2016 at 11:15 AM, Jens Axboe <axboe@fb.com> wrote: >>>> No functional changes with this patch, it's just in preparation for >>>> supporting legacy schedulers on blk-mq. >>>> >>>> Signed-off-by: Jens Axboe <axboe@fb.com> >>>> --- >>>> block/blk-core.c | 2 +- >>>> block/blk-exec.c | 2 +- >>>> block/blk-flush.c | 26 ++++++++++++++------------ >>>> block/blk.h | 12 +++++++++++- >>>> 4 files changed, 27 insertions(+), 15 deletions(-) >>>> >>>> diff --git a/block/blk-core.c b/block/blk-core.c >>>> index 3f2eb8d80189..0e23589ab3bf 100644 >>>> --- a/block/blk-core.c >>>> +++ b/block/blk-core.c >>>> @@ -1310,7 +1310,7 @@ static struct request *blk_old_get_request(struct request_queue *q, int rw, >>>> >>>> struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) >>>> { >>>> - if (q->mq_ops) >>>> + if (blk_use_mq_path(q)) >>>> return blk_mq_alloc_request(q, rw, >>>> (gfp_mask & __GFP_DIRECT_RECLAIM) ? >>>> 0 : BLK_MQ_REQ_NOWAIT); >>> >>> Another way might be to use mq allocator to allocate rq in case of mq_sched, >>> such as: just replace mempool_alloc in __get_request() with >>> blk_mq_alloc_request(), in this way, it should be possible to >>> avoid one extra rq allocation in blk_mq_sched_dispatch(), and keep mq's benefit >>> of rq preallocation, which can avoid to hold queue_lock during the >>> allocation too. >> >> One problem with the MQ rq allocation is that it's tied to the device >> queue depth. This is a problem for scheduling, since we want to have a >> larger pool of requests that the IO scheduler can use, so that we >> actually have something that we can schedule with. This is a non-starter >> on QD=1 devices, but it's also a problem for SATA with 31 effectively >> usable tags. >> >> That's why I split it in two, so we have the "old" requests that we hand >> to the scheduler. I know the 'rq' field copy isn't super pretty, though. > > OK, got it, thanks for your explanation. > > So could we fall back to mempool_alloc() for allocating rq with mq > size if MQ rq allocator fails? Then in this way the extra rq allocation > in blk_mq_alloc_request() may be killed. We could, yes, though I'm not sure it's worth special casing that. The copy is pretty damn cheap compared to the high costs of going through the legacy path. And given that, I'd probably prefer to keep it all the same, regardless or the depth of the device. I don't think the change would be noticable.
On Mon, Dec 5, 2016 at 10:07 AM, Jens Axboe <axboe@fb.com> wrote: > > On 12/05/2016 06:05 AM, Christoph Hellwig wrote: > > On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote: > >> No functional changes with this patch, it's just in preparation for > >> supporting legacy schedulers on blk-mq. > > > > Ewww. I think without refactoring to clear what 'use_mq_path' > > means here and better naming this is a total non-started. Even with > > that we'll now have yet another code path to worry about. Is there > > any chance to instead consolidate into a single path? > > It's not pretty at all. I should have prefaced this patchset with saying > that it's an experiment in seeing what it would take to simply use the > old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean > it up a bit after posting: > > http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched > > but I'm not going to claim this is anywhere near merge read, nor clean. Nice to see you've lowered your standards... Maybe now we can revisit this line of work? ;) http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip To fix: https://bugzilla.kernel.org/show_bug.cgi?id=119841 > >> struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) > >> { > >> - if (q->mq_ops) > >> + if (blk_use_mq_path(q)) > >> return blk_mq_alloc_request(q, rw, > >> (gfp_mask & __GFP_DIRECT_RECLAIM) ? > >> 0 : BLK_MQ_REQ_NOWAIT); > > > > So now with blk-mq and an elevator set we go into blk_old_get_request, > > hich will simply allocate new requests. How does this not break > > every existing driver? > > Since Johannes found that confusion, maybe I should explain how it all > works. > > Basically, if we are blk-mq (q->mq_ops) and now have an IO scheduler > attached (q->elevator), then the path from bio to request is essentially > the old one. We allocate a request through get_request() and friends, > and insert it with the elevator interface. When the queue is later run, > our blk_mq_sched_dispatch() asks the IO scheduler for a request, and if > successful, we allocate a real MQ request and dispatch that. So all the > driver ever sees is MQ requests, and it uses the MQ interface. Only the > internal bits now look at blk_use_mq_path() to tell whether they should > alloc+insert with that. > > The above will break if we have drivers manually allocating a request > through blk_get_request(), and then using MQ functions to insert/run it. > I didn't audit all of that, and I think most (all?) of them just use the > MQ interfaces. That would also currently be broken, we either/or the > logic to run software queues or run through the elevator. I'm not seeing anything in elevator_switch() that would prevent an elevator from attempting to be used on an mq device with > 1 hw queue. But I could easily be missing it. That aside, this patchset has all the makings of a _serious_ problem for dm-mq multipath (e.g. if dm_mod.use_blk_mq=Y and dm_mod.dm_mq_nr_hw_queues=1). There is a bunch of code in drivers/dm/dm-rq.c that looks at q->mq_ops vs not to determine if mq is used vs old .request_fn. I think we really need a way to force an allocated mq request_queue (with a single hw_queue) to not support this technological terror you've constructed. (*cough* http://es.memegenerator.net/instance/58341711 ) In fact I'd prefer there be a mechanism to only allow this if some opt-in interface is used... or the inverse: dm-mq can allocate a blk-mq request_requeue that will _never_ support this. I could be super dense on this line of work. But what is the point of all this? Seems like a really unfortunate distraction that makes the block code all the more encumbered with fiddley old vs new logic. So now we're opening old .request_fn users up to blk-mq-with-scheduler vs non-blk-mq bugs. Color me skeptical. In time, maybe I'll warm to all this but for now we need a big "OFF!" switch (aside from DEFAULT_MQ_SCHED_NONE, in request_queue allocation interface). FWIW, dm-multipath has supported a top-level .request_fn requeue_queue, legacy elevator and all, stacked on blk-mq queue(s) for quite a while. If people _really_ want this you could easily give it to them by forcing the use of the DM "multipath" target with multipath's "queue_mode" feature set to "rq". -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/05/2016 03:40 PM, Mike Snitzer wrote: > On Mon, Dec 5, 2016 at 10:07 AM, Jens Axboe <axboe@fb.com> wrote: >> >> On 12/05/2016 06:05 AM, Christoph Hellwig wrote: >>> On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote: >>>> No functional changes with this patch, it's just in preparation for >>>> supporting legacy schedulers on blk-mq. >>> >>> Ewww. I think without refactoring to clear what 'use_mq_path' >>> means here and better naming this is a total non-started. Even with >>> that we'll now have yet another code path to worry about. Is there >>> any chance to instead consolidate into a single path? >> >> It's not pretty at all. I should have prefaced this patchset with saying >> that it's an experiment in seeing what it would take to simply use the >> old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean >> it up a bit after posting: >> >> http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched >> >> but I'm not going to claim this is anywhere near merge read, nor clean. > > Nice to see you've lowered your standards... > > Maybe now we can revisit this line of work? ;) > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip I haven't lowered my standards. I thought this posting was pretty clear - it's an experiment in what supporting legacy schedulers would look like. As you quote me above, this is NOT proposed for merging, nor do I consider it anywhere near clean. > I'm not seeing anything in elevator_switch() that would prevent an > elevator from attempting to be used on an mq device with > 1 hw queue. > But I could easily be missing it. You missed it, it's in blk_mq_sched_init(), called from elv_iosched_store() when trying to switch (or setup a new) schedulers. > That aside, this patchset has all the makings of a _serious_ problem > for dm-mq multipath (e.g. if dm_mod.use_blk_mq=Y and > dm_mod.dm_mq_nr_hw_queues=1). There is a bunch of code in > drivers/dm/dm-rq.c that looks at q->mq_ops vs not to determine if mq > is used vs old .request_fn. > > I think we really need a way to force an allocated mq request_queue > (with a single hw_queue) to not support this technological terror > you've constructed. (*cough* See BLK_MQ_F_NO_SCHED. > I could be super dense on this line of work. But what is the point of > all this? Seems like a really unfortunate distraction that makes the > block code all the more encumbered with fiddley old vs new logic. So > now we're opening old .request_fn users up to blk-mq-with-scheduler vs > non-blk-mq bugs. See above, it's just an experiment in seeing what this would look like, how transparent (or not) we could make that. Don't overthink any of this, and don't start making plans or coming up with problems on why X or Y would not work with whatever interface variant of dm. That's jumping the gun.
On Mon, Dec 05 2016 at 5:50pm -0500, Jens Axboe <axboe@fb.com> wrote: > On 12/05/2016 03:40 PM, Mike Snitzer wrote: > > On Mon, Dec 5, 2016 at 10:07 AM, Jens Axboe <axboe@fb.com> wrote: > >> > >> On 12/05/2016 06:05 AM, Christoph Hellwig wrote: > >>> On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote: > >>>> No functional changes with this patch, it's just in preparation for > >>>> supporting legacy schedulers on blk-mq. > >>> > >>> Ewww. I think without refactoring to clear what 'use_mq_path' > >>> means here and better naming this is a total non-started. Even with > >>> that we'll now have yet another code path to worry about. Is there > >>> any chance to instead consolidate into a single path? > >> > >> It's not pretty at all. I should have prefaced this patchset with saying > >> that it's an experiment in seeing what it would take to simply use the > >> old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean > >> it up a bit after posting: > >> > >> http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched > >> > >> but I'm not going to claim this is anywhere near merge read, nor clean. > > > > Nice to see you've lowered your standards... > > > > Maybe now we can revisit this line of work? ;) > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip > > I haven't lowered my standards. I thought this posting was pretty clear > - it's an experiment in what supporting legacy schedulers would look > like. As you quote me above, this is NOT proposed for merging, nor do I > consider it anywhere near clean. Where'd your sense of humor go? > > I'm not seeing anything in elevator_switch() that would prevent an > > elevator from attempting to be used on an mq device with > 1 hw queue. > > But I could easily be missing it. > > You missed it, it's in blk_mq_sched_init(), called from > elv_iosched_store() when trying to switch (or setup a new) schedulers. > > > That aside, this patchset has all the makings of a _serious_ problem > > for dm-mq multipath (e.g. if dm_mod.use_blk_mq=Y and > > dm_mod.dm_mq_nr_hw_queues=1). There is a bunch of code in > > drivers/dm/dm-rq.c that looks at q->mq_ops vs not to determine if mq > > is used vs old .request_fn. > > > > I think we really need a way to force an allocated mq request_queue > > (with a single hw_queue) to not support this technological terror > > you've constructed. (*cough* > > See BLK_MQ_F_NO_SCHED. Yeap, missed it, thanks. Reviewing patches via gmail _sucks_ I should've just looked at your git branch(es) from the start. > > I could be super dense on this line of work. But what is the point of > > all this? Seems like a really unfortunate distraction that makes the > > block code all the more encumbered with fiddley old vs new logic. So > > now we're opening old .request_fn users up to blk-mq-with-scheduler vs > > non-blk-mq bugs. > > See above, it's just an experiment in seeing what this would look like, > how transparent (or not) we could make that. OK, seems not very transparent so far. But that aside, I'm more curious on what the goal(s) and/or benefit(s) might be? I know that before you were hopeful to eventually eliminate the old .request_fn path in block core (in favor of blk-mq, once it grew IO scheduling capabilties). But by tieing blk-mq through to the old request path (and associated IO schedulers) it certainly complicates getting rid of all the legacy code. Selfishly, I'm looking forward to eliminating the old .request_fn request-based code in DM core. This advance to supporting the old IO schedulers make that less likely. > Don't overthink any of this, and don't start making plans or coming up > with problems on why X or Y would not work with whatever interface > variant of dm. That's jumping the gun. Not overthinking.. just thinking ;) But if this does happen then maybe I should look to invert the request-based DM core cleanup: remove all the old .request_fn support and impose the same (namely IO scheduler enabled DM multipath) via dm_mod.use_blk_mq=Y and dm_mod.dm_mod.dm_mq_nr_hw_queues=1 Mike -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/blk-core.c b/block/blk-core.c index 3f2eb8d80189..0e23589ab3bf 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1310,7 +1310,7 @@ static struct request *blk_old_get_request(struct request_queue *q, int rw, struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask) { - if (q->mq_ops) + if (blk_use_mq_path(q)) return blk_mq_alloc_request(q, rw, (gfp_mask & __GFP_DIRECT_RECLAIM) ? 0 : BLK_MQ_REQ_NOWAIT); diff --git a/block/blk-exec.c b/block/blk-exec.c index 3ecb00a6cf45..73b8a701ae6d 100644 --- a/block/blk-exec.c +++ b/block/blk-exec.c @@ -64,7 +64,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk, * don't check dying flag for MQ because the request won't * be reused after dying flag is set */ - if (q->mq_ops) { + if (blk_use_mq_path(q)) { blk_mq_insert_request(rq, at_head, true, false); return; } diff --git a/block/blk-flush.c b/block/blk-flush.c index 1bdbb3d3e5f5..0b68a1258bdd 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -133,14 +133,16 @@ static void blk_flush_restore_request(struct request *rq) static bool blk_flush_queue_rq(struct request *rq, bool add_front) { - if (rq->q->mq_ops) { + struct request_queue *q = rq->q; + + if (blk_use_mq_path(q)) { blk_mq_add_to_requeue_list(rq, add_front, true); return false; } else { if (add_front) - list_add(&rq->queuelist, &rq->q->queue_head); + list_add(&rq->queuelist, &q->queue_head); else - list_add_tail(&rq->queuelist, &rq->q->queue_head); + list_add_tail(&rq->queuelist, &q->queue_head); return true; } } @@ -201,7 +203,7 @@ static bool blk_flush_complete_seq(struct request *rq, BUG_ON(!list_empty(&rq->queuelist)); list_del_init(&rq->flush.list); blk_flush_restore_request(rq); - if (q->mq_ops) + if (blk_use_mq_path(q)) blk_mq_end_request(rq, error); else __blk_end_request_all(rq, error); @@ -224,7 +226,7 @@ static void flush_end_io(struct request *flush_rq, int error) unsigned long flags = 0; struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx); - if (q->mq_ops) { + if (blk_use_mq_path(q)) { struct blk_mq_hw_ctx *hctx; /* release the tag's ownership to the req cloned from */ @@ -240,7 +242,7 @@ static void flush_end_io(struct request *flush_rq, int error) /* account completion of the flush request */ fq->flush_running_idx ^= 1; - if (!q->mq_ops) + if (!blk_use_mq_path(q)) elv_completed_request(q, flush_rq); /* and push the waiting requests to the next stage */ @@ -267,7 +269,7 @@ static void flush_end_io(struct request *flush_rq, int error) blk_run_queue_async(q); } fq->flush_queue_delayed = 0; - if (q->mq_ops) + if (blk_use_mq_path(q)) spin_unlock_irqrestore(&fq->mq_flush_lock, flags); } @@ -315,7 +317,7 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq) * be in flight at the same time. And acquire the tag's * ownership for flush req. */ - if (q->mq_ops) { + if (blk_use_mq_path(q)) { struct blk_mq_hw_ctx *hctx; flush_rq->mq_ctx = first_rq->mq_ctx; @@ -409,7 +411,7 @@ void blk_insert_flush(struct request *rq) * complete the request. */ if (!policy) { - if (q->mq_ops) + if (blk_use_mq_path(q)) blk_mq_end_request(rq, 0); else __blk_end_bidi_request(rq, 0, 0, 0); @@ -425,9 +427,9 @@ void blk_insert_flush(struct request *rq) */ if ((policy & REQ_FSEQ_DATA) && !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) { - if (q->mq_ops) { + if (blk_use_mq_path(q)) blk_mq_insert_request(rq, false, false, true); - } else + else list_add_tail(&rq->queuelist, &q->queue_head); return; } @@ -440,7 +442,7 @@ void blk_insert_flush(struct request *rq) INIT_LIST_HEAD(&rq->flush.list); rq->rq_flags |= RQF_FLUSH_SEQ; rq->flush.saved_end_io = rq->end_io; /* Usually NULL */ - if (q->mq_ops) { + if (blk_use_mq_path(q)) { rq->end_io = mq_flush_data_end_io; spin_lock_irq(&fq->mq_flush_lock); diff --git a/block/blk.h b/block/blk.h index 041185e5f129..094fc10429c3 100644 --- a/block/blk.h +++ b/block/blk.h @@ -36,10 +36,20 @@ extern struct kmem_cache *request_cachep; extern struct kobj_type blk_queue_ktype; extern struct ida blk_queue_ida; +/* + * Use the MQ path if we have mq_ops, but not if we are using an IO + * scheduler. For the scheduler, we should use the legacy path. Only + * for internal use in the block layer. + */ +static inline bool blk_use_mq_path(struct request_queue *q) +{ + return q->mq_ops && !q->elevator; +} + static inline struct blk_flush_queue *blk_get_flush_queue( struct request_queue *q, struct blk_mq_ctx *ctx) { - if (q->mq_ops) + if (blk_use_mq_path(q)) return blk_mq_map_queue(q, ctx->cpu)->fq; return q->fq; }
No functional changes with this patch, it's just in preparation for supporting legacy schedulers on blk-mq. Signed-off-by: Jens Axboe <axboe@fb.com> --- block/blk-core.c | 2 +- block/blk-exec.c | 2 +- block/blk-flush.c | 26 ++++++++++++++------------ block/blk.h | 12 +++++++++++- 4 files changed, 27 insertions(+), 15 deletions(-)