Message ID | 20110811004216.GA24810@july (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
2011/8/11 Kyungmin Park <kmpark@infradead.org>: > Hi Jens > > Now eMMC device requires the upper layer information to improve the data > performance and reliability. > > . Context ID > Using the context information, it can sort out the data internally and improve the performance. > The main problem is that it's needed to define "What's the context". > Actually I expect cfq queue has own unique ID but it doesn't so decide to use the pid instead > > . Data Tag > Using the Data Tag (1-bit information), It writes the data at SLC area when it's hot data. So it can make the chip more reliable. > First I expect the REQ_META but current ext4 doesn't pass the WRITE_META. only use the READ_META. so it needs to investigate it. > > With these characteristics, it's helpful to teach the device. After some consideration. it's needed to pass out these information at request data structure. > > Sample usage is following in drivers/mmc/card/block.c > > struct elevator_queue *e = md->queue.queue->elevator; > struct request_hint hint; > int ret; > > if (e->ops->elevator_get_req_hint_fn && req) > ret = e->ops->elevator_get_req_hint_fn(req, &hint); please put this to blkdev.h or similar. directly using it here is abnormal. > Thank you, > Kyungmin Park > --- > Changelog v2 > - Don't add the request member. instead add new elevator ops > > --- > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c > index 1f96ad6..5089f67 100644 > --- a/block/cfq-iosched.c > +++ b/block/cfq-iosched.c > @@ -3800,6 +3800,18 @@ queue_fail: > return 1; > } > > +static int cfq_get_request_hint(struct request *rq, struct request_hint *hint) > +{ > + struct cfq_queue *cfqq = RQ_CFQQ(rq); > + > + if (cfqq) { > + hint->context = cfqq->pid; > + hint->hot = !!(rq->cmd_flags & REQ_META); > + } if the rq is in cfq, the cfqq is always valid, so don't need the check. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2011/8/11 Shaohua Li <shli@kernel.org>: > 2011/8/11 Kyungmin Park <kmpark@infradead.org>: >> Hi Jens >> >> Now eMMC device requires the upper layer information to improve the data >> performance and reliability. >> >> . Context ID >> Using the context information, it can sort out the data internally and improve the performance. >> The main problem is that it's needed to define "What's the context". >> Actually I expect cfq queue has own unique ID but it doesn't so decide to use the pid instead >> >> . Data Tag >> Using the Data Tag (1-bit information), It writes the data at SLC area when it's hot data. So it can make the chip more reliable. >> First I expect the REQ_META but current ext4 doesn't pass the WRITE_META. only use the READ_META. so it needs to investigate it. >> >> With these characteristics, it's helpful to teach the device. After some consideration. it's needed to pass out these information at request data structure. >> >> Sample usage is following in drivers/mmc/card/block.c >> >> struct elevator_queue *e = md->queue.queue->elevator; >> struct request_hint hint; >> int ret; >> >> if (e->ops->elevator_get_req_hint_fn && req) >> ret = e->ops->elevator_get_req_hint_fn(req, &hint); > please put this to blkdev.h or similar. directly using it here > is abnormal. BTW, we can add a (rq->cmd_flags & REQ_ELVPRIV) check here to make sure the request is at io scheduler. -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-08-11 03:14, Shaohua Li wrote: > 2011/8/11 Shaohua Li <shli@kernel.org>: >> 2011/8/11 Kyungmin Park <kmpark@infradead.org>: >>> Hi Jens >>> >>> Now eMMC device requires the upper layer information to improve the data >>> performance and reliability. >>> >>> . Context ID >>> Using the context information, it can sort out the data internally and improve the performance. >>> The main problem is that it's needed to define "What's the context". >>> Actually I expect cfq queue has own unique ID but it doesn't so decide to use the pid instead >>> >>> . Data Tag >>> Using the Data Tag (1-bit information), It writes the data at SLC area when it's hot data. So it can make the chip more reliable. >>> First I expect the REQ_META but current ext4 doesn't pass the WRITE_META. only use the READ_META. so it needs to investigate it. >>> >>> With these characteristics, it's helpful to teach the device. After some consideration. it's needed to pass out these information at request data structure. >>> >>> Sample usage is following in drivers/mmc/card/block.c >>> >>> struct elevator_queue *e = md->queue.queue->elevator; >>> struct request_hint hint; >>> int ret; >>> >>> if (e->ops->elevator_get_req_hint_fn && req) >>> ret = e->ops->elevator_get_req_hint_fn(req, &hint); >> please put this to blkdev.h or similar. directly using it here >> is abnormal. > BTW, we can add a (rq->cmd_flags & REQ_ELVPRIV) check here to make > sure the request is at io scheduler. Yep, that should all go inside elv_get_request_context() or whatever is a good name. I don't want the hint structure, the caller can just check the request flags himself. So something like: int elv_get_request_context(struct request_queue *q, struct request *rq) { struct elevator_queue *e = q->elevator; if (!(rq->cmd_flags & REQ_ELVPRIV)) return -1; if (e->ops->elevator_get_req_context_fn) return e->ops->elevator_get_req_context_fn(q, rq); return -1; } and then cfq/others adding that helper to provide the mapping. Context is a bad name, but so is hint. Perhaps app_key would be better, as it more direcly infers what is being returned.
On Thu, Aug 11, 2011 at 5:23 PM, Jens Axboe <jaxboe@fusionio.com> wrote: > On 2011-08-11 03:14, Shaohua Li wrote: >> 2011/8/11 Shaohua Li <shli@kernel.org>: >>> 2011/8/11 Kyungmin Park <kmpark@infradead.org>: >>>> Hi Jens >>>> >>>> Now eMMC device requires the upper layer information to improve the data >>>> performance and reliability. >>>> >>>> . Context ID >>>> Using the context information, it can sort out the data internally and improve the performance. >>>> The main problem is that it's needed to define "What's the context". >>>> Actually I expect cfq queue has own unique ID but it doesn't so decide to use the pid instead >>>> >>>> . Data Tag >>>> Using the Data Tag (1-bit information), It writes the data at SLC area when it's hot data. So it can make the chip more reliable. >>>> First I expect the REQ_META but current ext4 doesn't pass the WRITE_META. only use the READ_META. so it needs to investigate it. >>>> >>>> With these characteristics, it's helpful to teach the device. After some consideration. it's needed to pass out these information at request data structure. >>>> >>>> Sample usage is following in drivers/mmc/card/block.c >>>> >>>> struct elevator_queue *e = md->queue.queue->elevator; >>>> struct request_hint hint; >>>> int ret; >>>> >>>> if (e->ops->elevator_get_req_hint_fn && req) >>>> ret = e->ops->elevator_get_req_hint_fn(req, &hint); >>> please put this to blkdev.h or similar. directly using it here >>> is abnormal. >> BTW, we can add a (rq->cmd_flags & REQ_ELVPRIV) check here to make >> sure the request is at io scheduler. > > Yep, that should all go inside elv_get_request_context() or whatever is > a good name. I don't want the hint structure, the caller can just check > the request flags himself. Okay I see, it has to modify the filesystem to send the meta request, REQ_META flags to know it from device drivers. > > So something like: > > int elv_get_request_context(struct request_queue *q, struct request *rq) > { > struct elevator_queue *e = q->elevator; > > if (!(rq->cmd_flags & REQ_ELVPRIV)) > return -1; Also need to check the !rq case, mmc send the NULL request to wait the previous request wait. > > if (e->ops->elevator_get_req_context_fn) > return e->ops->elevator_get_req_context_fn(q, rq); > > return -1; > } > > and then cfq/others adding that helper to provide the mapping. > > Context is a bad name, but so is hint. Perhaps app_key would be better, > as it more direcly infers what is being returned. I'll send the new version as you suggested. Thank you, Kyungmin Park > > -- > Jens Axboe > > -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-08-11 10:56, Kyungmin Park wrote: > On Thu, Aug 11, 2011 at 5:23 PM, Jens Axboe <jaxboe@fusionio.com> wrote: >> On 2011-08-11 03:14, Shaohua Li wrote: >>> 2011/8/11 Shaohua Li <shli@kernel.org>: >>>> 2011/8/11 Kyungmin Park <kmpark@infradead.org>: >>>>> Hi Jens >>>>> >>>>> Now eMMC device requires the upper layer information to improve the data >>>>> performance and reliability. >>>>> >>>>> . Context ID >>>>> Using the context information, it can sort out the data internally and improve the performance. >>>>> The main problem is that it's needed to define "What's the context". >>>>> Actually I expect cfq queue has own unique ID but it doesn't so decide to use the pid instead >>>>> >>>>> . Data Tag >>>>> Using the Data Tag (1-bit information), It writes the data at SLC area when it's hot data. So it can make the chip more reliable. >>>>> First I expect the REQ_META but current ext4 doesn't pass the WRITE_META. only use the READ_META. so it needs to investigate it. >>>>> >>>>> With these characteristics, it's helpful to teach the device. After some consideration. it's needed to pass out these information at request data structure. >>>>> >>>>> Sample usage is following in drivers/mmc/card/block.c >>>>> >>>>> struct elevator_queue *e = md->queue.queue->elevator; >>>>> struct request_hint hint; >>>>> int ret; >>>>> >>>>> if (e->ops->elevator_get_req_hint_fn && req) >>>>> ret = e->ops->elevator_get_req_hint_fn(req, &hint); >>>> please put this to blkdev.h or similar. directly using it here >>>> is abnormal. >>> BTW, we can add a (rq->cmd_flags & REQ_ELVPRIV) check here to make >>> sure the request is at io scheduler. >> >> Yep, that should all go inside elv_get_request_context() or whatever is >> a good name. I don't want the hint structure, the caller can just check >> the request flags himself. > Okay I see, it has to modify the filesystem to send the meta request, > REQ_META flags to know it from device drivers. Right, the submitter if the IO is the one that has to decide what is hot or not. >> So something like: >> >> int elv_get_request_context(struct request_queue *q, struct request *rq) >> { >> struct elevator_queue *e = q->elevator; >> >> if (!(rq->cmd_flags & REQ_ELVPRIV)) >> return -1; > Also need to check the !rq case, mmc send the NULL request to wait the > previous request wait. No, that would be a usage bug.
On Thu, Aug 11, 2011 at 09:42:16AM +0900, Kyungmin Park wrote: > Hi Jens > > Now eMMC device requires the upper layer information to improve the data > performance and reliability. > > . Context ID > Using the context information, it can sort out the data internally and improve the performance. > The main problem is that it's needed to define "What's the context". > Actually I expect cfq queue has own unique ID but it doesn't so decide to use the pid instead > Hi, Can you please give little more details about the optimization you will do with this pid information? Also what happens in the case of noop and deadline which don't maintain per process queues and can't provide this information. > . Data Tag > Using the Data Tag (1-bit information), It writes the data at SLC area when it's hot data. So it can make the chip more reliable. Sorry, I am not very familiar with flash technology. Can you please give some more details that what do you do with this information about "hot" data and how does that make chip more reliable. > First I expect the REQ_META but current ext4 doesn't pass the WRITE_META. only use the READ_META. so it needs to investigate it. So are you planning to later fix file systems to appropriately mark meta data requests? Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-08-11 15:33, Vivek Goyal wrote: > On Thu, Aug 11, 2011 at 09:42:16AM +0900, Kyungmin Park wrote: >> Hi Jens >> >> Now eMMC device requires the upper layer information to improve the data >> performance and reliability. >> >> . Context ID >> Using the context information, it can sort out the data internally and improve the performance. >> The main problem is that it's needed to define "What's the context". >> Actually I expect cfq queue has own unique ID but it doesn't so decide to use the pid instead >> > > Hi, > > Can you please give little more details about the optimization you will > do with this pid information? It is provided in one of the other email threads for this patch. > Also what happens in the case of noop and deadline which don't maintain > per process queues and can't provide this information. It'll still work, it isn't really tied to the CFQ way of diviying things up. >> First I expect the REQ_META but current ext4 doesn't pass the WRITE_META. only use the READ_META. so it needs to investigate it. > > So are you planning to later fix file systems to appropriately mark meta > data requests? One thing that occured to me is that equating META to HOT is not necessarily a good idea. Meta data isn't necessarily more "hot" than regular data, it all depends on how it's being used. So I think it would be a lot more appropriate to pass down this information specifically, instead of overloading REQ_META.
On Thu, Aug 11, 2011 at 03:41:15PM +0200, Jens Axboe wrote: > On 2011-08-11 15:33, Vivek Goyal wrote: > > On Thu, Aug 11, 2011 at 09:42:16AM +0900, Kyungmin Park wrote: > >> Hi Jens > >> > >> Now eMMC device requires the upper layer information to improve the data > >> performance and reliability. > >> > >> . Context ID > >> Using the context information, it can sort out the data internally and improve the performance. > >> The main problem is that it's needed to define "What's the context". > >> Actually I expect cfq queue has own unique ID but it doesn't so decide to use the pid instead > >> > > > > Hi, > > > > Can you please give little more details about the optimization you will > > do with this pid information? > > It is provided in one of the other email threads for this patch. Ok, thanks. I just read the other mail thread. So the idea is that when multiple requests from multiple processes are in flight in driver, then using context information (pid in this case), driver can potentially do more efficient scheduling of these requests. CFQ kind of already does that atleast for sync-idle queues as driver will see requests only from one context for extended period of time. So this optimization is primarily useful for random reads and writer queues where we do not idle. (Well if rotational=0, then we don't idle on even on sync-idle so not sure if these mmc chips set rotational=0 or not). > > > Also what happens in the case of noop and deadline which don't maintain > > per process queues and can't provide this information. > > It'll still work, it isn't really tied to the CFQ way of diviying things > up. IIUC, for noop and deadline no optimizatoin will take place as no context information is available and things will default back to status quo? Atleast this patch implements hook only for CFQ. > > >> First I expect the REQ_META but current ext4 doesn't pass the WRITE_META. only use the READ_META. so it needs to investigate it. > > > > So are you planning to later fix file systems to appropriately mark meta > > data requests? > > One thing that occured to me is that equating META to HOT is not > necessarily a good idea. Meta data isn't necessarily more "hot" than > regular data, it all depends on how it's being used. So I think it would > be a lot more appropriate to pass down this information specifically, > instead of overloading REQ_META. I think so. I guess it depends on what "HOT" means and filesystem should understand REQ_HOT and flag bio/req appropriately. But if this optimization is especially targeted at meta data, then using REQ_META will make sense too. Whatever flag it is (HOT, META), I am wondering why IO scheduler need to come into the picture for this information. This is kind of between filesystem and driver. As driver should see all the request flags, it should just be able to check for presence of flag and not call into elevator/IO scheduler. On a side note, if we are willing to keep pid/iocontext information in struct request, then this optimization should work with noop/deadline too. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 11, 2011 at 11:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote: > On Thu, Aug 11, 2011 at 03:41:15PM +0200, Jens Axboe wrote: >> On 2011-08-11 15:33, Vivek Goyal wrote: >> > On Thu, Aug 11, 2011 at 09:42:16AM +0900, Kyungmin Park wrote: >> >> Hi Jens >> >> >> >> Now eMMC device requires the upper layer information to improve the data >> >> performance and reliability. >> >> >> >> . Context ID >> >> Using the context information, it can sort out the data internally and improve the performance. >> >> The main problem is that it's needed to define "What's the context". >> >> Actually I expect cfq queue has own unique ID but it doesn't so decide to use the pid instead >> >> >> > >> > Hi, >> > >> > Can you please give little more details about the optimization you will >> > do with this pid information? >> >> It is provided in one of the other email threads for this patch. > > Ok, thanks. I just read the other mail thread. > > So the idea is that when multiple requests from multiple processes are > in flight in driver, then using context information (pid in this case), > driver can potentially do more efficient scheduling of these requests. > > CFQ kind of already does that atleast for sync-idle queues as driver will > see requests only from one context for extended period of time. So this > optimization is primarily useful for random reads and writer queues where > we do not idle. (Well if rotational=0, then we don't idle on even on > sync-idle so not sure if these mmc chips set rotational=0 or not). Currently mmc set the non-rotational device. > >> >> > Also what happens in the case of noop and deadline which don't maintain >> > per process queues and can't provide this information. >> >> It'll still work, it isn't really tied to the CFQ way of diviying things >> up. > > IIUC, for noop and deadline no optimizatoin will take place as no context > information is available and things will default back to status quo? Atleast > this patch implements hook only for CFQ. now it focuss on the cfq and I don't see any benefit if it uses the noop and deadline. > >> >> >> First I expect the REQ_META but current ext4 doesn't pass the WRITE_META. only use the READ_META. so it needs to investigate it. >> > >> > So are you planning to later fix file systems to appropriately mark meta >> > data requests? >> >> One thing that occured to me is that equating META to HOT is not >> necessarily a good idea. Meta data isn't necessarily more "hot" than >> regular data, it all depends on how it's being used. So I think it would >> be a lot more appropriate to pass down this information specifically, >> instead of overloading REQ_META. > > I think so. I guess it depends on what "HOT" means and filesystem should > understand REQ_HOT and flag bio/req appropriately. > > But if this optimization is especially targeted at meta data, then using > REQ_META will make sense too. Right, that's the matter, what's the HOT data? if the filesystem provides the hot data exactly. it's best. the remaining is the heuristic. anyway, host give the hot data information chip, then chip handle it another place from cold data. e.g., SLC area or some place to improve the performance and reliability > > Whatever flag it is (HOT, META), I am wondering why IO scheduler need to > come into the picture for this information. This is kind of between > filesystem and driver. As driver should see all the request flags, it > should just be able to check for presence of flag and not call into > elevator/IO scheduler. > > On a side note, if we are willing to keep pid/iocontext information in > struct request, then this optimization should work with noop/deadline too. Okay, If the concept is accepted. Thank you, Kyungmin Park -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 1f96ad6..5089f67 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -3800,6 +3800,18 @@ queue_fail: return 1; } +static int cfq_get_request_hint(struct request *rq, struct request_hint *hint) +{ + struct cfq_queue *cfqq = RQ_CFQQ(rq); + + if (cfqq) { + hint->context = cfqq->pid; + hint->hot = !!(rq->cmd_flags & REQ_META); + } + + return 0; +} + static void cfq_kick_queue(struct work_struct *work) { struct cfq_data *cfqd = @@ -4211,6 +4223,7 @@ static struct elevator_type iosched_cfq = { .elevator_latter_req_fn = elv_rb_latter_request, .elevator_set_req_fn = cfq_set_request, .elevator_put_req_fn = cfq_put_request, + .elevator_get_req_hint_fn = cfq_get_request_hint, .elevator_may_queue_fn = cfq_may_queue, .elevator_init_fn = cfq_init_queue, .elevator_exit_fn = cfq_exit_queue, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 0e67c45..8d9592b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -71,6 +71,11 @@ enum rq_cmd_type_bits { #define BLK_MAX_CDB 16 +struct request_hint { + int context; /* Context ID */ + unsigned int hot:1; /* Hot/cold */ +}; + /* * try to put the fields that are referenced together in the same cacheline. * if you modify this structure, be sure to check block/blk-core.c:blk_rq_init() diff --git a/include/linux/elevator.h b/include/linux/elevator.h index d800d51..114d427 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -26,6 +26,7 @@ typedef int (elevator_may_queue_fn) (struct request_queue *, int); typedef int (elevator_set_req_fn) (struct request_queue *, struct request *, gfp_t); typedef void (elevator_put_req_fn) (struct request *); +typedef int (elevator_get_req_hint_fn) (struct request *, struct request_hint *); typedef void (elevator_activate_req_fn) (struct request_queue *, struct request *); typedef void (elevator_deactivate_req_fn) (struct request_queue *, struct request *); @@ -52,6 +53,7 @@ struct elevator_ops elevator_set_req_fn *elevator_set_req_fn; elevator_put_req_fn *elevator_put_req_fn; + elevator_get_req_hint_fn *elevator_get_req_hint_fn; elevator_may_queue_fn *elevator_may_queue_fn;