diff mbox series

blktrace: Provide event for request merging

Message ID 20200617135823.980-1-jack@suse.cz (mailing list archive)
State New, archived
Headers show
Series blktrace: Provide event for request merging | expand

Commit Message

Jan Kara June 17, 2020, 1:58 p.m. UTC
Currently blk-mq does not report any event when two requests get merged
in the elevator. This then results in difficult to understand sequence
of events like:

...
  8,0   34     1579     0.608765271  2718  I  WS 215023504 + 40 [dbench]
  8,0   34     1584     0.609184613  2719  A  WS 215023544 + 56 <- (8,4) 2160568
  8,0   34     1585     0.609184850  2719  Q  WS 215023544 + 56 [dbench]
  8,0   34     1586     0.609188524  2719  G  WS 215023544 + 56 [dbench]
  8,0    3      602     0.609684162   773  D  WS 215023504 + 96 [kworker/3:1H]
  8,0   34     1591     0.609843593     0  C  WS 215023504 + 96 [0]

and you can only guess (after quite some headscratching since the above
excerpt is intermixed with a lot of other IO) that request 215023544+56
got merged to request 215023504+40. Provide proper event for request
merging like we used to do in the legacy block layer.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-merge.c            |  2 ++
 include/trace/events/block.h | 15 +++++++++++++++
 kernel/trace/blktrace.c      | 10 ++++++++++
 3 files changed, 27 insertions(+)

Comments

Chaitanya Kulkarni June 17, 2020, 5:41 p.m. UTC | #1
Jan,

On 6/17/20 7:03 AM, Jan Kara wrote:
> Currently blk-mq does not report any event when two requests get merged
> in the elevator. This then results in difficult to understand sequence
> of events like:
> 
> ...
>    8,0   34     1579     0.608765271  2718  I  WS 215023504 + 40 [dbench]
>    8,0   34     1584     0.609184613  2719  A  WS 215023544 + 56 <- (8,4) 2160568
>    8,0   34     1585     0.609184850  2719  Q  WS 215023544 + 56 [dbench]
>    8,0   34     1586     0.609188524  2719  G  WS 215023544 + 56 [dbench]
>    8,0    3      602     0.609684162   773  D  WS 215023504 + 96 [kworker/3:1H]
>    8,0   34     1591     0.609843593     0  C  WS 215023504 + 96 [0]
> 
> and you can only guess (after quite some headscratching since the above
> excerpt is intermixed with a lot of other IO) that request 215023544+56
> got merged to request 215023504+40. Provide proper event for request
> merging like we used to do in the legacy block layer.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

The attempt_merge function is also responsible for the discard merging 
which doesn't have any direction specified in ELEVATOR_XXX identifiers.
In this patch we care unconditionally generating back merge event
irrespective of the req_op.

Do we need to either generate event iff ELEVATOR_BACK_MERGE true case or
add another trace point for discard ? given that it may lead to
confusion since elevator flags for the discard doesn't specify the 
direction.
Christoph Hellwig June 18, 2020, 6:53 a.m. UTC | #2
On Wed, Jun 17, 2020 at 03:58:23PM +0200, Jan Kara wrote:
>  	blk_account_io_merge_request(next);
>  
> +	trace_block_rq_merge(q, next);

q can be derived from next, no need to explicitly pass it.  And yes,
I know a lot of existing tracepoints do so, but I plan to fix that up
as well.
Jan Kara June 18, 2020, 12:04 p.m. UTC | #3
Hello,

On Wed 17-06-20 17:41:36, Chaitanya Kulkarni wrote:
> On 6/17/20 7:03 AM, Jan Kara wrote:
> > Currently blk-mq does not report any event when two requests get merged
> > in the elevator. This then results in difficult to understand sequence
> > of events like:
> > 
> > ...
> >    8,0   34     1579     0.608765271  2718  I  WS 215023504 + 40 [dbench]
> >    8,0   34     1584     0.609184613  2719  A  WS 215023544 + 56 <- (8,4) 2160568
> >    8,0   34     1585     0.609184850  2719  Q  WS 215023544 + 56 [dbench]
> >    8,0   34     1586     0.609188524  2719  G  WS 215023544 + 56 [dbench]
> >    8,0    3      602     0.609684162   773  D  WS 215023504 + 96 [kworker/3:1H]
> >    8,0   34     1591     0.609843593     0  C  WS 215023504 + 96 [0]
> > 
> > and you can only guess (after quite some headscratching since the above
> > excerpt is intermixed with a lot of other IO) that request 215023544+56
> > got merged to request 215023504+40. Provide proper event for request
> > merging like we used to do in the legacy block layer.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> The attempt_merge function is also responsible for the discard merging 
> which doesn't have any direction specified in ELEVATOR_XXX identifiers.
> In this patch we care unconditionally generating back merge event
> irrespective of the req_op.
> 
> Do we need to either generate event iff ELEVATOR_BACK_MERGE true case or
> add another trace point for discard ? given that it may lead to
> confusion since elevator flags for the discard doesn't specify the 
> direction.

attempt_merge() is always called so that 'req' is always the request with
the lower sector, 'next' is the request with a higher sector, and 'next' is
discarded and 'req' is updated. So attempt_merge() always performs only one
direction of a merge and I don't think it makes any sence to distinguish
back merges and forward merges in this case. So discards don't need any
special treatment either AFAICT.

								Honza
Jan Kara June 18, 2020, 12:05 p.m. UTC | #4
On Wed 17-06-20 23:53:59, Christoph Hellwig wrote:
> On Wed, Jun 17, 2020 at 03:58:23PM +0200, Jan Kara wrote:
> >  	blk_account_io_merge_request(next);
> >  
> > +	trace_block_rq_merge(q, next);
> 
> q can be derived from next, no need to explicitly pass it.  And yes,
> I know a lot of existing tracepoints do so, but I plan to fix that up
> as well.

OK, I'll update the patch.

								Honza
Chaitanya Kulkarni June 18, 2020, 3:36 p.m. UTC | #5
On 6/18/20 5:04 AM, Jan Kara wrote:
>> The attempt_merge function is also responsible for the discard merging
>> which doesn't have any direction specified in ELEVATOR_XXX identifiers.
>> In this patch we care unconditionally generating back merge event
>> irrespective of the req_op.
>>
>> Do we need to either generate event iff ELEVATOR_BACK_MERGE true case or
>> add another trace point for discard ? given that it may lead to
>> confusion since elevator flags for the discard doesn't specify the
>> direction.
> attempt_merge() is always called so that 'req' is always the request with
> the lower sector, 'next' is the request with a higher sector, and 'next' is
> discarded and 'req' is updated. So attempt_merge() always performs only one
> direction of a merge and I don't think it makes any sence to distinguish
> back merges and forward merges in this case. So discards don't need any
> special treatment either AFAICT.
> 
Please disregard my comment regarding adding a separate tracpoint for 
discard.

I understand the code and that is what I'm saying it will only back-merge.

The concern is just like we have ELEVATOR_BACK_MERGE used in 
attempt_merge, we should also have ELEVATOR_DISCARD_BACK_MERGE and use 
it the attempt_merge() then this tracepoint will map to the back-merge 
which your patch does with the use of BLK_TC_BACKMERGE nothing needed in 
your patch to change regarding tracepoint.

> 								Honza
> 
> -- Jan Kara <jack@suse.com> SUSE Labs, CR
Jan Kara June 18, 2020, 4:13 p.m. UTC | #6
On Wed 17-06-20 23:53:59, Christoph Hellwig wrote:
> On Wed, Jun 17, 2020 at 03:58:23PM +0200, Jan Kara wrote:
> >  	blk_account_io_merge_request(next);
> >  
> > +	trace_block_rq_merge(q, next);
> 
> q can be derived from next, no need to explicitly pass it.  And yes,
> I know a lot of existing tracepoints do so, but I plan to fix that up
> as well.

I had a look into it now and I could do this but that would mean that
block_rq_merge trace event would now differ from all other similar events
and we couldn't use block_rq event class and have to define our own and so
overall it would IMO make future conversion to get rid of 'q' argument more
difficult, not simpler. So I can do this but are you really sure?

								Honza
Chaitanya Kulkarni June 18, 2020, 5:31 p.m. UTC | #7
On 6/18/20 9:13 AM, Jan Kara wrote:
> On Wed 17-06-20 23:53:59, Christoph Hellwig wrote:
>> On Wed, Jun 17, 2020 at 03:58:23PM +0200, Jan Kara wrote:
>>>   	blk_account_io_merge_request(next);
>>>   
>>> +	trace_block_rq_merge(q, next);
>> q can be derived from next, no need to explicitly pass it.  And yes,
>> I know a lot of existing tracepoints do so, but I plan to fix that up
>> as well.
> I had a look into it now and I could do this but that would mean that
> block_rq_merge trace event would now differ from all other similar events
> and we couldn't use block_rq event class and have to define our own and so
> overall it would IMO make future conversion to get rid of 'q' argument more
> difficult, not simpler. So I can do this but are you really sure?
> 
> 								Honza
> -- Jan Kara <jack@suse.com> SUSE Labs, CR

If you guys are okay then I can look into this and send out a patch and
we can keep this patch separate.
Christoph Hellwig June 19, 2020, 6:52 a.m. UTC | #8
On Thu, Jun 18, 2020 at 06:13:31PM +0200, Jan Kara wrote:
> On Wed 17-06-20 23:53:59, Christoph Hellwig wrote:
> > On Wed, Jun 17, 2020 at 03:58:23PM +0200, Jan Kara wrote:
> > >  	blk_account_io_merge_request(next);
> > >  
> > > +	trace_block_rq_merge(q, next);
> > 
> > q can be derived from next, no need to explicitly pass it.  And yes,
> > I know a lot of existing tracepoints do so, but I plan to fix that up
> > as well.
> 
> I had a look into it now and I could do this but that would mean that
> block_rq_merge trace event would now differ from all other similar events
> and we couldn't use block_rq event class and have to define our own and so
> overall it would IMO make future conversion to get rid of 'q' argument more
> difficult, not simpler. So I can do this but are you really sure?

Ok, let's keep the original version for now then.
Jan Kara June 25, 2020, 7:51 p.m. UTC | #9
On Wed 17-06-20 15:58:23, Jan Kara wrote:
> Currently blk-mq does not report any event when two requests get merged
> in the elevator. This then results in difficult to understand sequence
> of events like:
> 
> ...
>   8,0   34     1579     0.608765271  2718  I  WS 215023504 + 40 [dbench]
>   8,0   34     1584     0.609184613  2719  A  WS 215023544 + 56 <- (8,4) 2160568
>   8,0   34     1585     0.609184850  2719  Q  WS 215023544 + 56 [dbench]
>   8,0   34     1586     0.609188524  2719  G  WS 215023544 + 56 [dbench]
>   8,0    3      602     0.609684162   773  D  WS 215023504 + 96 [kworker/3:1H]
>   8,0   34     1591     0.609843593     0  C  WS 215023504 + 96 [0]
> 
> and you can only guess (after quite some headscratching since the above
> excerpt is intermixed with a lot of other IO) that request 215023544+56
> got merged to request 215023504+40. Provide proper event for request
> merging like we used to do in the legacy block layer.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Jens, it seems people are fine with this patch in the end. Can you please
merge it? Thanks!

								Honza

> ---
>  block/blk-merge.c            |  2 ++
>  include/trace/events/block.h | 15 +++++++++++++++
>  kernel/trace/blktrace.c      | 10 ++++++++++
>  3 files changed, 27 insertions(+)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index f0b0bae075a0..9c9fb21584b6 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -793,6 +793,8 @@ static struct request *attempt_merge(struct request_queue *q,
>  	 */
>  	blk_account_io_merge_request(next);
>  
> +	trace_block_rq_merge(q, next);
> +
>  	/*
>  	 * ownership of bio passed from next to req, return 'next' for
>  	 * the caller to free
> diff --git a/include/trace/events/block.h b/include/trace/events/block.h
> index 81b43f5bdf23..b81205560782 100644
> --- a/include/trace/events/block.h
> +++ b/include/trace/events/block.h
> @@ -211,6 +211,21 @@ DEFINE_EVENT(block_rq, block_rq_issue,
>  	TP_ARGS(q, rq)
>  );
>  
> +/**
> + * block_rq_merge - merge request with another one in the elevator
> + * @q: queue holding operation
> + * @rq: block IO operation operation request
> + *
> + * Called when block operation request @rq from queue @q is merged to another
> + * request queued in the elevator.
> + */
> +DEFINE_EVENT(block_rq, block_rq_merge,
> +
> +	TP_PROTO(struct request_queue *q, struct request *rq),
> +
> +	TP_ARGS(q, rq)
> +);
> +
>  /**
>   * block_bio_bounce - used bounce buffer when processing block operation
>   * @q: queue holding the block operation
> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> index ea47f2084087..41521216d3eb 100644
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -834,6 +834,13 @@ static void blk_add_trace_rq_issue(void *ignore,
>  			 blk_trace_request_get_cgid(q, rq));
>  }
>  
> +static void blk_add_trace_rq_merge(void *ignore,
> +				   struct request_queue *q, struct request *rq)
> +{
> +	blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_BACKMERGE,
> +			 blk_trace_request_get_cgid(q, rq));
> +}
> +
>  static void blk_add_trace_rq_requeue(void *ignore,
>  				     struct request_queue *q,
>  				     struct request *rq)
> @@ -1115,6 +1122,8 @@ static void blk_register_tracepoints(void)
>  	WARN_ON(ret);
>  	ret = register_trace_block_rq_issue(blk_add_trace_rq_issue, NULL);
>  	WARN_ON(ret);
> +	ret = register_trace_block_rq_merge(blk_add_trace_rq_merge, NULL);
> +	WARN_ON(ret);
>  	ret = register_trace_block_rq_requeue(blk_add_trace_rq_requeue, NULL);
>  	WARN_ON(ret);
>  	ret = register_trace_block_rq_complete(blk_add_trace_rq_complete, NULL);
> @@ -1161,6 +1170,7 @@ static void blk_unregister_tracepoints(void)
>  	unregister_trace_block_bio_bounce(blk_add_trace_bio_bounce, NULL);
>  	unregister_trace_block_rq_complete(blk_add_trace_rq_complete, NULL);
>  	unregister_trace_block_rq_requeue(blk_add_trace_rq_requeue, NULL);
> +	unregister_trace_block_rq_merge(blk_add_trace_rq_merge, NULL);
>  	unregister_trace_block_rq_issue(blk_add_trace_rq_issue, NULL);
>  	unregister_trace_block_rq_insert(blk_add_trace_rq_insert, NULL);
>  
> -- 
> 2.16.4
>
Chaitanya Kulkarni June 26, 2020, 12:03 a.m. UTC | #10
Hi Jan and Jens
On 6/25/20 12:51 PM, Jan Kara wrote:
> On Wed 17-06-20 15:58:23, Jan Kara wrote:
>> Currently blk-mq does not report any event when two requests get merged
>> in the elevator. This then results in difficult to understand sequence
>> of events like:
>>
>> ...
>>    8,0   34     1579     0.608765271  2718  I  WS 215023504 + 40 [dbench]
>>    8,0   34     1584     0.609184613  2719  A  WS 215023544 + 56 <- (8,4) 2160568
>>    8,0   34     1585     0.609184850  2719  Q  WS 215023544 + 56 [dbench]
>>    8,0   34     1586     0.609188524  2719  G  WS 215023544 + 56 [dbench]
>>    8,0    3      602     0.609684162   773  D  WS 215023504 + 96 [kworker/3:1H]
>>    8,0   34     1591     0.609843593     0  C  WS 215023504 + 96 [0]
>>
>> and you can only guess (after quite some headscratching since the above
>> excerpt is intermixed with a lot of other IO) that request 215023544+56
>> got merged to request 215023504+40. Provide proper event for request
>> merging like we used to do in the legacy block layer.
>>
>> Signed-off-by: Jan Kara<jack@suse.cz>
> Jens, it seems people are fine with this patch in the end. Can you please
> merge it? Thanks!
> 

I'm sending blktrace cleanup based on the discussion [1], if you guys 
want I can also add this patch and keep Jan Author and have cleanup done
with this patch included separtely.

Please let me know that is not accepted I'll send my series separately.

[1] https://marc.info/?l=linux-block&m=159296928424805&w=2
>
Jens Axboe June 26, 2020, 3:06 a.m. UTC | #11
On 6/25/20 1:51 PM, Jan Kara wrote:
> On Wed 17-06-20 15:58:23, Jan Kara wrote:
>> Currently blk-mq does not report any event when two requests get merged
>> in the elevator. This then results in difficult to understand sequence
>> of events like:
>>
>> ...
>>   8,0   34     1579     0.608765271  2718  I  WS 215023504 + 40 [dbench]
>>   8,0   34     1584     0.609184613  2719  A  WS 215023544 + 56 <- (8,4) 2160568
>>   8,0   34     1585     0.609184850  2719  Q  WS 215023544 + 56 [dbench]
>>   8,0   34     1586     0.609188524  2719  G  WS 215023544 + 56 [dbench]
>>   8,0    3      602     0.609684162   773  D  WS 215023504 + 96 [kworker/3:1H]
>>   8,0   34     1591     0.609843593     0  C  WS 215023504 + 96 [0]
>>
>> and you can only guess (after quite some headscratching since the above
>> excerpt is intermixed with a lot of other IO) that request 215023544+56
>> got merged to request 215023504+40. Provide proper event for request
>> merging like we used to do in the legacy block layer.
>>
>> Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Jens, it seems people are fine with this patch in the end. Can you please
> merge it? Thanks!

Applied for 5.9, thanks.
diff mbox series

Patch

diff --git a/block/blk-merge.c b/block/blk-merge.c
index f0b0bae075a0..9c9fb21584b6 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -793,6 +793,8 @@  static struct request *attempt_merge(struct request_queue *q,
 	 */
 	blk_account_io_merge_request(next);
 
+	trace_block_rq_merge(q, next);
+
 	/*
 	 * ownership of bio passed from next to req, return 'next' for
 	 * the caller to free
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 81b43f5bdf23..b81205560782 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -211,6 +211,21 @@  DEFINE_EVENT(block_rq, block_rq_issue,
 	TP_ARGS(q, rq)
 );
 
+/**
+ * block_rq_merge - merge request with another one in the elevator
+ * @q: queue holding operation
+ * @rq: block IO operation operation request
+ *
+ * Called when block operation request @rq from queue @q is merged to another
+ * request queued in the elevator.
+ */
+DEFINE_EVENT(block_rq, block_rq_merge,
+
+	TP_PROTO(struct request_queue *q, struct request *rq),
+
+	TP_ARGS(q, rq)
+);
+
 /**
  * block_bio_bounce - used bounce buffer when processing block operation
  * @q: queue holding the block operation
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index ea47f2084087..41521216d3eb 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -834,6 +834,13 @@  static void blk_add_trace_rq_issue(void *ignore,
 			 blk_trace_request_get_cgid(q, rq));
 }
 
+static void blk_add_trace_rq_merge(void *ignore,
+				   struct request_queue *q, struct request *rq)
+{
+	blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_BACKMERGE,
+			 blk_trace_request_get_cgid(q, rq));
+}
+
 static void blk_add_trace_rq_requeue(void *ignore,
 				     struct request_queue *q,
 				     struct request *rq)
@@ -1115,6 +1122,8 @@  static void blk_register_tracepoints(void)
 	WARN_ON(ret);
 	ret = register_trace_block_rq_issue(blk_add_trace_rq_issue, NULL);
 	WARN_ON(ret);
+	ret = register_trace_block_rq_merge(blk_add_trace_rq_merge, NULL);
+	WARN_ON(ret);
 	ret = register_trace_block_rq_requeue(blk_add_trace_rq_requeue, NULL);
 	WARN_ON(ret);
 	ret = register_trace_block_rq_complete(blk_add_trace_rq_complete, NULL);
@@ -1161,6 +1170,7 @@  static void blk_unregister_tracepoints(void)
 	unregister_trace_block_bio_bounce(blk_add_trace_bio_bounce, NULL);
 	unregister_trace_block_rq_complete(blk_add_trace_rq_complete, NULL);
 	unregister_trace_block_rq_requeue(blk_add_trace_rq_requeue, NULL);
+	unregister_trace_block_rq_merge(blk_add_trace_rq_merge, NULL);
 	unregister_trace_block_rq_issue(blk_add_trace_rq_issue, NULL);
 	unregister_trace_block_rq_insert(blk_add_trace_rq_insert, NULL);