diff mbox series

[2/5] block: add request->io_data_len

Message ID 20200408201450.3959560-3-tj@kernel.org (mailing list archive)
State New, archived
Headers show
Series [1/5] blk-iocost: switch to fixed non-auto-decaying use_delay | expand

Commit Message

Tejun Heo April 8, 2020, 8:14 p.m. UTC
Currently, at the time of completeion, there's no way of knowing how big a
request was. blk-iocost will need this information to account for IO size when
calculating expected latencies.

This patch adds rq->io_data_len which remembers blk_rq_bytes() at the time the
request gets issued. The field is enabled iff CONFIG_BLK_IO_DATA_LEN is set and
doesn't increase the size of the struct even when enabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/Kconfig          | 3 +++
 block/blk-mq.c         | 6 ++++++
 include/linux/blkdev.h | 8 ++++++++
 3 files changed, 17 insertions(+)

Comments

Ming Lei April 9, 2020, 1:44 a.m. UTC | #1
Hi Tejun,

On Wed, Apr 08, 2020 at 04:14:47PM -0400, Tejun Heo wrote:
> Currently, at the time of completeion, there's no way of knowing how big a
> request was. blk-iocost will need this information to account for IO size when
> calculating expected latencies.
> 
> This patch adds rq->io_data_len which remembers blk_rq_bytes() at the time the
> request gets issued. The field is enabled iff CONFIG_BLK_IO_DATA_LEN is set and
> doesn't increase the size of the struct even when enabled.

Almost all __blk_mq_end_request() follow blk_update_request(), so the
completed bytes can be passed to __blk_mq_end_request(), then we can
avoid to introduce this field.

Also there is just 20 callers of __blk_mq_end_request(), looks this kind
of change shouldn't be too big.


Thanks, 
Ming
Tejun Heo April 9, 2020, 2:11 a.m. UTC | #2
On Thu, Apr 09, 2020 at 09:44:06AM +0800, Ming Lei wrote:
> Almost all __blk_mq_end_request() follow blk_update_request(), so the
> completed bytes can be passed to __blk_mq_end_request(), then we can
> avoid to introduce this field.

But on some drivers blk_update_request() may be called multiple times before
__blk_mq_end_request() is called and what's needed here is the total number of
bytes in the whole request, not just in the final completion.

> Also there is just 20 callers of __blk_mq_end_request(), looks this kind
> of change shouldn't be too big.

This would work iff we get rid of partial completions and if we get rid of
partial completions, we might as well stop exposing blk_update_request() and
__blk_mq_end_request().

Thanks.
Ming Lei April 9, 2020, 2:38 a.m. UTC | #3
On Wed, Apr 08, 2020 at 10:11:19PM -0400, Tejun Heo wrote:
> On Thu, Apr 09, 2020 at 09:44:06AM +0800, Ming Lei wrote:
> > Almost all __blk_mq_end_request() follow blk_update_request(), so the
> > completed bytes can be passed to __blk_mq_end_request(), then we can
> > avoid to introduce this field.
> 
> But on some drivers blk_update_request() may be called multiple times before
> __blk_mq_end_request() is called and what's needed here is the total number of
> bytes in the whole request, not just in the final completion.

OK.

Another choice might be to record request bytes in rq's payload
when calling .queue_rq() only for these drivers.

> 
> > Also there is just 20 callers of __blk_mq_end_request(), looks this kind
> > of change shouldn't be too big.
> 
> This would work iff we get rid of partial completions and if we get rid of
> partial completions, we might as well stop exposing blk_update_request() and
> __blk_mq_end_request().

Indeed, we can store the completed bytes in request payload, so looks killing
partial completion shouldn't be too hard.

Thanks,
Ming
Bart Van Assche April 9, 2020, 3:44 a.m. UTC | #4
On 2020-04-08 13:14, Tejun Heo wrote:
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 32868fbedc9e..bfd34c6a27ef 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -142,6 +142,14 @@ struct request {
>  
>  	/* the following two fields are internal, NEVER access directly */
>  	unsigned int __data_len;	/* total data len */
> +#ifdef CONFIG_BLK_RQ_IO_DATA_LEN
> +	/*
> +	 * Total data len at the time of issue. This doesn't get deducted by
> +	 * blk_update_request() and can be used by completion path to determine
> +	 * the request size.
> +	 */
> +	unsigned int io_data_len;
> +#endif
>  	sector_t __sector;		/* sector cursor */
>  
>  	struct bio *bio;

So we have one struct member with the description "total data len" and
another struct member with the description "total data len at the time
of issue"? How could one not get confused by these descriptions?

This change makes the comment above __data_len incorrect. Please update
that comment or move io_data_len in front of that comment.

How does this change interact with the code in drivers/scsi/sd.c that
manipulates __data_len directly?

Thanks,

Bart.
Pavel Begunkov April 9, 2020, 5:08 a.m. UTC | #5
On 09/04/2020 05:38, Ming Lei wrote:
> On Wed, Apr 08, 2020 at 10:11:19PM -0400, Tejun Heo wrote:
>> On Thu, Apr 09, 2020 at 09:44:06AM +0800, Ming Lei wrote:
>>> Almost all __blk_mq_end_request() follow blk_update_request(), so the
>>> completed bytes can be passed to __blk_mq_end_request(), then we can
>>> avoid to introduce this field.
>>
>> But on some drivers blk_update_request() may be called multiple times before
>> __blk_mq_end_request() is called and what's needed here is the total number of
>> bytes in the whole request, not just in the final completion.
> 
> OK.
> 
> Another choice might be to record request bytes in rq's payload
> when calling .queue_rq() only for these drivers.
> 
>>
>>> Also there is just 20 callers of __blk_mq_end_request(), looks this kind
>>> of change shouldn't be too big.
>>
>> This would work iff we get rid of partial completions and if we get rid of
>> partial completions, we might as well stop exposing blk_update_request() and
>> __blk_mq_end_request().
> 
> Indeed, we can store the completed bytes in request payload, so looks killing
> partial completion shouldn't be too hard.

struct request already has such field (see @stats_sectors) because of the same
root-cause. I'd prefer killing it as well by following Ming's way, but otherwise
it could be easily adopted.
Tejun Heo April 13, 2020, 1:52 p.m. UTC | #6
Hello,

On Wed, Apr 08, 2020 at 08:44:17PM -0700, Bart Van Assche wrote:
> On 2020-04-08 13:14, Tejun Heo wrote:
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 32868fbedc9e..bfd34c6a27ef 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -142,6 +142,14 @@ struct request {
> >  
> >  	/* the following two fields are internal, NEVER access directly */
> >  	unsigned int __data_len;	/* total data len */
> > +#ifdef CONFIG_BLK_RQ_IO_DATA_LEN
> > +	/*
> > +	 * Total data len at the time of issue. This doesn't get deducted by
> > +	 * blk_update_request() and can be used by completion path to determine
> > +	 * the request size.
> > +	 */
> > +	unsigned int io_data_len;
> > +#endif
> >  	sector_t __sector;		/* sector cursor */
> >  
> >  	struct bio *bio;
> 
> So we have one struct member with the description "total data len" and
> another struct member with the description "total data len at the time
> of issue"? How could one not get confused by these descriptions?

The new one explicitly says it doesn't get deducted by update_request.

> This change makes the comment above __data_len incorrect. Please update
> that comment or move io_data_len in front of that comment.

Sure.

> How does this change interact with the code in drivers/scsi/sd.c that
> manipulates __data_len directly?

It doesn't.

Thanks.
Tejun Heo April 13, 2020, 1:56 p.m. UTC | #7
Hello,

On Thu, Apr 09, 2020 at 10:38:57AM +0800, Ming Lei wrote:
> On Wed, Apr 08, 2020 at 10:11:19PM -0400, Tejun Heo wrote:
> > On Thu, Apr 09, 2020 at 09:44:06AM +0800, Ming Lei wrote:
> > > Almost all __blk_mq_end_request() follow blk_update_request(), so the
> > > completed bytes can be passed to __blk_mq_end_request(), then we can
> > > avoid to introduce this field.
> > 
> > But on some drivers blk_update_request() may be called multiple times before
> > __blk_mq_end_request() is called and what's needed here is the total number of
> > bytes in the whole request, not just in the final completion.
> 
> OK.
> 
> Another choice might be to record request bytes in rq's payload
> when calling .queue_rq() only for these drivers.

There sure are multiple ways to skin a cat.

> > > Also there is just 20 callers of __blk_mq_end_request(), looks this kind
> > > of change shouldn't be too big.
> > 
> > This would work iff we get rid of partial completions and if we get rid of
> > partial completions, we might as well stop exposing blk_update_request() and
> > __blk_mq_end_request().
> 
> Indeed, we can store the completed bytes in request payload, so looks killing
> partial completion shouldn't be too hard.

There's a reason why we've had partial completions. On slower IO devices, like
floppy, partial completions actually are advantageous. I'm not arguing this
still holds up as a valid justification but getting rid of partial completions
isn't just a decision about plumbing details either.

Thanks.
Tejun Heo April 13, 2020, 2:02 p.m. UTC | #8
Hello,

On Thu, Apr 09, 2020 at 08:08:59AM +0300, Pavel Begunkov wrote:
> struct request already has such field (see @stats_sectors) because of the same
> root-cause. I'd prefer killing it as well by following Ming's way, but otherwise
> it could be easily adopted.

Oh, I completely missed that field. Lemme just use that one instead. Please
disregard this patch. As for killing partial completions, while I'm not
necessarily against it, it isn't a no brainer either.

Thanks.
diff mbox series

Patch

diff --git a/block/Kconfig b/block/Kconfig
index 3bc76bb113a0..48308e600dc8 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -26,6 +26,9 @@  menuconfig BLOCK
 
 if BLOCK
 
+config BLK_RQ_IO_DATA_LEN
+	bool
+
 config BLK_RQ_ALLOC_TIME
 	bool
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6291ceedee4..64ed22712fe4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -415,6 +415,9 @@  struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 		return ERR_PTR(-EWOULDBLOCK);
 
 	rq->__data_len = 0;
+#ifdef CONFIG_BLK_RQ_IO_DATA_LEN
+	rq->io_data_len = 0;
+#endif
 	rq->__sector = (sector_t) -1;
 	rq->bio = rq->biotail = NULL;
 	return rq;
@@ -655,6 +658,9 @@  void blk_mq_start_request(struct request *rq)
 
 	trace_block_rq_issue(q, rq);
 
+#ifdef CONFIG_BLK_RQ_IO_DATA_LEN
+	rq->io_data_len = blk_rq_bytes(rq);
+#endif
 	if (test_bit(QUEUE_FLAG_STATS, &q->queue_flags)) {
 		rq->io_start_time_ns = ktime_get_ns();
 		rq->stats_sectors = blk_rq_sectors(rq);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 32868fbedc9e..bfd34c6a27ef 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -142,6 +142,14 @@  struct request {
 
 	/* the following two fields are internal, NEVER access directly */
 	unsigned int __data_len;	/* total data len */
+#ifdef CONFIG_BLK_RQ_IO_DATA_LEN
+	/*
+	 * Total data len at the time of issue. This doesn't get deducted by
+	 * blk_update_request() and can be used by completion path to determine
+	 * the request size.
+	 */
+	unsigned int io_data_len;
+#endif
 	sector_t __sector;		/* sector cursor */
 
 	struct bio *bio;