diff mbox series

[1/2] block: Optimize blk_mq_submit_bio() for the cache hit scenario

Message ID 20241216201901.2670237-2-bvanassche@acm.org (mailing list archive)
State New
Headers show
Series Move blk_mq_submit_bio() error handling | expand

Commit Message

Bart Van Assche Dec. 16, 2024, 8:19 p.m. UTC
Help the CPU branch predictor in case of a cache hit by handling the cache
hit scenario first.

Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig Dec. 17, 2024, 4:16 a.m. UTC | #1
On Mon, Dec 16, 2024 at 12:19:00PM -0800, Bart Van Assche wrote:
> Help the CPU branch predictor in case of a cache hit by handling the cache
> hit scenario first.

Numbers, please.
Bart Van Assche Dec. 17, 2024, 7:22 p.m. UTC | #2
On 12/16/24 8:16 PM, Christoph Hellwig wrote:
> On Mon, Dec 16, 2024 at 12:19:00PM -0800, Bart Van Assche wrote:
>> Help the CPU branch predictor in case of a cache hit by handling the cache
>> hit scenario first.
> 
> Numbers, please.

For a single CPU core and with the brd driver and fio and the io_uring
I/O engine, I see the following performance in a VM (three test runs for
each test case):

Without this patch:      1619K, 1641K, 1638K IOPS or 1633 K +/- 10 K.
With this patch applied: 1650K, 1633K, 1635K IOPS or 1639 K +/-  8 K.

So there is a small performance improvement but the improvement is
smaller than the measurement error. Is this sufficient data to proceed
with this patch?

Thanks,

Bart.
Jens Axboe Dec. 17, 2024, 9:15 p.m. UTC | #3
On 12/17/24 12:22 PM, Bart Van Assche wrote:
> On 12/16/24 8:16 PM, Christoph Hellwig wrote:
>> On Mon, Dec 16, 2024 at 12:19:00PM -0800, Bart Van Assche wrote:
>>> Help the CPU branch predictor in case of a cache hit by handling the cache
>>> hit scenario first.
>>
>> Numbers, please.
> 
> For a single CPU core and with the brd driver and fio and the io_uring
> I/O engine, I see the following performance in a VM (three test runs for
> each test case):
> 
> Without this patch:      1619K, 1641K, 1638K IOPS or 1633 K +/- 10 K.
> With this patch applied: 1650K, 1633K, 1635K IOPS or 1639 K +/-  8 K.
> 
> So there is a small performance improvement but the improvement is
> smaller than the measurement error. Is this sufficient data to proceed
> with this patch?

I think it's fine, it's going to be very hard to reliably measure. I
tend to prefer the expected case to be the one checked for, eg:

rq = get_cached();
if (rq)
	...

anyway, as then the expected outcome is what you're reading next
anyway. Either that or

rq = get_cached();
if (unlikely(!rq))
	goto some_failure_path;

If you really wanted to measure it, you'd probably skip the IOPS
part and just look at branch misprediction in perf.
Christoph Hellwig Dec. 18, 2024, 6:59 a.m. UTC | #4
On Tue, Dec 17, 2024 at 11:22:47AM -0800, Bart Van Assche wrote:
> For a single CPU core and with the brd driver and fio and the io_uring
> I/O engine, I see the following performance in a VM (three test runs for
> each test case):
>
> Without this patch:      1619K, 1641K, 1638K IOPS or 1633 K +/- 10 K.
> With this patch applied: 1650K, 1633K, 1635K IOPS or 1639 K +/-  8 K.
>
> So there is a small performance improvement but the improvement is
> smaller than the measurement error. Is this sufficient data to proceed
> with this patch?

I think it's sufficient data that should not claim that it is an
optimizations.  The resulting code still looks nicer to me, but
arguing with optimizations feels like BS.
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7ee21346a41e..8d2aab4d9ba9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3102,12 +3102,12 @@  void blk_mq_submit_bio(struct bio *bio)
 		goto queue_exit;
 
 new_request:
-	if (!rq) {
+	if (rq) {
+		blk_mq_use_cached_rq(rq, plug, bio);
+	} else {
 		rq = blk_mq_get_new_requests(q, plug, bio, nr_segs);
 		if (unlikely(!rq))
 			goto queue_exit;
-	} else {
-		blk_mq_use_cached_rq(rq, plug, bio);
 	}
 
 	trace_block_getrq(bio);