mbox series

[RFC,for-next,v2,0/4] enable pcpu bio caching for IRQ I/O

Message ID cover.1666122465.git.asml.silence@gmail.com (mailing list archive)
Headers show
Series enable pcpu bio caching for IRQ I/O | expand

Message

Pavel Begunkov Oct. 18, 2022, 7:50 p.m. UTC
This series implements bio pcpu caching for normal / IRQ-driven I/O
extending REQ_ALLOC_CACHE currently limited to iopoll. The allocation side
still only works from non-irq context, which is the reason it's not enabled
by default, but turning it on for other users (e.g. filesystems) is
as a matter of passing a flag.

t/io_uring with an Optane SSD setup showed +7% for batches of 32 requests
and +4.3% for batches of 8.

IRQ, 128/32/32, cache off
IOPS=59.08M, BW=28.84GiB/s, IOS/call=31/31
IOPS=59.30M, BW=28.96GiB/s, IOS/call=32/32
IOPS=59.97M, BW=29.28GiB/s, IOS/call=31/31
IOPS=59.92M, BW=29.26GiB/s, IOS/call=32/32
IOPS=59.81M, BW=29.20GiB/s, IOS/call=32/31

IRQ, 128/32/32, cache on
IOPS=64.05M, BW=31.27GiB/s, IOS/call=32/31
IOPS=64.22M, BW=31.36GiB/s, IOS/call=32/32
IOPS=64.04M, BW=31.27GiB/s, IOS/call=31/31
IOPS=63.16M, BW=30.84GiB/s, IOS/call=32/32

IRQ, 32/8/8, cache off
IOPS=50.60M, BW=24.71GiB/s, IOS/call=7/8
IOPS=50.22M, BW=24.52GiB/s, IOS/call=8/7
IOPS=49.54M, BW=24.19GiB/s, IOS/call=8/8
IOPS=50.07M, BW=24.45GiB/s, IOS/call=7/7
IOPS=50.46M, BW=24.64GiB/s, IOS/call=8/8

IRQ, 32/8/8, cache on
IOPS=51.39M, BW=25.09GiB/s, IOS/call=8/7
IOPS=52.52M, BW=25.64GiB/s, IOS/call=7/8
IOPS=52.57M, BW=25.67GiB/s, IOS/call=8/8
IOPS=52.58M, BW=25.67GiB/s, IOS/call=8/7
IOPS=52.61M, BW=25.69GiB/s, IOS/call=8/8

The main part is in patch 3. Would be great to take patch 1 separately
for 6.1 for extra safety.

v2: fix botched splicing threshold checks

Pavel Begunkov (4):
  bio: safeguard REQ_ALLOC_CACHE bio put
  bio: split pcpu cache part of bio_put into a helper
  block/bio: add pcpu caching for non-polling bio_put
  io_uring/rw: enable bio caches for IRQ rw

 block/bio.c   | 94 ++++++++++++++++++++++++++++++++++++++++-----------
 io_uring/rw.c |  3 +-
 2 files changed, 76 insertions(+), 21 deletions(-)

Comments

Christoph Hellwig Oct. 20, 2022, 8:32 a.m. UTC | #1
On Tue, Oct 18, 2022 at 08:50:54PM +0100, Pavel Begunkov wrote:
> This series implements bio pcpu caching for normal / IRQ-driven I/O
> extending REQ_ALLOC_CACHE currently limited to iopoll. The allocation side
> still only works from non-irq context, which is the reason it's not enabled
> by default, but turning it on for other users (e.g. filesystems) is
> as a matter of passing a flag.
> 
> t/io_uring with an Optane SSD setup showed +7% for batches of 32 requests
> and +4.3% for batches of 8.

This looks much nicer to me than the previous attempt exposing the bio
internals to io_uring, thanks.
Pavel Begunkov Oct. 20, 2022, 12:40 p.m. UTC | #2
On 10/20/22 09:32, Christoph Hellwig wrote:
> On Tue, Oct 18, 2022 at 08:50:54PM +0100, Pavel Begunkov wrote:
>> This series implements bio pcpu caching for normal / IRQ-driven I/O
>> extending REQ_ALLOC_CACHE currently limited to iopoll. The allocation side
>> still only works from non-irq context, which is the reason it's not enabled
>> by default, but turning it on for other users (e.g. filesystems) is
>> as a matter of passing a flag.
>>
>> t/io_uring with an Optane SSD setup showed +7% for batches of 32 requests
>> and +4.3% for batches of 8.
> 
> This looks much nicer to me than the previous attempt exposing the bio
> internals to io_uring, thanks.

Yeah, I saw the one Jens posted before but I wanted this one to be more
generic, i.e. applicable not only to io_uring. Thanks for taking a look.
Jens Axboe Oct. 20, 2022, 12:50 p.m. UTC | #3
On Tue, 18 Oct 2022 20:50:54 +0100, Pavel Begunkov wrote:
> This series implements bio pcpu caching for normal / IRQ-driven I/O
> extending REQ_ALLOC_CACHE currently limited to iopoll. The allocation side
> still only works from non-irq context, which is the reason it's not enabled
> by default, but turning it on for other users (e.g. filesystems) is
> as a matter of passing a flag.
> 
> t/io_uring with an Optane SSD setup showed +7% for batches of 32 requests
> and +4.3% for batches of 8.
> 
> [...]

Applied, thanks!

[1/4] bio: safeguard REQ_ALLOC_CACHE bio put
      commit: d4347d50407daea6237872281ece64c4bdf1ec99

Best regards,
Jens Axboe Oct. 20, 2022, 12:53 p.m. UTC | #4
On 10/20/22 5:40 AM, Pavel Begunkov wrote:
> On 10/20/22 09:32, Christoph Hellwig wrote:
>> On Tue, Oct 18, 2022 at 08:50:54PM +0100, Pavel Begunkov wrote:
>>> This series implements bio pcpu caching for normal / IRQ-driven I/O
>>> extending REQ_ALLOC_CACHE currently limited to iopoll. The allocation side
>>> still only works from non-irq context, which is the reason it's not enabled
>>> by default, but turning it on for other users (e.g. filesystems) is
>>> as a matter of passing a flag.
>>>
>>> t/io_uring with an Optane SSD setup showed +7% for batches of 32 requests
>>> and +4.3% for batches of 8.
>>
>> This looks much nicer to me than the previous attempt exposing the bio
>> internals to io_uring, thanks.
> 
> Yeah, I saw the one Jens posted before but I wanted this one to be more
> generic, i.e. applicable not only to io_uring. Thanks for taking a look.

It is indeed better like that, also because we can get rid of the alloc
cache flag long term and just have it be the way that bio allocations
work.