mbox series

[V3,0/5] loop: improve loop aio perf by IOCB_NOWAIT

Message ID 20250322012617.354222-1-ming.lei@redhat.com (mailing list archive)
Headers show
Series loop: improve loop aio perf by IOCB_NOWAIT | expand

Message

Ming Lei March 22, 2025, 1:26 a.m. UTC
Hello Jens,

This patchset improves loop aio perf by using IOCB_NOWAIT for avoiding to queue aio
command to workqueue context, meantime refactor lo_rw_aio() a bit.

In my test VM, loop disk perf becomes very close to perf of the backing block
device(nvme/mq virtio-scsi).

And Mikulas verified that this way can improve 12jobs sequential rw io by
~5X, and basically solve the reported problem together with loop MQ change.

https://lore.kernel.org/linux-block/a8e5c76a-231f-07d1-a394-847de930f638@redhat.com/

The loop MQ change will be posted as standalone patch, because it needs
losetup change.


Thanks,
Ming

V3:
	- add reviewed-by tag
	- rename variable & improve commit log & comment on 5/5(Christoph)

V2:
	- patch style fix & cleanup (Christoph)
	- fix randwrite perf regression on sparse backing file
	- drop MQ change


Ming Lei (5):
  loop: simplify do_req_filebacked()
  loop: cleanup lo_rw_aio()
  loop: move command blkcg/memcg initialization into loop_queue_work
  loop: try to handle loop aio command via NOWAIT IO first
  loop: add hint for handling aio via IOCB_NOWAIT

 drivers/block/loop.c | 227 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 181 insertions(+), 46 deletions(-)

Comments

Jens Axboe March 22, 2025, 5:40 p.m. UTC | #1
On Sat, 22 Mar 2025 09:26:09 +0800, Ming Lei wrote:
> This patchset improves loop aio perf by using IOCB_NOWAIT for avoiding to queue aio
> command to workqueue context, meantime refactor lo_rw_aio() a bit.
> 
> In my test VM, loop disk perf becomes very close to perf of the backing block
> device(nvme/mq virtio-scsi).
> 
> And Mikulas verified that this way can improve 12jobs sequential rw io by
> ~5X, and basically solve the reported problem together with loop MQ change.
> 
> [...]

Applied, thanks!

[1/5] loop: simplify do_req_filebacked()
      commit: 04dcb8a909b5b68464ec5ccb123e9614f3ac333d
[2/5] loop: cleanup lo_rw_aio()
      commit: 832c9fec8e2314170c5451023565b94f05477aa7
[3/5] loop: move command blkcg/memcg initialization into loop_queue_work
      commit: a23d34a31758000b2b158288226bf24f96d8864d
[4/5] loop: try to handle loop aio command via NOWAIT IO first
      commit: dfc77a934a3acdb13dadf237b7417c6a31b19da8
[5/5] loop: add hint for handling aio via IOCB_NOWAIT
      commit: 4c3f4bad7a6e9022489a9f8392f7147ed3ce74b1

Best regards,
Jens Axboe March 24, 2025, 2:50 p.m. UTC | #2
On 3/22/25 11:40 AM, Jens Axboe wrote:
> 
> On Sat, 22 Mar 2025 09:26:09 +0800, Ming Lei wrote:
>> This patchset improves loop aio perf by using IOCB_NOWAIT for avoiding to queue aio
>> command to workqueue context, meantime refactor lo_rw_aio() a bit.
>>
>> In my test VM, loop disk perf becomes very close to perf of the backing block
>> device(nvme/mq virtio-scsi).
>>
>> And Mikulas verified that this way can improve 12jobs sequential rw io by
>> ~5X, and basically solve the reported problem together with loop MQ change.
>>
>> [...]
> 
> Applied, thanks!
> 
> [1/5] loop: simplify do_req_filebacked()
>       commit: 04dcb8a909b5b68464ec5ccb123e9614f3ac333d
> [2/5] loop: cleanup lo_rw_aio()
>       commit: 832c9fec8e2314170c5451023565b94f05477aa7
> [3/5] loop: move command blkcg/memcg initialization into loop_queue_work
>       commit: a23d34a31758000b2b158288226bf24f96d8864d
> [4/5] loop: try to handle loop aio command via NOWAIT IO first
>       commit: dfc77a934a3acdb13dadf237b7417c6a31b19da8
> [5/5] loop: add hint for handling aio via IOCB_NOWAIT
>       commit: 4c3f4bad7a6e9022489a9f8392f7147ed3ce74b1

Just a heads-up that I had applied this for testing, not necessarily to
get included. To clear up that confusion, I have retained patches 1-3
for now, and then we can queue up 4-5/5 later when everybody is happy
with them.
Ming Lei March 25, 2025, 1:59 a.m. UTC | #3
On Mon, Mar 24, 2025 at 08:50:14AM -0600, Jens Axboe wrote:
> On 3/22/25 11:40 AM, Jens Axboe wrote:
> > 
> > On Sat, 22 Mar 2025 09:26:09 +0800, Ming Lei wrote:
> >> This patchset improves loop aio perf by using IOCB_NOWAIT for avoiding to queue aio
> >> command to workqueue context, meantime refactor lo_rw_aio() a bit.
> >>
> >> In my test VM, loop disk perf becomes very close to perf of the backing block
> >> device(nvme/mq virtio-scsi).
> >>
> >> And Mikulas verified that this way can improve 12jobs sequential rw io by
> >> ~5X, and basically solve the reported problem together with loop MQ change.
> >>
> >> [...]
> > 
> > Applied, thanks!
> > 
> > [1/5] loop: simplify do_req_filebacked()
> >       commit: 04dcb8a909b5b68464ec5ccb123e9614f3ac333d
> > [2/5] loop: cleanup lo_rw_aio()
> >       commit: 832c9fec8e2314170c5451023565b94f05477aa7
> > [3/5] loop: move command blkcg/memcg initialization into loop_queue_work
> >       commit: a23d34a31758000b2b158288226bf24f96d8864d
> > [4/5] loop: try to handle loop aio command via NOWAIT IO first
> >       commit: dfc77a934a3acdb13dadf237b7417c6a31b19da8
> > [5/5] loop: add hint for handling aio via IOCB_NOWAIT
> >       commit: 4c3f4bad7a6e9022489a9f8392f7147ed3ce74b1
> 
> Just a heads-up that I had applied this for testing, not necessarily to
> get included. To clear up that confusion, I have retained patches 1-3
> for now, and then we can queue up 4-5/5 later when everybody is happy
> with them.

Fine.

I'd see the reason if there is, looks not see it anywhere, :-)

And it should have been posted on mail list.

Christoph suggested per-cmd struct, which does cause regression for
the usual sequential IO workload from both throughput and cpu utilization viewpoints,
and this thing has been observed 10 years ago when enabling loop dio/aio.

https://lore.kernel.org/lkml/1439778711-9621-4-git-send-email-ming.lei@canonical.com/

And my recent test shows same result too:

https://lore.kernel.org/linux-block/Z9I2lm31KOQ784nb@fedora/

Mikulas's test shows per-cmd struct works much worse than this patchset:

https://lore.kernel.org/linux-block/7b8b8a24-f36b-d213-cca1-d8857b6aca02@redhat.com/

And anything else?


Thanks,
Ming
Jens Axboe March 25, 2025, 12:07 p.m. UTC | #4
On 3/24/25 7:59 PM, Ming Lei wrote:
> On Mon, Mar 24, 2025 at 08:50:14AM -0600, Jens Axboe wrote:
>> On 3/22/25 11:40 AM, Jens Axboe wrote:
>>>
>>> On Sat, 22 Mar 2025 09:26:09 +0800, Ming Lei wrote:
>>>> This patchset improves loop aio perf by using IOCB_NOWAIT for avoiding to queue aio
>>>> command to workqueue context, meantime refactor lo_rw_aio() a bit.
>>>>
>>>> In my test VM, loop disk perf becomes very close to perf of the backing block
>>>> device(nvme/mq virtio-scsi).
>>>>
>>>> And Mikulas verified that this way can improve 12jobs sequential rw io by
>>>> ~5X, and basically solve the reported problem together with loop MQ change.
>>>>
>>>> [...]
>>>
>>> Applied, thanks!
>>>
>>> [1/5] loop: simplify do_req_filebacked()
>>>       commit: 04dcb8a909b5b68464ec5ccb123e9614f3ac333d
>>> [2/5] loop: cleanup lo_rw_aio()
>>>       commit: 832c9fec8e2314170c5451023565b94f05477aa7
>>> [3/5] loop: move command blkcg/memcg initialization into loop_queue_work
>>>       commit: a23d34a31758000b2b158288226bf24f96d8864d
>>> [4/5] loop: try to handle loop aio command via NOWAIT IO first
>>>       commit: dfc77a934a3acdb13dadf237b7417c6a31b19da8
>>> [5/5] loop: add hint for handling aio via IOCB_NOWAIT
>>>       commit: 4c3f4bad7a6e9022489a9f8392f7147ed3ce74b1
>>
>> Just a heads-up that I had applied this for testing, not necessarily to
>> get included. To clear up that confusion, I have retained patches 1-3
>> for now, and then we can queue up 4-5/5 later when everybody is happy
>> with them.
> 
> Fine.
> 
> I'd see the reason if there is, looks not see it anywhere, :-)
> 
> And it should have been posted on mail list.

There's no reason, it's what I emailed above. It's just that 4-5/5
aren't fully reviewed yet. We can still make 6.15 if folks are happy
with it, just wanted to ensure it had enough time on the list to ensure
that that is the case.