mbox series

[v3,0/5] io_uring/rsrc: coalescing multi-hugepage registered buffers

Message ID 20240513082300.515905-1-cliang01.li@samsung.com (mailing list archive)
Headers show
Series io_uring/rsrc: coalescing multi-hugepage registered buffers | expand

Message

Chenliang Li May 13, 2024, 8:22 a.m. UTC
Registered buffers are stored and processed in the form of bvec array,
each bvec element typically points to a PAGE_SIZE page but can also work
with hugepages. Specifically, a buffer consisting of a hugepage is
coalesced to use only one hugepage bvec entry during registration.
This coalescing feature helps to save both the space and DMA-mapping time.

However, currently the coalescing feature doesn't work for multi-hugepage
buffers. For a buffer with several 2M hugepages, we still split it into
thousands of 4K page bvec entries while in fact, we can just use a
handful of hugepage bvecs.

This patch series enables coalescing registered buffers with more than
one hugepages. It optimizes the DMA-mapping time and saves memory for
these kind of buffers.

Perf diff of 8M(4*2M) hugepage fixed buffer fio test:

fio/t/io_uring -d64 -s32 -c32 -b8388608 -p0 -B1 -F0 -n1 -O1 -r10 \
-R1 /dev/nvme0n1

Before          After           Symbol

5.90%                           [k] __blk_rq_map_sg
3.70%                           [k] dma_direct_map_sg
3.07%                           [k] dma_pool_alloc
1.12%                           [k] sg_next
                +0.44%          [k] dma_map_page_attrs

First three patches prepare for adding the multi-hugepage coalescing
into buffer registration, the 4th patch enables the feature. The 5th
patch add test cases for this feature in liburing.

-----------------
Changes since v2:

- Modify the loop iterator increment to make code cleaner
- Minor fix to the return procedure in coalesced buffer account
- Correct commit messages
- Add test cases in liburing

v2 : https://lore.kernel.org/io-uring/20240513020149.492727-1-cliang01.li@samsung.com/T/#t

Changes since v1:

- Split into 4 patches
- Fix code style issues
- Rearrange the change of code for cleaner look
- Add speciallized pinned page accounting procedure for coalesced
  buffers
- Reordered the newly add fields in imu struct for better compaction

v1 : https://lore.kernel.org/io-uring/20240506075303.25630-1-cliang01.li@samsung.com/T/#u

Chenliang Li (5):
  io_uring/rsrc: add hugepage buffer coalesce helpers
  io_uring/rsrc: store folio shift and mask into imu
  io_uring/rsrc: add init and account functions for coalesced imus
  io_uring/rsrc: enable multi-hugepage buffer coalescing
  liburing: add test cases for hugepage registered buffers

 io_uring/rsrc.c | 217 +++++++++++++++++++++++++++++++++++++++---------
 io_uring/rsrc.h |  12 +++
 2 files changed, 191 insertions(+), 38 deletions(-)


base-commit: 59b28a6e37e650c0d601ed87875b6217140cda5d

Comments

Anuj gupta May 13, 2024, 12:09 p.m. UTC | #1
On Mon, May 13, 2024 at 1:59 PM Chenliang Li <cliang01.li@samsung.com> wrote:
>
> Registered buffers are stored and processed in the form of bvec array,
> each bvec element typically points to a PAGE_SIZE page but can also work
> with hugepages. Specifically, a buffer consisting of a hugepage is
> coalesced to use only one hugepage bvec entry during registration.
> This coalescing feature helps to save both the space and DMA-mapping time.
>
> However, currently the coalescing feature doesn't work for multi-hugepage
> buffers. For a buffer with several 2M hugepages, we still split it into
> thousands of 4K page bvec entries while in fact, we can just use a
> handful of hugepage bvecs.
>
> This patch series enables coalescing registered buffers with more than
> one hugepages. It optimizes the DMA-mapping time and saves memory for
> these kind of buffers.
>
> Perf diff of 8M(4*2M) hugepage fixed buffer fio test:
>
> fio/t/io_uring -d64 -s32 -c32 -b8388608 -p0 -B1 -F0 -n1 -O1 -r10 \
> -R1 /dev/nvme0n1

It seems you modified t/io_uring to allocate from hugepages. It would be nice
to mention that part here.
--
Anuj Gupta
Jens Axboe May 13, 2024, 1:40 p.m. UTC | #2
On 5/13/24 6:09 AM, Anuj gupta wrote:
> On Mon, May 13, 2024 at 1:59?PM Chenliang Li <cliang01.li@samsung.com> wrote:
>>
>> Registered buffers are stored and processed in the form of bvec array,
>> each bvec element typically points to a PAGE_SIZE page but can also work
>> with hugepages. Specifically, a buffer consisting of a hugepage is
>> coalesced to use only one hugepage bvec entry during registration.
>> This coalescing feature helps to save both the space and DMA-mapping time.
>>
>> However, currently the coalescing feature doesn't work for multi-hugepage
>> buffers. For a buffer with several 2M hugepages, we still split it into
>> thousands of 4K page bvec entries while in fact, we can just use a
>> handful of hugepage bvecs.
>>
>> This patch series enables coalescing registered buffers with more than
>> one hugepages. It optimizes the DMA-mapping time and saves memory for
>> these kind of buffers.
>>
>> Perf diff of 8M(4*2M) hugepage fixed buffer fio test:
>>
>> fio/t/io_uring -d64 -s32 -c32 -b8388608 -p0 -B1 -F0 -n1 -O1 -r10 \
>> -R1 /dev/nvme0n1
> 
> It seems you modified t/io_uring to allocate from hugepages. It would be nice
> to mention that part here.

Yes, please just send a separate series/patch for both liburing and fio.
This series should be strictly the kernel side changes required, then
reference/link the postings for the t/io_uring and liburing test case(s)
in the cover letter.
Chenliang Li May 14, 2024, 12:14 a.m. UTC | #3
On Mon, 13 May 2024 17:41:14 +0530, Anuj Gupta wrote:
> On Mon, May 13, 2024 at 2:04 PM Chenliang Li <cliang01.li@samsung.com> wrote:
>>
>> Modify the original buffer registration path to expand the
>> one-hugepage coalescing feature to work with multi-hugepage
>> buffers. Separated from previous patches to make it more
>> easily reviewed.

> The last line should not be a part of the commit description IMO.

Will delete that. Thanks.
Chenliang Li May 14, 2024, 12:16 a.m. UTC | #4
On Mon, 13 May 2024 17:39:37 +0530, Anuj Gupta wrote:
> On Mon, May 13, 2024 at 1:59 PM Chenliang Li <cliang01.li@samsung.com> wrote:
>>
>> Registered buffers are stored and processed in the form of bvec array,
>> each bvec element typically points to a PAGE_SIZE page but can also work
>> with hugepages. Specifically, a buffer consisting of a hugepage is
>> coalesced to use only one hugepage bvec entry during registration.
>> This coalescing feature helps to save both the space and DMA-mapping time.
>>
>> However, currently the coalescing feature doesn't work for multi-hugepage
>> buffers. For a buffer with several 2M hugepages, we still split it into
>> thousands of 4K page bvec entries while in fact, we can just use a
>> handful of hugepage bvecs.
>>
>> This patch series enables coalescing registered buffers with more than
>> one hugepages. It optimizes the DMA-mapping time and saves memory for
>> these kind of buffers.
>>
>> Perf diff of 8M(4*2M) hugepage fixed buffer fio test:
>>
>> fio/t/io_uring -d64 -s32 -c32 -b8388608 -p0 -B1 -F0 -n1 -O1 -r10 \
>> -R1 /dev/nvme0n1
> 
> It seems you modified t/io_uring to allocate from hugepages. It would be nice
> to mention that part here.

Yeah I forgot to mention that. Thanks for pointing out.
Chenliang Li May 14, 2024, 12:18 a.m. UTC | #5
On Mon, 13 May 2024 07:40:13 -0600, Jens Axboe wrote:
> Yes, please just send a separate series/patch for both liburing and fio.
> This series should be strictly the kernel side changes required, then
> reference/link the postings for the t/io_uring and liburing test case(s)
> in the cover letter.

Sure, will send them separately.