mbox series

[v2,00/16] Reduce preallocations for maple tree

Message ID 20230612203953.2093911-1-Liam.Howlett@oracle.com (mailing list archive)
Headers show
Series Reduce preallocations for maple tree | expand

Message

Liam R. Howlett June 12, 2023, 8:39 p.m. UTC
Initial work on preallocations showed no regression in performance
during testing, but recently some users (both on [1] and off [android]
list) have reported that preallocating the worst-case number of nodes
has caused some slow down.  This patch set addresses the number of
allocations in a few ways.

During munmap() most munmap() operations will remove a single VMA, so
leverage the fact that the maple tree can place a single pointer at
range 0 - 0 without allocating.  This is done by changing the index in
the 'sidetree'.

Re-introduce the entry argument to mas_preallocate() so that a more
intelligent guess of the node count can be made.

During development of v2 of this patch set, I also noticed that the
number of nodes being allocated for a rebalance was beyond what could
possibly be needed.  This is addressed in patch 0008.

Patches are in the following order:
0001-0002: Testing framework for benchmarking some operations
0003-0004: Reduction of maple node allocation in sidetree
0005:      Small cleanup of do_vmi_align_munmap()
0006-0015: mas_preallocate() calculation change
0016:      Change the vma iterator order

Changes since v1:
 - Reduced preallocations for append and slot store - Thanks Peng Zhang
 - Added patch to reduce node allocations for mas_rebalance() (patch 0008)
 - Reduced resets during store setup to avoid duplicate walking (patch 0015)

v1: https://lore.kernel.org/lkml/20230601021605.2823123-1-Liam.Howlett@oracle.com/

[1] https://lore.kernel.org/linux-mm/202305061457.ac15990c-yujie.liu@intel.com/

Liam R. Howlett (16):
  maple_tree: Add benchmarking for mas_for_each
  maple_tree: Add benchmarking for mas_prev()
  mm: Move unmap_vmas() declaration to internal header
  mm: Change do_vmi_align_munmap() side tree index
  mm: Remove prev check from do_vmi_align_munmap()
  maple_tree: Introduce __mas_set_range()
  mm: Remove re-walk from mmap_region()
  maple_tree: Adjust node allocation on mas_rebalance()
  maple_tree: Re-introduce entry to mas_preallocate() arguments
  mm: Use vma_iter_clear_gfp() in nommu
  mm: Set up vma iterator for vma_iter_prealloc() calls
  maple_tree: Move mas_wr_end_piv() below mas_wr_extend_null()
  maple_tree: Update mas_preallocate() testing
  maple_tree: Refine mas_preallocate() node calculations
  maple_tree: Reduce resets during store setup
  mm/mmap: Change vma iteration order in do_vmi_align_munmap()

 fs/exec.c                        |   1 +
 include/linux/maple_tree.h       |  23 ++++-
 include/linux/mm.h               |   4 -
 lib/maple_tree.c                 | 121 ++++++++++++++++------
 lib/test_maple_tree.c            |  74 +++++++++++++
 mm/internal.h                    |  40 ++++++--
 mm/memory.c                      |  19 ++--
 mm/mmap.c                        | 171 ++++++++++++++++---------------
 mm/nommu.c                       |  45 ++++----
 tools/testing/radix-tree/maple.c |  59 ++++++-----
 10 files changed, 364 insertions(+), 193 deletions(-)

Comments

Yin, Fengwei June 15, 2023, 8:33 a.m. UTC | #1
Hi Liam,

On 6/13/2023 4:39 AM, Liam R. Howlett wrote:
> Initial work on preallocations showed no regression in performance
> during testing, but recently some users (both on [1] and off [android]
> list) have reported that preallocating the worst-case number of nodes
> has caused some slow down.  This patch set addresses the number of
> allocations in a few ways.
> 
> During munmap() most munmap() operations will remove a single VMA, so
> leverage the fact that the maple tree can place a single pointer at
> range 0 - 0 without allocating.  This is done by changing the index in
> the 'sidetree'.
> 
> Re-introduce the entry argument to mas_preallocate() so that a more
> intelligent guess of the node count can be made.
> 
> During development of v2 of this patch set, I also noticed that the
> number of nodes being allocated for a rebalance was beyond what could
> possibly be needed.  This is addressed in patch 0008.
> 
> Patches are in the following order:
> 0001-0002: Testing framework for benchmarking some operations
> 0003-0004: Reduction of maple node allocation in sidetree
> 0005:      Small cleanup of do_vmi_align_munmap()
> 0006-0015: mas_preallocate() calculation change
> 0016:      Change the vma iterator order
> 
> Changes since v1:
>  - Reduced preallocations for append and slot store - Thanks Peng Zhang
>  - Added patch to reduce node allocations for mas_rebalance() (patch 0008)
>  - Reduced resets during store setup to avoid duplicate walking (patch 0015)

AIM9.page_test in my local env:
  before-preallocation:                   763812
  preallocation:                          676600
  preallocation fixup v2 (this patchset:  734060

For this specific test, the v2 patch works perfectly. Thanks.

Regards
Yin, Fengwei

> 
> v1: https://lore.kernel.org/lkml/20230601021605.2823123-1-Liam.Howlett@oracle.com/
> 
> [1] https://lore.kernel.org/linux-mm/202305061457.ac15990c-yujie.liu@intel.com/
> 
> Liam R. Howlett (16):
>   maple_tree: Add benchmarking for mas_for_each
>   maple_tree: Add benchmarking for mas_prev()
>   mm: Move unmap_vmas() declaration to internal header
>   mm: Change do_vmi_align_munmap() side tree index
>   mm: Remove prev check from do_vmi_align_munmap()
>   maple_tree: Introduce __mas_set_range()
>   mm: Remove re-walk from mmap_region()
>   maple_tree: Adjust node allocation on mas_rebalance()
>   maple_tree: Re-introduce entry to mas_preallocate() arguments
>   mm: Use vma_iter_clear_gfp() in nommu
>   mm: Set up vma iterator for vma_iter_prealloc() calls
>   maple_tree: Move mas_wr_end_piv() below mas_wr_extend_null()
>   maple_tree: Update mas_preallocate() testing
>   maple_tree: Refine mas_preallocate() node calculations
>   maple_tree: Reduce resets during store setup
>   mm/mmap: Change vma iteration order in do_vmi_align_munmap()
> 
>  fs/exec.c                        |   1 +
>  include/linux/maple_tree.h       |  23 ++++-
>  include/linux/mm.h               |   4 -
>  lib/maple_tree.c                 | 121 ++++++++++++++++------
>  lib/test_maple_tree.c            |  74 +++++++++++++
>  mm/internal.h                    |  40 ++++++--
>  mm/memory.c                      |  19 ++--
>  mm/mmap.c                        | 171 ++++++++++++++++---------------
>  mm/nommu.c                       |  45 ++++----
>  tools/testing/radix-tree/maple.c |  59 ++++++-----
>  10 files changed, 364 insertions(+), 193 deletions(-)
>