mbox series

[v10,00/69] Introducing the Maple Tree

Message ID 20220621204632.3370049-1-Liam.Howlett@oracle.com (mailing list archive)
Headers show
Series Introducing the Maple Tree | expand

Message

Liam R. Howlett June 21, 2022, 8:46 p.m. UTC
Hello,

Andrew this is safe to ignore as it is v9 with all the fixes in
mm-unstable branch with the fixes squashed.  I've revised this patch set
for easier review as requested by David Hildenbrand.

I rebased this patch set on v5.19-rc3

git: https://github.com/oracle/linux-uek/tree/howlett/maple/20220621

Patch series "Introducing the Maple Tree".

The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently.  There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface.  If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.

The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes.  With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses.  The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.

The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
long term goal is to reduce or remove the mmap_lock contention.

The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers.  A single write operation will be
allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.

Davidlor said

: Yes I like the maple tree, and at this stage I don't think we can ask for
: more from this series wrt the MM - albeit there seems to still be some
: folks reporting breakage.  Fundamentally I see Liam's work to (re)move
: complexity out of the MM (not to say that the actual maple tree is not
: complex) by consolidating the three complimentary data structures very
: much worth it considering performance does not take a hit.  This was very
: much a turn off with the range locking approach, which worst case scenario
: incurred in prohibitive overhead.  Also as Liam and Matthew have
: mentioned, RCU opens up a lot of nice performance opportunities, and in
: addition academia[1] has shown outstanding scalability of address spaces
: with the foundation of replacing the locked rbtree with RCU aware trees.

A similar work has been discovered in the academic press

	https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf

Sheer coincidence.  We designed our tree with the intention of solving the
hardest problem first.  Upon settling on a b-tree variant and a rough
outline, we researched ranged based b-trees and RCU b-trees and did find
that article.  So it was nice to find reassurances that we were on the
right path, but our design choice of using ranges made that paper unusable
for us.


Changes:
 - Does not include binder fix - moved above these patches in the tree.
 - Does not include mm/page_alloc optimization to reduce fragmentation -
   moved about these patches in the tree.
 - maple_tree: Fix expanding nulls off the end of the data
 - maple_tree: Fix mas_next() when already on the last node entry
 - maple_tree: Fix 32b parent pointer
 - maple_tree: Fix potential out of range on mas_next()/mas_prev()
 - maple_tree: Fix typo in MAINTAINERS
 - maple_tree: Some checkpatch fix ups
 - maple_tree: fix mt_destroy_walk() on full non-leaf non-alloc nodes
 - maple_tree: change spanning store to work on larger trees
 - maple_tree: make mas_prealloc() error checking more generic
 - maple_tree: Fix return from mas_prealloc()
 - nommu: Fix nommu build issue
 - mm/nommu: fix compile warning in do_mmap()
 - mm/nommu: move preallocations and limit other allocations
 - fs/userfaultfd: fix vma iteration in mas_for_each() loop
 - remove unnecessary include in damon test code
 - mm/mmap: fix leak on expand_downwards() and expand_upwards()
 - mm/mmap: fix do_brk_munmap() when munmapping multiple mappings
 - mm/mmap: fix potential leak on do_mas_align_munmap()
 - mm/mmap: Fix exit_mmap() comment
 - mm/mmap: Drop exit_mmap() mm->mmap = NULL
 - mm/mmap: Allow vma_expand() to lock both anon and file locks
 - mm/mmap: Change do_mas_align_munmap() to avoid preallocations for
   sidetree

I've tried to preserve the data in this change set by adding the links
to the patches sent to the mailing lists to the commit messages where
they were absorbed.


v9: https://lore.kernel.org/lkml/20220504010716.661115-1-Liam.Howlett@oracle.com/
...and
https://lore.kernel.org/lkml/20220504011215.661968-1-Liam.Howlett@oracle.com/

v8: https://lore.kernel.org/lkml/20220426150616.3937571-1-Liam.Howlett@oracle.com/
v7: https://lore.kernel.org/linux-mm/20220404143501.2016403-8-Liam.Howlett@oracle.com/
v6: https://lore.kernel.org/linux-mm/20220215143728.3810954-1-Liam.Howlett@oracle.com/
v5: https://lore.kernel.org/linux-mm/20220202024137.2516438-1-Liam.Howlett@oracle.com/
v4: https://lore.kernel.org/linux-mm/20211201142918.921493-1-Liam.Howlett@oracle.com/
v3: https://lore.kernel.org/linux-mm/20211005012959.1110504-1-Liam.Howlett@oracle.com/
v2: https://lore.kernel.org/linux-mm/20210817154651.1570984-1-Liam.Howlett@oracle.com/
v1: https://lore.kernel.org/linux-mm/20210428153542.2814175-1-Liam.Howlett@Oracle.com/


Liam R. Howlett (44):
  Maple Tree: add new data structure
  radix tree test suite: add pr_err define
  radix tree test suite: add kmem_cache_set_non_kernel()
  radix tree test suite: add allocation counts and size to kmem_cache
  radix tree test suite: add support for slab bulk APIs
  radix tree test suite: add lockdep_is_held to header
  lib/test_maple_tree: add testing for maple tree
  mm: start tracking VMAs with maple tree
  mm/mmap: use the maple tree in find_vma() instead of the rbtree.
  mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree
  mm/mmap: use maple tree for unmapped_area{_topdown}
  kernel/fork: use maple tree for dup_mmap() during forking
  damon: convert __damon_va_three_regions to use the VMA iterator
  mm: remove rb tree.
  mmap: change zeroing of maple tree in __vma_adjust()
  xen: use vma_lookup() in privcmd_ioctl_mmap()
  mm: optimize find_exact_vma() to use vma_lookup()
  mm/khugepaged: optimize collapse_pte_mapped_thp() by using
    vma_lookup()
  mm/mmap: change do_brk_flags() to expand existing VMA and add
    do_brk_munmap()
  mm: use maple tree operations for find_vma_intersection()
  mm/mmap: use advanced maple tree API for mmap_region()
  mm: remove vmacache
  mm: convert vma_lookup() to use mtree_load()
  mm/mmap: move mmap_region() below do_munmap()
  mm/mmap: reorganize munmap to use maple states
  mm/mmap: change do_brk_munmap() to use do_mas_align_munmap()
  arm64: Change elfcore for_each_mte_vma() to use VMA iterator
  fs/proc/base: use maple tree iterators in place of linked list
  userfaultfd: use maple tree iterator to iterate VMAs
  ipc/shm: use VMA iterator instead of linked list
  bpf: remove VMA linked list
  mm/gup: use maple tree navigation instead of linked list
  mm/madvise: use vma_find() instead of vma linked list
  mm/memcontrol: stop using mm->highest_vm_end
  mm/mempolicy: use vma iterator & maple state instead of vma linked
    list
  mm/mprotect: use maple tree navigation instead of vma linked list
  mm/mremap: use vma_find_intersection() instead of vma linked list
  mm/msync: use vma_find() instead of vma linked list
  mm/oom_kill: use maple tree iterators instead of vma linked list
  mm/swapfile: use vma iterator instead of vma linked list
  riscv: use vma iterator for vdso
  mm: remove the vma linked list
  mm/mmap: drop range_has_overlap() function
  mm/mmap.c: pass in mapping to __vma_link_file()

Matthew Wilcox (Oracle) (25):
  mm: add VMA iterator
  mmap: use the VMA iterator in count_vma_pages_range()
  proc: remove VMA rbtree use from nommu
  arm64: remove mmap linked list from vdso
  parisc: remove mmap linked list from cache handling
  powerpc: remove mmap linked list walks
  s390: remove vma linked list walks
  x86: remove vma linked list walks
  xtensa: remove vma linked list walks
  cxl: remove vma linked list walk
  optee: remove vma linked list walk
  um: remove vma linked list walk
  coredump: remove vma linked list walk
  exec: use VMA iterator instead of linked list
  fs/proc/task_mmu: stop using linked list and highest_vm_end
  acct: use VMA iterator instead of linked list
  perf: use VMA iterator
  sched: use maple tree iterator to walk VMAs
  fork: use VMA iterator
  mm/khugepaged: stop using vma linked list
  mm/ksm: use vma iterators instead of vma linked list
  mm/mlock: use vma iterator and maple state instead of vma linked list
  mm/pagewalk: use vma_find() instead of vma linked list
  i915: use the VMA iterator
  nommu: remove uses of VMA linked list

 Documentation/core-api/index.rst              |     1 +
 Documentation/core-api/maple_tree.rst         |   217 +
 MAINTAINERS                                   |    12 +
 arch/arm64/kernel/elfcore.c                   |    16 +-
 arch/arm64/kernel/vdso.c                      |     3 +-
 arch/parisc/kernel/cache.c                    |     9 +-
 arch/powerpc/kernel/vdso.c                    |     6 +-
 arch/powerpc/mm/book3s32/tlb.c                |    11 +-
 arch/powerpc/mm/book3s64/subpage_prot.c       |    13 +-
 arch/riscv/kernel/vdso.c                      |     3 +-
 arch/s390/kernel/vdso.c                       |     3 +-
 arch/s390/mm/gmap.c                           |     6 +-
 arch/um/kernel/tlb.c                          |    14 +-
 arch/x86/entry/vdso/vma.c                     |     9 +-
 arch/x86/kernel/tboot.c                       |     2 +-
 arch/xtensa/kernel/syscall.c                  |    18 +-
 drivers/firmware/efi/efi.c                    |     2 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |    14 +-
 drivers/misc/cxl/fault.c                      |    45 +-
 drivers/tee/optee/call.c                      |    18 +-
 drivers/xen/privcmd.c                         |     2 +-
 fs/coredump.c                                 |    34 +-
 fs/exec.c                                     |    12 +-
 fs/proc/base.c                                |     5 +-
 fs/proc/internal.h                            |     2 +-
 fs/proc/task_mmu.c                            |    74 +-
 fs/proc/task_nommu.c                          |    45 +-
 fs/userfaultfd.c                              |    62 +-
 include/linux/maple_tree.h                    |   685 +
 include/linux/mm.h                            |    77 +-
 include/linux/mm_types.h                      |    43 +-
 include/linux/mm_types_task.h                 |    12 -
 include/linux/sched.h                         |     1 -
 include/linux/userfaultfd_k.h                 |     7 +-
 include/linux/vm_event_item.h                 |     4 -
 include/linux/vmacache.h                      |    28 -
 include/linux/vmstat.h                        |     6 -
 include/trace/events/maple_tree.h             |   123 +
 include/trace/events/mmap.h                   |    73 +
 init/main.c                                   |     2 +
 ipc/shm.c                                     |    21 +-
 kernel/acct.c                                 |    11 +-
 kernel/bpf/task_iter.c                        |    10 +-
 kernel/debug/debug_core.c                     |    12 -
 kernel/events/core.c                          |     3 +-
 kernel/events/uprobes.c                       |     9 +-
 kernel/fork.c                                 |    57 +-
 kernel/sched/fair.c                           |    10 +-
 lib/Kconfig.debug                             |    17 +-
 lib/Makefile                                  |     3 +-
 lib/maple_tree.c                              |  7041 +++
 lib/test_maple_tree.c                         | 38186 ++++++++++++++++
 mm/Makefile                                   |     2 +-
 mm/damon/vaddr-test.h                         |    36 +-
 mm/damon/vaddr.c                              |    53 +-
 mm/debug.c                                    |    14 +-
 mm/gup.c                                      |     7 +-
 mm/huge_memory.c                              |     4 +-
 mm/init-mm.c                                  |     4 +-
 mm/internal.h                                 |     8 +-
 mm/khugepaged.c                               |    13 +-
 mm/ksm.c                                      |    18 +-
 mm/madvise.c                                  |     2 +-
 mm/memcontrol.c                               |     6 +-
 mm/memory.c                                   |    33 +-
 mm/mempolicy.c                                |    56 +-
 mm/mlock.c                                    |    35 +-
 mm/mmap.c                                     |  2173 +-
 mm/mprotect.c                                 |     7 +-
 mm/mremap.c                                   |    22 +-
 mm/msync.c                                    |     2 +-
 mm/nommu.c                                    |   249 +-
 mm/oom_kill.c                                 |     3 +-
 mm/pagewalk.c                                 |     2 +-
 mm/swapfile.c                                 |     4 +-
 mm/util.c                                     |    32 -
 mm/vmacache.c                                 |   117 -
 mm/vmstat.c                                   |     4 -
 tools/include/linux/slab.h                    |     4 +
 tools/testing/radix-tree/.gitignore           |     2 +
 tools/testing/radix-tree/Makefile             |     9 +-
 tools/testing/radix-tree/generated/autoconf.h |     1 +
 tools/testing/radix-tree/linux.c              |   160 +-
 tools/testing/radix-tree/linux/kernel.h       |     1 +
 tools/testing/radix-tree/linux/lockdep.h      |     2 +
 tools/testing/radix-tree/linux/maple_tree.h   |     7 +
 tools/testing/radix-tree/maple.c              |    59 +
 .../radix-tree/trace/events/maple_tree.h      |     5 +
 88 files changed, 48396 insertions(+), 1859 deletions(-)
 create mode 100644 Documentation/core-api/maple_tree.rst
 create mode 100644 include/linux/maple_tree.h
 delete mode 100644 include/linux/vmacache.h
 create mode 100644 include/trace/events/maple_tree.h
 create mode 100644 lib/maple_tree.c
 create mode 100644 lib/test_maple_tree.c
 delete mode 100644 mm/vmacache.c
 create mode 100644 tools/testing/radix-tree/linux/maple_tree.h
 create mode 100644 tools/testing/radix-tree/maple.c
 create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h

Comments

David Hildenbrand June 21, 2022, 9:04 p.m. UTC | #1
On 21.06.22 22:46, Liam Howlett wrote:
> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> 
> Using the maple tree interface mt_find() will handle the RCU locking and
> will start searching at the address up to the limit, ULONG_MAX in this
> case.
> 
> Add kernel documentation to this API.
> 
> Link: https://lkml.kernel.org/r/20220504010716.661115-13-Liam.Howlett@oracle.com
> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>  mm/mmap.c | 28 ++++++++++------------------
>  1 file changed, 10 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d7e6baa2f40f..fdb61252448f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2486,11 +2486,18 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>  
>  EXPORT_SYMBOL(get_unmapped_area);
>  
> -/* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
> +/**
> + * find_vma() - Find the VMA for a given address, or the next vma.
> + * @mm: The mm_struct to check
> + * @addr: The address
> + *
> + * Returns: The VMA associated with addr, or the next vma.
> + * May return %NULL in the case of no vma at addr or above.

Nit: inconsistent use of VMA vs. vma.

> + */
>  struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
>  {
> -	struct rb_node *rb_node;
>  	struct vm_area_struct *vma;
> +	unsigned long index = addr;
>  
>  	mmap_assert_locked(mm);
>  	/* Check the cache first. */
> @@ -2498,22 +2505,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
>  	if (likely(vma))
>  		return vma;
>  
> -	rb_node = mm->mm_rb.rb_node;
> -
> -	while (rb_node) {
> -		struct vm_area_struct *tmp;
> -
> -		tmp = rb_entry(rb_node, struct vm_area_struct, vm_rb);
> -
> -		if (tmp->vm_end > addr) {
> -			vma = tmp;
> -			if (tmp->vm_start <= addr)
> -				break;
> -			rb_node = rb_node->rb_left;
> -		} else
> -			rb_node = rb_node->rb_right;
> -	}
> -
> +	vma = mt_find(&mm->mm_mt, &index, ULONG_MAX);

I guess it would be handy to have a mt_find() variant that simply
consumes an address, because for example here, we don't actually care
about the output semantics? Does anything speak against such a utility
function or is this here really just a corner case?

That would make that code *even easier* to read.

>  	if (vma)
>  		vmacache_update(addr, vma);
>  	return vma;

Reviewed-by: David Hildenbrand <david@redhat.com>
David Hildenbrand June 21, 2022, 9:13 p.m. UTC | #2
On 21.06.22 22:46, Liam Howlett wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> This simplifies the implementation and is faster than using the linked
> list.
> 
> Link: https://lkml.kernel.org/r/20220504010716.661115-12-Liam.Howlett@oracle.com
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>  mm/mmap.c | 24 +++++++-----------------
>  1 file changed, 7 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 6be7833c781b..d7e6baa2f40f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -675,29 +675,19 @@ munmap_vma_range(struct mm_struct *mm, unsigned long start, unsigned long len,
>  
>  	return 0;
>  }
> +
>  static unsigned long count_vma_pages_range(struct mm_struct *mm,
>  		unsigned long addr, unsigned long end)
>  {
> -	unsigned long nr_pages = 0;
> +	VMA_ITERATOR(vmi, mm, addr);
>  	struct vm_area_struct *vma;
> +	unsigned long nr_pages = 0;
>  
> -	/* Find first overlapping mapping */
> -	vma = find_vma_intersection(mm, addr, end);
> -	if (!vma)
> -		return 0;
> -
> -	nr_pages = (min(end, vma->vm_end) -
> -		max(addr, vma->vm_start)) >> PAGE_SHIFT;
> -
> -	/* Iterate over the rest of the overlaps */
> -	for (vma = vma->vm_next; vma; vma = vma->vm_next) {
> -		unsigned long overlap_len;
> -
> -		if (vma->vm_start > end)
> -			break;
> +	for_each_vma_range(vmi, vma, end) {
> +		unsigned long vm_start = max(addr, vma->vm_start);
> +		unsigned long vm_end = min(end, vma->vm_end);

I thought using max_t and min_t was the latest advisory. I might be
wrong and simply kept doing it that way ;)

>  
> -		overlap_len = min(end, vma->vm_end) - vma->vm_start;
> -		nr_pages += overlap_len >> PAGE_SHIFT;
> +		nr_pages += (vm_end - vm_start) / PAGE_SIZE;

PHYS_PFN(vm_end - vm_start)

>  	}
>  
>  	return nr_pages;

Reviewed-by: David Hildenbrand <david@redhat.com>
Liam R. Howlett June 24, 2022, 1:05 p.m. UTC | #3
* David Hildenbrand <david@redhat.com> [220621 17:04]:
> On 21.06.22 22:46, Liam Howlett wrote:
> > From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> > 
> > Using the maple tree interface mt_find() will handle the RCU locking and
> > will start searching at the address up to the limit, ULONG_MAX in this
> > case.
> > 
> > Add kernel documentation to this API.
> > 
> > Link: https://lkml.kernel.org/r/20220504010716.661115-13-Liam.Howlett@oracle.com
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: David Howells <dhowells@redhat.com>
> > Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > Cc: SeongJae Park <sj@kernel.org>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Davidlohr Bueso <dave@stgolabs.net>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > ---
> >  mm/mmap.c | 28 ++++++++++------------------
> >  1 file changed, 10 insertions(+), 18 deletions(-)
> > 
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index d7e6baa2f40f..fdb61252448f 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2486,11 +2486,18 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> >  
> >  EXPORT_SYMBOL(get_unmapped_area);
> >  
> > -/* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
> > +/**
> > + * find_vma() - Find the VMA for a given address, or the next vma.
> > + * @mm: The mm_struct to check
> > + * @addr: The address
> > + *
> > + * Returns: The VMA associated with addr, or the next vma.
> > + * May return %NULL in the case of no vma at addr or above.
> 
> Nit: inconsistent use of VMA vs. vma.

I'll update the comment.

> 
> > + */
> >  struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> >  {
> > -	struct rb_node *rb_node;
> >  	struct vm_area_struct *vma;
> > +	unsigned long index = addr;
> >  
> >  	mmap_assert_locked(mm);
> >  	/* Check the cache first. */
> > @@ -2498,22 +2505,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> >  	if (likely(vma))
> >  		return vma;
> >  
> > -	rb_node = mm->mm_rb.rb_node;
> > -
> > -	while (rb_node) {
> > -		struct vm_area_struct *tmp;
> > -
> > -		tmp = rb_entry(rb_node, struct vm_area_struct, vm_rb);
> > -
> > -		if (tmp->vm_end > addr) {
> > -			vma = tmp;
> > -			if (tmp->vm_start <= addr)
> > -				break;
> > -			rb_node = rb_node->rb_left;
> > -		} else
> > -			rb_node = rb_node->rb_right;
> > -	}
> > -
> > +	vma = mt_find(&mm->mm_mt, &index, ULONG_MAX);
> 
> I guess it would be handy to have a mt_find() variant that simply
> consumes an address, because for example here, we don't actually care
> about the output semantics? Does anything speak against such a utility
> function or is this here really just a corner case?
> 
> That would make that code *even easier* to read.

That makes sense.  I just don't want to have a whole lot of APIs that
do the same thing.  I'll add it to my list and see how often we see the
pattern though.

> 
> >  	if (vma)
> >  		vmacache_update(addr, vma);
> >  	return vma;
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> -- 
> Thanks,
> 
> David / dhildenb
>
Liam R. Howlett June 24, 2022, 1:10 p.m. UTC | #4
* David Hildenbrand <david@redhat.com> [220621 17:13]:
> On 21.06.22 22:46, Liam Howlett wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > This simplifies the implementation and is faster than using the linked
> > list.
> > 
> > Link: https://lkml.kernel.org/r/20220504010716.661115-12-Liam.Howlett@oracle.com
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: David Howells <dhowells@redhat.com>
> > Cc: SeongJae Park <sj@kernel.org>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Davidlohr Bueso <dave@stgolabs.net>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > ---
> >  mm/mmap.c | 24 +++++++-----------------
> >  1 file changed, 7 insertions(+), 17 deletions(-)
> > 
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 6be7833c781b..d7e6baa2f40f 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -675,29 +675,19 @@ munmap_vma_range(struct mm_struct *mm, unsigned long start, unsigned long len,
> >  
> >  	return 0;
> >  }
> > +
> >  static unsigned long count_vma_pages_range(struct mm_struct *mm,
> >  		unsigned long addr, unsigned long end)
> >  {
> > -	unsigned long nr_pages = 0;
> > +	VMA_ITERATOR(vmi, mm, addr);
> >  	struct vm_area_struct *vma;
> > +	unsigned long nr_pages = 0;
> >  
> > -	/* Find first overlapping mapping */
> > -	vma = find_vma_intersection(mm, addr, end);
> > -	if (!vma)
> > -		return 0;
> > -
> > -	nr_pages = (min(end, vma->vm_end) -
> > -		max(addr, vma->vm_start)) >> PAGE_SHIFT;
> > -
> > -	/* Iterate over the rest of the overlaps */
> > -	for (vma = vma->vm_next; vma; vma = vma->vm_next) {
> > -		unsigned long overlap_len;
> > -
> > -		if (vma->vm_start > end)
> > -			break;
> > +	for_each_vma_range(vmi, vma, end) {
> > +		unsigned long vm_start = max(addr, vma->vm_start);
> > +		unsigned long vm_end = min(end, vma->vm_end);
> 
> I thought using max_t and min_t was the latest advisory. I might be
> wrong and simply kept doing it that way ;)

They are the same type so I think this is okay.

> 
> >  
> > -		overlap_len = min(end, vma->vm_end) - vma->vm_start;
> > -		nr_pages += overlap_len >> PAGE_SHIFT;
> > +		nr_pages += (vm_end - vm_start) / PAGE_SIZE;
> 
> PHYS_PFN(vm_end - vm_start)
> 
> >  	}
> >  
> >  	return nr_pages;
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> -- 
> Thanks,
> 
> David / dhildenb
>