mbox series

[v3,00/10] mm: remove vma_merge()

Message ID cover.1725040657.git.lorenzo.stoakes@oracle.com (mailing list archive)
Headers show
Series mm: remove vma_merge() | expand

Message

Lorenzo Stoakes Aug. 30, 2024, 6:10 p.m. UTC
Andrew: This is rebased on v8 of Liam's series [4], so the ordering
between our series should be to merge his first and then mine on top of
that. Thanks!


The infamous vma_merge() function has been the cause of a great deal of
pain, bugs and confusion for a very long time.

It is subtle, contains many corner cases, tries to do far too much and is
as a result very fragile.

The fact that the function requires there to be a numbering system to cover
each possible eventuality with references to each in the many branches of
its implementation as to which case you are looking at speaks to all this.

Some of this complexity is inherent - unfortunately there is no getting
away from the need to figure out precisely how to execute the merge,
whether we need to remove VMAs, whether it is safe to do so, what
constitutes a mergeable VMA and so on.

However, a lot of the complexity is not inherent but instead a product of
the function's 'organic' development.

Liam has gone to great lengths to improve the situation as a part of his
maple tree implementation, greatly improving the readability of the code,
and Vlastimil and myself have additionally gone to lengths to try to
improve things further.

However, with the availability of userland VMA testing, it now becomes
possible to perform a rather more significant refactoring while maintaining
confidence in its correct operation.

An attempt was previously made by Vlastimil [0] to eliminate vma_merge(),
however it was rather - brutal - and an astute reader might refer to the
date of that patch for insight as to its intent.

This series instead divides merge operations into two natural kinds -
merges which occur when a NEW vma is being added to the address space, and
merges which occur when a vma is being MODIFIED.

Happily, the vma_expand() function introduced by Liam, which has the
capacity for also deleting a subsequent VMA, covers each of the NEW vma
cases.

By abstracting the actual final commit of changes to a VMA to its own
function, commit_merge() and writing a wrapper around vma_expand() for new
VMA cases vma_merge_new_range(), we can avoid having to use vma_merge() for
these instances altogether.

By doing so we are also able to then de-duplicate all existing merge logic
in mmap_region() and do_brk_flags() and have everything invoke this new
function, so we universally take the same approach to merging new VMAs.

Having done so, we can then completely rework vma_merge() into
vma_merge_existing_range() and use this for the instances where a merge is
proposed for a region of an existing VMA.

This eliminates vma_merge() and its numbered cases and instead divides
things into logical cases - merge both, merge left, merge right (the latter
2 being either partial or full merges).

The code is heavily annotated with ASCII diagrams and greatly simplified in
comparison to the existing vma_merge() function.

Having made this change, we take the opportunity to address an issue with
merging VMAs possessing a vm_ops->close() hook - commit 714965ca8252
("mm/mmap: start distinguishing if vma can be removed in mergeability
test") and commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case 7 with
vma_ops->close") make efforts to relax how we handle these, making
assumptions about which VMAs might end up deleted (and thus, if possessing
a vm_ops->close() hook, cannot be).

This refactor means we do not need to guess, so instead explicitly only
disallow merge in instances where a VMA with a vm_ops->close() hook would
be deleted (and try a smaller merge in cases where this is possible).

In addition to these changes, we introduce a new vma_merge_struct
abstraction to allow VMA merge state to be threaded through the operation
neatly.

There is heavy unit testing provided for all merge functionality, added
prior to the refactoring, allowing for before/after testing.

The vm_ops->close() change also introduces exhaustive testing to
demonstrate that this functions as expected, and in addition to this the
reproduction code from commit fc0c8f9089c2 ("mm, mmap: fix vma_merge() case
7 with vma_ops->close") was tested and confirmed passing.

[0]:https://lore.kernel.org/linux-mm/20240401192623.18575-2-vbabka@suse.cz/
[1]:https://lore.kernel.org/all/20240830040101.822209-1-Liam.Howlett@oracle.com/
[2]:https://lore.kernel.org/linux-mm/c0ef6b6a-1c9b-4da2-a180-c8e1c73b1c28@lucifer.local/
[3]:https://lore.kernel.org/all/9dcddc2c-482b-4e12-a409-eee8d902ba26@lucifer.local/
[4]:https://lore.kernel.org/all/20240830040101.822209-1-Liam.Howlett@oracle.com/

v3:
* Rebased on Liam's v8 'Avoid MAP_FIXED gap exposure' series [1].
* Fixed issue with copy_vma() vma iterator positioning as per [2] (formerly
  fixed via a fix patch).
* Fixed issue with vma_merge_expand() not correctly obtaining the next VMA as
  per [3] (formerly fixed via a fix patch) - Thanks Mark Brown!
* General whitespace fixes.
* Improved comments.
* Added comments for bool params for clarity.
* Removed unnecessary syntactic change in vma_merge().
* Removed unnecessary else from mmap_region().
* Introduced vma_iter_next_rewind(), are_anon_vmas_compatible(),
  can_vma_merge_left(), can_vma_merge_right().
* Cleaned up logic in vma_merge_new_range().
* Cleaned up logic in vma_merge_existing_range().
* Eliminated vma_lookup() from all VMA merge code.
* Added vma_merge_extend() regression test + confirmed fails before fix + passes
  after.
* Added copy_vma() regression test + confirmed triggers assert before fix +
  doesn't after.
* Confirmed _all_ self-tests passing at same rate before/after changes.
* Confirmed no perf impact.

v2:
* Updated tests to function without the vmg change, and moved earlier in
  series so we can test against the code _exactly_ as it was previously.
* Added vmg->mm to store mm_struct and avoid hacky container_of() in
  vma_merge() prior to refactor. It's logical to thread this through.
* Stopped specifying vmg->vma for vma_merge_new_vma() from the start,
  which was previously removed later in the series.
* Improve vma_modify_flags() to be better formatted for a large number of
  flags.
* Removed if (vma) { ... } logic in mmap_region() and integrated the
  approach from a later commit of putting logic into the if (next &&... )
  block. Improved comment about why we are doing this.
* Introduced VMG_STATE() and VMG_VMA_STATE() macros and use these to avoid
  duplication of initialisation of vmg state.
* Expanded the commit message for abstracting the policy comparison to
  explain the logic.
* Reverted the use of vmg in vma_shrink() and split_vma().
* Reverted the cleanup of __split_vma() int -> bool as at this point fully
  irrelevant to series.
* Reinstated incorrectly removed vmg.uffd_ctx assignment in mmap_region().
* Removed a confusing comment about assignment of vmg.end in early version
  of mmap_region().
* Renamed vma_merge_new_vma() to vma_merge_new_range() and
  vma_merge_modified() to vma_merge_existing_range(). This makes it clearer
  what we're attempting to do.
* Stopped setting vmg parameters in do_brk_flags() that we did not set in
  the original implementation, i.e. vma parameters for things like
  anon_vma, uffd context, etc. which in the original implementation are not
  checked in can_vma_merge_after().
* Moved VM_SPECIAL maple tree rewalk out of if (!prev && !next) { ... }
  block in vma_merge_new_range() (which was changed to !next anyway). This
  should always be done in the VM_SPECIAL case if vmg->prev is specified.
* Updated vma_merge_new_range() to correct the case where prev, next could
  be merged individually with the proposed range, however not
  together.
* Update vma_merge_new_range() to require that the caller sets prev and
  next. This simplifies the logic and avoids unnecessary maple tree walks.
* Updated mmap_region() to update vmg->flags from vma->vm_flags on merge
  reattempt.
* Updated callers of vma_merge_new_range() to ensure we always point the
  iterator at prev if it exists.
* Added new state field to vmg to allow for errors to be returned.
* Adjusted do_brk_flags() to read vmg->state and handle memory allocation
  failures accordingly.
* Do not double-assign VM_SOFTDIRTY in do_brk_flags().
* Separated out move of vma_prepare(), init_vma_prep(), vma_complete(),
  can_vma_merge_before(), can_vma_merge_after() functions to separate
  commit.
* Adjusted commit_merge() change to initially _only_ have parameters
  relevant to vma_expand() to make review easier.
* Reinstated 'vma iterator must be pointing to start' comment in
  commit_merge().
* Adjusted commit_merge() again when introducing vma_merge_existing_range()
  to accept parameters specific to existing range merges.
* Removed unnecessary abstraction of vmg->end in vma_merge_existing_range()
  as only used once.
* Abstract expanded parameter to local variable for clarity in
  vma_merge_existing_range().
* Unlink anon_vma objects if VMA pre-allocation fails on commit_merge() in
  vma_merge_existing_range() if any were duplicated. This was incorrectly
  excluded from the refactor.
* Moved comment from close commit regarding merge_will_delete_both to
  previous commit as unchanged behaviour.
* Corrected failure to assign vmg->flags after applying VM_ACCOUNT in
  map_region() (this had caused a ~5% regression in do_brk_flags()
  incidentally, now resolved).
* Added vmi assumptions and asserts in merge functions.
* Added lock asserts in merge functions.
* Added an assert to vma_merge_new_range() to ensure no VMA within
  [vmg->start, vmg->end).
* Added additional comments describing why we are moving the iterator to
  avoid maple tree re-walks.
* Added new test for the case of prev, next both with vm_ops->close()
  adding a new VMA, which should result in prev being expanded but NOT
  merged with next.
* Adjusted test code to do a mock version of anon_vma duplication, and
  cleanup after itself.
* Adjusted test code to allow vma preallocation to fail so we can test
  how we handle this.
* Added a test to assert correct anon_vma duplication behaviour.
* Added a test to assert that preallocation failure results in anon_vma's
  being unlinked.
* Corrected vma_expand() assumption - we need vma, next not prev.
* Reinstated removed VM_WARN_ON() around vp.anon_vma state in
  commit_merge().
* Rebased over Pedro + Liam's changes.
* Updated test logic to handle current->{mm,pid,comm} fields after rebase
  on Liam's changes which use these. Also added stub for pr_warn_once() for
  the same reason.
* Adjusted logic fundamentals based on rebase - vma_merge_new_range() now
  assumes vmi is pointing at the gap...
https://lore.kernel.org/all/cover.1724441678.git.lorenzo.stoakes@oracle.com/

v1:
https://lore.kernel.org/linux-mm/cover.1722849859.git.lorenzo.stoakes@oracle.com/

Lorenzo Stoakes (10):
  tools: improve vma test Makefile
  tools: add VMA merge tests
  mm: introduce vma_merge_struct and abstract vma_merge(),vma_modify()
  mm: remove duplicated open-coded VMA policy check
  mm: abstract vma_expand() to use vma_merge_struct
  mm: avoid using vma_merge() for new VMAs
  mm: make vma_prepare() and friends static and internal to vma.c
  mm: introduce commit_merge(), abstracting final commit of merge
  mm: refactor vma_merge() into modify-only vma_merge_existing_range()
  mm: rework vm_ops->close() handling on VMA merge

 mm/mmap.c                        |  103 +--
 mm/vma.c                         | 1307 ++++++++++++++++------------
 mm/vma.h                         |  179 ++--
 tools/testing/vma/Makefile       |    6 +-
 tools/testing/vma/vma.c          | 1366 +++++++++++++++++++++++++++++-
 tools/testing/vma/vma_internal.h |   51 +-
 6 files changed, 2316 insertions(+), 696 deletions(-)

--
2.46.0