diff mbox series

[2/5] mm: further refactor commit_merge()

Message ID 0bbb2efbdc2ac2af2f6d73679cbb0811544d0647.1737929364.git.lorenzo.stoakes@oracle.com (mailing list archive)
State New
Headers show
Series mm: further simplify VMA merge operation | expand

Commit Message

Lorenzo Stoakes Jan. 27, 2025, 3:50 p.m. UTC
The current VMA merge mechanism contains a number of confusing mechanisms
around removal of VMAs on merge and the shrinking of the VMA adjacent to
vma->target in the case of merges which result in a partial merge with that
adjacent VMA.

Since we now have a STABLE set of VMAs - prev, middle, next - we are now
able to have the caller of commit_merge() explicitly tell us which VMAs
need deleting, using newly introduced internal VMA merge flags.

Doing so allows us to embed this state within the VMG and remove the
confusing remove, remove2 parameters from commit_merge().

We additionally are able to eliminate the highly confusing and misleading
'expanded' parameter - a parameter that in reality refers to whether or not
the return VMA is the target one or the one immediately adjacent.

We can infer which is the case from whether or not the adj_start parameter
is negative. This also allows us to simplify further logic around iterator
configuration and VMA iterator stores.

Doing so means we can also eliminate the adjust parameter, as we are able
to infer which VMA ought to be adjusted from adj_start - a positive value
implies we adjust the start of 'middle', a negative one implies we adjust
the start of 'next'.

We are then able to have commit_merge() explicitly return the target VMA,
or NULL on inability to pre-allocate memory. Errors were previously
filtered so behaviour does not change.

This patch has no change in functional behaviour.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/vma.c                | 83 +++++++++++++++++++++++------------------
 mm/vma.h                | 10 +++++
 tools/testing/vma/vma.c |  3 ++
 3 files changed, 60 insertions(+), 36 deletions(-)

Comments

Vlastimil Babka Jan. 28, 2025, 3:05 p.m. UTC | #1
On 1/27/25 16:50, Lorenzo Stoakes wrote:
> The current VMA merge mechanism contains a number of confusing mechanisms
> around removal of VMAs on merge and the shrinking of the VMA adjacent to
> vma->target in the case of merges which result in a partial merge with that
> adjacent VMA.
> 
> Since we now have a STABLE set of VMAs - prev, middle, next - we are now
> able to have the caller of commit_merge() explicitly tell us which VMAs
> need deleting, using newly introduced internal VMA merge flags.
> 
> Doing so allows us to embed this state within the VMG and remove the
> confusing remove, remove2 parameters from commit_merge().
> 
> We additionally are able to eliminate the highly confusing and misleading
> 'expanded' parameter - a parameter that in reality refers to whether or not
> the return VMA is the target one or the one immediately adjacent.
> 
> We can infer which is the case from whether or not the adj_start parameter
> is negative. This also allows us to simplify further logic around iterator
> configuration and VMA iterator stores.
> 
> Doing so means we can also eliminate the adjust parameter, as we are able
> to infer which VMA ought to be adjusted from adj_start - a positive value
> implies we adjust the start of 'middle', a negative one implies we adjust
> the start of 'next'.
> 
> We are then able to have commit_merge() explicitly return the target VMA,
> or NULL on inability to pre-allocate memory. Errors were previously
> filtered so behaviour does not change.
> 
> This patch has no change in functional behaviour.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Some of the parts would be quite a hack for a final state of things, but I
understand it's a intermediate step for review purposes.

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Vlastimil Babka Jan. 28, 2025, 3:45 p.m. UTC | #2
On 1/27/25 16:50, Lorenzo Stoakes wrote:
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -67,6 +67,16 @@ enum vma_merge_flags {
>  	 * at the gap.
>  	 */
>  	VMG_FLAG_JUST_EXPAND = 1 << 0,
> +	/*
> +	 * Internal flag used during the merge operation to indicate we will
> +	 * remove vmg->middle.
> +	 */
> +	__VMG_FLAG_REMOVE_MIDDLE = 1 << 1,
> +	/*
> +	 * Internal flag used during the merge operationr to indicate we will
> +	 * remove vmg->next.
> +	 */
> +	__VMG_FLAG_REMOVE_NEXT = 1 << 2,
>  };

Hm this is actually kinda weird? It's an enum, but the values of it are
defined as different bits. And then struct vma_merge_struct has a "enum
vma_merge_flags merge_flags;" but we don't store to it a single "enum
vma_merge_flags" value defined above, but a combination of those. Is that
even legal to do in C?

AFAIK the more common pattern is enum that has normal incremental values
that are used for the shifts.

But I don't think we need all of this at all here? Just have bitfields in
struct vma_merge_struct?

bool just_expand : 1;
bool remove_middle : 1;
...

>  /*
> diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
> index 3c0572120e94..8cce67237d86 100644
> --- a/tools/testing/vma/vma.c
> +++ b/tools/testing/vma/vma.c
> @@ -154,6 +154,9 @@ static void vmg_set_range(struct vma_merge_struct *vmg, unsigned long start,
>  	vmg->end = end;
>  	vmg->pgoff = pgoff;
>  	vmg->flags = flags;
> +
> +	vmg->merge_flags = VMG_FLAG_DEFAULT;
> +	vmg->target = NULL;
>  }
>  
>  /*
Lorenzo Stoakes Jan. 28, 2025, 4:07 p.m. UTC | #3
On Tue, Jan 28, 2025 at 04:45:01PM +0100, Vlastimil Babka wrote:
> On 1/27/25 16:50, Lorenzo Stoakes wrote:
> > --- a/mm/vma.h
> > +++ b/mm/vma.h
> > @@ -67,6 +67,16 @@ enum vma_merge_flags {
> >  	 * at the gap.
> >  	 */
> >  	VMG_FLAG_JUST_EXPAND = 1 << 0,
> > +	/*
> > +	 * Internal flag used during the merge operation to indicate we will
> > +	 * remove vmg->middle.
> > +	 */
> > +	__VMG_FLAG_REMOVE_MIDDLE = 1 << 1,
> > +	/*
> > +	 * Internal flag used during the merge operationr to indicate we will
> > +	 * remove vmg->next.
> > +	 */
> > +	__VMG_FLAG_REMOVE_NEXT = 1 << 2,
> >  };
>
> Hm this is actually kinda weird? It's an enum, but the values of it are
> defined as different bits. And then struct vma_merge_struct has a "enum
> vma_merge_flags merge_flags;" but we don't store to it a single "enum
> vma_merge_flags" value defined above, but a combination of those. Is that
> even legal to do in C?

Yes it's legal to do. And we already did it. And other parts of the kernel do
it.

I get that it breaks a switch (enum val) { case ... } statement but we don't do
that.

>
> AFAIK the more common pattern is enum that has normal incremental values
> that are used for the shifts.
>
> But I don't think we need all of this at all here? Just have bitfields in
> struct vma_merge_struct?
>
> bool just_expand : 1;
> bool remove_middle : 1;

I find that ugly, and it necessitates the addition of a new field for every new
flag.

It also prevents any masking stuff going forward and clutters everything.

It also makes the interface confusiing, because now you have users having to
know there's a field that lets you do X rather than just a simple flags field
that can encapsulate all state.

And some of those fields are now internal...

If you were to insist we have to change this, then I'd pefer a set of defines
and the but then it'd be a question of whether we typedef something for that or
just pass an unsigned long.

I prefer having the type safety of the enum even if it pedantically 'not
correct'.

C doesn't give you many sane choices for this. I am doing my part to make rust
more of a thing in mm which will help on this front ;)

> ...
>
> >  /*
> > diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
> > index 3c0572120e94..8cce67237d86 100644
> > --- a/tools/testing/vma/vma.c
> > +++ b/tools/testing/vma/vma.c
> > @@ -154,6 +154,9 @@ static void vmg_set_range(struct vma_merge_struct *vmg, unsigned long start,
> >  	vmg->end = end;
> >  	vmg->pgoff = pgoff;
> >  	vmg->flags = flags;
> > +
> > +	vmg->merge_flags = VMG_FLAG_DEFAULT;
> > +	vmg->target = NULL;
> >  }
> >
> >  /*
>
Lorenzo Stoakes Jan. 28, 2025, 4:42 p.m. UTC | #4
On Tue, Jan 28, 2025 at 04:07:00PM +0000, Lorenzo Stoakes wrote:
> On Tue, Jan 28, 2025 at 04:45:01PM +0100, Vlastimil Babka wrote:
> > On 1/27/25 16:50, Lorenzo Stoakes wrote:
> > > --- a/mm/vma.h
> > > +++ b/mm/vma.h
> > > @@ -67,6 +67,16 @@ enum vma_merge_flags {
> > >  	 * at the gap.
> > >  	 */
> > >  	VMG_FLAG_JUST_EXPAND = 1 << 0,
> > > +	/*
> > > +	 * Internal flag used during the merge operation to indicate we will
> > > +	 * remove vmg->middle.
> > > +	 */
> > > +	__VMG_FLAG_REMOVE_MIDDLE = 1 << 1,
> > > +	/*
> > > +	 * Internal flag used during the merge operationr to indicate we will
> > > +	 * remove vmg->next.
> > > +	 */
> > > +	__VMG_FLAG_REMOVE_NEXT = 1 << 2,
> > >  };
> >
> > Hm this is actually kinda weird? It's an enum, but the values of it are
> > defined as different bits. And then struct vma_merge_struct has a "enum
> > vma_merge_flags merge_flags;" but we don't store to it a single "enum
> > vma_merge_flags" value defined above, but a combination of those. Is that
> > even legal to do in C?
>
> Yes it's legal to do. And we already did it. And other parts of the kernel do
> it.
>
> I get that it breaks a switch (enum val) { case ... } statement but we don't do
> that.
>
> >
> > AFAIK the more common pattern is enum that has normal incremental values
> > that are used for the shifts.
> >
> > But I don't think we need all of this at all here? Just have bitfields in
> > struct vma_merge_struct?
> >
> > bool just_expand : 1;
> > bool remove_middle : 1;
>
> I find that ugly, and it necessitates the addition of a new field for every new
> flag.
>
> It also prevents any masking stuff going forward and clutters everything.
>
> It also makes the interface confusiing, because now you have users having to
> know there's a field that lets you do X rather than just a simple flags field
> that can encapsulate all state.
>
> And some of those fields are now internal...
>
> If you were to insist we have to change this, then I'd pefer a set of defines
> and the but then it'd be a question of whether we typedef something for that or
> just pass an unsigned long.
>
> I prefer having the type safety of the enum even if it pedantically 'not
> correct'.
>
> C doesn't give you many sane choices for this. I am doing my part to make rust
> more of a thing in mm which will help on this front ;)
>

Actually, looking more closely, this is not a common pattern and the weirdness
you say is confusing.

The alternative of a bare flags field sucks badly, so while I dislike the
aesthetics of the bitfields, the fact you can't mask, the fact it's not a
clean 'state parameter' now, it's probably marginally better overall
vs. alternatives.

C doesn't help you here very much... some other languages have a concept of
a 'flags enum' or at least a specifically-typed value that can be used this
way.

We can add comments to make this less sucky like:

/* Flags which callers can use to modify merge behaviour */
bool just_expand :1;
/* Internal flags set during merge process */
bool __blahdy_blah :1;

TL;DR - will convert to bitfields, you're right I'm wrong, beer + pizzas in
Prague soon! ;)


> > ...
> >
> > >  /*
> > > diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
> > > index 3c0572120e94..8cce67237d86 100644
> > > --- a/tools/testing/vma/vma.c
> > > +++ b/tools/testing/vma/vma.c
> > > @@ -154,6 +154,9 @@ static void vmg_set_range(struct vma_merge_struct *vmg, unsigned long start,
> > >  	vmg->end = end;
> > >  	vmg->pgoff = pgoff;
> > >  	vmg->flags = flags;
> > > +
> > > +	vmg->merge_flags = VMG_FLAG_DEFAULT;
> > > +	vmg->target = NULL;
> > >  }
> > >
> > >  /*
> >
>
diff mbox series

Patch

diff --git a/mm/vma.c b/mm/vma.c
index 68a301a76297..955c5ebd5739 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -120,8 +120,8 @@  static void init_multi_vma_prep(struct vma_prepare *vp,
 	memset(vp, 0, sizeof(struct vma_prepare));
 	vp->vma = vma;
 	vp->anon_vma = vma->anon_vma;
-	vp->remove = remove;
-	vp->remove2 = remove2;
+	vp->remove = remove ? remove : remove2;
+	vp->remove2 = remove ? remove2 : NULL;
 	vp->adj_next = next;
 	if (!vp->anon_vma && next)
 		vp->anon_vma = next->anon_vma;
@@ -129,7 +129,6 @@  static void init_multi_vma_prep(struct vma_prepare *vp,
 	vp->file = vma->vm_file;
 	if (vp->file)
 		vp->mapping = vma->vm_file->f_mapping;
-
 }
 
 /*
@@ -629,22 +628,40 @@  void validate_mm(struct mm_struct *mm)
 }
 #endif /* CONFIG_DEBUG_VM_MAPLE_TREE */
 
-/* Actually perform the VMA merge operation. */
-static int commit_merge(struct vma_merge_struct *vmg,
-			struct vm_area_struct *adjust,
-			struct vm_area_struct *remove,
-			struct vm_area_struct *remove2,
-			long adj_start,
-			bool expanded)
+/*
+ * Actually perform the VMA merge operation.
+ *
+ * On success, returns the merged VMA. Otherwise returns NULL.
+ */
+static struct vm_area_struct *commit_merge(struct vma_merge_struct *vmg,
+		long adj_start)
 {
 	struct vma_prepare vp;
+	struct vm_area_struct *remove = NULL;
+	struct vm_area_struct *remove2 = NULL;
+	struct vm_area_struct *adjust = NULL;
+	/*
+	 * In all cases but that of merge right, shrink next, we write
+	 * vmg->target to the maple tree and return this as the merged VMA.
+	 */
+	bool merge_target = adj_start >= 0;
+
+	if (vmg->merge_flags & __VMG_FLAG_REMOVE_MIDDLE)
+		remove = vmg->middle;
+	if (vmg->merge_flags & __VMG_FLAG_REMOVE_NEXT)
+		remove2 = vmg->next;
+
+	if (adj_start > 0)
+		adjust = vmg->middle;
+	else if (adj_start < 0)
+		adjust = vmg->next;
 
 	init_multi_vma_prep(&vp, vmg->target, adjust, remove, remove2);
 
 	VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
 		   vp.anon_vma != adjust->anon_vma);
 
-	if (expanded) {
+	if (merge_target) {
 		/* Note: vma iterator must be pointing to 'start'. */
 		vma_iter_config(vmg->vmi, vmg->start, vmg->end);
 	} else {
@@ -653,27 +670,26 @@  static int commit_merge(struct vma_merge_struct *vmg,
 	}
 
 	if (vma_iter_prealloc(vmg->vmi, vmg->target))
-		return -ENOMEM;
+		return NULL;
 
 	vma_prepare(&vp);
 	vma_adjust_trans_huge(vmg->target, vmg->start, vmg->end, adj_start);
 	vma_set_range(vmg->target, vmg->start, vmg->end, vmg->pgoff);
 
-	if (expanded)
+	if (merge_target)
 		vma_iter_store(vmg->vmi, vmg->target);
 
 	if (adj_start) {
 		adjust->vm_start += adj_start;
 		adjust->vm_pgoff += PHYS_PFN(adj_start);
-		if (adj_start < 0) {
-			WARN_ON(expanded);
+
+		if (!merge_target)
 			vma_iter_store(vmg->vmi, adjust);
-		}
 	}
 
 	vma_complete(&vp, vmg->vmi, vmg->target->vm_mm);
 
-	return 0;
+	return merge_target ? vmg->target : vmg->next;
 }
 
 /* We can only remove VMAs when merging if they do not have a close hook. */
@@ -718,7 +734,6 @@  static __must_check struct vm_area_struct *vma_merge_existing_range(
 	struct vm_area_struct *prev = vmg->prev;
 	struct vm_area_struct *next, *res;
 	struct vm_area_struct *anon_dup = NULL;
-	struct vm_area_struct *adjust = NULL;
 	unsigned long start = vmg->start;
 	unsigned long end = vmg->end;
 	bool left_side = middle && start == middle->vm_start;
@@ -727,7 +742,6 @@  static __must_check struct vm_area_struct *vma_merge_existing_range(
 	long adj_start = 0;
 	bool merge_will_delete_middle, merge_will_delete_next;
 	bool merge_left, merge_right, merge_both;
-	bool expanded;
 
 	mmap_assert_write_locked(vmg->mm);
 	VM_WARN_ON_VMG(!middle, vmg); /* We are modifying a VMA, so caller must specify. */
@@ -846,10 +860,11 @@  static __must_check struct vm_area_struct *vma_merge_existing_range(
 		vmg->start = prev->vm_start;
 		vmg->pgoff = prev->vm_pgoff;
 
-		if (!merge_will_delete_middle) {
-			adjust = middle;
+		/*
+		 * We both expand prev and shrink middle.
+		 */
+		if (!merge_will_delete_middle)
 			adj_start = vmg->end - middle->vm_start;
-		}
 
 		err = dup_anon_vma(prev, middle, &anon_dup);
 	} else { /* merge_right */
@@ -883,7 +898,6 @@  static __must_check struct vm_area_struct *vma_merge_existing_range(
 			vmg->end = start;
 			vmg->pgoff = middle->vm_pgoff;
 
-			adjust = next;
 			adj_start = -(middle->vm_end - start);
 		}
 
@@ -893,17 +907,13 @@  static __must_check struct vm_area_struct *vma_merge_existing_range(
 	if (err)
 		goto abort;
 
-	/*
-	 * In nearly all cases, we expand vmg->middle. There is one exception -
-	 * merge_right where we partially span the VMA. In this case we shrink
-	 * the end of vmg->middle and adjust the start of vmg->next accordingly.
-	 */
-	expanded = !merge_right || merge_will_delete_middle;
+	if (merge_will_delete_middle)
+		vmg->merge_flags |= __VMG_FLAG_REMOVE_MIDDLE;
+	if (merge_will_delete_next)
+		vmg->merge_flags |= __VMG_FLAG_REMOVE_NEXT;
 
-	if (commit_merge(vmg, adjust,
-			 merge_will_delete_middle ? middle : NULL,
-			 merge_will_delete_next ? next : NULL,
-			 adj_start, expanded)) {
+	res = commit_merge(vmg, adj_start);
+	if (!res) {
 		if (anon_dup)
 			unlink_anon_vmas(anon_dup);
 
@@ -911,9 +921,7 @@  static __must_check struct vm_area_struct *vma_merge_existing_range(
 		return NULL;
 	}
 
-	res = merge_left ? prev : next;
 	khugepaged_enter_vma(res, vmg->flags);
-
 	vmg->state = VMA_MERGE_SUCCESS;
 	return res;
 
@@ -1076,7 +1084,10 @@  int vma_expand(struct vma_merge_struct *vmg)
 		       middle->vm_end > vmg->end, vmg);
 
 	vmg->target = middle;
-	if (commit_merge(vmg, NULL, remove_next ? next : NULL, NULL, 0, true))
+	if (remove_next)
+		vmg->merge_flags |= __VMG_FLAG_REMOVE_NEXT;
+
+	if (!commit_merge(vmg, 0))
 		goto nomem;
 
 	return 0;
diff --git a/mm/vma.h b/mm/vma.h
index 5b5dd07e478c..ffbfefb9a83d 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -67,6 +67,16 @@  enum vma_merge_flags {
 	 * at the gap.
 	 */
 	VMG_FLAG_JUST_EXPAND = 1 << 0,
+	/*
+	 * Internal flag used during the merge operation to indicate we will
+	 * remove vmg->middle.
+	 */
+	__VMG_FLAG_REMOVE_MIDDLE = 1 << 1,
+	/*
+	 * Internal flag used during the merge operationr to indicate we will
+	 * remove vmg->next.
+	 */
+	__VMG_FLAG_REMOVE_NEXT = 1 << 2,
 };
 
 /*
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index 3c0572120e94..8cce67237d86 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -154,6 +154,9 @@  static void vmg_set_range(struct vma_merge_struct *vmg, unsigned long start,
 	vmg->end = end;
 	vmg->pgoff = pgoff;
 	vmg->flags = flags;
+
+	vmg->merge_flags = VMG_FLAG_DEFAULT;
+	vmg->target = NULL;
 }
 
 /*