diff mbox series

[v8,3/3] mm: add anonymous vma name refcounting

Message ID 20210827191858.2037087-4-surenb@google.com (mailing list archive)
State New
Headers show
Series Anonymous VMA naming patches | expand

Commit Message

Suren Baghdasaryan Aug. 27, 2021, 7:18 p.m. UTC
While forking a process with high number (64K) of named anonymous vmas the
overhead caused by strdup() is noticeable. Experiments with ARM64 Android
device show up to 40% performance regression when forking a process with
64k unpopulated anonymous vmas using the max name lengths vs the same
process with the same number of anonymous vmas having no name.
Introduce anon_vma_name refcounted structure to avoid the overhead of
copying vma names during fork() and when splitting named anonymous vmas.
When a vma is duplicated, instead of copying the name we increment the
refcount of this structure. Multiple vmas can point to the same
anon_vma_name as long as they increment the refcount. The name member of
anon_vma_name structure is assigned at structure allocation time and is
never changed. If vma name changes then the refcount of the original
structure is dropped, a new anon_vma_name structure is allocated
to hold the new name and the vma pointer is updated to point to the new
structure.
With this approach the fork() performance regressions is reduced 3-4x
times and with usecases using more reasonable number of VMAs (a few
thousand) the regressions is not measurable.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/mm_types.h |  9 ++++++++-
 mm/madvise.c             | 42 +++++++++++++++++++++++++++++++++-------
 2 files changed, 43 insertions(+), 8 deletions(-)

Comments

Kees Cook Aug. 28, 2021, 5:28 a.m. UTC | #1
On Fri, Aug 27, 2021 at 12:18:58PM -0700, Suren Baghdasaryan wrote:
> While forking a process with high number (64K) of named anonymous vmas the
> overhead caused by strdup() is noticeable. Experiments with ARM64 Android
> device show up to 40% performance regression when forking a process with
> 64k unpopulated anonymous vmas using the max name lengths vs the same
> process with the same number of anonymous vmas having no name.
> Introduce anon_vma_name refcounted structure to avoid the overhead of
> copying vma names during fork() and when splitting named anonymous vmas.
> When a vma is duplicated, instead of copying the name we increment the
> refcount of this structure. Multiple vmas can point to the same
> anon_vma_name as long as they increment the refcount. The name member of
> anon_vma_name structure is assigned at structure allocation time and is
> never changed. If vma name changes then the refcount of the original
> structure is dropped, a new anon_vma_name structure is allocated
> to hold the new name and the vma pointer is updated to point to the new
> structure.
> With this approach the fork() performance regressions is reduced 3-4x
> times and with usecases using more reasonable number of VMAs (a few
> thousand) the regressions is not measurable.

I like the refcounting; thank you!

Since patch2 adds a lot of things that are changed by patch3; maybe
combine them?
Suren Baghdasaryan Aug. 28, 2021, 9:13 p.m. UTC | #2
On Fri, Aug 27, 2021 at 10:28 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Aug 27, 2021 at 12:18:58PM -0700, Suren Baghdasaryan wrote:
> > While forking a process with high number (64K) of named anonymous vmas the
> > overhead caused by strdup() is noticeable. Experiments with ARM64 Android
> > device show up to 40% performance regression when forking a process with
> > 64k unpopulated anonymous vmas using the max name lengths vs the same
> > process with the same number of anonymous vmas having no name.
> > Introduce anon_vma_name refcounted structure to avoid the overhead of
> > copying vma names during fork() and when splitting named anonymous vmas.
> > When a vma is duplicated, instead of copying the name we increment the
> > refcount of this structure. Multiple vmas can point to the same
> > anon_vma_name as long as they increment the refcount. The name member of
> > anon_vma_name structure is assigned at structure allocation time and is
> > never changed. If vma name changes then the refcount of the original
> > structure is dropped, a new anon_vma_name structure is allocated
> > to hold the new name and the vma pointer is updated to point to the new
> > structure.
> > With this approach the fork() performance regressions is reduced 3-4x
> > times and with usecases using more reasonable number of VMAs (a few
> > thousand) the regressions is not measurable.
>
> I like the refcounting; thank you!
>
> Since patch2 adds a lot of things that are changed by patch3; maybe
> combine them?

I thought it would be easier to review with the main logic being
written using a basic type (string) first and then replace the basic
type with a more complex refcounted structure. Also, if someone would
like to rerun the tests and measure the regression of strdup vs
refcounting approach, keeping this patch separate makes it easier to
set up these tests.
If that's not convenient I can absolutely squash them together.

>
> --
> Kees Cook
Rolf Eike Beer Aug. 30, 2021, 7:03 a.m. UTC | #3
Am Freitag, 27. August 2021, 21:18:58 CEST schrieb Suren Baghdasaryan:
> While forking a process with high number (64K) of named anonymous vmas the
> overhead caused by strdup() is noticeable. Experiments with ARM64 Android
> device show up to 40% performance regression when forking a process with
> 64k unpopulated anonymous vmas using the max name lengths vs the same
> process with the same number of anonymous vmas having no name.
> Introduce anon_vma_name refcounted structure to avoid the overhead of
> copying vma names during fork() and when splitting named anonymous vmas.
> When a vma is duplicated, instead of copying the name we increment the
> refcount of this structure. Multiple vmas can point to the same
> anon_vma_name as long as they increment the refcount. The name member of
> anon_vma_name structure is assigned at structure allocation time and is
> never changed. If vma name changes then the refcount of the original
> structure is dropped, a new anon_vma_name structure is allocated
> to hold the new name and the vma pointer is updated to point to the new
> structure.
> With this approach the fork() performance regressions is reduced 3-4x
> times and with usecases using more reasonable number of VMAs (a few
> thousand) the regressions is not measurable.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  include/linux/mm_types.h |  9 ++++++++-
>  mm/madvise.c             | 42 +++++++++++++++++++++++++++++++++-------
>  2 files changed, 43 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index bc029f3fca6a..32ac5dc5ebf3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -63,6 +63,27 @@ static int madvise_need_mmap_write(int behavior)
>  	}
>  }
> 
> +static struct anon_vma_name *anon_vma_name_alloc(const char *name)
> +{
> +	struct anon_vma_name *anon_name;
> +	size_t len = strlen(name);
> +
> +	/* Add 1 for NUL terminator at the end of the anon_name->name */
> +	anon_name = kzalloc(sizeof(*anon_name) + len + 1,
> +			    GFP_KERNEL);
> +	kref_init(&anon_name->kref);
> +	strcpy(anon_name->name, name);
> +
> +	return anon_name;
> +}

Given that you overwrite anything in that struct anyway this could be reduced 
to kmalloc(), no? And it definitely needs a NULL check.

Eike
Suren Baghdasaryan Aug. 30, 2021, 4:12 p.m. UTC | #4
On Mon, Aug 30, 2021 at 12:03 AM Rolf Eike Beer <eb@emlix.com> wrote:
>
> Am Freitag, 27. August 2021, 21:18:58 CEST schrieb Suren Baghdasaryan:
> > While forking a process with high number (64K) of named anonymous vmas the
> > overhead caused by strdup() is noticeable. Experiments with ARM64 Android
> > device show up to 40% performance regression when forking a process with
> > 64k unpopulated anonymous vmas using the max name lengths vs the same
> > process with the same number of anonymous vmas having no name.
> > Introduce anon_vma_name refcounted structure to avoid the overhead of
> > copying vma names during fork() and when splitting named anonymous vmas.
> > When a vma is duplicated, instead of copying the name we increment the
> > refcount of this structure. Multiple vmas can point to the same
> > anon_vma_name as long as they increment the refcount. The name member of
> > anon_vma_name structure is assigned at structure allocation time and is
> > never changed. If vma name changes then the refcount of the original
> > structure is dropped, a new anon_vma_name structure is allocated
> > to hold the new name and the vma pointer is updated to point to the new
> > structure.
> > With this approach the fork() performance regressions is reduced 3-4x
> > times and with usecases using more reasonable number of VMAs (a few
> > thousand) the regressions is not measurable.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> >  include/linux/mm_types.h |  9 ++++++++-
> >  mm/madvise.c             | 42 +++++++++++++++++++++++++++++++++-------
> >  2 files changed, 43 insertions(+), 8 deletions(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index bc029f3fca6a..32ac5dc5ebf3 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -63,6 +63,27 @@ static int madvise_need_mmap_write(int behavior)
> >       }
> >  }
> >
> > +static struct anon_vma_name *anon_vma_name_alloc(const char *name)
> > +{
> > +     struct anon_vma_name *anon_name;
> > +     size_t len = strlen(name);
> > +
> > +     /* Add 1 for NUL terminator at the end of the anon_name->name */
> > +     anon_name = kzalloc(sizeof(*anon_name) + len + 1,
> > +                         GFP_KERNEL);
> > +     kref_init(&anon_name->kref);
> > +     strcpy(anon_name->name, name);
> > +
> > +     return anon_name;
> > +}
>
> Given that you overwrite anything in that struct anyway this could be reduced
> to kmalloc(), no? And it definitely needs a NULL check.

Ack. I'll address both points in the next revision.
Thanks!
Suren.

>
> Eike
> --
> Rolf Eike Beer, emlix GmbH, http://www.emlix.com
> Fon +49 551 30664-0, Fax +49 551 30664-11
> Gothaer Platz 3, 37083 Göttingen, Germany
> Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
> Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055
>
> emlix - smart embedded open source
diff mbox series

Patch

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 26a30f7a5228..a7361acf2921 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -5,6 +5,7 @@ 
 #include <linux/mm_types_task.h>
 
 #include <linux/auxvec.h>
+#include <linux/kref.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/rbtree.h>
@@ -302,6 +303,12 @@  struct vm_userfaultfd_ctx {
 struct vm_userfaultfd_ctx {};
 #endif /* CONFIG_USERFAULTFD */
 
+struct anon_vma_name {
+	struct kref kref;
+	/* The name needs to be at the end because it is dynamically sized. */
+	char name[];
+};
+
 /*
  * This struct describes a virtual memory area. There is one of these
  * per VM-area/task. A VM area is any part of the process virtual memory
@@ -353,7 +360,7 @@  struct vm_area_struct {
 			unsigned long rb_subtree_last;
 		} shared;
 		/* Serialized by mmap_sem. */
-		char *anon_name;
+		struct anon_vma_name *anon_name;
 	};
 
 	/*
diff --git a/mm/madvise.c b/mm/madvise.c
index bc029f3fca6a..32ac5dc5ebf3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -63,6 +63,27 @@  static int madvise_need_mmap_write(int behavior)
 	}
 }
 
+static struct anon_vma_name *anon_vma_name_alloc(const char *name)
+{
+	struct anon_vma_name *anon_name;
+	size_t len = strlen(name);
+
+	/* Add 1 for NUL terminator at the end of the anon_name->name */
+	anon_name = kzalloc(sizeof(*anon_name) + len + 1,
+			    GFP_KERNEL);
+	kref_init(&anon_name->kref);
+	strcpy(anon_name->name, name);
+
+	return anon_name;
+}
+
+static void vma_anon_name_free(struct kref *kref)
+{
+	struct anon_vma_name *anon_name =
+			container_of(kref, struct anon_vma_name, kref);
+	kfree(anon_name);
+}
+
 static inline bool has_vma_anon_name(struct vm_area_struct *vma)
 {
 	return !vma->vm_file && vma->anon_name;
@@ -75,7 +96,7 @@  const char *vma_anon_name(struct vm_area_struct *vma)
 
 	mmap_assert_locked(vma->vm_mm);
 
-	return vma->anon_name;
+	return vma->anon_name->name;
 }
 
 void dup_vma_anon_name(struct vm_area_struct *orig_vma,
@@ -84,37 +105,44 @@  void dup_vma_anon_name(struct vm_area_struct *orig_vma,
 	if (!has_vma_anon_name(orig_vma))
 		return;
 
-	new_vma->anon_name = kstrdup(orig_vma->anon_name, GFP_KERNEL);
+	kref_get(&orig_vma->anon_name->kref);
+	new_vma->anon_name = orig_vma->anon_name;
 }
 
 void free_vma_anon_name(struct vm_area_struct *vma)
 {
+	struct anon_vma_name *anon_name;
+
 	if (!has_vma_anon_name(vma))
 		return;
 
-	kfree(vma->anon_name);
+	anon_name = vma->anon_name;
 	vma->anon_name = NULL;
+	kref_put(&anon_name->kref, vma_anon_name_free);
 }
 
 /* mmap_lock should be write-locked */
 static void replace_vma_anon_name(struct vm_area_struct *vma, const char *name)
 {
+	const char *anon_name;
+
 	if (!name) {
 		free_vma_anon_name(vma);
 		return;
 	}
 
-	if (vma->anon_name) {
+	anon_name = vma_anon_name(vma);
+	if (anon_name) {
 		/* Should never happen, to dup use dup_vma_anon_name() */
-		WARN_ON(vma->anon_name == name);
+		WARN_ON(anon_name == name);
 
 		/* Same name, nothing to do here */
-		if (!strcmp(name, vma->anon_name))
+		if (!strcmp(name, anon_name))
 			return;
 
 		free_vma_anon_name(vma);
 	}
-	vma->anon_name = kstrdup(name, GFP_KERNEL);
+	vma->anon_name = anon_vma_name_alloc(name);
 }
 
 /*