diff mbox series

[1/5] kselftests: vm: Add mremap tests

Message ID 20200930222130.4175584-2-kaleshsingh@google.com (mailing list archive)
State New, archived
Headers show
Series Speed up mremap on large regions | expand

Commit Message

Kalesh Singh Sept. 30, 2020, 10:21 p.m. UTC
Test mremap on regions of various sizes and alignments and validate
data after remapping. Also provide total time for remapping
the region which is useful for performance comparison of the mremap
optimizations that move pages at the PMD/PUD levels if HAVE_MOVE_PMD
and/or HAVE_MOVE_PUD are enabled.

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 tools/testing/selftests/vm/.gitignore    |   1 +
 tools/testing/selftests/vm/Makefile      |   1 +
 tools/testing/selftests/vm/mremap_test.c | 243 +++++++++++++++++++++++
 tools/testing/selftests/vm/run_vmtests   |  11 +
 4 files changed, 256 insertions(+)
 create mode 100644 tools/testing/selftests/vm/mremap_test.c

Comments

John Hubbard Oct. 1, 2020, 7:24 a.m. UTC | #1
On 9/30/20 3:21 PM, Kalesh Singh wrote:
> Test mremap on regions of various sizes and alignments and validate
> data after remapping. Also provide total time for remapping
> the region which is useful for performance comparison of the mremap
> optimizations that move pages at the PMD/PUD levels if HAVE_MOVE_PMD
> and/or HAVE_MOVE_PUD are enabled.
> 
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>   tools/testing/selftests/vm/.gitignore    |   1 +
>   tools/testing/selftests/vm/Makefile      |   1 +
>   tools/testing/selftests/vm/mremap_test.c | 243 +++++++++++++++++++++++
>   tools/testing/selftests/vm/run_vmtests   |  11 +
>   4 files changed, 256 insertions(+)
>   create mode 100644 tools/testing/selftests/vm/mremap_test.c
> 

Hi,

This takes 100x longer to run than it should: 1:46 min of running time on
my x86_64 test machine. The entire selftests/vm test suite takes 45 sec on a
bad day, where a bad day is defined as up until about tomorrow, when I will
post a compaction_test.c patch that will cut that time down to about half, or
24 sec total run time...for 22 tests!

In other words, most tests here should take about 1 or 2 seconds, unless they
are exceptionally special snowflakes.

At the very least, the invocation within run_vmtests could pass in a parameter
to tell it to run a shorter test. But there's also opportunities to speed it
up, too.

...
> +
> +#define MAKE_TEST(source_align, destination_align, size,	\
> +		  overlaps, should_fail, test_name)		\
> +{								\
> +	.name = test_name,					\
> +	.config = {						\
> +		.src_alignment = source_align,			\
> +		.dest_alignment = destination_align,		\
> +		.region_size = size,				\
> +		.overlapping = overlaps,			\
> +	},							\
> +	.expect_failure = should_fail				\
> +}
> +

OK...

> +#define MAKE_SIMPLE_TEST(source_align, destination_align, size)	\
> +	MAKE_TEST(source_align, destination_align, size, 0, 0,	\
> +		  #size " mremap - Source " #source_align	\
> +		  " aligned, Destination " #destination_align	\
> +		  " aligned")
> +

...and not OK. :)  Because this is just obscuring things. Both the
code and the output are harder to read. For these tiny test programs,
clarity is what we want, not necessarily compactness on the screen.
Because people want to get in, understand what they seen in the code
and match it up with what is printed to stdout--without spending much
time. (And that includes run time, as hinted at above.)

...
> +
> +/* Returns the time taken for the remap on success else returns -1. */
> +static long long remap_region(struct config c)
> +{
> +	void *addr, *src_addr, *dest_addr;
> +	int i, j;
> +	struct timespec t_start = {0, 0}, t_end = {0, 0};
> +	long long  start_ns, end_ns, align_mask, ret, offset;
> +	char pattern[] = {0xa8, 0xcd, 0xfe};

I'd recommend using rand() to help choose the pattern, and using different
patterns for different runs. When testing memory, it's a pitfall to have
the same test pattern.

Normally, you'd also want to report the random seed or the test pattern(s)
or both to stdout, and provide a way to run with the same pattern, but
here I don't *think* you care: all patterns should have the same performance.

> +	int pattern_size = ARRAY_SIZE(pattern);
> +
> +	src_addr = get_source_mapping(c);
> +	if (!src_addr) {
> +		ret = -1;
> +		goto out;
> +	}
> +
> +	/* Set byte pattern */
> +	for (i = 0; i < c.region_size; i++) {
> +		for (j = 0; i+j < c.region_size && j < pattern_size; j++)
> +			memset((char *) src_addr + i+j, pattern[j], 1);
> +		i += pattern_size-1;
> +	}
> +
> +	align_mask = ~(c.dest_alignment - 1);
> +	offset = (c.overlapping) ? -c.dest_alignment : c.dest_alignment;

A comment for what the above two lines are doing would be a nice touch.

...
> +	start_ns = t_start.tv_sec * 1000000000ULL + t_start.tv_nsec;
> +	end_ns = t_end.tv_sec * 1000000000ULL + t_end.tv_nsec;

A const or #defined for all those 0000's would help.

...
> +int main(int argc, char *argv[])
> +{
> +	int failures = 0;
> +	int i;
> +
> +	struct test test_cases[] = {
> +		/* Expected mremap failures */
> +		MAKE_TEST(_4KB, _4KB, _4KB, 1 /* overlaps */, 1 /* fails */,

Named flags instead of 1's and 0's would avoid the need for messy comments.

> +			  "mremap - Source and Destination Regions Overlapping"),
> +		MAKE_TEST(_4KB, _1KB, _4KB, 0 /* overlaps */, 1 /* fails */,
> +			  "mremap - Destination Address Misaligned (1KB-aligned)"),
> +		MAKE_TEST(_1KB, _4KB, _4KB, 0 /* overlaps */, 1 /* fails */,
> +			  "mremap - Source Address Misaligned (1KB-aligned)"),
> +
> +		/* Src addr PTE aligned */
> +		MAKE_SIMPLE_TEST(PTE, PTE, _8KB),
> +
> +		/* Src addr 1MB aligned */
> +		MAKE_SIMPLE_TEST(_1MB, PTE, _2MB),
> +		MAKE_SIMPLE_TEST(_1MB, _1MB, _2MB),
> +
> +		/* Src addr PMD aligned */
> +		MAKE_SIMPLE_TEST(PMD, PTE, _4MB),
> +		MAKE_SIMPLE_TEST(PMD, _1MB, _4MB),
> +		MAKE_SIMPLE_TEST(PMD, PMD, _4MB),
> +
> +		/* Src addr PUD aligned */
> +		MAKE_SIMPLE_TEST(PUD, PTE, _2GB),
> +		MAKE_SIMPLE_TEST(PUD, _1MB, _2GB),
> +		MAKE_SIMPLE_TEST(PUD, PMD, _2GB),
> +		MAKE_SIMPLE_TEST(PUD, PUD, _2GB),


Too concise. Not fun lining these up with the stdout report.


thanks,
Kalesh Singh Oct. 1, 2020, 3:46 p.m. UTC | #2
On Thu, Oct 1, 2020 at 3:24 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 9/30/20 3:21 PM, Kalesh Singh wrote:
> > Test mremap on regions of various sizes and alignments and validate
> > data after remapping. Also provide total time for remapping
> > the region which is useful for performance comparison of the mremap
> > optimizations that move pages at the PMD/PUD levels if HAVE_MOVE_PMD
> > and/or HAVE_MOVE_PUD are enabled.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >   tools/testing/selftests/vm/.gitignore    |   1 +
> >   tools/testing/selftests/vm/Makefile      |   1 +
> >   tools/testing/selftests/vm/mremap_test.c | 243 +++++++++++++++++++++++
> >   tools/testing/selftests/vm/run_vmtests   |  11 +
> >   4 files changed, 256 insertions(+)
> >   create mode 100644 tools/testing/selftests/vm/mremap_test.c
> >
>
> Hi,
>
> This takes 100x longer to run than it should: 1:46 min of running time on
> my x86_64 test machine. The entire selftests/vm test suite takes 45 sec on a
> bad day, where a bad day is defined as up until about tomorrow, when I will
> post a compaction_test.c patch that will cut that time down to about half, or
> 24 sec total run time...for 22 tests!
>
> In other words, most tests here should take about 1 or 2 seconds, unless they
> are exceptionally special snowflakes.
>
> At the very least, the invocation within run_vmtests could pass in a parameter
> to tell it to run a shorter test. But there's also opportunities to speed it
> up, too.


Hi John. Thanks for the comments.

The bulk of the test time comes from setting and verifying the byte
pattern in 1GB
or larger regions for testing the HAVE_MOVE_PUD functionality. Without testing
1GB or larger regions the test takes 0.18 seconds on my x86_64 system.

One option I think would be to only validate to a certain threshold of the remap
region. We can have a flag to specify a threshold or to validate the
full size of the
remapped region. I did some initial testing with a 4MB threshold and
the total time
dropped to 0.38 seconds from 1:12 minutes (for verifying the entire
remapped region).
The 4MB threshold would cover the full region of all the tests
excluding those for the
1GB and 2GB sized regions. Let me know what you think.

Your other comments below sound good to me. I’ll make those changes in
the next version.

Thanks,
Kalesh

>
> ...
> > +
> > +#define MAKE_TEST(source_align, destination_align, size,     \
> > +               overlaps, should_fail, test_name)             \
> > +{                                                            \
> > +     .name = test_name,                                      \
> > +     .config = {                                             \
> > +             .src_alignment = source_align,                  \
> > +             .dest_alignment = destination_align,            \
> > +             .region_size = size,                            \
> > +             .overlapping = overlaps,                        \
> > +     },                                                      \
> > +     .expect_failure = should_fail                           \
> > +}
> > +
>
> OK...
>
> > +#define MAKE_SIMPLE_TEST(source_align, destination_align, size)      \
> > +     MAKE_TEST(source_align, destination_align, size, 0, 0,  \
> > +               #size " mremap - Source " #source_align       \
> > +               " aligned, Destination " #destination_align   \
> > +               " aligned")
> > +
>
> ...and not OK. :)  Because this is just obscuring things. Both the
> code and the output are harder to read. For these tiny test programs,
> clarity is what we want, not necessarily compactness on the screen.
> Because people want to get in, understand what they seen in the code
> and match it up with what is printed to stdout--without spending much
> time. (And that includes run time, as hinted at above.)
>
> ...
> > +
> > +/* Returns the time taken for the remap on success else returns -1. */
> > +static long long remap_region(struct config c)
> > +{
> > +     void *addr, *src_addr, *dest_addr;
> > +     int i, j;
> > +     struct timespec t_start = {0, 0}, t_end = {0, 0};
> > +     long long  start_ns, end_ns, align_mask, ret, offset;
> > +     char pattern[] = {0xa8, 0xcd, 0xfe};
>
> I'd recommend using rand() to help choose the pattern, and using different
> patterns for different runs. When testing memory, it's a pitfall to have
> the same test pattern.
>
> Normally, you'd also want to report the random seed or the test pattern(s)
> or both to stdout, and provide a way to run with the same pattern, but
> here I don't *think* you care: all patterns should have the same performance.
>
> > +     int pattern_size = ARRAY_SIZE(pattern);
> > +
> > +     src_addr = get_source_mapping(c);
> > +     if (!src_addr) {
> > +             ret = -1;
> > +             goto out;
> > +     }
> > +
> > +     /* Set byte pattern */
> > +     for (i = 0; i < c.region_size; i++) {
> > +             for (j = 0; i+j < c.region_size && j < pattern_size; j++)
> > +                     memset((char *) src_addr + i+j, pattern[j], 1);
> > +             i += pattern_size-1;
> > +     }
> > +
> > +     align_mask = ~(c.dest_alignment - 1);
> > +     offset = (c.overlapping) ? -c.dest_alignment : c.dest_alignment;
>
> A comment for what the above two lines are doing would be a nice touch.
>
> ...
> > +     start_ns = t_start.tv_sec * 1000000000ULL + t_start.tv_nsec;
> > +     end_ns = t_end.tv_sec * 1000000000ULL + t_end.tv_nsec;
>
> A const or #defined for all those 0000's would help.
>
> ...
> > +int main(int argc, char *argv[])
> > +{
> > +     int failures = 0;
> > +     int i;
> > +
> > +     struct test test_cases[] = {
> > +             /* Expected mremap failures */
> > +             MAKE_TEST(_4KB, _4KB, _4KB, 1 /* overlaps */, 1 /* fails */,
>
> Named flags instead of 1's and 0's would avoid the need for messy comments.
>
> > +                       "mremap - Source and Destination Regions Overlapping"),
> > +             MAKE_TEST(_4KB, _1KB, _4KB, 0 /* overlaps */, 1 /* fails */,
> > +                       "mremap - Destination Address Misaligned (1KB-aligned)"),
> > +             MAKE_TEST(_1KB, _4KB, _4KB, 0 /* overlaps */, 1 /* fails */,
> > +                       "mremap - Source Address Misaligned (1KB-aligned)"),
> > +
> > +             /* Src addr PTE aligned */
> > +             MAKE_SIMPLE_TEST(PTE, PTE, _8KB),
> > +
> > +             /* Src addr 1MB aligned */
> > +             MAKE_SIMPLE_TEST(_1MB, PTE, _2MB),
> > +             MAKE_SIMPLE_TEST(_1MB, _1MB, _2MB),
> > +
> > +             /* Src addr PMD aligned */
> > +             MAKE_SIMPLE_TEST(PMD, PTE, _4MB),
> > +             MAKE_SIMPLE_TEST(PMD, _1MB, _4MB),
> > +             MAKE_SIMPLE_TEST(PMD, PMD, _4MB),
> > +
> > +             /* Src addr PUD aligned */
> > +             MAKE_SIMPLE_TEST(PUD, PTE, _2GB),
> > +             MAKE_SIMPLE_TEST(PUD, _1MB, _2GB),
> > +             MAKE_SIMPLE_TEST(PUD, PMD, _2GB),
> > +             MAKE_SIMPLE_TEST(PUD, PUD, _2GB),
>
>
> Too concise. Not fun lining these up with the stdout report.
>
>
> thanks,
> --
> John Hubbard
> NVIDIA
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
John Hubbard Oct. 1, 2020, 6:36 p.m. UTC | #3
On 10/1/20 8:46 AM, Kalesh Singh wrote:
> On Thu, Oct 1, 2020 at 3:24 AM John Hubbard <jhubbard@nvidia.com> wrote:
>> On 9/30/20 3:21 PM, Kalesh Singh wrote:
...
> The bulk of the test time comes from setting and verifying the byte
> pattern in 1GB
> or larger regions for testing the HAVE_MOVE_PUD functionality. Without testing
> 1GB or larger regions the test takes 0.18 seconds on my x86_64 system.
> 
> One option I think would be to only validate to a certain threshold of the remap
> region. We can have a flag to specify a threshold or to validate the
> full size of the
> remapped region. I did some initial testing with a 4MB threshold and
> the total time
> dropped to 0.38 seconds from 1:12 minutes (for verifying the entire
> remapped region).
> The 4MB threshold would cover the full region of all the tests
> excluding those for the
> 1GB and 2GB sized regions. Let me know what you think.
>

That sounds beautiful.

thanks,
diff mbox series

Patch

diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore
index 849e8226395a..b3a183c36cb5 100644
--- a/tools/testing/selftests/vm/.gitignore
+++ b/tools/testing/selftests/vm/.gitignore
@@ -8,6 +8,7 @@  thuge-gen
 compaction_test
 mlock2-tests
 mremap_dontunmap
+mremap_test
 on-fault-limit
 transhuge-stress
 protection_keys
diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index a9026706d597..f044808b45fa 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -16,6 +16,7 @@  TEST_GEN_FILES += map_populate
 TEST_GEN_FILES += mlock-random-test
 TEST_GEN_FILES += mlock2-tests
 TEST_GEN_FILES += mremap_dontunmap
+TEST_GEN_FILES += mremap_test
 TEST_GEN_FILES += on-fault-limit
 TEST_GEN_FILES += thuge-gen
 TEST_GEN_FILES += transhuge-stress
diff --git a/tools/testing/selftests/vm/mremap_test.c b/tools/testing/selftests/vm/mremap_test.c
new file mode 100644
index 000000000000..09dc9a1ef81f
--- /dev/null
+++ b/tools/testing/selftests/vm/mremap_test.c
@@ -0,0 +1,243 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Google LLC
+ */
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <time.h>
+
+#include "../kselftest.h"
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+struct config {
+	unsigned long long src_alignment;
+	unsigned long long dest_alignment;
+	unsigned long long region_size;
+	int overlapping;
+};
+
+struct test {
+	const char *name;
+	struct config config;
+	int expect_failure;
+};
+
+enum {
+	_1KB = 1ULL << 10,	/* 1KB -> not page aligned */
+	_4KB = 4ULL << 10,
+	_8KB = 8ULL << 10,
+	_1MB = 1ULL << 20,
+	_2MB = 2ULL << 20,
+	_4MB = 4ULL << 20,
+	_1GB = 1ULL << 30,
+	_2GB = 2ULL << 30,
+	PTE = _4KB,
+	PMD = _2MB,
+	PUD = _1GB,
+};
+
+#define MAKE_TEST(source_align, destination_align, size,	\
+		  overlaps, should_fail, test_name)		\
+{								\
+	.name = test_name,					\
+	.config = {						\
+		.src_alignment = source_align,			\
+		.dest_alignment = destination_align,		\
+		.region_size = size,				\
+		.overlapping = overlaps,			\
+	},							\
+	.expect_failure = should_fail				\
+}
+
+#define MAKE_SIMPLE_TEST(source_align, destination_align, size)	\
+	MAKE_TEST(source_align, destination_align, size, 0, 0,	\
+		  #size " mremap - Source " #source_align	\
+		  " aligned, Destination " #destination_align	\
+		  " aligned")
+
+/*
+ * Returns the start address of the mapping on success, else returns
+ * NULL on failure.
+ */
+static void *get_source_mapping(struct config c)
+{
+	unsigned long long addr = 0ULL;
+	void *src_addr = NULL;
+retry:
+	addr += c.src_alignment;
+	src_addr = mmap((void *) addr, c.region_size, PROT_READ | PROT_WRITE,
+			MAP_FIXED | MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+	if (src_addr == MAP_FAILED) {
+		if (errno == EPERM)
+			goto retry;
+		goto error;
+	}
+	/*
+	 * Check that the address is aligned to the specified alignment. Addresses
+	 * which have alignments that are multiples of that specified are not considered
+	 * valid. For instance, 1GB address is 2MB-aligned, however it will not be
+	 * considered valid for a requested alignment of 2MB. This is done to
+	 * reduce coincidental alignment in the tests.
+	 */
+	if (((unsigned long long) src_addr & (c.src_alignment - 1)) ||
+			!((unsigned long long) src_addr & c.src_alignment))
+		goto retry;
+
+	if (!src_addr)
+		goto error;
+
+	return src_addr;
+error:
+	ksft_print_msg("Failed to map source region: %s\n",
+			strerror(errno));
+	return NULL;
+}
+
+/* Returns the time taken for the remap on success else returns -1. */
+static long long remap_region(struct config c)
+{
+	void *addr, *src_addr, *dest_addr;
+	int i, j;
+	struct timespec t_start = {0, 0}, t_end = {0, 0};
+	long long  start_ns, end_ns, align_mask, ret, offset;
+	char pattern[] = {0xa8, 0xcd, 0xfe};
+	int pattern_size = ARRAY_SIZE(pattern);
+
+	src_addr = get_source_mapping(c);
+	if (!src_addr) {
+		ret = -1;
+		goto out;
+	}
+
+	/* Set byte pattern */
+	for (i = 0; i < c.region_size; i++) {
+		for (j = 0; i+j < c.region_size && j < pattern_size; j++)
+			memset((char *) src_addr + i+j, pattern[j], 1);
+		i += pattern_size-1;
+	}
+
+	align_mask = ~(c.dest_alignment - 1);
+	offset = (c.overlapping) ? -c.dest_alignment : c.dest_alignment;
+	addr = (void *) (((unsigned long long) src_addr + c.region_size + offset)
+			& align_mask);
+
+	/* See comment in get_source_mapping() */
+	if (!((unsigned long long) addr & c.dest_alignment))
+		addr = (void *) ((unsigned long long) addr | c.dest_alignment);
+
+	clock_gettime(CLOCK_MONOTONIC, &t_start);
+	dest_addr = mremap(src_addr, c.region_size, c.region_size,
+			MREMAP_MAYMOVE|MREMAP_FIXED, (char *) addr);
+	clock_gettime(CLOCK_MONOTONIC, &t_end);
+
+	if (dest_addr == MAP_FAILED) {
+		ksft_print_msg("mremap failed: %s\n", strerror(errno));
+		ret = -1;
+		goto clean_up_src;
+	}
+
+	/* Verify byte pattern after remapping */
+	for (i = 0; i < c.region_size; i++) {
+		for (j = 0; i+j < c.region_size && j < pattern_size; j++) {
+			if (((char *) dest_addr)[i+j] != (char) pattern[j]) {
+				ksft_print_msg("Data after remap doesn't match at offset %d\n",
+						i+j);
+				ksft_print_msg("Expected: %#x\t Got: %#x\n", pattern[j] & 0xff,
+						((char *) dest_addr)[i+j] & 0xff);
+				ret = -1;
+				goto clean_up_dest;
+			}
+		}
+		i += pattern_size-1;
+	}
+
+	start_ns = t_start.tv_sec * 1000000000ULL + t_start.tv_nsec;
+	end_ns = t_end.tv_sec * 1000000000ULL + t_end.tv_nsec;
+	ret = end_ns - start_ns;
+
+/*
+ * Since the destination address is specified using MREMAP_FIXED, subsequent mremap will unmap any
+ * previous mapping at the address range specified by dest_addr and region_size. This significantly
+ * affects the remap time of subsequent tests. So we clean up mappings after each test.
+ */
+clean_up_dest:
+	munmap(dest_addr, c.region_size);
+clean_up_src:
+	munmap(src_addr, c.region_size);
+out:
+	return ret;
+}
+
+static void run_mremap_test_case(struct test test_case, int *failures)
+{
+	long long remap_time = remap_region(test_case.config);
+
+	if (remap_time < 0) {
+		if (test_case.expect_failure)
+			ksft_test_result_pass("%s\n\tExpected mremap failure\n", test_case.name);
+		else {
+			ksft_test_result_fail("%s\n", test_case.name);
+			*failures += 1;
+		}
+	} else
+		ksft_test_result_pass("%s\n\tmremap time: %12lldns\n", test_case.name, remap_time);
+}
+
+int main(int argc, char *argv[])
+{
+	int failures = 0;
+	int i;
+
+	struct test test_cases[] = {
+		/* Expected mremap failures */
+		MAKE_TEST(_4KB, _4KB, _4KB, 1 /* overlaps */, 1 /* fails */,
+			  "mremap - Source and Destination Regions Overlapping"),
+		MAKE_TEST(_4KB, _1KB, _4KB, 0 /* overlaps */, 1 /* fails */,
+			  "mremap - Destination Address Misaligned (1KB-aligned)"),
+		MAKE_TEST(_1KB, _4KB, _4KB, 0 /* overlaps */, 1 /* fails */,
+			  "mremap - Source Address Misaligned (1KB-aligned)"),
+
+		/* Src addr PTE aligned */
+		MAKE_SIMPLE_TEST(PTE, PTE, _8KB),
+
+		/* Src addr 1MB aligned */
+		MAKE_SIMPLE_TEST(_1MB, PTE, _2MB),
+		MAKE_SIMPLE_TEST(_1MB, _1MB, _2MB),
+
+		/* Src addr PMD aligned */
+		MAKE_SIMPLE_TEST(PMD, PTE, _4MB),
+		MAKE_SIMPLE_TEST(PMD, _1MB, _4MB),
+		MAKE_SIMPLE_TEST(PMD, PMD, _4MB),
+
+		/* Src addr PUD aligned */
+		MAKE_SIMPLE_TEST(PUD, PTE, _2GB),
+		MAKE_SIMPLE_TEST(PUD, _1MB, _2GB),
+		MAKE_SIMPLE_TEST(PUD, PMD, _2GB),
+		MAKE_SIMPLE_TEST(PUD, PUD, _2GB),
+	};
+
+	struct test perf_test_cases[] = {
+		/* mremap 1GB region - Page table level aligned time comparison */
+		MAKE_SIMPLE_TEST(PTE, PTE, _1GB),
+		MAKE_SIMPLE_TEST(PMD, PMD, _1GB),
+		MAKE_SIMPLE_TEST(PUD, PUD, _1GB),
+	};
+
+	ksft_set_plan(ARRAY_SIZE(test_cases) + ARRAY_SIZE(perf_test_cases));
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++)
+		run_mremap_test_case(test_cases[i], &failures);
+
+	ksft_print_msg("\nmremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region:\n");
+	for (i = 0; i < ARRAY_SIZE(perf_test_cases); i++)
+		run_mremap_test_case(perf_test_cases[i], &failures);
+
+	if (failures > 0)
+		ksft_exit_fail();
+	else
+		ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests
index a3f4f30f0a2e..d578ad831813 100755
--- a/tools/testing/selftests/vm/run_vmtests
+++ b/tools/testing/selftests/vm/run_vmtests
@@ -241,6 +241,17 @@  else
 	echo "[PASS]"
 fi
 
+echo "-------------------"
+echo "running mremap_test"
+echo "-------------------"
+./mremap_test
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+	exitcode=1
+else
+	echo "[PASS]"
+fi
+
 echo "-----------------"
 echo "running thuge-gen"
 echo "-----------------"