diff mbox series

[v4,2/5] arm64: mremap speedup - Enable HAVE_MOVE_PMD

Message ID 20201014005320.2233162-3-kaleshsingh@google.com
State New
Headers show
Series Speed up mremap on large regions | expand

Commit Message

Kalesh Singh Oct. 14, 2020, 12:53 a.m. UTC
HAVE_MOVE_PMD enables remapping pages at the PMD level if both the
source and destination addresses are PMD-aligned.

HAVE_MOVE_PMD is already enabled on x86. The original patch [1] that
introduced this config did not enable it on arm64 at the time because
of performance issues with flushing the TLB on every PMD move. These
issues have since been addressed in more recent releases with
improvements to the arm64 TLB invalidation and core mmu_gather code as
Will Deacon mentioned in [2].

From the data below, it can be inferred that there is approximately
8x improvement in performance when HAVE_MOVE_PMD is enabled on arm64.

--------- Test Results ----------

The following results were obtained on an arm64 device running a 5.4
kernel, by remapping a PMD-aligned, 1GB sized region to a PMD-aligned
destination. The results from 10 iterations of the test are given below.
All times are in nanoseconds.

Control    HAVE_MOVE_PMD

9220833    1247761
9002552    1219896
9254115    1094792
8725885    1227760
9308646    1043698
9001667    1101771
8793385    1159896
8774636    1143594
9553125    1025833
9374010    1078125

9100885.4  1134312.6    <-- Mean Time in nanoseconds

Total mremap time for a 1GB sized PMD-aligned region drops from
~9.1 milliseconds to ~1.1 milliseconds. (~8x speedup).

[1] https://lore.kernel.org/r/20181108181201.88826-3-joelaf@google.com
[2] https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg140837.html

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
Changes in v4:
  - Add Kirill's Acked-by.

 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

Comments

Will Deacon Oct. 15, 2020, 10:55 a.m. UTC | #1
On Wed, Oct 14, 2020 at 12:53:07AM +0000, Kalesh Singh wrote:
> HAVE_MOVE_PMD enables remapping pages at the PMD level if both the
> source and destination addresses are PMD-aligned.
> 
> HAVE_MOVE_PMD is already enabled on x86. The original patch [1] that
> introduced this config did not enable it on arm64 at the time because
> of performance issues with flushing the TLB on every PMD move. These
> issues have since been addressed in more recent releases with
> improvements to the arm64 TLB invalidation and core mmu_gather code as
> Will Deacon mentioned in [2].
> 
> From the data below, it can be inferred that there is approximately
> 8x improvement in performance when HAVE_MOVE_PMD is enabled on arm64.
> 
> --------- Test Results ----------
> 
> The following results were obtained on an arm64 device running a 5.4
> kernel, by remapping a PMD-aligned, 1GB sized region to a PMD-aligned
> destination. The results from 10 iterations of the test are given below.
> All times are in nanoseconds.
> 
> Control    HAVE_MOVE_PMD
> 
> 9220833    1247761
> 9002552    1219896
> 9254115    1094792
> 8725885    1227760
> 9308646    1043698
> 9001667    1101771
> 8793385    1159896
> 8774636    1143594
> 9553125    1025833
> 9374010    1078125
> 
> 9100885.4  1134312.6    <-- Mean Time in nanoseconds
> 
> Total mremap time for a 1GB sized PMD-aligned region drops from
> ~9.1 milliseconds to ~1.1 milliseconds. (~8x speedup).
> 
> [1] https://lore.kernel.org/r/20181108181201.88826-3-joelaf@google.com
> [2] https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg140837.html
> 
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
> Changes in v4:
>   - Add Kirill's Acked-by.

Argh, I thought we already enabled this for PMDs back in 2018! Looks like
that we forgot to actually do that after I improved the performance of
the TLB invalidation.

I'll pick this one patch up for 5.10.

Will
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4b136e923ccb..434d6791e869 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -123,6 +123,7 @@  config ARM64
 	select GENERIC_VDSO_TIME_NS
 	select HANDLE_DOMAIN_IRQ
 	select HARDIRQS_SW_RESEND
+	select HAVE_MOVE_PMD
 	select HAVE_PCI
 	select HAVE_ACPI_APEI if (ACPI && EFI)
 	select HAVE_ALIGNED_STRUCT_PAGE if SLUB