diff mbox series

[v5,01/10] mm: x86, arm64: add arch_has_hw_pte_young()

Message ID 20211111041510.402534-2-yuzhao@google.com (mailing list archive)
State New, archived
Headers show
Series Multigenerational LRU Framework | expand

Commit Message

Yu Zhao Nov. 11, 2021, 4:15 a.m. UTC
Some architectures automatically set the accessed bit in PTEs, e.g.,
x86 and arm64 v8.2. On architectures that do not have this capability,
clearing the accessed bit in a PTE triggers a page fault following the
TLB miss of this PTE.

Being aware of this capability can help make better decisions, i.e.,
whether to limit the size of each batch of PTEs and the burst of
batches when clearing the accessed bit.

Signed-off-by: Yu Zhao <yuzhao@google.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
---
 arch/arm64/include/asm/cpufeature.h |  5 +++++
 arch/arm64/include/asm/pgtable.h    | 13 ++++++++-----
 arch/arm64/kernel/cpufeature.c      | 10 ++++++++++
 arch/arm64/tools/cpucaps            |  1 +
 arch/x86/include/asm/pgtable.h      |  6 +++---
 include/linux/pgtable.h             | 13 +++++++++++++
 mm/memory.c                         | 14 +-------------
 7 files changed, 41 insertions(+), 21 deletions(-)

Comments

Will Deacon Nov. 11, 2021, 8:59 a.m. UTC | #1
On Wed, Nov 10, 2021 at 09:15:01PM -0700, Yu Zhao wrote:
> Some architectures automatically set the accessed bit in PTEs, e.g.,
> x86 and arm64 v8.2. On architectures that do not have this capability,
> clearing the accessed bit in a PTE triggers a page fault following the
> TLB miss of this PTE.
> 
> Being aware of this capability can help make better decisions, i.e.,
> whether to limit the size of each batch of PTEs and the burst of
> batches when clearing the accessed bit.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
> ---
>  arch/arm64/include/asm/cpufeature.h |  5 +++++
>  arch/arm64/include/asm/pgtable.h    | 13 ++++++++-----
>  arch/arm64/kernel/cpufeature.c      | 10 ++++++++++
>  arch/arm64/tools/cpucaps            |  1 +
>  arch/x86/include/asm/pgtable.h      |  6 +++---
>  include/linux/pgtable.h             | 13 +++++++++++++
>  mm/memory.c                         | 14 +-------------
>  7 files changed, 41 insertions(+), 21 deletions(-)

*Please* cc the maintainers on arch patches. I asked you that last time,
too:

https://lore.kernel.org/r/20210819091923.GA15467@willie-the-truck

> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 6ec7036ef7e1..940615d33845 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2157,6 +2157,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>  		.matches = has_hw_dbm,
>  		.cpu_enable = cpu_enable_hw_dbm,
>  	},
> +	{
> +		.desc = "Hardware update of the Access flag",
> +		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
> +		.capability = ARM64_HW_AF,
> +		.sys_reg = SYS_ID_AA64MMFR1_EL1,
> +		.sign = FTR_UNSIGNED,
> +		.field_pos = ID_AA64MMFR1_HADBS_SHIFT,
> +		.min_field_value = 1,
> +		.matches = has_cpuid_feature,
> +	},

As before, please don't make this a system feature as it will prohibit
onlining of late CPUs with mismatched access flag support and I really
don't see that being necessary.

You should just be able to use arch_faults_on_old_pte() as-is.

Will
Yu Zhao Nov. 11, 2021, 11:11 a.m. UTC | #2
On Thu, Nov 11, 2021 at 1:59 AM Will Deacon <will@kernel.org> wrote:
>
> On Wed, Nov 10, 2021 at 09:15:01PM -0700, Yu Zhao wrote:
> > Some architectures automatically set the accessed bit in PTEs, e.g.,
> > x86 and arm64 v8.2. On architectures that do not have this capability,
> > clearing the accessed bit in a PTE triggers a page fault following the
> > TLB miss of this PTE.
> >
> > Being aware of this capability can help make better decisions, i.e.,
> > whether to limit the size of each batch of PTEs and the burst of
> > batches when clearing the accessed bit.
> >
> > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
> > ---
> >  arch/arm64/include/asm/cpufeature.h |  5 +++++
> >  arch/arm64/include/asm/pgtable.h    | 13 ++++++++-----
> >  arch/arm64/kernel/cpufeature.c      | 10 ++++++++++
> >  arch/arm64/tools/cpucaps            |  1 +
> >  arch/x86/include/asm/pgtable.h      |  6 +++---
> >  include/linux/pgtable.h             | 13 +++++++++++++
> >  mm/memory.c                         | 14 +-------------
> >  7 files changed, 41 insertions(+), 21 deletions(-)
>
> *Please* cc the maintainers on arch patches. I asked you that last time,
> too:
>
> https://lore.kernel.org/r/20210819091923.GA15467@willie-the-truck
>
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 6ec7036ef7e1..940615d33845 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -2157,6 +2157,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> >               .matches = has_hw_dbm,
> >               .cpu_enable = cpu_enable_hw_dbm,
> >       },
> > +     {
> > +             .desc = "Hardware update of the Access flag",
> > +             .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> > +             .capability = ARM64_HW_AF,
> > +             .sys_reg = SYS_ID_AA64MMFR1_EL1,
> > +             .sign = FTR_UNSIGNED,
> > +             .field_pos = ID_AA64MMFR1_HADBS_SHIFT,
> > +             .min_field_value = 1,
> > +             .matches = has_cpuid_feature,
> > +     },
>
> As before, please don't make this a system feature as it will prohibit
> onlining of late CPUs with mismatched access flag support and I really
> don't see that being necessary.
>
> You should just be able to use arch_faults_on_old_pte() as-is.

Well, I did explain why last time and there weren't any pushbacks
after I listed my reasons:

1) there is no known hardware that does this, i.e., having mismatching
accessed bit capabilities.
2) there are no foreseeable benefits in supporting such a theoretical use case

There are many other opportunities ARM could explore, e.g., supporting
the accessed bit on non-leaf entries as x86 has been for decades.
Dwelling on this theoretical use case, IMO, is just a typical example
of overengineering.

But I do hear you -- as I mentioned last time, I don't have a problem
with doing it your way. Please touch on the two points I raised so
that I know where we stand. Otherwise I might again assume I've
convinced you to do it my way :)

Also why should I be able to use arch_faults_on_old_pte() as-is? This
implies we can make the following decisions based on the same
information.
1) set the accessed bit since the CPU we are running on won't do it for us
2) don't clear the accessed bit on too many entries since the CPU we
are running on will have a page fault upon seeing cleared accessed bit
on an entry
Are there conditions equivalent? Technically no, but it probably would
make no difference anyway.

My real problem with arch_faults_on_old_pte() is that it needs to be
called with preemption disabled. Again, IMO, a static branch for such
a common capability (from x86 perspective) is a no-brainer, because
there is zero return on investment in leaving the door open for
mismatching accessed bit capabilities.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index ef6be92b1921..99518b4b2a9e 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -779,6 +779,11 @@  static inline bool system_supports_tlb_range(void)
 		cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
 }
 
+static inline bool system_has_hw_af(void)
+{
+	return IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && cpus_have_const_cap(ARM64_HW_AF);
+}
+
 extern int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
 
 static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index dfa76afa0ccf..880ae658a69a 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -993,13 +993,16 @@  static inline void update_mmu_cache(struct vm_area_struct *vma,
  * page after fork() + CoW for pfn mappings. We don't always have a
  * hardware-managed access flag on arm64.
  */
-static inline bool arch_faults_on_old_pte(void)
+static inline bool arch_has_hw_pte_young(bool local)
 {
-	WARN_ON(preemptible());
+	if (local) {
+		WARN_ON(preemptible());
+		return cpu_has_hw_af();
+	}
 
-	return !cpu_has_hw_af();
+	return system_has_hw_af();
 }
-#define arch_faults_on_old_pte		arch_faults_on_old_pte
+#define arch_has_hw_pte_young		arch_has_hw_pte_young
 
 /*
  * Experimentally, it's cheap to set the access flag in hardware and we
@@ -1007,7 +1010,7 @@  static inline bool arch_faults_on_old_pte(void)
  */
 static inline bool arch_wants_old_prefaulted_pte(void)
 {
-	return !arch_faults_on_old_pte();
+	return arch_has_hw_pte_young(true);
 }
 #define arch_wants_old_prefaulted_pte	arch_wants_old_prefaulted_pte
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6ec7036ef7e1..940615d33845 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2157,6 +2157,16 @@  static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_hw_dbm,
 		.cpu_enable = cpu_enable_hw_dbm,
 	},
+	{
+		.desc = "Hardware update of the Access flag",
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.capability = ARM64_HW_AF,
+		.sys_reg = SYS_ID_AA64MMFR1_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR1_HADBS_SHIFT,
+		.min_field_value = 1,
+		.matches = has_cpuid_feature,
+	},
 #endif
 	{
 		.desc = "CRC32 instructions",
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 49305c2e6dfd..d52f50671e60 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -35,6 +35,7 @@  HAS_STAGE2_FWB
 HAS_SYSREG_GIC_CPUIF
 HAS_TLB_RANGE
 HAS_VIRT_HOST_EXTN
+HW_AF
 HW_DBM
 KVM_PROTECTED_MODE
 MISMATCHED_CACHE_TYPE
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 448cd01eb3ec..c60b16f8b741 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1397,10 +1397,10 @@  static inline bool arch_has_pfn_modify_check(void)
 	return boot_cpu_has_bug(X86_BUG_L1TF);
 }
 
-#define arch_faults_on_old_pte arch_faults_on_old_pte
-static inline bool arch_faults_on_old_pte(void)
+#define arch_has_hw_pte_young arch_has_hw_pte_young
+static inline bool arch_has_hw_pte_young(bool local)
 {
-	return false;
+	return true;
 }
 
 #endif	/* __ASSEMBLY__ */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index e24d2c992b11..a64c6124c137 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -258,6 +258,19 @@  static inline int pmdp_clear_flush_young(struct vm_area_struct *vma,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif
 
+#ifndef arch_has_hw_pte_young
+/*
+ * Return whether the accessed bit is supported by the local CPU or all CPUs.
+ *
+ * Those arches which have hw access flag feature need to implement their own
+ * helper. By default, "false" means pagefault will be hit on old pte.
+ */
+static inline bool arch_has_hw_pte_young(bool local)
+{
+	return false;
+}
+#endif
+
 #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 				       unsigned long address,
diff --git a/mm/memory.c b/mm/memory.c
index c52be6d6b605..012fad94bc77 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -121,18 +121,6 @@  int randomize_va_space __read_mostly =
 					2;
 #endif
 
-#ifndef arch_faults_on_old_pte
-static inline bool arch_faults_on_old_pte(void)
-{
-	/*
-	 * Those arches which don't have hw access flag feature need to
-	 * implement their own helper. By default, "true" means pagefault
-	 * will be hit on old pte.
-	 */
-	return true;
-}
-#endif
-
 #ifndef arch_wants_old_prefaulted_pte
 static inline bool arch_wants_old_prefaulted_pte(void)
 {
@@ -2769,7 +2757,7 @@  static inline bool cow_user_page(struct page *dst, struct page *src,
 	 * On architectures with software "accessed" bits, we would
 	 * take a double page fault, so mark it accessed here.
 	 */
-	if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
+	if (!arch_has_hw_pte_young(true) && !pte_young(vmf->orig_pte)) {
 		pte_t entry;
 
 		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);