diff mbox series

[v10,2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64

Message ID 20190930015740.84362-3-justin.he@arm.com (mailing list archive)
State New, archived
Headers show
Series fix double page fault on arm64 | expand

Commit Message

Jia He Sept. 30, 2019, 1:57 a.m. UTC
On arm64 without hardware Access Flag, copying fromuser will fail because
the pte is old and cannot be marked young. So we always end up with zeroed
page after fork() + CoW for pfn mappings. we don't always have a
hardware-managed access flag on arm64.

Hence implement arch_faults_on_old_pte on arm64 to indicate that it might
cause page fault when accessing old pte.

Signed-off-by: Jia He <justin.he@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Will Deacon Oct. 1, 2019, 12:50 p.m. UTC | #1
On Mon, Sep 30, 2019 at 09:57:39AM +0800, Jia He wrote:
> On arm64 without hardware Access Flag, copying fromuser will fail because
> the pte is old and cannot be marked young. So we always end up with zeroed
> page after fork() + CoW for pfn mappings. we don't always have a
> hardware-managed access flag on arm64.
> 
> Hence implement arch_faults_on_old_pte on arm64 to indicate that it might
> cause page fault when accessing old pte.
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 7576df00eb50..e96fb82f62de 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  #define phys_to_ttbr(addr)	(addr)
>  #endif
>  
> +/*
> + * On arm64 without hardware Access Flag, copying from user will fail because
> + * the pte is old and cannot be marked young. So we always end up with zeroed
> + * page after fork() + CoW for pfn mappings. We don't always have a
> + * hardware-managed access flag on arm64.
> + */
> +static inline bool arch_faults_on_old_pte(void)
> +{
> +	WARN_ON(preemptible());
> +
> +	return !cpu_has_hw_af();
> +}

Does this work correctly in a KVM guest? (i.e. is the MMFR sanitised in that
case, despite not being the case on the host?)

Will
Marc Zyngier Oct. 1, 2019, 1:32 p.m. UTC | #2
On Tue, 1 Oct 2019 13:50:32 +0100
Will Deacon <will@kernel.org> wrote:

> On Mon, Sep 30, 2019 at 09:57:39AM +0800, Jia He wrote:
> > On arm64 without hardware Access Flag, copying fromuser will fail because
> > the pte is old and cannot be marked young. So we always end up with zeroed
> > page after fork() + CoW for pfn mappings. we don't always have a
> > hardware-managed access flag on arm64.
> > 
> > Hence implement arch_faults_on_old_pte on arm64 to indicate that it might
> > cause page fault when accessing old pte.
> > 
> > Signed-off-by: Jia He <justin.he@arm.com>
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > ---
> >  arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > index 7576df00eb50..e96fb82f62de 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> >  #define phys_to_ttbr(addr)	(addr)
> >  #endif
> >  
> > +/*
> > + * On arm64 without hardware Access Flag, copying from user will fail because
> > + * the pte is old and cannot be marked young. So we always end up with zeroed
> > + * page after fork() + CoW for pfn mappings. We don't always have a
> > + * hardware-managed access flag on arm64.
> > + */
> > +static inline bool arch_faults_on_old_pte(void)
> > +{
> > +	WARN_ON(preemptible());
> > +
> > +	return !cpu_has_hw_af();
> > +}  
> 
> Does this work correctly in a KVM guest? (i.e. is the MMFR sanitised in that
> case, despite not being the case on the host?)

Yup, all the 64bit MMFRs are trapped (HCR_EL2.TID3 is set for an
AArch64 guest), and we return the sanitised version.

But that's an interesting remark: we're now trading an extra fault on
CPUs that do not support HWAFDBS for a guaranteed trap for each and
every guest under the sun that will hit the COW path...

My gut feeling is that this is going to be pretty visible. Jia, do you
have any numbers for this kind of behaviour?

Thanks,

	M.
Jia He Oct. 8, 2019, 1:55 a.m. UTC | #3
Hi Will and Marc

> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: 2019年10月1日 21:32
> To: Will Deacon <will@kernel.org>
> Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
> Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
> Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> <punitagrawal@gmail.com>; Thomas Gleixner <tglx@linutronix.de>;
> Andrew Morton <akpm@linux-foundation.org>; hejianet@gmail.com; Kaly
> Xin (Arm Technology China) <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v10 2/3] arm64: mm: implement
> arch_faults_on_old_pte() on arm64
> 
> On Tue, 1 Oct 2019 13:50:32 +0100
> Will Deacon <will@kernel.org> wrote:
> 
> > On Mon, Sep 30, 2019 at 09:57:39AM +0800, Jia He wrote:
> > > On arm64 without hardware Access Flag, copying fromuser will fail
> because
> > > the pte is old and cannot be marked young. So we always end up with
> zeroed
> > > page after fork() + CoW for pfn mappings. we don't always have a
> > > hardware-managed access flag on arm64.
> > >
> > > Hence implement arch_faults_on_old_pte on arm64 to indicate that it
> might
> > > cause page fault when accessing old pte.
> > >
> > > Signed-off-by: Jia He <justin.he@arm.com>
> > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > > ---
> > >  arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > >
> > > diff --git a/arch/arm64/include/asm/pgtable.h
> b/arch/arm64/include/asm/pgtable.h
> > > index 7576df00eb50..e96fb82f62de 100644
> > > --- a/arch/arm64/include/asm/pgtable.h
> > > +++ b/arch/arm64/include/asm/pgtable.h
> > > @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct
> vm_area_struct *vma,
> > >  #define phys_to_ttbr(addr)	(addr)
> > >  #endif
> > >
> > > +/*
> > > + * On arm64 without hardware Access Flag, copying from user will fail
> because
> > > + * the pte is old and cannot be marked young. So we always end up
> with zeroed
> > > + * page after fork() + CoW for pfn mappings. We don't always have a
> > > + * hardware-managed access flag on arm64.
> > > + */
> > > +static inline bool arch_faults_on_old_pte(void)
> > > +{
> > > +	WARN_ON(preemptible());
> > > +
> > > +	return !cpu_has_hw_af();
> > > +}
> >
> > Does this work correctly in a KVM guest? (i.e. is the MMFR sanitised in
> that
> > case, despite not being the case on the host?)
> 
> Yup, all the 64bit MMFRs are trapped (HCR_EL2.TID3 is set for an
> AArch64 guest), and we return the sanitised version.
Thanks for Marc's explanation. I verified the patch series on a kvm guest (-M virt)
with simulated nvdimm device created by qemu. The host is ThunderX2 aarch64.

> 
> But that's an interesting remark: we're now trading an extra fault on
> CPUs that do not support HWAFDBS for a guaranteed trap for each and
> every guest under the sun that will hit the COW path...
> 
> My gut feeling is that this is going to be pretty visible. Jia, do you
> have any numbers for this kind of behaviour?
It is not a common COW path, but a COW for PFN mapping pages only.
I add a g_counter before pte_mkyoung in force_mkyoung{} when testing 
vmmalloc_fork at [1].

In this test case, it will start M fork processes and N pthreads. The default is
M=2,N=4. the g_counter is about 241, that is it will hit my patch series for 241
times.
If I set M=20 and N=40 for TEST3, the g_counter is about 1492.
  
[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork


--
Cheers,
Justin (Jia He)
Jia He Oct. 8, 2019, 2:30 a.m. UTC | #4
> -----Original Message-----
> From: Justin He (Arm Technology China)
> Sent: 2019年10月8日 9:55
> To: Marc Zyngier <maz@kernel.org>; Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
> Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> <punitagrawal@gmail.com>; Thomas Gleixner <tglx@linutronix.de>;
> Andrew Morton <akpm@linux-foundation.org>; hejianet@gmail.com; Kaly
> Xin (Arm Technology China) <Kaly.Xin@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH v10 2/3] arm64: mm: implement
> arch_faults_on_old_pte() on arm64
> 
> Hi Will and Marc
> 
> > -----Original Message-----
> > From: Marc Zyngier <maz@kernel.org>
> > Sent: 2019年10月1日 21:32
> > To: Will Deacon <will@kernel.org>
> > Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
> > Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> > <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
> > Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> > linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> > <punitagrawal@gmail.com>; Thomas Gleixner <tglx@linutronix.de>;
> > Andrew Morton <akpm@linux-foundation.org>; hejianet@gmail.com;
> Kaly
> > Xin (Arm Technology China) <Kaly.Xin@arm.com>
> > Subject: Re: [PATCH v10 2/3] arm64: mm: implement
> > arch_faults_on_old_pte() on arm64
> >
> > On Tue, 1 Oct 2019 13:50:32 +0100
> > Will Deacon <will@kernel.org> wrote:
> >
> > > On Mon, Sep 30, 2019 at 09:57:39AM +0800, Jia He wrote:
> > > > On arm64 without hardware Access Flag, copying fromuser will fail
> > because
> > > > the pte is old and cannot be marked young. So we always end up with
> > zeroed
> > > > page after fork() + CoW for pfn mappings. we don't always have a
> > > > hardware-managed access flag on arm64.
> > > >
> > > > Hence implement arch_faults_on_old_pte on arm64 to indicate that
> it
> > might
> > > > cause page fault when accessing old pte.
> > > >
> > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > ---
> > > >  arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
> > > >  1 file changed, 14 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/include/asm/pgtable.h
> > b/arch/arm64/include/asm/pgtable.h
> > > > index 7576df00eb50..e96fb82f62de 100644
> > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct
> > vm_area_struct *vma,
> > > >  #define phys_to_ttbr(addr)	(addr)
> > > >  #endif
> > > >
> > > > +/*
> > > > + * On arm64 without hardware Access Flag, copying from user will
> fail
> > because
> > > > + * the pte is old and cannot be marked young. So we always end up
> > with zeroed
> > > > + * page after fork() + CoW for pfn mappings. We don't always have a
> > > > + * hardware-managed access flag on arm64.
> > > > + */
> > > > +static inline bool arch_faults_on_old_pte(void)
> > > > +{
> > > > +	WARN_ON(preemptible());
> > > > +
> > > > +	return !cpu_has_hw_af();
> > > > +}
> > >
> > > Does this work correctly in a KVM guest? (i.e. is the MMFR sanitised in
> > that
> > > case, despite not being the case on the host?)
> >
> > Yup, all the 64bit MMFRs are trapped (HCR_EL2.TID3 is set for an
> > AArch64 guest), and we return the sanitised version.
> Thanks for Marc's explanation. I verified the patch series on a kvm guest (-
> M virt)
> with simulated nvdimm device created by qemu. The host is ThunderX2
> aarch64.
> 
> >
> > But that's an interesting remark: we're now trading an extra fault on
> > CPUs that do not support HWAFDBS for a guaranteed trap for each and
> > every guest under the sun that will hit the COW path...
> >
> > My gut feeling is that this is going to be pretty visible. Jia, do you
> > have any numbers for this kind of behaviour?
> It is not a common COW path, but a COW for PFN mapping pages only.
> I add a g_counter before pte_mkyoung in force_mkyoung{} when testing
> vmmalloc_fork at [1].
> 
> In this test case, it will start M fork processes and N pthreads. The default is
> M=2,N=4. the g_counter is about 241, that is it will hit my patch series for
> 241
> times.
> If I set M=20 and N=40 for TEST3, the g_counter is about 1492.

The time overhead of test vmmalloc_fork is:
real    0m5.411s
user    0m4.206s
sys     0m2.699s

> 
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> 
> 
> --
> Cheers,
> Justin (Jia He)
>
Marc Zyngier Oct. 8, 2019, 7:46 a.m. UTC | #5
On Tue, 8 Oct 2019 01:55:04 +0000
"Justin He (Arm Technology China)" <Justin.He@arm.com> wrote:

> Hi Will and Marc
> 
> > -----Original Message-----
> > From: Marc Zyngier <maz@kernel.org>
> > Sent: 2019年10月1日 21:32
> > To: Will Deacon <will@kernel.org>
> > Cc: Justin He (Arm Technology China) <Justin.He@arm.com>; Catalin
> > Marinas <Catalin.Marinas@arm.com>; Mark Rutland
> > <Mark.Rutland@arm.com>; James Morse <James.Morse@arm.com>;
> > Matthew Wilcox <willy@infradead.org>; Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com>; linux-arm-kernel@lists.infradead.org;
> > linux-kernel@vger.kernel.org; linux-mm@kvack.org; Punit Agrawal
> > <punitagrawal@gmail.com>; Thomas Gleixner <tglx@linutronix.de>;
> > Andrew Morton <akpm@linux-foundation.org>; hejianet@gmail.com; Kaly
> > Xin (Arm Technology China) <Kaly.Xin@arm.com>
> > Subject: Re: [PATCH v10 2/3] arm64: mm: implement
> > arch_faults_on_old_pte() on arm64
> > 
> > On Tue, 1 Oct 2019 13:50:32 +0100
> > Will Deacon <will@kernel.org> wrote:
> >   
> > > On Mon, Sep 30, 2019 at 09:57:39AM +0800, Jia He wrote:  
> > > > On arm64 without hardware Access Flag, copying fromuser will fail  
> > because  
> > > > the pte is old and cannot be marked young. So we always end up with  
> > zeroed  
> > > > page after fork() + CoW for pfn mappings. we don't always have a
> > > > hardware-managed access flag on arm64.
> > > >
> > > > Hence implement arch_faults_on_old_pte on arm64 to indicate that it  
> > might  
> > > > cause page fault when accessing old pte.
> > > >
> > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > ---
> > > >  arch/arm64/include/asm/pgtable.h | 14 ++++++++++++++
> > > >  1 file changed, 14 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/include/asm/pgtable.h  
> > b/arch/arm64/include/asm/pgtable.h  
> > > > index 7576df00eb50..e96fb82f62de 100644
> > > > --- a/arch/arm64/include/asm/pgtable.h
> > > > +++ b/arch/arm64/include/asm/pgtable.h
> > > > @@ -885,6 +885,20 @@ static inline void update_mmu_cache(struct  
> > vm_area_struct *vma,  
> > > >  #define phys_to_ttbr(addr)	(addr)
> > > >  #endif
> > > >
> > > > +/*
> > > > + * On arm64 without hardware Access Flag, copying from user will fail  
> > because  
> > > > + * the pte is old and cannot be marked young. So we always end up  
> > with zeroed  
> > > > + * page after fork() + CoW for pfn mappings. We don't always have a
> > > > + * hardware-managed access flag on arm64.
> > > > + */
> > > > +static inline bool arch_faults_on_old_pte(void)
> > > > +{
> > > > +	WARN_ON(preemptible());
> > > > +
> > > > +	return !cpu_has_hw_af();
> > > > +}  
> > >
> > > Does this work correctly in a KVM guest? (i.e. is the MMFR sanitised in  
> > that  
> > > case, despite not being the case on the host?)  
> > 
> > Yup, all the 64bit MMFRs are trapped (HCR_EL2.TID3 is set for an
> > AArch64 guest), and we return the sanitised version.  
> Thanks for Marc's explanation. I verified the patch series on a kvm guest (-M virt)
> with simulated nvdimm device created by qemu. The host is ThunderX2 aarch64.
> 
> > 
> > But that's an interesting remark: we're now trading an extra fault on
> > CPUs that do not support HWAFDBS for a guaranteed trap for each and
> > every guest under the sun that will hit the COW path...
> > 
> > My gut feeling is that this is going to be pretty visible. Jia, do you
> > have any numbers for this kind of behaviour?  
> It is not a common COW path, but a COW for PFN mapping pages only.
> I add a g_counter before pte_mkyoung in force_mkyoung{} when testing 
> vmmalloc_fork at [1].
> 
> In this test case, it will start M fork processes and N pthreads. The default is
> M=2,N=4. the g_counter is about 241, that is it will hit my patch series for 241
> times.
> If I set M=20 and N=40 for TEST3, the g_counter is about 1492.

I must confess I'm not so much interested in random microbenchmarks,
but more in actual applications that could potentially be impacted by
this. The numbers you're quoting here seem pretty small, which would
indicate a low overhead, but that's not indicative of what would happen
in real life.

I guess that we can leave it at that for now, and turn it into a CPU
feature (with the associated static key) if this shows anywhere.

Thanks,

	M.


>   
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> 
> 
> --
> Cheers,
> Justin (Jia He)
>
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7576df00eb50..e96fb82f62de 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -885,6 +885,20 @@  static inline void update_mmu_cache(struct vm_area_struct *vma,
 #define phys_to_ttbr(addr)	(addr)
 #endif
 
+/*
+ * On arm64 without hardware Access Flag, copying from user will fail because
+ * the pte is old and cannot be marked young. So we always end up with zeroed
+ * page after fork() + CoW for pfn mappings. We don't always have a
+ * hardware-managed access flag on arm64.
+ */
+static inline bool arch_faults_on_old_pte(void)
+{
+	WARN_ON(preemptible());
+
+	return !cpu_has_hw_af();
+}
+#define arch_faults_on_old_pte arch_faults_on_old_pte
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_PGTABLE_H */