Message ID | 20170809200755.11234-5-tycho@docker.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Aug 09, 2017 at 02:07:49PM -0600, Tycho Andersen wrote: > From: Juerg Haefliger <juerg.haefliger@hpe.com> > > Add a hook for flushing a single TLB entry on arm64. > > Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com> > Tested-by: Tycho Andersen <tycho@docker.com> > --- > arch/arm64/include/asm/tlbflush.h | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h > index af1c76981911..8e0c49105d3e 100644 > --- a/arch/arm64/include/asm/tlbflush.h > +++ b/arch/arm64/include/asm/tlbflush.h > @@ -184,6 +184,14 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end > isb(); > } > > +static inline void __flush_tlb_one(unsigned long addr) > +{ > + dsb(ishst); > + __tlbi(vaae1is, addr >> 12); > + dsb(ish); > + isb(); > +} Is this going to be called by generic code? It would be nice if we could drop 'kernel' into the name, to make it clear this is intended to affect the kernel mappings, which have different maintenance requirements to user mappings. We should be able to implement this more simply as: flush_tlb_kernel_page(unsigned long addr) { flush_tlb_kernel_range(addr, addr + PAGE_SIZE); } Thanks, Mark.
Hi Mark, On Sat, Aug 12, 2017 at 12:26:03PM +0100, Mark Rutland wrote: > On Wed, Aug 09, 2017 at 02:07:49PM -0600, Tycho Andersen wrote: > > From: Juerg Haefliger <juerg.haefliger@hpe.com> > > > > Add a hook for flushing a single TLB entry on arm64. > > > > Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com> > > Tested-by: Tycho Andersen <tycho@docker.com> > > --- > > arch/arm64/include/asm/tlbflush.h | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h > > index af1c76981911..8e0c49105d3e 100644 > > --- a/arch/arm64/include/asm/tlbflush.h > > +++ b/arch/arm64/include/asm/tlbflush.h > > @@ -184,6 +184,14 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end > > isb(); > > } > > > > +static inline void __flush_tlb_one(unsigned long addr) > > +{ > > + dsb(ishst); > > + __tlbi(vaae1is, addr >> 12); > > + dsb(ish); > > + isb(); > > +} > > Is this going to be called by generic code? Yes, it's called in mm/xpfo.c:xpfo_kunmap. > It would be nice if we could drop 'kernel' into the name, to make it clear this > is intended to affect the kernel mappings, which have different maintenance > requirements to user mappings. > > We should be able to implement this more simply as: > > flush_tlb_kernel_page(unsigned long addr) > { > flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > } It's named __flush_tlb_one after the x86 (and a few other arches) function of the same name. I can change it to flush_tlb_kernel_page, but then we'll need some x86-specific code to map the name as well. Maybe since it's called from generic code that's warranted though? I'll change the implementation for now, let me know what you want to do about the name. Cheers, Tycho
On Mon, Aug 14, 2017 at 10:35:36AM -0600, Tycho Andersen wrote: > Hi Mark, > > On Sat, Aug 12, 2017 at 12:26:03PM +0100, Mark Rutland wrote: > > On Wed, Aug 09, 2017 at 02:07:49PM -0600, Tycho Andersen wrote: > > > +static inline void __flush_tlb_one(unsigned long addr) > > > +{ > > > + dsb(ishst); > > > + __tlbi(vaae1is, addr >> 12); > > > + dsb(ish); > > > + isb(); > > > +} > > > > Is this going to be called by generic code? > > Yes, it's called in mm/xpfo.c:xpfo_kunmap. > > > It would be nice if we could drop 'kernel' into the name, to make it clear this > > is intended to affect the kernel mappings, which have different maintenance > > requirements to user mappings. > It's named __flush_tlb_one after the x86 (and a few other arches) > function of the same name. I can change it to flush_tlb_kernel_page, > but then we'll need some x86-specific code to map the name as well. > > Maybe since it's called from generic code that's warranted though? > I'll change the implementation for now, let me know what you want to > do about the name. I think it would be preferable to do so, to align with flush_tlb_kernel_range(), which is an existing generic interface. That said, is there any reason not to use flush_tlb_kernel_range() directly? Thanks, Mark.
On Mon, Aug 14, 2017 at 05:50:47PM +0100, Mark Rutland wrote: > On Mon, Aug 14, 2017 at 10:35:36AM -0600, Tycho Andersen wrote: > > Hi Mark, > > > > On Sat, Aug 12, 2017 at 12:26:03PM +0100, Mark Rutland wrote: > > > On Wed, Aug 09, 2017 at 02:07:49PM -0600, Tycho Andersen wrote: > > > > +static inline void __flush_tlb_one(unsigned long addr) > > > > +{ > > > > + dsb(ishst); > > > > + __tlbi(vaae1is, addr >> 12); > > > > + dsb(ish); > > > > + isb(); > > > > +} > > > > > > Is this going to be called by generic code? > > > > Yes, it's called in mm/xpfo.c:xpfo_kunmap. > > > > > It would be nice if we could drop 'kernel' into the name, to make it clear this > > > is intended to affect the kernel mappings, which have different maintenance > > > requirements to user mappings. > > > It's named __flush_tlb_one after the x86 (and a few other arches) > > function of the same name. I can change it to flush_tlb_kernel_page, > > but then we'll need some x86-specific code to map the name as well. > > > > Maybe since it's called from generic code that's warranted though? > > I'll change the implementation for now, let me know what you want to > > do about the name. > > I think it would be preferable to do so, to align with > flush_tlb_kernel_range(), which is an existing generic interface. > > That said, is there any reason not to use flush_tlb_kernel_range() > directly? I don't think so, I'll change the generic code to that and drop this patch. Thanks! Tycho
Hi Mark, On Mon, Aug 14, 2017 at 05:50:47PM +0100, Mark Rutland wrote: > That said, is there any reason not to use flush_tlb_kernel_range() > directly? So it turns out that there is a difference between __flush_tlb_one() and flush_tlb_kernel_range() on x86: flush_tlb_kernel_range() flushes all the TLBs via on_each_cpu(), where as __flush_tlb_one() only flushes the local TLB (which I think is enough here). As you might expect, this is quite a performance hit (at least under kvm), I ran a little kernbench: # __flush_tlb_one Wed Aug 23 15:47:33 UTC 2017 4.13.0-rc5+ Average Half load -j 2 Run (std deviation): Elapsed Time 50.3233 (1.82716) User Time 87.1233 (1.26871) System Time 15.36 (0.500899) Percent CPU 203.667 (4.04145) Context Switches 7350.33 (1339.65) Sleeps 16008.3 (980.362) Average Optimal load -j 4 Run (std deviation): Elapsed Time 27.4267 (0.215019) User Time 88.6983 (1.91501) System Time 13.1933 (2.39488) Percent CPU 286.333 (90.6083) Context Switches 11393 (4509.14) Sleeps 15764.7 (698.048) # flush_tlb_kernel_range() Wed Aug 23 16:00:03 UTC 2017 4.13.0-rc5+ Average Half load -j 2 Run (std deviation): Elapsed Time 86.57 (1.06099) User Time 103.25 (1.85475) System Time 75.4433 (0.415852) Percent CPU 205.667 (3.21455) Context Switches 9363.33 (1361.57) Sleeps 14703.3 (1439.12) Average Optimal load -j 4 Run (std deviation): Elapsed Time 51.27 (0.615873) User Time 110.328 (7.93884) System Time 74.06 (1.55788) Percent CPU 288 (90.2197) Context Switches 16557.5 (7930.01) Sleeps 14774.7 (921.746) So, I think we need to keep something like __flush_tlb_one around. I'll call it flush_one_local_tlb() for now, and will cc x86@ on the next version to see if they have any insight. Cheers, Tycho
On Wed, Aug 23, 2017 at 10:58:42AM -0600, Tycho Andersen wrote: > Hi Mark, > > On Mon, Aug 14, 2017 at 05:50:47PM +0100, Mark Rutland wrote: > > That said, is there any reason not to use flush_tlb_kernel_range() > > directly? > > So it turns out that there is a difference between __flush_tlb_one() and > flush_tlb_kernel_range() on x86: flush_tlb_kernel_range() flushes all the TLBs > via on_each_cpu(), where as __flush_tlb_one() only flushes the local TLB (which > I think is enough here). That sounds suspicious; I don't think that __flush_tlb_one() is sufficient. If you only do local TLB maintenance, then the page is left accessible to other CPUs via the (stale) kernel mappings. i.e. the page isn't exclusively mapped by userspace. Thanks, Mark.
On 08/23/2017 07:04 PM, Mark Rutland wrote: > On Wed, Aug 23, 2017 at 10:58:42AM -0600, Tycho Andersen wrote: >> Hi Mark, >> >> On Mon, Aug 14, 2017 at 05:50:47PM +0100, Mark Rutland wrote: >>> That said, is there any reason not to use flush_tlb_kernel_range() >>> directly? >> >> So it turns out that there is a difference between __flush_tlb_one() and >> flush_tlb_kernel_range() on x86: flush_tlb_kernel_range() flushes all the TLBs >> via on_each_cpu(), where as __flush_tlb_one() only flushes the local TLB (which >> I think is enough here). > > That sounds suspicious; I don't think that __flush_tlb_one() is > sufficient. > > If you only do local TLB maintenance, then the page is left accessible > to other CPUs via the (stale) kernel mappings. i.e. the page isn't > exclusively mapped by userspace. We flush all CPUs to get rid of stale entries when a new page is allocated to userspace that was previously allocated to the kernel. Is that the scenario you were thinking of? ...Juerg > Thanks, > Mark. >
On Wed, Aug 30, 2017 at 07:31:25AM +0200, Juerg Haefliger wrote: > > > On 08/23/2017 07:04 PM, Mark Rutland wrote: > > On Wed, Aug 23, 2017 at 10:58:42AM -0600, Tycho Andersen wrote: > >> Hi Mark, > >> > >> On Mon, Aug 14, 2017 at 05:50:47PM +0100, Mark Rutland wrote: > >>> That said, is there any reason not to use flush_tlb_kernel_range() > >>> directly? > >> > >> So it turns out that there is a difference between __flush_tlb_one() and > >> flush_tlb_kernel_range() on x86: flush_tlb_kernel_range() flushes all the TLBs > >> via on_each_cpu(), where as __flush_tlb_one() only flushes the local TLB (which > >> I think is enough here). > > > > That sounds suspicious; I don't think that __flush_tlb_one() is > > sufficient. > > > > If you only do local TLB maintenance, then the page is left accessible > > to other CPUs via the (stale) kernel mappings. i.e. the page isn't > > exclusively mapped by userspace. > > We flush all CPUs to get rid of stale entries when a new page is > allocated to userspace that was previously allocated to the kernel. > Is that the scenario you were thinking of? I think there are two cases, the one you describe above, where the pages are first allocated, and a second one, where e.g. the pages are mapped into the kernel because of DMA or whatever. In the case you describe above, I think we're doing the right thing (which is why my test worked correctly, because it tested this case). In the second case, when the pages are unmapped (i.e. the kernel is done doing DMA), do we need to flush the other CPUs TLBs? I think the current code is not quite correct, because if multiple tasks (CPUs) map the pages, only the TLB of the last one is flushed when the mapping is cleared, because the tlb is only flushed when ->mapcount drops to zero, leaving stale entries in the other TLBs. It's not clear to me what to do about this case. Thoughts? Tycho > ...Juerg > > > > Thanks, > > Mark. > > >
On 08/30/2017 06:47 PM, Tycho Andersen wrote: > On Wed, Aug 30, 2017 at 07:31:25AM +0200, Juerg Haefliger wrote: >> >> >> On 08/23/2017 07:04 PM, Mark Rutland wrote: >>> On Wed, Aug 23, 2017 at 10:58:42AM -0600, Tycho Andersen wrote: >>>> Hi Mark, >>>> >>>> On Mon, Aug 14, 2017 at 05:50:47PM +0100, Mark Rutland wrote: >>>>> That said, is there any reason not to use flush_tlb_kernel_range() >>>>> directly? >>>> >>>> So it turns out that there is a difference between __flush_tlb_one() and >>>> flush_tlb_kernel_range() on x86: flush_tlb_kernel_range() flushes all the TLBs >>>> via on_each_cpu(), where as __flush_tlb_one() only flushes the local TLB (which >>>> I think is enough here). >>> >>> That sounds suspicious; I don't think that __flush_tlb_one() is >>> sufficient. >>> >>> If you only do local TLB maintenance, then the page is left accessible >>> to other CPUs via the (stale) kernel mappings. i.e. the page isn't >>> exclusively mapped by userspace. >> >> We flush all CPUs to get rid of stale entries when a new page is >> allocated to userspace that was previously allocated to the kernel. >> Is that the scenario you were thinking of? > > I think there are two cases, the one you describe above, where the > pages are first allocated, and a second one, where e.g. the pages are > mapped into the kernel because of DMA or whatever. In the case you > describe above, I think we're doing the right thing (which is why my > test worked correctly, because it tested this case). > > In the second case, when the pages are unmapped (i.e. the kernel is > done doing DMA), do we need to flush the other CPUs TLBs? I think the > current code is not quite correct, because if multiple tasks (CPUs) > map the pages, only the TLB of the last one is flushed when the > mapping is cleared, because the tlb is only flushed when ->mapcount > drops to zero, leaving stale entries in the other TLBs. It's not clear > to me what to do about this case. For this to happen, multiple CPUs need to have the same userspace page mapped at the same time. Is this a valid scenario? ...Juerg > Thoughts? > > Tycho > >> ...Juerg >> >> >>> Thanks, >>> Mark. >>> >> > > >
On Thu, Aug 31, 2017 at 11:43:53AM +0200, Juerg Haefliger wrote: > On 08/30/2017 06:47 PM, Tycho Andersen wrote: > > On Wed, Aug 30, 2017 at 07:31:25AM +0200, Juerg Haefliger wrote: > >> > >> > >> On 08/23/2017 07:04 PM, Mark Rutland wrote: > >>> On Wed, Aug 23, 2017 at 10:58:42AM -0600, Tycho Andersen wrote: > >>>> Hi Mark, > >>>> > >>>> On Mon, Aug 14, 2017 at 05:50:47PM +0100, Mark Rutland wrote: > >>>>> That said, is there any reason not to use flush_tlb_kernel_range() > >>>>> directly? > >>>> > >>>> So it turns out that there is a difference between __flush_tlb_one() and > >>>> flush_tlb_kernel_range() on x86: flush_tlb_kernel_range() flushes all the TLBs > >>>> via on_each_cpu(), where as __flush_tlb_one() only flushes the local TLB (which > >>>> I think is enough here). > >>> > >>> That sounds suspicious; I don't think that __flush_tlb_one() is > >>> sufficient. > >>> > >>> If you only do local TLB maintenance, then the page is left accessible > >>> to other CPUs via the (stale) kernel mappings. i.e. the page isn't > >>> exclusively mapped by userspace. > >> > >> We flush all CPUs to get rid of stale entries when a new page is > >> allocated to userspace that was previously allocated to the kernel. > >> Is that the scenario you were thinking of? > > > > I think there are two cases, the one you describe above, where the > > pages are first allocated, and a second one, where e.g. the pages are > > mapped into the kernel because of DMA or whatever. In the case you > > describe above, I think we're doing the right thing (which is why my > > test worked correctly, because it tested this case). > > > > In the second case, when the pages are unmapped (i.e. the kernel is > > done doing DMA), do we need to flush the other CPUs TLBs? I think the > > current code is not quite correct, because if multiple tasks (CPUs) > > map the pages, only the TLB of the last one is flushed when the > > mapping is cleared, because the tlb is only flushed when ->mapcount > > drops to zero, leaving stale entries in the other TLBs. It's not clear > > to me what to do about this case. > > For this to happen, multiple CPUs need to have the same userspace page > mapped at the same time. Is this a valid scenario? I believe so. I think you could trigger that with a multi-threaded application running across several CPUs. All those threads would share the same page tables. Thanks, Mark.
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index af1c76981911..8e0c49105d3e 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -184,6 +184,14 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end isb(); } +static inline void __flush_tlb_one(unsigned long addr) +{ + dsb(ishst); + __tlbi(vaae1is, addr >> 12); + dsb(ish); + isb(); +} + /* * Used to invalidate the TLB (walk caches) corresponding to intermediate page * table levels (pgd/pud/pmd).