Message ID | 20200219174354.84726-5-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86: improve assisted tlb flush and use it in guest mode | expand |
On 19.02.2020 18:43, Roger Pau Monne wrote: > Introduce a specific flag to request a HVM guest TLB flush, which is > an ASID/VPID tickle that forces a linear TLB flush for all HVM guests. Here and below, what do you mean by "linear"? I guess you mean TLBs holding translations from guest linear to guest physical, but I think this could do with then also saying so, even if it's more words. > This was previously unconditionally done in each pre_flush call, but > that's not required: HVM guests not using shadow don't require linear > TLB flushes as Xen doesn't modify the guest page tables in that case > (ie: when using HAP). This explains the correctness in one direction. What about the removal of this from the switch_cr3_cr4() path? And what about our safety assumptions from the ticking of tlbflush_clock, where we then imply that pages e.g. about to be freed can't have any translations left in any TLBs anymore? > --- a/xen/include/asm-x86/flushtlb.h > +++ b/xen/include/asm-x86/flushtlb.h > @@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4); > #define FLUSH_VCPU_STATE 0x1000 > /* Flush the per-cpu root page table */ > #define FLUSH_ROOT_PGTBL 0x2000 > + /* Flush all HVM guests linear TLB (using ASID/VPID) */ > +#define FLUSH_GUESTS_TLB 0x4000 For one, the "all" is pretty misleading. A single such request doesn't do this for all vCPU-s of all HVM guests, does it? I'm also struggling with the 'S' in "GUESTS" - why is it not just "GUEST"? I admit the names of the involved functions (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat misleading, as they don't actually do any flushing, they merely arrange for what is in the TLB to no longer be able to be used, so giving this a suitable name is "historically" complicated. What if we did away with the hvm_flush_guest_tlbs() wrapper, naming the constant here then after hvm_asid_flush_core(), e.g. FLUSH_ASID_CORE? I also think this constant might better be zero when !HVM. Jan
On Fri, Feb 28, 2020 at 05:14:05PM +0100, Jan Beulich wrote: > On 19.02.2020 18:43, Roger Pau Monne wrote: > > Introduce a specific flag to request a HVM guest TLB flush, which is > > an ASID/VPID tickle that forces a linear TLB flush for all HVM guests. > > Here and below, what do you mean by "linear"? I guess you mean > TLBs holding translations from guest linear to guest physical, > but I think this could do with then also saying so, even if it's > more words. Yes, will change. > > This was previously unconditionally done in each pre_flush call, but > > that's not required: HVM guests not using shadow don't require linear > > TLB flushes as Xen doesn't modify the guest page tables in that case > > (ie: when using HAP). > > This explains the correctness in one direction. What about the > removal of this from the switch_cr3_cr4() path? AFAICT that's never used by shadow code to modify cr3 or cr4, and hence doesn't require a guest linear TLB flush. > And what about > our safety assumptions from the ticking of tlbflush_clock, > where we then imply that pages e.g. about to be freed can't > have any translations left in any TLBs anymore? I'm slightly confused. That flush only affects HVM guests linear TLB, and hence is not under Xen control unless shadow mode is used. Pages to be freed in the HAP case need to be flushed from the EPT/NPT, but whether there are references left in the guest TLB to point to that gfn really doesn't matter to Xen at all, since the guest is in full control of it's MMU and TLB in that case. For shadow any change in the guest page-tables should already be accompanied by a guest TLB flush, or else the guest could access no longer present entries, regardless of whether the affected pages are freed or not. > > --- a/xen/include/asm-x86/flushtlb.h > > +++ b/xen/include/asm-x86/flushtlb.h > > @@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4); > > #define FLUSH_VCPU_STATE 0x1000 > > /* Flush the per-cpu root page table */ > > #define FLUSH_ROOT_PGTBL 0x2000 > > + /* Flush all HVM guests linear TLB (using ASID/VPID) */ > > +#define FLUSH_GUESTS_TLB 0x4000 > > For one, the "all" is pretty misleading. A single such request > doesn't do this for all vCPU-s of all HVM guests, does it? It kind of does because it tickles the pCPU ASID/VPID generation ID, so any vCPU scheduled on the selected pCPUs will get a new ASID/VPID allocated and thus a clean TLB. > I'm > also struggling with the 'S' in "GUESTS" - why is it not just > "GUEST"? Any guest vCPUs running on the selected pCPUs will get a new ASID/VPID ID and thus a clean TLB. > I admit the names of the involved functions > (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat > misleading, as they don't actually do any flushing, they merely > arrange for what is in the TLB to no longer be able to be used, > so giving this a suitable name is "historically" complicated. > What if we did away with the hvm_flush_guest_tlbs() wrapper, > naming the constant here then after hvm_asid_flush_core(), e.g. > FLUSH_ASID_CORE? I'm not opposed to renaming. The comment before the definition was also meant to clarify it's usage, and hence the explicit mention of ASID/VPID. > I also think this constant might better be zero when !HVM. Yes, that's a good idea. Thanks, Roger.
On 28.02.2020 17:50, Roger Pau Monné wrote: > On Fri, Feb 28, 2020 at 05:14:05PM +0100, Jan Beulich wrote: >> On 19.02.2020 18:43, Roger Pau Monne wrote: >>> This was previously unconditionally done in each pre_flush call, but >>> that's not required: HVM guests not using shadow don't require linear >>> TLB flushes as Xen doesn't modify the guest page tables in that case >>> (ie: when using HAP). >> >> This explains the correctness in one direction. What about the >> removal of this from the switch_cr3_cr4() path? > > AFAICT that's never used by shadow code to modify cr3 or cr4, and > hence doesn't require a guest linear TLB flush. XSA-294 tells me to be very conservative here. It is not necessarily the direct use by shadow code that matters; toggle_guest_*() isn't used directly by it, either. >> And what about >> our safety assumptions from the ticking of tlbflush_clock, >> where we then imply that pages e.g. about to be freed can't >> have any translations left in any TLBs anymore? > > I'm slightly confused. That flush only affects HVM guests linear TLB, > and hence is not under Xen control unless shadow mode is used. Pages > to be freed in the HAP case need to be flushed from the EPT/NPT, but > whether there are references left in the guest TLB to point to that > gfn really doesn't matter to Xen at all, since the guest is in full > control of it's MMU and TLB in that case. Ah yes, sorry, I probably didn't get my thinking right around combined mappings and when they get invalidated. >>> --- a/xen/include/asm-x86/flushtlb.h >>> +++ b/xen/include/asm-x86/flushtlb.h >>> @@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4); >>> #define FLUSH_VCPU_STATE 0x1000 >>> /* Flush the per-cpu root page table */ >>> #define FLUSH_ROOT_PGTBL 0x2000 >>> + /* Flush all HVM guests linear TLB (using ASID/VPID) */ >>> +#define FLUSH_GUESTS_TLB 0x4000 >> >> For one, the "all" is pretty misleading. A single such request >> doesn't do this for all vCPU-s of all HVM guests, does it? > > It kind of does because it tickles the pCPU ASID/VPID generation ID, > so any vCPU scheduled on the selected pCPUs will get a new ASID/VPID > allocated and thus a clean TLB. > >> I'm >> also struggling with the 'S' in "GUESTS" - why is it not just >> "GUEST"? > > Any guest vCPUs running on the selected pCPUs will get a new ASID/VPID > ID and thus a clean TLB. Right, I see. Yet ... >> I admit the names of the involved functions >> (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat >> misleading, as they don't actually do any flushing, they merely >> arrange for what is in the TLB to no longer be able to be used, >> so giving this a suitable name is "historically" complicated. >> What if we did away with the hvm_flush_guest_tlbs() wrapper, >> naming the constant here then after hvm_asid_flush_core(), e.g. >> FLUSH_ASID_CORE? > > I'm not opposed to renaming. The comment before the definition was > also meant to clarify it's usage, and hence the explicit mention of > ASID/VPID. ... there's also one more argument for renaming: The present name doesn't convey at all that this operation is HVM-only (i.e. PV guests wouldn't have their TLBs [as far as one can call them "their"] flushed). Jan
On Mon, Mar 02, 2020 at 11:31:23AM +0100, Jan Beulich wrote: > On 28.02.2020 17:50, Roger Pau Monné wrote: > > On Fri, Feb 28, 2020 at 05:14:05PM +0100, Jan Beulich wrote: > >> On 19.02.2020 18:43, Roger Pau Monne wrote: > >>> This was previously unconditionally done in each pre_flush call, but > >>> that's not required: HVM guests not using shadow don't require linear > >>> TLB flushes as Xen doesn't modify the guest page tables in that case > >>> (ie: when using HAP). > >> > >> This explains the correctness in one direction. What about the > >> removal of this from the switch_cr3_cr4() path? > > > > AFAICT that's never used by shadow code to modify cr3 or cr4, and > > hence doesn't require a guest linear TLB flush. > > XSA-294 tells me to be very conservative here. It is not necessarily > the direct use by shadow code that matters; toggle_guest_*() isn't > used directly by it, either. toggle_guest_{mode/pt} seems to be exclusively used by PV guests. I'm fine with adding extra flushes to be on the safe side, but those functions are never used against a HVM guest AFAICT. The only reason to flush a HVM guest 'internal' TLB is when using shadow, and in that case the shadow code must already take care of issuing such flushes. > >> I admit the names of the involved functions > >> (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat > >> misleading, as they don't actually do any flushing, they merely > >> arrange for what is in the TLB to no longer be able to be used, > >> so giving this a suitable name is "historically" complicated. > >> What if we did away with the hvm_flush_guest_tlbs() wrapper, > >> naming the constant here then after hvm_asid_flush_core(), e.g. > >> FLUSH_ASID_CORE? > > > > I'm not opposed to renaming. The comment before the definition was > > also meant to clarify it's usage, and hence the explicit mention of > > ASID/VPID. > > ... there's also one more argument for renaming: The present > name doesn't convey at all that this operation is HVM-only > (i.e. PV guests wouldn't have their TLBs [as far as one can > call them "their"] flushed). Do you think FLUSH_ASID_CORE is clear enough, or would you prefer FLUSH_HVM_ASID_CORE? Thanks, Roger.
On 02.03.2020 17:50, Roger Pau Monné wrote: > On Mon, Mar 02, 2020 at 11:31:23AM +0100, Jan Beulich wrote: >> On 28.02.2020 17:50, Roger Pau Monné wrote: >>> On Fri, Feb 28, 2020 at 05:14:05PM +0100, Jan Beulich wrote: >>>> On 19.02.2020 18:43, Roger Pau Monne wrote: >>>>> This was previously unconditionally done in each pre_flush call, but >>>>> that's not required: HVM guests not using shadow don't require linear >>>>> TLB flushes as Xen doesn't modify the guest page tables in that case >>>>> (ie: when using HAP). >>>> >>>> This explains the correctness in one direction. What about the >>>> removal of this from the switch_cr3_cr4() path? >>> >>> AFAICT that's never used by shadow code to modify cr3 or cr4, and >>> hence doesn't require a guest linear TLB flush. >> >> XSA-294 tells me to be very conservative here. It is not necessarily >> the direct use by shadow code that matters; toggle_guest_*() isn't >> used directly by it, either. > > toggle_guest_{mode/pt} seems to be exclusively used by PV guests. I'm > fine with adding extra flushes to be on the safe side, but those > functions are never used against a HVM guest AFAICT. The only reason > to flush a HVM guest 'internal' TLB is when using shadow, and in that > case the shadow code must already take care of issuing such flushes. What I'm asking for primarily is to extend the description. If it is clear enough, it ought to also be clear enough that no flushes need inserting anywhere. >>>> I admit the names of the involved functions >>>> (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat >>>> misleading, as they don't actually do any flushing, they merely >>>> arrange for what is in the TLB to no longer be able to be used, >>>> so giving this a suitable name is "historically" complicated. >>>> What if we did away with the hvm_flush_guest_tlbs() wrapper, >>>> naming the constant here then after hvm_asid_flush_core(), e.g. >>>> FLUSH_ASID_CORE? >>> >>> I'm not opposed to renaming. The comment before the definition was >>> also meant to clarify it's usage, and hence the explicit mention of >>> ASID/VPID. >> >> ... there's also one more argument for renaming: The present >> name doesn't convey at all that this operation is HVM-only >> (i.e. PV guests wouldn't have their TLBs [as far as one can >> call them "their"] flushed). > > Do you think FLUSH_ASID_CORE is clear enough, or would you prefer > FLUSH_HVM_ASID_CORE? While in principle in our code base ASID implies HVM, perhaps the latter would still be even slightly better. Jan
diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c index 03f92c23dc..e7ccd4ec7b 100644 --- a/xen/arch/x86/flushtlb.c +++ b/xen/arch/x86/flushtlb.c @@ -59,8 +59,6 @@ static u32 pre_flush(void) raise_softirq(NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ); skip_clocktick: - hvm_flush_guest_tlbs(); - return t2; } @@ -221,6 +219,9 @@ unsigned int flush_area_local(const void *va, unsigned int flags) do_tlb_flush(); } + if ( flags & FLUSH_GUESTS_TLB ) + hvm_flush_guest_tlbs(); + if ( flags & FLUSH_CACHE ) { const struct cpuinfo_x86 *c = ¤t_cpu_data; diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c index 121ddf1255..4847f24d3b 100644 --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -363,7 +363,7 @@ static int oos_remove_write_access(struct vcpu *v, mfn_t gmfn, } if ( ftlb ) - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); return 0; } @@ -939,7 +939,7 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages) /* See if that freed up enough space */ if ( d->arch.paging.shadow.free_pages >= pages ) { - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); return; } } @@ -993,7 +993,7 @@ static void shadow_blow_tables(struct domain *d) pagetable_get_mfn(v->arch.shadow_table[i]), 0); /* Make sure everyone sees the unshadowings */ - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); } void shadow_blow_tables_per_domain(struct domain *d) @@ -1102,7 +1102,7 @@ mfn_t shadow_alloc(struct domain *d, if ( unlikely(!cpumask_empty(&mask)) ) { perfc_incr(shadow_alloc_tlbflush); - flush_tlb_mask(&mask); + flush_mask(&mask, FLUSH_TLB | FLUSH_GUESTS_TLB); } /* Now safe to clear the page for reuse */ clear_domain_page(page_to_mfn(sp)); @@ -2290,7 +2290,7 @@ void sh_remove_shadows(struct domain *d, mfn_t gmfn, int fast, int all) /* Need to flush TLBs now, so that linear maps are safe next time we * take a fault. */ - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); paging_unlock(d); } @@ -3005,7 +3005,7 @@ static void sh_unshadow_for_p2m_change(struct domain *d, unsigned long gfn, { sh_remove_all_shadows_and_parents(d, mfn); if ( sh_remove_all_mappings(d, mfn, _gfn(gfn)) ) - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); } } @@ -3045,7 +3045,7 @@ static void sh_unshadow_for_p2m_change(struct domain *d, unsigned long gfn, } omfn = mfn_add(omfn, 1); } - flush_tlb_mask(&flushmask); + flush_mask(&flushmask, FLUSH_TLB | FLUSH_GUESTS_TLB); if ( npte ) unmap_domain_page(npte); @@ -3332,7 +3332,7 @@ int shadow_track_dirty_vram(struct domain *d, } } if ( flush_tlb ) - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); goto out; out_sl1ma: @@ -3402,7 +3402,7 @@ bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v), } /* Flush TLBs on all CPUs with dirty vcpu state. */ - flush_tlb_mask(mask); + flush_mask(mask, FLUSH_TLB | FLUSH_GUESTS_TLB); /* Done. */ for_each_vcpu ( d, v ) diff --git a/xen/arch/x86/mm/shadow/hvm.c b/xen/arch/x86/mm/shadow/hvm.c index a219266fa2..64077d181b 100644 --- a/xen/arch/x86/mm/shadow/hvm.c +++ b/xen/arch/x86/mm/shadow/hvm.c @@ -590,7 +590,7 @@ static void validate_guest_pt_write(struct vcpu *v, mfn_t gmfn, if ( rc & SHADOW_SET_FLUSH ) /* Need to flush TLBs to pick up shadow PT changes */ - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); if ( rc & SHADOW_SET_ERROR ) { diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c index 26798b317c..22aeb97b1e 100644 --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -3066,7 +3066,7 @@ static int sh_page_fault(struct vcpu *v, perfc_incr(shadow_rm_write_flush_tlb); smp_wmb(); atomic_inc(&d->arch.paging.shadow.gtable_dirty_version); - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); } #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) @@ -3575,7 +3575,7 @@ static bool sh_invlpg(struct vcpu *v, unsigned long linear) if ( mfn_to_page(sl1mfn)->u.sh.type == SH_type_fl1_shadow ) { - flush_tlb_local(); + flush_local(FLUSH_TLB | FLUSH_GUESTS_TLB); return false; } @@ -3810,7 +3810,7 @@ sh_update_linear_entries(struct vcpu *v) * table entry. But, without this change, it would fetch the wrong * value due to a stale TLB. */ - flush_tlb_local(); + flush_local(FLUSH_TLB | FLUSH_GUESTS_TLB); } } @@ -4011,7 +4011,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush) * (old) shadow linear maps in the writeable mapping heuristics. */ #if GUEST_PAGING_LEVELS == 2 if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 ) - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow); #elif GUEST_PAGING_LEVELS == 3 /* PAE guests have four shadow_table entries, based on the @@ -4035,7 +4035,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush) } } if ( flush ) - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); /* Now install the new shadows. */ for ( i = 0; i < 4; i++ ) { @@ -4056,7 +4056,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush) } #elif GUEST_PAGING_LEVELS == 4 if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 ) - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow); if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) ) { @@ -4502,7 +4502,7 @@ static void sh_pagetable_dying(paddr_t gpa) } } if ( flush ) - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); /* Remember that we've seen the guest use this interface, so we * can rely on it using it in future, instead of guessing at @@ -4539,7 +4539,7 @@ static void sh_pagetable_dying(paddr_t gpa) mfn_to_page(gmfn)->pagetable_dying = true; shadow_unhook_mappings(d, smfn, 1/* user pages only */); /* Now flush the TLB: we removed toplevel mappings. */ - flush_tlb_mask(d->dirty_cpumask); + flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB); } /* Remember that we've seen the guest use this interface, so we diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h index 2cfe4e6e97..07f9bc6103 100644 --- a/xen/include/asm-x86/flushtlb.h +++ b/xen/include/asm-x86/flushtlb.h @@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4); #define FLUSH_VCPU_STATE 0x1000 /* Flush the per-cpu root page table */ #define FLUSH_ROOT_PGTBL 0x2000 + /* Flush all HVM guests linear TLB (using ASID/VPID) */ +#define FLUSH_GUESTS_TLB 0x4000 /* Flush local TLBs/caches. */ unsigned int flush_area_local(const void *va, unsigned int flags);