Message ID | 20190528103313.1343-40-jgross@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xen: add core scheduling support | expand |
On 28/05/2019 11:32, Juergen Gross wrote: > Instead of dynamically decide whether the previous vcpu was using full "deciding" > or default GDT just add a percpu variable for that purpose. This at "was using a full or default GDT, just add" > once removes the need for testing vcpu_ids to differ twice. > > Cache the need_full_gdt(nd) value in a local variable. What's the point of doing this? I know the logic is rather complicated in __context_switch(), but at least it is visually consistent. After this change, it is asymmetric and harder to follow. > > Signed-off-by: Juergen Gross <jgross@suse.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> > --- > RFC V2: new patch (split from previous one) > V1: init percpu flag at cpu startup > rename variable (Jan Beulich) > --- > xen/arch/x86/cpu/common.c | 3 +++ > xen/arch/x86/domain.c | 16 +++++++++++----- > xen/include/asm-x86/desc.h | 1 + > 3 files changed, 15 insertions(+), 5 deletions(-) > > diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c > index 33f5d32557..8b90356fe5 100644 > --- a/xen/arch/x86/cpu/common.c > +++ b/xen/arch/x86/cpu/common.c > @@ -49,6 +49,8 @@ unsigned int vaddr_bits __read_mostly = VADDR_BITS; > static unsigned int cleared_caps[NCAPINTS]; > static unsigned int forced_caps[NCAPINTS]; > > +DEFINE_PER_CPU(bool, full_gdt_loaded); > + > void __init setup_clear_cpu_cap(unsigned int cap) > { > const uint32_t *dfs; > @@ -745,6 +747,7 @@ void load_system_tables(void) > offsetof(struct tss_struct, __cacheline_filler) - 1, > SYS_DESC_tss_busy); > > + per_cpu(full_gdt_loaded, cpu) = false; Indentation. (Although I've got half a mind to do a blanket convert of files like this to Xen style. They are almost completely diverged from their Linux heritage.) ~Andrew
On 02.07.19 18:09, Andrew Cooper wrote: > On 28/05/2019 11:32, Juergen Gross wrote: >> Instead of dynamically decide whether the previous vcpu was using full > > "deciding" > >> or default GDT just add a percpu variable for that purpose. This at > > "was using a full or default GDT, just add" > >> once removes the need for testing vcpu_ids to differ twice. >> >> Cache the need_full_gdt(nd) value in a local variable. > > What's the point of doing this? I know the logic is rather complicated > in __context_switch(), but at least it is visually consistent. After > this change, it is asymmetric and harder to follow. This is a hot path. need_full_gdt() needs two compares, of which one is using evaluate_nospec(). Juergen
On 03/07/2019 07:30, Juergen Gross wrote: > On 02.07.19 18:09, Andrew Cooper wrote: >> On 28/05/2019 11:32, Juergen Gross wrote: >>> Instead of dynamically decide whether the previous vcpu was using full >> >> "deciding" >> >>> or default GDT just add a percpu variable for that purpose. This at >> >> "was using a full or default GDT, just add" >> >>> once removes the need for testing vcpu_ids to differ twice. >>> >>> Cache the need_full_gdt(nd) value in a local variable. >> >> What's the point of doing this? I know the logic is rather complicated >> in __context_switch(), but at least it is visually consistent. After >> this change, it is asymmetric and harder to follow. > > This is a hot path. need_full_gdt() needs two compares, of which one is > using evaluate_nospec(). Urgh. So evalute_nospec() is already broken here because need_full_gdt() isn't always_inline, but surely this isn't the only example impacted in __context_switch()? The choice of 'gdt' is similarly impacted by the looks of things. I'd recommend not worrying about evalute_nospec() for now. There are several fundamental problems atm, and Xen 4.13 cannot ship with it in this state. ~Andrew
On 03.07.19 14:21, Andrew Cooper wrote: > On 03/07/2019 07:30, Juergen Gross wrote: >> On 02.07.19 18:09, Andrew Cooper wrote: >>> On 28/05/2019 11:32, Juergen Gross wrote: >>>> Instead of dynamically decide whether the previous vcpu was using full >>> >>> "deciding" >>> >>>> or default GDT just add a percpu variable for that purpose. This at >>> >>> "was using a full or default GDT, just add" >>> >>>> once removes the need for testing vcpu_ids to differ twice. >>>> >>>> Cache the need_full_gdt(nd) value in a local variable. >>> >>> What's the point of doing this? I know the logic is rather complicated >>> in __context_switch(), but at least it is visually consistent. After >>> this change, it is asymmetric and harder to follow. >> >> This is a hot path. need_full_gdt() needs two compares, of which one is >> using evaluate_nospec(). > > Urgh. So evalute_nospec() is already broken here because > need_full_gdt() isn't always_inline, but surely this isn't the only > example impacted in __context_switch()? The choice of 'gdt' is > similarly impacted by the looks of things. > > I'd recommend not worrying about evalute_nospec() for now. There are > several fundamental problems atm, and Xen 4.13 cannot ship with it in > this state. I did a small performance test with this patch and then removed latching of need_full_gdt(nd) in the local variable: On a 8 cpu system I started 2 mini-os domains (1 vcpu each) doing a busy loop sending events to dom0. On dom0 I did a build of the hypervisor via "make -j 8" and measured the time for that build, then took the average of 5 such builds (doing a make clean in between). elapsed user system Unpatched: 66.51 232.93 109.21 latched: 64.82 232.33 109.18 unlatched: 63.39 231.81 107.49 As there is a small advantage for not latching I'll remove the full_gdt local variable. Juergen
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c index 33f5d32557..8b90356fe5 100644 --- a/xen/arch/x86/cpu/common.c +++ b/xen/arch/x86/cpu/common.c @@ -49,6 +49,8 @@ unsigned int vaddr_bits __read_mostly = VADDR_BITS; static unsigned int cleared_caps[NCAPINTS]; static unsigned int forced_caps[NCAPINTS]; +DEFINE_PER_CPU(bool, full_gdt_loaded); + void __init setup_clear_cpu_cap(unsigned int cap) { const uint32_t *dfs; @@ -745,6 +747,7 @@ void load_system_tables(void) offsetof(struct tss_struct, __cacheline_filler) - 1, SYS_DESC_tss_busy); + per_cpu(full_gdt_loaded, cpu) = false; lgdt(&gdtr); lidt(&idtr); ltr(TSS_ENTRY << 3); diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index adc06154ee..98d2939daf 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1645,6 +1645,8 @@ static inline void load_full_gdt(const struct vcpu *v, unsigned int cpu) }; lgdt(&gdt_desc); + + per_cpu(full_gdt_loaded, cpu) = true; } static inline void load_default_gdt(const seg_desc_t *gdt, unsigned int cpu) @@ -1655,6 +1657,8 @@ static inline void load_default_gdt(const seg_desc_t *gdt, unsigned int cpu) }; lgdt(&gdt_desc); + + per_cpu(full_gdt_loaded, cpu) = false; } static void __context_switch(void) @@ -1665,6 +1669,7 @@ static void __context_switch(void) struct vcpu *n = current; struct domain *pd = p->domain, *nd = n->domain; seg_desc_t *gdt; + bool full_gdt; ASSERT(p != n); ASSERT(!vcpu_cpu_dirty(n)); @@ -1707,11 +1712,13 @@ static void __context_switch(void) gdt = !is_pv_32bit_domain(nd) ? per_cpu(gdt_table, cpu) : per_cpu(compat_gdt_table, cpu); - if ( need_full_gdt(nd) ) + full_gdt = need_full_gdt(nd); + + if ( full_gdt ) write_full_gdt_ptes(gdt, n); - if ( need_full_gdt(pd) && - ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd)) ) + if ( per_cpu(full_gdt_loaded, cpu) && + ((p->vcpu_id != n->vcpu_id) || !full_gdt) ) load_default_gdt(gdt, cpu); write_ptbase(n); @@ -1723,8 +1730,7 @@ static void __context_switch(void) svm_load_segs(0, 0, 0, 0, 0, 0, 0); #endif - if ( need_full_gdt(nd) && - ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) ) + if ( full_gdt && !per_cpu(full_gdt_loaded, cpu) ) load_full_gdt(n, cpu); if ( pd != nd ) diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h index 85e83bcefb..ff9ac5f15d 100644 --- a/xen/include/asm-x86/desc.h +++ b/xen/include/asm-x86/desc.h @@ -208,6 +208,7 @@ extern seg_desc_t boot_cpu_gdt_table[]; DECLARE_PER_CPU(seg_desc_t *, gdt_table); extern seg_desc_t boot_cpu_compat_gdt_table[]; DECLARE_PER_CPU(seg_desc_t *, compat_gdt_table); +DECLARE_PER_CPU(bool, full_gdt_loaded); extern void load_TR(void);