Message ID | 7ffeec9f-2ce4-9122-4699-32c3ffb06a5d@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/AMD: also determine L3 cache size | expand |
On 16/04/2021 14:20, Jan Beulich wrote: > For Intel CPUs we record L3 cache size, hence we should also do so for > AMD and alike. > > While making these additions, also make sure (throughout the function) > that we don't needlessly overwrite prior values when the new value to be > stored is zero. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > I have to admit though that I'm not convinced the sole real use of the > field (in flush_area_local()) is a good one - flushing an entire L3's > worth of lines via CLFLUSH may not be more efficient than using WBINVD. > But I didn't measure it (yet). WBINVD always needs a broadcast IPI to work correctly. CLFLUSH and friends let you do this from a single CPU, using cache coherency to DTRT with the line, wherever it is. Looking at that logic in flush_area_local(), I don't see how it can be correct. The WBINVD path is a decomposition inside the IPI, but in the higher level helpers, I don't see how the "area too big, convert to WBINVD" can be safe. All users of FLUSH_CACHE are flush_all(), except two PCI Passthrough-restricted cases. MMUEXT_FLUSH_CACHE_GLOBAL looks to be safe, while vmx_do_resume() has very dubious reasoning, and is dead code I think, because I'm not aware of a VT-x capable CPU without WBINVD-exiting. ~Andrew
On 16.04.2021 16:21, Andrew Cooper wrote: > On 16/04/2021 14:20, Jan Beulich wrote: >> For Intel CPUs we record L3 cache size, hence we should also do so for >> AMD and alike. >> >> While making these additions, also make sure (throughout the function) >> that we don't needlessly overwrite prior values when the new value to be >> stored is zero. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> I have to admit though that I'm not convinced the sole real use of the >> field (in flush_area_local()) is a good one - flushing an entire L3's >> worth of lines via CLFLUSH may not be more efficient than using WBINVD. >> But I didn't measure it (yet). > > WBINVD always needs a broadcast IPI to work correctly. > > CLFLUSH and friends let you do this from a single CPU, using cache > coherency to DTRT with the line, wherever it is. > > > Looking at that logic in flush_area_local(), I don't see how it can be > correct. The WBINVD path is a decomposition inside the IPI, but in the > higher level helpers, I don't see how the "area too big, convert to > WBINVD" can be safe. Would you mind giving an example? I'm struggling to understand what exactly you mean to point out. Jan > All users of FLUSH_CACHE are flush_all(), except two PCI > Passthrough-restricted cases. MMUEXT_FLUSH_CACHE_GLOBAL looks to be > safe, while vmx_do_resume() has very dubious reasoning, and is dead code > I think, because I'm not aware of a VT-x capable CPU without WBINVD-exiting. > > ~Andrew >
On 16.04.2021 16:21, Andrew Cooper wrote: > On 16/04/2021 14:20, Jan Beulich wrote: >> For Intel CPUs we record L3 cache size, hence we should also do so for >> AMD and alike. >> >> While making these additions, also make sure (throughout the function) >> that we don't needlessly overwrite prior values when the new value to be >> stored is zero. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> I have to admit though that I'm not convinced the sole real use of the >> field (in flush_area_local()) is a good one - flushing an entire L3's >> worth of lines via CLFLUSH may not be more efficient than using WBINVD. >> But I didn't measure it (yet). > > WBINVD always needs a broadcast IPI to work correctly. > > CLFLUSH and friends let you do this from a single CPU, using cache > coherency to DTRT with the line, wherever it is. > > > Looking at that logic in flush_area_local(), I don't see how it can be > correct. The WBINVD path is a decomposition inside the IPI, but in the > higher level helpers, I don't see how the "area too big, convert to > WBINVD" can be safe. > > All users of FLUSH_CACHE are flush_all(), except two PCI > Passthrough-restricted cases. MMUEXT_FLUSH_CACHE_GLOBAL looks to be > safe, while vmx_do_resume() has very dubious reasoning, and is dead code > I think, because I'm not aware of a VT-x capable CPU without WBINVD-exiting. Besides my prior question on your reply, may I also ask what all of this means for the patch itself? After all you've been replying to the post-commit-message remark only so far. Jan
On 29.04.2021 11:21, Jan Beulich wrote: > On 16.04.2021 16:21, Andrew Cooper wrote: >> On 16/04/2021 14:20, Jan Beulich wrote: >>> For Intel CPUs we record L3 cache size, hence we should also do so for >>> AMD and alike. >>> >>> While making these additions, also make sure (throughout the function) >>> that we don't needlessly overwrite prior values when the new value to be >>> stored is zero. >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> --- >>> I have to admit though that I'm not convinced the sole real use of the >>> field (in flush_area_local()) is a good one - flushing an entire L3's >>> worth of lines via CLFLUSH may not be more efficient than using WBINVD. >>> But I didn't measure it (yet). >> >> WBINVD always needs a broadcast IPI to work correctly. >> >> CLFLUSH and friends let you do this from a single CPU, using cache >> coherency to DTRT with the line, wherever it is. >> >> >> Looking at that logic in flush_area_local(), I don't see how it can be >> correct. The WBINVD path is a decomposition inside the IPI, but in the >> higher level helpers, I don't see how the "area too big, convert to >> WBINVD" can be safe. >> >> All users of FLUSH_CACHE are flush_all(), except two PCI >> Passthrough-restricted cases. MMUEXT_FLUSH_CACHE_GLOBAL looks to be >> safe, while vmx_do_resume() has very dubious reasoning, and is dead code >> I think, because I'm not aware of a VT-x capable CPU without WBINVD-exiting. > > Besides my prior question on your reply, may I also ask what all of > this means for the patch itself? After all you've been replying to > the post-commit-message remark only so far. As for the other patch just pinged again, unless I hear back on the patch itself by then, I'm intending to commit this the week after the next one, if need be without any acks. Jan
--- a/xen/arch/x86/cpu/common.c +++ b/xen/arch/x86/cpu/common.c @@ -240,28 +240,41 @@ int get_model_name(struct cpuinfo_x86 *c void display_cacheinfo(struct cpuinfo_x86 *c) { - unsigned int dummy, ecx, edx, l2size; + unsigned int dummy, ecx, edx, size; if (c->extended_cpuid_level >= 0x80000005) { cpuid(0x80000005, &dummy, &dummy, &ecx, &edx); - if (opt_cpu_info) - printk("CPU: L1 I cache %dK (%d bytes/line)," - " D cache %dK (%d bytes/line)\n", - edx>>24, edx&0xFF, ecx>>24, ecx&0xFF); - c->x86_cache_size=(ecx>>24)+(edx>>24); + if ((edx | ecx) >> 24) { + if (opt_cpu_info) + printk("CPU: L1 I cache %uK (%u bytes/line)," + " D cache %uK (%u bytes/line)\n", + edx >> 24, edx & 0xFF, ecx >> 24, ecx & 0xFF); + c->x86_cache_size = (ecx >> 24) + (edx >> 24); + } } if (c->extended_cpuid_level < 0x80000006) /* Some chips just has a large L1. */ return; - ecx = cpuid_ecx(0x80000006); - l2size = ecx >> 16; - - c->x86_cache_size = l2size; - - if (opt_cpu_info) - printk("CPU: L2 Cache: %dK (%d bytes/line)\n", - l2size, ecx & 0xFF); + cpuid(0x80000006, &dummy, &dummy, &ecx, &edx); + + size = ecx >> 16; + if (size) { + c->x86_cache_size = size; + + if (opt_cpu_info) + printk("CPU: L2 Cache: %uK (%u bytes/line)\n", + size, ecx & 0xFF); + } + + size = edx >> 18; + if (size) { + c->x86_cache_size = size * 512; + + if (opt_cpu_info) + printk("CPU: L3 Cache: %uM (%u bytes/line)\n", + (size + (size & 1)) >> 1, edx & 0xFF); + } } static inline u32 _phys_pkg_id(u32 cpuid_apic, int index_msb)
For Intel CPUs we record L3 cache size, hence we should also do so for AMD and alike. While making these additions, also make sure (throughout the function) that we don't needlessly overwrite prior values when the new value to be stored is zero. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- I have to admit though that I'm not convinced the sole real use of the field (in flush_area_local()) is a good one - flushing an entire L3's worth of lines via CLFLUSH may not be more efficient than using WBINVD. But I didn't measure it (yet).