Message ID | 1496702979-26132-2-git-send-email-cota@braap.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote: > This is a constant used as a hint for padding structs to hopefully avoid > false cache line sharing. > > The constant can be set at configure time by defining QEMU_CACHELINE_SIZE > via --extra-cflags. If not set there, we try to obtain the value from > the machine running the configure script. If we fail, we default to > reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others. > > Note: the configure script only picks up the cache line size when run > on Linux hosts because I have no other platforms (e.g. Windows, BSD's) > to test on. > > Signed-off-by: Emilio G. Cota <cota@braap.org> > --- > configure | 38 ++++++++++++++++++++++++++++++++++++++ > include/qemu/compiler.h | 17 +++++++++++++++++ > 2 files changed, 55 insertions(+) > > diff --git a/configure b/configure > index 13e040d..6a68cb2 100755 > --- a/configure > +++ b/configure > @@ -4832,6 +4832,41 @@ EOF > fi > fi > > +# Find out the size of a cache line on the host > +# TODO: support more platforms > +cat > $TMPC<<EOF > +#ifdef __linux__ > + > +#include <stdio.h> > + > +#define SYSFS "/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size" > + > +int main(int argc, char *argv[]) > +{ > + unsigned int size; > + FILE *fp; > + > + fp = fopen(SYSFS, "r"); > + if (fp == NULL) { > + return -1; > + } > + if (!fscanf(fp, "%u", &size)) { > + return -1; > + } > + return size; > +} > +#else > +#error Cannot find host cache line size > +#endif > +EOF Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)? Thanks,
On 06/05/2017 10:39 PM, Pranith Kumar wrote:
> Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)?
That's an excellent idea. In fact... see reply to 3/3.
r~
On Tue, Jun 06, 2017 at 01:39:45 -0400, Pranith Kumar wrote: > On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote: > > This is a constant used as a hint for padding structs to hopefully avoid > > false cache line sharing. > > > > The constant can be set at configure time by defining QEMU_CACHELINE_SIZE > > via --extra-cflags. If not set there, we try to obtain the value from > > the machine running the configure script. If we fail, we default to > > reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others. (snip) > Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)? I tried using sysconf, but it doesn't work on the PowerPC machine I have access to (it returns 0). It might be a machine-specific thing though-I don't know. Here's the machine's `uname -a': Linux gcc2-power8.osuosl.org 3.10.0-514.10.2.el7.ppc64le #1 SMP Fri Mar \ 3 16:16:38 GMT 2017 ppc64le ppc64le ppc64le GNU/Linux E.
On 06/06/2017 09:11 AM, Emilio G. Cota wrote: > On Tue, Jun 06, 2017 at 01:39:45 -0400, Pranith Kumar wrote: >> On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote: >>> This is a constant used as a hint for padding structs to hopefully avoid >>> false cache line sharing. >>> >>> The constant can be set at configure time by defining QEMU_CACHELINE_SIZE >>> via --extra-cflags. If not set there, we try to obtain the value from >>> the machine running the configure script. If we fail, we default to >>> reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others. > (snip) >> Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)? > > I tried using sysconf, but it doesn't work on the PowerPC machine I have > access to (it returns 0). It might be a machine-specific thing though-I > don't know. Here's the machine's `uname -a': > Linux gcc2-power8.osuosl.org 3.10.0-514.10.2.el7.ppc64le #1 SMP Fri Mar \ > 3 16:16:38 GMT 2017 ppc64le ppc64le ppc64le GNU/Linux Well that's unfortunate. Doing some digging, the kernel has provided the info to userland via elf auxv data since the beginning of time (aka initial git repository build), but glibc still does not export that information properly for ppc. For ppc, you can get what we want from qemu_getauxval(AT_ICACHEBSIZE). Indeed, we already have 4 different system dependent methods for determining the icache size in tcg/ppc/tcg-target.inc.c. So what I think we ought to do is create a new util/cachesize.c like so: unsigned qemu_icache_linesize = 64; unsigned qemu_dcache_linesize = 64; static void init_icache_data(void) { #ifdef _SC_LEVEL1_ICACHE_LINESIZE { long x = sysconf(_SC_LEVEL1_ICACHE_LINESIZE); if (x > 0) { qemu_icache_linesize = x; return; } } #endif #ifdef AT_ICACHEBSIZE { unsigned long x = qemu_getauxval(AT_ICACHEBSIZE); if (x > 0) { qemu_icache_linesize = x; return; } } #endif // Other system specific methods. } static void init_dcache_data(void) { // Similarly. } static void __attribute__((constructor)) init_cache_data(void) { init_icache_data(); init_dcache_data(); } In particular, I think you want to be padding to the icache linesize rather than the dcache linesize since what we're attempting is to avoid writable data in the icache. r~
Am 06.06.2017 um 19:39 schrieb Richard Henderson: > On 06/06/2017 09:11 AM, Emilio G. Cota wrote: >> On Tue, Jun 06, 2017 at 01:39:45 -0400, Pranith Kumar wrote: >>> On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <cota@braap.org> wrote: >>>> This is a constant used as a hint for padding structs to hopefully >>>> avoid >>>> false cache line sharing. >>>> >>>> The constant can be set at configure time by defining >>>> QEMU_CACHELINE_SIZE >>>> via --extra-cflags. If not set there, we try to obtain the value from >>>> the machine running the configure script. If we fail, we default to >>>> reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all >>>> others. >> (snip) >>> Is there any reason not to use sysconf(_SC_LEVEL1_DCACHE_LINESIZE)? >> >> I tried using sysconf, but it doesn't work on the PowerPC machine I have >> access to (it returns 0). It might be a machine-specific thing though-I >> don't know. Here's the machine's `uname -a': >> Linux gcc2-power8.osuosl.org 3.10.0-514.10.2.el7.ppc64le #1 SMP Fri >> Mar \ >> 3 16:16:38 GMT 2017 ppc64le ppc64le ppc64le GNU/Linux > > Well that's unfortunate. > > Doing some digging, the kernel has provided the info to userland via elf > auxv data since the beginning of time (aka initial git repository > build), but glibc still does not export that information properly for ppc. > > For ppc, you can get what we want from qemu_getauxval(AT_ICACHEBSIZE). > Indeed, we already have 4 different system dependent methods for > determining the icache size in tcg/ppc/tcg-target.inc.c. > > So what I think we ought to do is create a new util/cachesize.c like so: > > unsigned qemu_icache_linesize = 64; > unsigned qemu_dcache_linesize = 64; > > static void init_icache_data(void) > { > #ifdef _SC_LEVEL1_ICACHE_LINESIZE > { > long x = sysconf(_SC_LEVEL1_ICACHE_LINESIZE); > if (x > 0) { > qemu_icache_linesize = x; > return; > } > } > #endif > #ifdef AT_ICACHEBSIZE > { > unsigned long x = qemu_getauxval(AT_ICACHEBSIZE); > if (x > 0) { > qemu_icache_linesize = x; > return; > } > } > #endif > // Other system specific methods. On a fully patched Windows 10 with an i5-4690 this code works for me (TM): #ifdef _WIN32 { DWORD bufferSize = 0; if (!GetLogicalProcessorInformation(0, &bufferSize) && GetLastError() == ERROR_INSUFFICIENT_BUFFER) { PSYSTEM_LOGICAL_PROCESSOR_INFORMATION buffer = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION)g_malloc0(bufferSize); if (GetLogicalProcessorInformation(buffer, &bufferSize)) { size_t i = 0, numOfProcessors = bufferSize / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); for (; i < numOfProcessors; i++) { if (buffer[i].Relationship == RelationCache && buffer[i].Cache.Level == 1 && ( buffer[i].Cache.Type == CacheUnified || buffer[i].Cache.Type == CacheInstruction) ) { qemu_icache_linesize = buffer[i].Cache.LineSize; break; } } } g_free(buffer); } } #endif I don't particularly like that stair of ifs style, so I guess if I were to do a proper patch this should become a function. > } > > static void init_dcache_data(void) > { > // Similarly. The code from above, just s/CacheInstruction/CacheData/ and s/qemu_icache/qemu_dcache/ > } > > static void __attribute__((constructor)) init_cache_data(void) > { > init_icache_data(); > init_dcache_data(); > } > > In particular, I think you want to be padding to the icache linesize > rather than the dcache linesize since what we're attempting is to avoid > writable data in the icache. > > > r~ > > To quote from the documentation: "RelationCache: [... snip ...] Windows Server 2003: This value is not supported until Windows Server 2003 with SP1 and Windows XP Professional x64 Edition." -- https://msdn.microsoft.com/en-us/library/windows/desktop/ms686694(v=vs.85).aspx I'm not sure if that is considered a problem, as both systems aren't supported anymore for almost 2 years now. Geert
On Tue, Jun 06, 2017 at 22:28:23 +0200, Geert Martin Ijewski wrote:
> On a fully patched Windows 10 with an i5-4690 this code works for me (TM):
Thanks!
Can you please test this?
Emilio
---
#include "qemu/osdep.h"
#include <windows.h>
static unsigned int linesize_win(PROCESSOR_CACHE_TYPE type)
{
PSYSTEM_LOGICAL_PROCESSOR_INFORMATION buf;
DWORD size = 0;
unsigned int ret = 0;
BOOL success;
size_t n;
size_t i;
success = GetLogicalProcessorInformation(0, &size);
if (success || GetLastError() != ERROR_INSUFFICIENT_BUF) {
return 0;
}
buf = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION)g_malloc0(size);
if (!GetLogicalProcessorInformation(buf, &size)) {
goto out;
}
n = size / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION);
for (i = 0; i < n; i++) {
if (buf[i].Relationship == RelationCache &&
buf[i].Cache.Level == 1 &&
(buf[i].Cache.Type == CacheUnified ||
buf[i].Cache.Type == type)) {
ret = buf[i].Cache.LineSize;
break;
}
}
out:
g_free(buf);
return ret;
}
linesize_win(CacheInstruction);
linesize_win(CacheData);
Am 06.06.2017 um 23:38 schrieb Emilio G. Cota: > On Tue, Jun 06, 2017 at 22:28:23 +0200, Geert Martin Ijewski wrote: >> On a fully patched Windows 10 with an i5-4690 this code works for me (TM): > > Thanks! > Can you please test this? > > Emilio > --- > #include "qemu/osdep.h" > #include <windows.h> unnecassary as it's already included by qemu/osdep.h -> sysemu/os-win32.h > > static unsigned int linesize_win(PROCESSOR_CACHE_TYPE type) > { > PSYSTEM_LOGICAL_PROCESSOR_INFORMATION buf; > DWORD size = 0; > unsigned int ret = 0; > BOOL success; > size_t n; > size_t i; > > success = GetLogicalProcessorInformation(0, &size); > if (success || GetLastError() != ERROR_INSUFFICIENT_BUF) { > return 0; > } > buf = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION)g_malloc0(size); > if (!GetLogicalProcessorInformation(buf, &size)) { > goto out; > } > > n = size / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); > for (i = 0; i < n; i++) { > if (buf[i].Relationship == RelationCache && > buf[i].Cache.Level == 1 && > (buf[i].Cache.Type == CacheUnified || > buf[i].Cache.Type == type)) { > ret = buf[i].Cache.LineSize; > break; > } > } > out: > g_free(buf); > return ret; > } > > linesize_win(CacheInstruction); > linesize_win(CacheData); > > Yes, that works. Tested-by: Geert Martin Ijewski <gm.ijewski@web.de>
diff --git a/configure b/configure index 13e040d..6a68cb2 100755 --- a/configure +++ b/configure @@ -4832,6 +4832,41 @@ EOF fi fi +# Find out the size of a cache line on the host +# TODO: support more platforms +cat > $TMPC<<EOF +#ifdef __linux__ + +#include <stdio.h> + +#define SYSFS "/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size" + +int main(int argc, char *argv[]) +{ + unsigned int size; + FILE *fp; + + fp = fopen(SYSFS, "r"); + if (fp == NULL) { + return -1; + } + if (!fscanf(fp, "%u", &size)) { + return -1; + } + return size; +} +#else +#error Cannot find host cache line size +#endif +EOF + +host_cacheline_size=0 +if compile_prog "" "" ; then + ./$TMPE + host_cacheline_size=$? +fi + + ########################################## # check for _Static_assert() @@ -5284,6 +5319,9 @@ fi if test "$bigendian" = "yes" ; then echo "HOST_WORDS_BIGENDIAN=y" >> $config_host_mak fi +if test "$host_cacheline_size" -gt 0 ; then + echo "HOST_CACHELINE_SIZE=$host_cacheline_size" >> $config_host_mak +fi if test "$mingw32" = "yes" ; then echo "CONFIG_WIN32=y" >> $config_host_mak rc_version=$(cat $source_path/VERSION) diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h index 340e5fd..178d831 100644 --- a/include/qemu/compiler.h +++ b/include/qemu/compiler.h @@ -40,6 +40,23 @@ # define QEMU_PACKED __attribute__((packed)) #endif +/* + * Cache line size of the host. Can be overriden. + * Note that this is just a compile-time hint to hopefully avoid false sharing + * of cache lines; code must be correct regardless of the constant's value. + */ +#ifndef QEMU_CACHELINE_SIZE +# ifdef HOST_CACHELINE_SIZE +# define QEMU_CACHELINE_SIZE HOST_CACHELINE_SIZE +# else +# if defined(__powerpc64__) +# define QEMU_CACHELINE_SIZE 128 +# else +# define QEMU_CACHELINE_SIZE 64 +# endif +# endif +#endif + #define QEMU_ALIGNED(X) __attribute__((aligned(X))) #ifndef glue
This is a constant used as a hint for padding structs to hopefully avoid false cache line sharing. The constant can be set at configure time by defining QEMU_CACHELINE_SIZE via --extra-cflags. If not set there, we try to obtain the value from the machine running the configure script. If we fail, we default to reasonable values, i.e. 128 bytes for ppc64 and 64 bytes for all others. Note: the configure script only picks up the cache line size when run on Linux hosts because I have no other platforms (e.g. Windows, BSD's) to test on. Signed-off-by: Emilio G. Cota <cota@braap.org> --- configure | 38 ++++++++++++++++++++++++++++++++++++++ include/qemu/compiler.h | 17 +++++++++++++++++ 2 files changed, 55 insertions(+)