Message ID | 20220721120732.118133-8-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hostmem: NUMA-aware memory preallocation using ThreadContext | expand |
On 7/21/22 14:07, David Hildenbrand wrote: > Currently, there is no way to configure a CPU affinity inside QEMU when > the sandbox option disables it for QEMU as a whole, for example, via: > -sandbox enable=on,resourcecontrol=deny > > While ThreadContext objects can be created on the QEMU commandline and > the CPU affinity can be configured externally via the thread-id, this is > insufficient if a ThreadContext with a certain CPU affinity is already > required during QEMU startup, before we can intercept QEMU and > configure the CPU affinity. > > Blocking sched_setaffinity() was introduced in 24f8cdc57224 ("seccomp: > add resourcecontrol argument to command line"), "to avoid any bigger of the > process". However, we only care about once QEMU is running, not when > the instance starting QEMU explicitly requests a certain CPU affinity > on the QEMU comandline. > > Right now, for NUMA-aware preallocation of memory backends used for initial > machine RAM, one has to: > > 1) Start QEMU with the memory-backend with "prealloc=off" > 2) Pause QEMU before it starts the guest (-S) > 3) Create ThreadContext, configure the CPU affinity using the thread-id > 4) Configure the ThreadContext as "prealloc-context" of the memory > backend > 5) Trigger preallocation by setting "prealloc=on" > > To simplify this handling especially for initial machine RAM, > allow creation of ThreadContext objects before parsing sandbox options, > such that the CPU affinity requested on the QEMU commandline alongside the > sandbox option can be set. As ThreadContext objects essentially only create > a persistant context thread and set the CPU affinity, this is easily > possible. > > With this change, we can create a ThreadContext with a CPU affinity on > the QEMU commandline and use it for preallocation of memory backends > glued to the machine (simplified example): > > qemu-system-x86_64 -m 1G \ > -object thread-context,id=tc1,cpu-affinity=3-4 \ > -object memory-backend-ram,id=pc.ram,size=1G,prealloc=on,prealloc-threads=2,prealloc-context=tc1 \ > -machine memory-backend=pc.ram \ > -S -monitor stdio -sandbox enable=on,resourcecontrol=deny > > And while we can query the current CPU affinity: > (qemu) qom-get tc1 cpu-affinity > [ > 3, > 4 > ] > > We can no longer change it from QEMU directly: > (qemu) qom-set tc1 cpu-affinity 1-2 > Error: Setting CPU affinity failed: Operation not permitted > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > softmmu/vl.c | 30 +++++++++++++++++++++++++++++- > 1 file changed, 29 insertions(+), 1 deletion(-) > > diff --git a/softmmu/vl.c b/softmmu/vl.c > index aabd82e09a..252732cf5d 100644 > --- a/softmmu/vl.c > +++ b/softmmu/vl.c > @@ -1761,6 +1761,27 @@ static void object_option_parse(const char *optarg) > visit_free(v); > } > > +/* > + * Very early object creation, before the sandbox options have been activated. > + */ > +static bool object_create_pre_sandbox(const char *type) > +{ > + /* > + * Objects should in general not get initialized "too early" without > + * a reason. If you add one, state the reason in a comment! > + */ > + > + /* > + * Reason: -sandbox on,resourcecontrol=deny disallows setting CPU > + * affinity of threads. > + */ > + if (g_str_equal(type, "thread-context")) { > + return true; > + } > + > + return false; > +} > + > /* > * Initial object creation happens before all other > * QEMU data types are created. The majority of objects > @@ -1775,6 +1796,11 @@ static bool object_create_early(const char *type) > * add one, state the reason in a comment! > */ > > + /* Reason: already created. */ > + if (object_create_pre_sandbox(type)) { > + return false; > + } > + > /* Reason: property "chardev" */ > if (g_str_equal(type, "rng-egd") || > g_str_equal(type, "qtest")) { > @@ -1895,7 +1921,7 @@ static void qemu_create_early_backends(void) > */ > static bool object_create_late(const char *type) > { > - return !object_create_early(type); > + return !object_create_early(type) && !object_create_pre_sandbox(type); > } > > static void qemu_create_late_backends(void) > @@ -2365,6 +2391,8 @@ static int process_runstate_actions(void *opaque, QemuOpts *opts, Error **errp) > > static void qemu_process_early_options(void) > { > + object_option_foreach_add(object_create_pre_sandbox); > + > #ifdef CONFIG_SECCOMP > QemuOptsList *olist = qemu_find_opts_err("sandbox", NULL); > if (olist) { Cool, this is processed before -sandbox, so threads can have their affinity. However, it's also processed before -name debug-threads=on which means that even though we try to set a thread name in 3/7, it's effectively a dead code because name_threads from util/qemu-thread-posix.c is still false. Could we shift things a bit? E.g. like this: static void qemu_process_early_options(void) { qemu_opts_foreach(qemu_find_opts("name"), parse_name, NULL, &error_fatal); object_option_foreach_add(object_create_pre_sandbox); .. Michal
On 05.08.22 13:01, Michal Prívozník wrote: > On 7/21/22 14:07, David Hildenbrand wrote: >> Currently, there is no way to configure a CPU affinity inside QEMU when >> the sandbox option disables it for QEMU as a whole, for example, via: >> -sandbox enable=on,resourcecontrol=deny >> >> While ThreadContext objects can be created on the QEMU commandline and >> the CPU affinity can be configured externally via the thread-id, this is >> insufficient if a ThreadContext with a certain CPU affinity is already >> required during QEMU startup, before we can intercept QEMU and >> configure the CPU affinity. >> >> Blocking sched_setaffinity() was introduced in 24f8cdc57224 ("seccomp: >> add resourcecontrol argument to command line"), "to avoid any bigger of the >> process". However, we only care about once QEMU is running, not when >> the instance starting QEMU explicitly requests a certain CPU affinity >> on the QEMU comandline. >> >> Right now, for NUMA-aware preallocation of memory backends used for initial >> machine RAM, one has to: >> >> 1) Start QEMU with the memory-backend with "prealloc=off" >> 2) Pause QEMU before it starts the guest (-S) >> 3) Create ThreadContext, configure the CPU affinity using the thread-id >> 4) Configure the ThreadContext as "prealloc-context" of the memory >> backend >> 5) Trigger preallocation by setting "prealloc=on" >> >> To simplify this handling especially for initial machine RAM, >> allow creation of ThreadContext objects before parsing sandbox options, >> such that the CPU affinity requested on the QEMU commandline alongside the >> sandbox option can be set. As ThreadContext objects essentially only create >> a persistant context thread and set the CPU affinity, this is easily >> possible. >> >> With this change, we can create a ThreadContext with a CPU affinity on >> the QEMU commandline and use it for preallocation of memory backends >> glued to the machine (simplified example): >> >> qemu-system-x86_64 -m 1G \ >> -object thread-context,id=tc1,cpu-affinity=3-4 \ >> -object memory-backend-ram,id=pc.ram,size=1G,prealloc=on,prealloc-threads=2,prealloc-context=tc1 \ >> -machine memory-backend=pc.ram \ >> -S -monitor stdio -sandbox enable=on,resourcecontrol=deny >> >> And while we can query the current CPU affinity: >> (qemu) qom-get tc1 cpu-affinity >> [ >> 3, >> 4 >> ] >> >> We can no longer change it from QEMU directly: >> (qemu) qom-set tc1 cpu-affinity 1-2 >> Error: Setting CPU affinity failed: Operation not permitted >> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> softmmu/vl.c | 30 +++++++++++++++++++++++++++++- >> 1 file changed, 29 insertions(+), 1 deletion(-) >> >> diff --git a/softmmu/vl.c b/softmmu/vl.c >> index aabd82e09a..252732cf5d 100644 >> --- a/softmmu/vl.c >> +++ b/softmmu/vl.c >> @@ -1761,6 +1761,27 @@ static void object_option_parse(const char *optarg) >> visit_free(v); >> } >> >> +/* >> + * Very early object creation, before the sandbox options have been activated. >> + */ >> +static bool object_create_pre_sandbox(const char *type) >> +{ >> + /* >> + * Objects should in general not get initialized "too early" without >> + * a reason. If you add one, state the reason in a comment! >> + */ >> + >> + /* >> + * Reason: -sandbox on,resourcecontrol=deny disallows setting CPU >> + * affinity of threads. >> + */ >> + if (g_str_equal(type, "thread-context")) { >> + return true; >> + } >> + >> + return false; >> +} >> + >> /* >> * Initial object creation happens before all other >> * QEMU data types are created. The majority of objects >> @@ -1775,6 +1796,11 @@ static bool object_create_early(const char *type) >> * add one, state the reason in a comment! >> */ >> >> + /* Reason: already created. */ >> + if (object_create_pre_sandbox(type)) { >> + return false; >> + } >> + >> /* Reason: property "chardev" */ >> if (g_str_equal(type, "rng-egd") || >> g_str_equal(type, "qtest")) { >> @@ -1895,7 +1921,7 @@ static void qemu_create_early_backends(void) >> */ >> static bool object_create_late(const char *type) >> { >> - return !object_create_early(type); >> + return !object_create_early(type) && !object_create_pre_sandbox(type); >> } >> >> static void qemu_create_late_backends(void) >> @@ -2365,6 +2391,8 @@ static int process_runstate_actions(void *opaque, QemuOpts *opts, Error **errp) >> >> static void qemu_process_early_options(void) >> { >> + object_option_foreach_add(object_create_pre_sandbox); >> + >> #ifdef CONFIG_SECCOMP >> QemuOptsList *olist = qemu_find_opts_err("sandbox", NULL); >> if (olist) { > > Cool, this is processed before -sandbox, so threads can have their > affinity. However, it's also processed before -name debug-threads=on > which means that even though we try to set a thread name in 3/7, it's > effectively a dead code because name_threads from > util/qemu-thread-posix.c is still false. Could we shift things a bit? > E.g. like this: > > static void qemu_process_early_options(void) > { > qemu_opts_foreach(qemu_find_opts("name"), > parse_name, NULL, &error_fatal); > > object_option_foreach_add(object_create_pre_sandbox); Thanks for pointing that out. IMHO yes, there isn't too much magic to parse_name().
diff --git a/softmmu/vl.c b/softmmu/vl.c index aabd82e09a..252732cf5d 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -1761,6 +1761,27 @@ static void object_option_parse(const char *optarg) visit_free(v); } +/* + * Very early object creation, before the sandbox options have been activated. + */ +static bool object_create_pre_sandbox(const char *type) +{ + /* + * Objects should in general not get initialized "too early" without + * a reason. If you add one, state the reason in a comment! + */ + + /* + * Reason: -sandbox on,resourcecontrol=deny disallows setting CPU + * affinity of threads. + */ + if (g_str_equal(type, "thread-context")) { + return true; + } + + return false; +} + /* * Initial object creation happens before all other * QEMU data types are created. The majority of objects @@ -1775,6 +1796,11 @@ static bool object_create_early(const char *type) * add one, state the reason in a comment! */ + /* Reason: already created. */ + if (object_create_pre_sandbox(type)) { + return false; + } + /* Reason: property "chardev" */ if (g_str_equal(type, "rng-egd") || g_str_equal(type, "qtest")) { @@ -1895,7 +1921,7 @@ static void qemu_create_early_backends(void) */ static bool object_create_late(const char *type) { - return !object_create_early(type); + return !object_create_early(type) && !object_create_pre_sandbox(type); } static void qemu_create_late_backends(void) @@ -2365,6 +2391,8 @@ static int process_runstate_actions(void *opaque, QemuOpts *opts, Error **errp) static void qemu_process_early_options(void) { + object_option_foreach_add(object_create_pre_sandbox); + #ifdef CONFIG_SECCOMP QemuOptsList *olist = qemu_find_opts_err("sandbox", NULL); if (olist) {
Currently, there is no way to configure a CPU affinity inside QEMU when the sandbox option disables it for QEMU as a whole, for example, via: -sandbox enable=on,resourcecontrol=deny While ThreadContext objects can be created on the QEMU commandline and the CPU affinity can be configured externally via the thread-id, this is insufficient if a ThreadContext with a certain CPU affinity is already required during QEMU startup, before we can intercept QEMU and configure the CPU affinity. Blocking sched_setaffinity() was introduced in 24f8cdc57224 ("seccomp: add resourcecontrol argument to command line"), "to avoid any bigger of the process". However, we only care about once QEMU is running, not when the instance starting QEMU explicitly requests a certain CPU affinity on the QEMU comandline. Right now, for NUMA-aware preallocation of memory backends used for initial machine RAM, one has to: 1) Start QEMU with the memory-backend with "prealloc=off" 2) Pause QEMU before it starts the guest (-S) 3) Create ThreadContext, configure the CPU affinity using the thread-id 4) Configure the ThreadContext as "prealloc-context" of the memory backend 5) Trigger preallocation by setting "prealloc=on" To simplify this handling especially for initial machine RAM, allow creation of ThreadContext objects before parsing sandbox options, such that the CPU affinity requested on the QEMU commandline alongside the sandbox option can be set. As ThreadContext objects essentially only create a persistant context thread and set the CPU affinity, this is easily possible. With this change, we can create a ThreadContext with a CPU affinity on the QEMU commandline and use it for preallocation of memory backends glued to the machine (simplified example): qemu-system-x86_64 -m 1G \ -object thread-context,id=tc1,cpu-affinity=3-4 \ -object memory-backend-ram,id=pc.ram,size=1G,prealloc=on,prealloc-threads=2,prealloc-context=tc1 \ -machine memory-backend=pc.ram \ -S -monitor stdio -sandbox enable=on,resourcecontrol=deny And while we can query the current CPU affinity: (qemu) qom-get tc1 cpu-affinity [ 3, 4 ] We can no longer change it from QEMU directly: (qemu) qom-set tc1 cpu-affinity 1-2 Error: Setting CPU affinity failed: Operation not permitted Signed-off-by: David Hildenbrand <david@redhat.com> --- softmmu/vl.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-)