Message ID | 1554235048-3373-4-git-send-email-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/smt: Runtime SMT controls | expand |
>>> On 02.04.19 at 21:57, <andrew.cooper3@citrix.com> wrote: > Currently, a user can in combine the output of `xl info -n`, the APCI tables, > and some manual CPUID data to figure out which CPU numbers to feed into > `xen-hptool cpu-offline` to effectively disable SMT at runtime. > > A more convenient option is to teach Xen how to perform this action. > > First of all, extend XEN_SYSCTL_cpu_hotplug with two new operations. > Introduce new smt_{up,down}_helper() functions which wrap the > cpu_{up,down}_helper() helpers with logic which understands siblings based on > their APIC_ID. > > Add libxc stubs, and extend xen-hptool with smt-{enable,disable} options. > These are intended to be shorthands for a loop over cpu-{online,offline}. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Wei Liu <wei.liu2@citrix.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > > Slightly RFC. I'm not very happy with the contination situation, but -EBUSY > is the preexisting style and it seems like it is the only option from tasklet > context. Well, offloading the re-invocation to the caller isn't really nice. Looking at the code, is there any reason why couldn't use the usual -ERESTART / hypercall_create_continuation()? This would require a little bit of re-work, in particular to allow passing the vCPU into hypercall_create_continuation(), but beyond that I can't see any immediate obstacles. Though clearly I wouldn't make this a prereq requirement for the work here. > Is it intentional that we can actually online and offline processors beyond > maxcpu? This is a consequence of the cpu parking logic. I think so, yes. That's meant to be a boot time limit only imo. The runtime limit is nr_cpu_ids. > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -60,7 +60,7 @@ static bool __initdata opt_nosmp; > boolean_param("nosmp", opt_nosmp); > > /* maxcpus: maximum number of CPUs to activate. */ > -static unsigned int __initdata max_cpus; > +unsigned int max_cpus; > integer_param("maxcpus", max_cpus); As per above I don't think this change should be needed or wanted, but if so for whatever reason, wouldn't the variable better be __read_mostly? > --- a/xen/arch/x86/sysctl.c > +++ b/xen/arch/x86/sysctl.c > @@ -114,6 +114,92 @@ long cpu_down_helper(void *data) > return ret; > } > > +static long smt_up_helper(void *data) > +{ > + unsigned int cpu, sibling_mask = > + (1u << (boot_cpu_data.x86_num_siblings - 1)) - 1; I don't think this is quite right for higher than 2-thread configurations. In detect_extended_topology() terms, don't you simply mean (1u << ht_mask_width) - 1 here, i.e. just boot_cpu_data.x86_num_siblings - 1 (without any shifting)? > + int ret = 0; > + > + if ( !cpu_has_htt || !sibling_mask ) > + return -EOPNOTSUPP; Why not put the first part of the check right into the sysctl handler? > + opt_smt = true; Perhaps also bail early when the variable already has the designated value? And again perhaps right in the sysctl handler? > + for_each_present_cpu ( cpu ) > + { > + if ( cpu == 0 ) > + continue; Is this special case really needed? If so, perhaps worth a brief comment? > + if ( cpu >= max_cpus ) > + break; > + > + if ( x86_cpu_to_apicid[cpu] & sibling_mask ) > + ret = cpu_up_helper(_p(cpu)); Shouldn't this be restricted to CPUs a sibling of which is already online? And widened at the same time, to also online thread 0 if one of the other threads is already online? Also any reason you use _p() here but not in patch 2? > +static long smt_down_helper(void *data) > +{ > + unsigned int cpu, sibling_mask = > + (1u << (boot_cpu_data.x86_num_siblings - 1)) - 1; > + int ret = 0; > + > + if ( !cpu_has_htt || !sibling_mask ) > + return -EOPNOTSUPP; > + > + opt_smt = false; > + > + for_each_present_cpu ( cpu ) > + { > + if ( cpu == 0 ) > + continue; > + if ( cpu >= max_cpus ) > + break; > + > + if ( x86_cpu_to_apicid[cpu] & sibling_mask ) > + ret = cpu_down_helper(_p(cpu)); Similarly here, wouldn't you better skip this if it would offline the last thread of a core? I also notice that the two functions are extremely similar, and hence it might be worthwhile considering to fold them, with the caller controlling the behavior via the so far unused function parameter (at which point the related remark of mine on patch 2 would become inapplicable). > --- a/xen/include/public/sysctl.h > +++ b/xen/include/public/sysctl.h > @@ -246,8 +246,17 @@ struct xen_sysctl_get_pmstat { > struct xen_sysctl_cpu_hotplug { > /* IN variables */ > uint32_t cpu; /* Physical cpu. */ > + > + /* Single CPU enable/disable. */ > #define XEN_SYSCTL_CPU_HOTPLUG_ONLINE 0 > #define XEN_SYSCTL_CPU_HOTPLUG_OFFLINE 1 > + > + /* > + * SMT enable/disable. Caller must zero the 'cpu' field to begin, and > + * ignore it on completion. > + */ > +#define XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE 2 > +#define XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE 3 Is the "cpu" field constraint mentioned in the comment just a precaution? I can't see you encode anything into that field, or use it upon getting re-invoked. I assume that's because of the expectation that only actual onlining/offlining would potentially take long, while iterating over all present CPUs without further action ought to be fast enough. Jan
On 03/04/2019 10:33, Jan Beulich wrote: >>>> On 02.04.19 at 21:57, <andrew.cooper3@citrix.com> wrote: >> Currently, a user can in combine the output of `xl info -n`, the APCI tables, >> and some manual CPUID data to figure out which CPU numbers to feed into >> `xen-hptool cpu-offline` to effectively disable SMT at runtime. >> >> A more convenient option is to teach Xen how to perform this action. >> >> First of all, extend XEN_SYSCTL_cpu_hotplug with two new operations. >> Introduce new smt_{up,down}_helper() functions which wrap the >> cpu_{up,down}_helper() helpers with logic which understands siblings based on >> their APIC_ID. >> >> Add libxc stubs, and extend xen-hptool with smt-{enable,disable} options. >> These are intended to be shorthands for a loop over cpu-{online,offline}. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> CC: Jan Beulich <JBeulich@suse.com> >> CC: Wei Liu <wei.liu2@citrix.com> >> CC: Roger Pau Monné <roger.pau@citrix.com> >> >> Slightly RFC. I'm not very happy with the contination situation, but -EBUSY >> is the preexisting style and it seems like it is the only option from tasklet >> context. > Well, offloading the re-invocation to the caller isn't really nice. > Looking at the code, is there any reason why couldn't use > the usual -ERESTART / hypercall_create_continuation()? This > would require a little bit of re-work, in particular to allow > passing the vCPU into hypercall_create_continuation(), but > beyond that I can't see any immediate obstacles. Though > clearly I wouldn't make this a prereq requirement for the work > here. The problem isn't really the ERESTART. We could do some plumbing and make it work, but the real problem is that I can't stash the current cpu index in the sysctl data block across the continuation point. At the moment, the loop depends on, once all CPUs are in the correct state, getting through the for_each_present_cpu() loop without taking a further continuation. > >> Is it intentional that we can actually online and offline processors beyond >> maxcpu? This is a consequence of the cpu parking logic. > I think so, yes. That's meant to be a boot time limit only imo. > The runtime limit is nr_cpu_ids. > >> --- a/xen/arch/x86/setup.c >> +++ b/xen/arch/x86/setup.c >> @@ -60,7 +60,7 @@ static bool __initdata opt_nosmp; >> boolean_param("nosmp", opt_nosmp); >> >> /* maxcpus: maximum number of CPUs to activate. */ >> -static unsigned int __initdata max_cpus; >> +unsigned int max_cpus; >> integer_param("maxcpus", max_cpus); > As per above I don't think this change should be needed or > wanted, but if so for whatever reason, wouldn't the variable > better be __read_mostly? __read_mostly, yes, but as to whether the change is needed, that entirely depends on whether the change in semantics to maxcpus= was accidental or intentional. > >> --- a/xen/arch/x86/sysctl.c >> +++ b/xen/arch/x86/sysctl.c >> @@ -114,6 +114,92 @@ long cpu_down_helper(void *data) >> return ret; >> } >> >> +static long smt_up_helper(void *data) >> +{ >> + unsigned int cpu, sibling_mask = >> + (1u << (boot_cpu_data.x86_num_siblings - 1)) - 1; > I don't think this is quite right for higher than 2-thread configurations. > In detect_extended_topology() terms, don't you simply mean > (1u << ht_mask_width) - 1 here, i.e. just > boot_cpu_data.x86_num_siblings - 1 (without any shifting)? Good point, yes. > >> + int ret = 0; >> + >> + if ( !cpu_has_htt || !sibling_mask ) >> + return -EOPNOTSUPP; > Why not put the first part of the check right into the sysctl > handler? Can do. I think this is a side effect of how it developed. > >> + opt_smt = true; > Perhaps also bail early when the variable already has the > designated value? And again perhaps right in the sysctl > handler? That is not safe across continuations. While it would be a very silly thing to do, there could be two callers which are fighting over whether SMT is disabled or enabled. > >> + for_each_present_cpu ( cpu ) >> + { >> + if ( cpu == 0 ) >> + continue; > Is this special case really needed? If so, perhaps worth a brief > comment? Trying to down cpu 0 is a hard -EINVAL. > >> + if ( cpu >= max_cpus ) >> + break; >> + >> + if ( x86_cpu_to_apicid[cpu] & sibling_mask ) >> + ret = cpu_up_helper(_p(cpu)); > Shouldn't this be restricted to CPUs a sibling of which is already > online? And widened at the same time, to also online thread 0 > if one of the other threads is already online? Unfortunately, that turns into a rats nest very very quickly, which is why I gave up and simplified the semantics to strictly "this shall {of,off}line the nonzero siblings threads". This is a convenience for people wanting to do a one-time reconfiguration of the system, and indeed, how has multiple end user requests behind its coming into existence. Users who are already hotplugging aren't going to be interested in this functionality. As the usecases don't overlap, I went for the most simple logic. > Also any reason you use _p() here but not in patch 2? I thought I'd fixed patch 2 up, but I clearly hadn't > I also notice that the two functions are extremely similar, and > hence it might be worthwhile considering to fold them, with the > caller controlling the behavior via the so far unused function > parameter (at which point the related remark of mine on patch > 2 would become inapplicable). By passing the plug boolean in via data? Yes I suppose they are rather more similar than they started out. > >> --- a/xen/include/public/sysctl.h >> +++ b/xen/include/public/sysctl.h >> @@ -246,8 +246,17 @@ struct xen_sysctl_get_pmstat { >> struct xen_sysctl_cpu_hotplug { >> /* IN variables */ >> uint32_t cpu; /* Physical cpu. */ >> + >> + /* Single CPU enable/disable. */ >> #define XEN_SYSCTL_CPU_HOTPLUG_ONLINE 0 >> #define XEN_SYSCTL_CPU_HOTPLUG_OFFLINE 1 >> + >> + /* >> + * SMT enable/disable. Caller must zero the 'cpu' field to begin, and >> + * ignore it on completion. >> + */ >> +#define XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE 2 >> +#define XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE 3 > Is the "cpu" field constraint mentioned in the comment just a > precaution? I can't see you encode anything into that field, or > use it upon getting re-invoked. I assume that's because of the > expectation that only actual onlining/offlining would potentially > take long, while iterating over all present CPUs without further > action ought to be fast enough. Ah - that was stale from before I encountered the "fun" of continuations from tasklet context. I would prefer to find a better way, but short of doing a full vcpu context switch, I don't see an option. ~Andrew
>>> On 03.04.19 at 12:17, <andrew.cooper3@citrix.com> wrote: > On 03/04/2019 10:33, Jan Beulich wrote: >>>>> On 02.04.19 at 21:57, <andrew.cooper3@citrix.com> wrote: >>> Slightly RFC. I'm not very happy with the contination situation, but -EBUSY >>> is the preexisting style and it seems like it is the only option from tasklet >>> context. >> Well, offloading the re-invocation to the caller isn't really nice. >> Looking at the code, is there any reason why couldn't use >> the usual -ERESTART / hypercall_create_continuation()? This >> would require a little bit of re-work, in particular to allow >> passing the vCPU into hypercall_create_continuation(), but >> beyond that I can't see any immediate obstacles. Though >> clearly I wouldn't make this a prereq requirement for the work >> here. > > The problem isn't really the ERESTART. We could do some plumbing and > make it work, but the real problem is that I can't stash the current cpu > index in the sysctl data block across the continuation point. > > At the moment, the loop depends on, once all CPUs are in the correct > state, getting through the for_each_present_cpu() loop without taking a > further continuation. But these are two orthogonal things: One is how to invoke the continuation, and the other is where the continuation is to resume from. I think the former is more important to address, as it affects how the tools side code needs to look like. >>> Is it intentional that we can actually online and offline processors beyond >>> maxcpu? This is a consequence of the cpu parking logic. >> I think so, yes. That's meant to be a boot time limit only imo. >> The runtime limit is nr_cpu_ids. >> >>> --- a/xen/arch/x86/setup.c >>> +++ b/xen/arch/x86/setup.c >>> @@ -60,7 +60,7 @@ static bool __initdata opt_nosmp; >>> boolean_param("nosmp", opt_nosmp); >>> >>> /* maxcpus: maximum number of CPUs to activate. */ >>> -static unsigned int __initdata max_cpus; >>> +unsigned int max_cpus; >>> integer_param("maxcpus", max_cpus); >> As per above I don't think this change should be needed or >> wanted, but if so for whatever reason, wouldn't the variable >> better be __read_mostly? > > __read_mostly, yes, but as to whether the change is needed, that > entirely depends on whether the change in semantics to maxcpus= was > accidental or intentional. Well, as said, I did consider this while putting together the parking series, and I therefore consider it intentional. >>> + opt_smt = true; >> Perhaps also bail early when the variable already has the >> designated value? And again perhaps right in the sysctl >> handler? > > That is not safe across continuations. > > While it would be a very silly thing to do, there could be two callers > which are fighting over whether SMT is disabled or enabled. Oh, and actually not just that: The continuation then wouldn't do anything anymore (unless you first reverted the setting, which in turn wouldn't be right in case any other CPU activity would occur in parallel, while the continuation is still pending). >>> + for_each_present_cpu ( cpu ) >>> + { >>> + if ( cpu == 0 ) >>> + continue; >> Is this special case really needed? If so, perhaps worth a brief >> comment? > > Trying to down cpu 0 is a hard -EINVAL. But here we're on the CPU-up path. Plus, for eventually supporting the offlining of CPU 0, it would feel slightly better if you used smp_processor_id() here. >>> + if ( cpu >= max_cpus ) >>> + break; >>> + >>> + if ( x86_cpu_to_apicid[cpu] & sibling_mask ) >>> + ret = cpu_up_helper(_p(cpu)); >> Shouldn't this be restricted to CPUs a sibling of which is already >> online? And widened at the same time, to also online thread 0 >> if one of the other threads is already online? > > Unfortunately, that turns into a rats nest very very quickly, which is > why I gave up and simplified the semantics to strictly "this shall > {of,off}line the nonzero siblings threads". Okay, if that's the intention, then I can certainly live with this. But it needs to be called out at the very least in the public header. (It might be worthwhile setting up a flag right away for "full" behavior, but leave acting upon it unimplemented). It also wouldn't hurt if the patch description already set expectations accordingly. Then again, considering your "maxcpus=" related question, it would certainly be odd for people to see non-zero threads come online here when they've intentionally left entire cores or nodes offline for whatever reason. Arguably that's not something to expect people would commonly do, and hence it may not be worth wasting meaningful extra effort on. But as above, and such "oddities" should be spelled out, such that it can be recognized that they're not oversights. >> I also notice that the two functions are extremely similar, and >> hence it might be worthwhile considering to fold them, with the >> caller controlling the behavior via the so far unused function >> parameter (at which point the related remark of mine on patch >> 2 would become inapplicable). > > By passing the plug boolean in via data? Yes. >>> --- a/xen/include/public/sysctl.h >>> +++ b/xen/include/public/sysctl.h >>> @@ -246,8 +246,17 @@ struct xen_sysctl_get_pmstat { >>> struct xen_sysctl_cpu_hotplug { >>> /* IN variables */ >>> uint32_t cpu; /* Physical cpu. */ >>> + >>> + /* Single CPU enable/disable. */ >>> #define XEN_SYSCTL_CPU_HOTPLUG_ONLINE 0 >>> #define XEN_SYSCTL_CPU_HOTPLUG_OFFLINE 1 >>> + >>> + /* >>> + * SMT enable/disable. Caller must zero the 'cpu' field to begin, and >>> + * ignore it on completion. >>> + */ >>> +#define XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE 2 >>> +#define XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE 3 >> Is the "cpu" field constraint mentioned in the comment just a >> precaution? I can't see you encode anything into that field, or >> use it upon getting re-invoked. I assume that's because of the >> expectation that only actual onlining/offlining would potentially >> take long, while iterating over all present CPUs without further >> action ought to be fast enough. > > Ah - that was stale from before I encountered the "fun" of continuations > from tasklet context. > > I would prefer to find a better way, but short of doing a full vcpu > context switch, I don't see an option. And I don't think there's a strong need. It should just be made clear (again in the description) that the remark here is just a precaution at this time, unless you want to drop it altogether. One thing you may want to do though: /* Tolerate already-online siblings. */ if ( ret == -EEXIST ) { ret = 0; continue; } to bypass the general_preempt_check() in that case, such that you can guarantee making forward progress. Jan
On 03/04/2019 11:44, Jan Beulich wrote: >>>> On 03.04.19 at 12:17, <andrew.cooper3@citrix.com> wrote: >> On 03/04/2019 10:33, Jan Beulich wrote: >>>>>> On 02.04.19 at 21:57, <andrew.cooper3@citrix.com> wrote: >>>> Slightly RFC. I'm not very happy with the contination situation, but -EBUSY >>>> is the preexisting style and it seems like it is the only option from tasklet >>>> context. >>> Well, offloading the re-invocation to the caller isn't really nice. >>> Looking at the code, is there any reason why couldn't use >>> the usual -ERESTART / hypercall_create_continuation()? This >>> would require a little bit of re-work, in particular to allow >>> passing the vCPU into hypercall_create_continuation(), but >>> beyond that I can't see any immediate obstacles. Though >>> clearly I wouldn't make this a prereq requirement for the work >>> here. >> The problem isn't really the ERESTART. We could do some plumbing and >> make it work, but the real problem is that I can't stash the current cpu >> index in the sysctl data block across the continuation point. >> >> At the moment, the loop depends on, once all CPUs are in the correct >> state, getting through the for_each_present_cpu() loop without taking a >> further continuation. > But these are two orthogonal things: One is how to invoke the > continuation, and the other is where the continuation is to > resume from. I think the former is more important to address, > as it affects how the tools side code needs to look like. Right, but -EBUSY is consistent with how the single online/offline ops function at the moment, which is why I reused it here. > >>>> + for_each_present_cpu ( cpu ) >>>> + { >>>> + if ( cpu == 0 ) >>>> + continue; >>> Is this special case really needed? If so, perhaps worth a brief >>> comment? >> Trying to down cpu 0 is a hard -EINVAL. > But here we're on the CPU-up path. Plus, for eventually supporting > the offlining of CPU 0, it would feel slightly better if you used > smp_processor_id() here. Are there any processors where you can actually take CPU 0 offline? Its certainly not possible on any Intel or AMD CPUs. While I can appreciate the theoretical end goal, it isn't a reality and I see no signs of that changing. Xen very definitely cannot take CPU 0 offline, nor can hardware, and I don't see any value in jumping through hoops for an end goal which doesn't exist. >>>> + if ( cpu >= max_cpus ) >>>> + break; >>>> + >>>> + if ( x86_cpu_to_apicid[cpu] & sibling_mask ) >>>> + ret = cpu_up_helper(_p(cpu)); >>> Shouldn't this be restricted to CPUs a sibling of which is already >>> online? And widened at the same time, to also online thread 0 >>> if one of the other threads is already online? >> Unfortunately, that turns into a rats nest very very quickly, which is >> why I gave up and simplified the semantics to strictly "this shall >> {of,off}line the nonzero siblings threads". > Okay, if that's the intention, then I can certainly live with this. > But it needs to be called out at the very least in the public header. > (It might be worthwhile setting up a flag right away for "full" > behavior, but leave acting upon it unimplemented). It also wouldn't > hurt if the patch description already set expectations accordingly. > > Then again, considering your "maxcpus=" related question, > it would certainly be odd for people to see non-zero threads > come online here when they've intentionally left entire cores > or nodes offline for whatever reason. Arguably that's not > something to expect people would commonly do, and hence it > may not be worth wasting meaningful extra effort on. But as > above, and such "oddities" should be spelled out, such that it > can be recognized that they're not oversights. And we come back to Xen's perennial problem of having no documentation. I'll see if I can find some time to put some Sphinx/RST together for this. As for the maxcpus behaviour, I think that is sufficiently niche to debugging circumstances only that perhaps we can ignore it. I certainly don't expect to see maxcpus= used in production. ~Andrew
>>> On 03.04.19 at 13:33, <andrew.cooper3@citrix.com> wrote: > On 03/04/2019 11:44, Jan Beulich wrote: >>>>> On 03.04.19 at 12:17, <andrew.cooper3@citrix.com> wrote: >>> On 03/04/2019 10:33, Jan Beulich wrote: >>>>>>> On 02.04.19 at 21:57, <andrew.cooper3@citrix.com> wrote: >>>>> Slightly RFC. I'm not very happy with the contination situation, but -EBUSY >>>>> is the preexisting style and it seems like it is the only option from tasklet >>>>> context. >>>> Well, offloading the re-invocation to the caller isn't really nice. >>>> Looking at the code, is there any reason why couldn't use >>>> the usual -ERESTART / hypercall_create_continuation()? This >>>> would require a little bit of re-work, in particular to allow >>>> passing the vCPU into hypercall_create_continuation(), but >>>> beyond that I can't see any immediate obstacles. Though >>>> clearly I wouldn't make this a prereq requirement for the work >>>> here. >>> The problem isn't really the ERESTART. We could do some plumbing and >>> make it work, but the real problem is that I can't stash the current cpu >>> index in the sysctl data block across the continuation point. >>> >>> At the moment, the loop depends on, once all CPUs are in the correct >>> state, getting through the for_each_present_cpu() loop without taking a >>> further continuation. >> But these are two orthogonal things: One is how to invoke the >> continuation, and the other is where the continuation is to >> resume from. I think the former is more important to address, >> as it affects how the tools side code needs to look like. > > Right, but -EBUSY is consistent with how the single online/offline ops > function at the moment, which is why I reused it here. Right, and which also is why I don't think this has to be taken care of right now. >>>>> + for_each_present_cpu ( cpu ) >>>>> + { >>>>> + if ( cpu == 0 ) >>>>> + continue; >>>> Is this special case really needed? If so, perhaps worth a brief >>>> comment? >>> Trying to down cpu 0 is a hard -EINVAL. >> But here we're on the CPU-up path. Plus, for eventually supporting >> the offlining of CPU 0, it would feel slightly better if you used >> smp_processor_id() here. > > Are there any processors where you can actually take CPU 0 offline? Its > certainly not possible on any Intel or AMD CPUs. > > While I can appreciate the theoretical end goal, it isn't a reality and > I see no signs of that changing. Xen very definitely cannot take CPU 0 > offline, nor can hardware, and I don't see any value in jumping through > hoops for an end goal which doesn't exist. Interesting. Why was it then that x86 Linux took quite some steps to make it possible (see BOOTPARAM_HOTPLUG_CPU0 Kconfig option as a possible anchor to locate pieces, which even has a description of conditions that need to be met in order for this to be possible)? IOW I'd appreciate clarification on what it is exactly that you think prevents offlining CPU0 (from a pure hardware / firmware perspective), besides the PIC and suspend/resume aspects (neither of which has to be in use) mentioned there. Jan
>>> On 03.04.19 at 12:17, <andrew.cooper3@citrix.com> wrote: > On 03/04/2019 10:33, Jan Beulich wrote: >>>>> On 02.04.19 at 21:57, <andrew.cooper3@citrix.com> wrote: >>> + if ( x86_cpu_to_apicid[cpu] & sibling_mask ) >>> + ret = cpu_up_helper(_p(cpu)); >> Shouldn't this be restricted to CPUs a sibling of which is already >> online? And widened at the same time, to also online thread 0 >> if one of the other threads is already online? > > Unfortunately, that turns into a rats nest very very quickly, which is > why I gave up and simplified the semantics to strictly "this shall > {of,off}line the nonzero siblings threads". > > This is a convenience for people wanting to do a one-time > reconfiguration of the system, and indeed, how has multiple end user > requests behind its coming into existence. Users who are already > hotplugging aren't going to be interested in this functionality. I'd like to come back to this: I assume by "hotplugging" you really mean hot {on,off}lining. Thinking about the actual physical hotplug case, the flow of events is (if I'm not mistaken) for the Dom0 kernel to issue XENPF_cpu_hotadd (in response to respective ACPI events), which then would still need to be followed by xen-hptool invocations to actually bring up the new cores/threads. While we invoke remove_siblinginfo() from __cpu_disable(), I don't see why we shouldn't be able to clone cpu_sibling_mask into an instance not getting altered when a CPU gets parked. This could then be used here, albeit the context of me thinking about this again is my intended attempt to make core parking work with opt_smt set to false, seeing that Jürgen had pointed out this case as not working. I further wonder whether these new sysctl-s should be constrained to the park_offline_cpus == true case. I don't think it was the intention to bring up/down secondary compute units of AMD Fam15 CPUs, and I also don't think fiddling with AMD Fam17 secondary threads is really intended/necessary here. I wouldn't be worried much about the other, opt_mce, dependency: People shouldn't use "mce=0" on production systems anyway; we should perhaps name this an unsupported configuration. Jan
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index a3628e5..49a6b2a 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1854,6 +1854,8 @@ int xc_pm_reset_cxstat(xc_interface *xch, int cpuid); int xc_cpu_online(xc_interface *xch, int cpu); int xc_cpu_offline(xc_interface *xch, int cpu); +int xc_smt_enable(xc_interface *xch); +int xc_smt_disable(xc_interface *xch); /* * cpufreq para name of this structure named diff --git a/tools/libxc/xc_cpu_hotplug.c b/tools/libxc/xc_cpu_hotplug.c index 58c2a0f..2ea9825 100644 --- a/tools/libxc/xc_cpu_hotplug.c +++ b/tools/libxc/xc_cpu_hotplug.c @@ -46,3 +46,29 @@ int xc_cpu_offline(xc_interface *xch, int cpu) return ret; } +int xc_smt_enable(xc_interface *xch) +{ + DECLARE_SYSCTL; + int ret; + + sysctl.cmd = XEN_SYSCTL_cpu_hotplug; + sysctl.u.cpu_hotplug.cpu = 0; + sysctl.u.cpu_hotplug.op = XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE; + ret = xc_sysctl(xch, &sysctl); + + return ret; +} + +int xc_smt_disable(xc_interface *xch) +{ + DECLARE_SYSCTL; + int ret; + + sysctl.cmd = XEN_SYSCTL_cpu_hotplug; + sysctl.u.cpu_hotplug.cpu = 0; + sysctl.u.cpu_hotplug.op = XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE; + ret = xc_sysctl(xch, &sysctl); + + return ret; +} + diff --git a/tools/misc/xen-hptool.c b/tools/misc/xen-hptool.c index 40cd966..6e27d9c 100644 --- a/tools/misc/xen-hptool.c +++ b/tools/misc/xen-hptool.c @@ -19,6 +19,8 @@ void show_help(void) " mem-online <mfn> online MEMORY <mfn>\n" " mem-offline <mfn> offline MEMORY <mfn>\n" " mem-status <mfn> query Memory status<mfn>\n" + " smt-enable onlines all SMT threads\n" + " smt-disable offlines all SMT threads\n" ); } @@ -304,6 +306,58 @@ static int hp_cpu_offline_func(int argc, char *argv[]) return ret; } +static int main_smt_enable(int argc, char *argv[]) +{ + int ret; + + if ( argc ) + { + show_help(); + return -1; + } + + for ( ;; ) + { + ret = xc_smt_enable(xch); + if ( (ret >= 0) || (errno != EBUSY) ) + break; + } + + if ( ret < 0 ) + fprintf(stderr, "Unable to enable SMT: errno %d, %s\n", + errno, strerror(errno)); + else + printf("Enabled SMT\n"); + + return ret; +} + +static int main_smt_disable(int argc, char *argv[]) +{ + int ret; + + if ( argc ) + { + show_help(); + return -1; + } + + for ( ;; ) + { + ret = xc_smt_disable(xch); + if ( (ret >= 0) || (errno != EBUSY) ) + break; + } + + if ( ret < 0 ) + fprintf(stderr, "Unable to disable SMT: errno %d, %s\n", + errno, strerror(errno)); + else + printf("Disabled SMT\n"); + + return ret; +} + struct { const char *name; int (*function)(int argc, char *argv[]); @@ -314,6 +368,8 @@ struct { { "mem-status", hp_mem_query_func}, { "mem-online", hp_mem_online_func}, { "mem-offline", hp_mem_offline_func}, + { "smt-enable", main_smt_enable }, + { "smt-disable", main_smt_disable }, }; diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 3440794..af245eb 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -60,7 +60,7 @@ static bool __initdata opt_nosmp; boolean_param("nosmp", opt_nosmp); /* maxcpus: maximum number of CPUs to activate. */ -static unsigned int __initdata max_cpus; +unsigned int max_cpus; integer_param("maxcpus", max_cpus); int8_t __read_mostly opt_smt = -1; diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c index b3cc4b5..0e2e409 100644 --- a/xen/arch/x86/sysctl.c +++ b/xen/arch/x86/sysctl.c @@ -114,6 +114,92 @@ long cpu_down_helper(void *data) return ret; } +static long smt_up_helper(void *data) +{ + unsigned int cpu, sibling_mask = + (1u << (boot_cpu_data.x86_num_siblings - 1)) - 1; + int ret = 0; + + if ( !cpu_has_htt || !sibling_mask ) + return -EOPNOTSUPP; + + opt_smt = true; + + for_each_present_cpu ( cpu ) + { + if ( cpu == 0 ) + continue; + if ( cpu >= max_cpus ) + break; + + if ( x86_cpu_to_apicid[cpu] & sibling_mask ) + ret = cpu_up_helper(_p(cpu)); + + /* Tolerate already-online siblings. */ + if ( ret == -EEXIST ) + ret = 0; + + if ( ret ) + break; + + if ( general_preempt_check() ) + { + /* In tasklet context - can't create a contination. */ + ret = -EBUSY; + break; + } + } + + if ( !ret ) + printk(XENLOG_INFO "SMT enabled - online CPUs {%*pbl}\n", + nr_cpu_ids, cpumask_bits(&cpu_online_map)); + + return ret; +} + +static long smt_down_helper(void *data) +{ + unsigned int cpu, sibling_mask = + (1u << (boot_cpu_data.x86_num_siblings - 1)) - 1; + int ret = 0; + + if ( !cpu_has_htt || !sibling_mask ) + return -EOPNOTSUPP; + + opt_smt = false; + + for_each_present_cpu ( cpu ) + { + if ( cpu == 0 ) + continue; + if ( cpu >= max_cpus ) + break; + + if ( x86_cpu_to_apicid[cpu] & sibling_mask ) + ret = cpu_down_helper(_p(cpu)); + + /* Tolerate already-offline siblings. */ + if ( ret == -EEXIST ) + ret = 0; + + if ( ret ) + break; + + if ( general_preempt_check() ) + { + /* In tasklet context - can't create a contination. */ + ret = -EBUSY; + break; + } + } + + if ( !ret ) + printk(XENLOG_INFO "SMT disabled - online CPUs {%*pbl}\n", + nr_cpu_ids, cpumask_bits(&cpu_online_map)); + + return ret; +} + void arch_do_physinfo(struct xen_sysctl_physinfo *pi) { memcpy(pi->hw_cap, boot_cpu_data.x86_capability, @@ -155,6 +241,16 @@ long arch_do_sysctl( hcpu = (void *)(unsigned long)cpu; break; + case XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE: + plug = true; + fn = smt_up_helper; + break; + + case XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE: + plug = false; + fn = smt_down_helper; + break; + default: ret = -EOPNOTSUPP; break; diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index cef3ffb..8ffcffe 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -154,6 +154,7 @@ extern void (*ctxt_switch_masking)(const struct vcpu *next); extern bool_t opt_cpu_info; extern u32 cpuid_ext_features; extern u64 trampoline_misc_enable_off; +extern unsigned int max_cpus; /* Maximum width of physical addresses supported by the hardware. */ extern unsigned int paddr_bits; diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index c49b4dc..5c43aed 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -246,8 +246,17 @@ struct xen_sysctl_get_pmstat { struct xen_sysctl_cpu_hotplug { /* IN variables */ uint32_t cpu; /* Physical cpu. */ + + /* Single CPU enable/disable. */ #define XEN_SYSCTL_CPU_HOTPLUG_ONLINE 0 #define XEN_SYSCTL_CPU_HOTPLUG_OFFLINE 1 + + /* + * SMT enable/disable. Caller must zero the 'cpu' field to begin, and + * ignore it on completion. + */ +#define XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE 2 +#define XEN_SYSCTL_CPU_HOTPLUG_SMT_DISABLE 3 uint32_t op; /* hotplug opcode */ };
Currently, a user can in combine the output of `xl info -n`, the APCI tables, and some manual CPUID data to figure out which CPU numbers to feed into `xen-hptool cpu-offline` to effectively disable SMT at runtime. A more convenient option is to teach Xen how to perform this action. First of all, extend XEN_SYSCTL_cpu_hotplug with two new operations. Introduce new smt_{up,down}_helper() functions which wrap the cpu_{up,down}_helper() helpers with logic which understands siblings based on their APIC_ID. Add libxc stubs, and extend xen-hptool with smt-{enable,disable} options. These are intended to be shorthands for a loop over cpu-{online,offline}. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Roger Pau Monné <roger.pau@citrix.com> Slightly RFC. I'm not very happy with the contination situation, but -EBUSY is the preexisting style and it seems like it is the only option from tasklet context. Is it intentional that we can actually online and offline processors beyond maxcpu? This is a consequence of the cpu parking logic. --- tools/libxc/include/xenctrl.h | 2 + tools/libxc/xc_cpu_hotplug.c | 26 +++++++++++ tools/misc/xen-hptool.c | 56 ++++++++++++++++++++++++ xen/arch/x86/setup.c | 2 +- xen/arch/x86/sysctl.c | 96 +++++++++++++++++++++++++++++++++++++++++ xen/include/asm-x86/processor.h | 1 + xen/include/public/sysctl.h | 9 ++++ 7 files changed, 191 insertions(+), 1 deletion(-)