Message ID | 20190930100900.660-1-jgross@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
On 30.09.2019 12:09, Juergen Gross wrote: > Add documentation for the new "sched-gran" hypervisor boot parameter. > > Signed-off-by: Juergen Gross <jgross@suse.com> > --- > docs/misc/xen-command-line.pandoc | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc > index fc64429064..c855246050 100644 > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -1782,6 +1782,27 @@ Set the timeslice of the credit1 scheduler, in milliseconds. The > default is 30ms. Reasonable values may include 10, 5, or even 1 for > very latency-sensitive workloads. > > +### sched-gran (x86) > +> `= cpu | core | socket` > + > +> Default: `sched-gran=cpu` > + > +Set the scheduling granularity. In case the granularity is larger than 1 (e.g. > +`core`on a SMT-enabled system, or `socket`) multiple vcpus are assigned > +statically to a "scheduling unit" which will then be subject to scheduling. > +This assignment of vcpus to scheduling units is fixed. > + > +`cpu`: Vcpus will be scheduled individually on single cpus. > + > +`core`: As many vcpus as there are hyperthreads on a physical core are > +scheduled together on a physical core. > + > +`socket`: As many vcpus as there are hyperthreads on a physical sockets are > +scheduled together on a physical socket. I'd prefer if this didn't end up Intel-centric; ideally it also wouldn't be x86-specific. AMD has introduced hyperthreading in Fam17 only; Fam15 used "compute units", grouping together "cores". Internally the Intel side "core vs hyperthread" is represented in the same variables (cpu_sibling_mask in particular) as the AMD side "compute unit vs core". Therefore it may be better to talk here about e.g. "smallest topological sub-unit" and only say "e.g. a hyperthread to make a connection to common x86 / Intel terminology". Of course the AMD side alternative use of the variables also renders the actual command line option "sched-gran=core" not overly fortunate. Perhaps we'd want to also use more abstract terms here, e.g. topological "levels"? > +Note: a value other than `cpu` will result in rejecting a runtime modification > +of the "smt" setting. Perhaps add "attempt" here? Jan
On 30.09.19 12:25, Jan Beulich wrote: > On 30.09.2019 12:09, Juergen Gross wrote: >> Add documentation for the new "sched-gran" hypervisor boot parameter. >> >> Signed-off-by: Juergen Gross <jgross@suse.com> >> --- >> docs/misc/xen-command-line.pandoc | 21 +++++++++++++++++++++ >> 1 file changed, 21 insertions(+) >> >> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc >> index fc64429064..c855246050 100644 >> --- a/docs/misc/xen-command-line.pandoc >> +++ b/docs/misc/xen-command-line.pandoc >> @@ -1782,6 +1782,27 @@ Set the timeslice of the credit1 scheduler, in milliseconds. The >> default is 30ms. Reasonable values may include 10, 5, or even 1 for >> very latency-sensitive workloads. >> >> +### sched-gran (x86) >> +> `= cpu | core | socket` >> + >> +> Default: `sched-gran=cpu` >> + >> +Set the scheduling granularity. In case the granularity is larger than 1 (e.g. >> +`core`on a SMT-enabled system, or `socket`) multiple vcpus are assigned >> +statically to a "scheduling unit" which will then be subject to scheduling. >> +This assignment of vcpus to scheduling units is fixed. >> + >> +`cpu`: Vcpus will be scheduled individually on single cpus. >> + >> +`core`: As many vcpus as there are hyperthreads on a physical core are >> +scheduled together on a physical core. >> + >> +`socket`: As many vcpus as there are hyperthreads on a physical sockets are >> +scheduled together on a physical socket. > > I'd prefer if this didn't end up Intel-centric; ideally it also wouldn't be > x86-specific. AMD has introduced hyperthreading in Fam17 only; Fam15 used > "compute units", grouping together "cores". Internally the Intel side > "core vs hyperthread" is represented in the same variables (cpu_sibling_mask > in particular) as the AMD side "compute unit vs core". Yes, it is a mess. > Therefore it may be better to talk here about e.g. "smallest topological > sub-unit" and only say "e.g. a hyperthread to make a connection to common > x86 / Intel terminology". Of course the AMD side alternative use of the > variables also renders the actual command line option "sched-gran=core" > not overly fortunate. Perhaps we'd want to also use more abstract terms > here, e.g. topological "levels"? I think regarding usage of "hyperthreads" I'll go with: +`cpu`: Vcpus will be scheduled individually on single cpus (e.g. a + hyperthread using x86/Intel terminology) + + `core`: As many vcpus as there are cpus on a physical core are + scheduled together on a physical core. ... I think using "core" is fine. We have it in multiple places in the hypervisor which are _not_ specific to Intel. And "core-scheduling" is a well-known buzzword already. > >> +Note: a value other than `cpu` will result in rejecting a runtime modification >> +of the "smt" setting. > > Perhaps add "attempt" here? Yes. Juergen
On 30.09.2019 12:51, Jürgen Groß wrote: > On 30.09.19 12:25, Jan Beulich wrote: >> On 30.09.2019 12:09, Juergen Gross wrote: >>> Add documentation for the new "sched-gran" hypervisor boot parameter. >>> >>> Signed-off-by: Juergen Gross <jgross@suse.com> >>> --- >>> docs/misc/xen-command-line.pandoc | 21 +++++++++++++++++++++ >>> 1 file changed, 21 insertions(+) >>> >>> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc >>> index fc64429064..c855246050 100644 >>> --- a/docs/misc/xen-command-line.pandoc >>> +++ b/docs/misc/xen-command-line.pandoc >>> @@ -1782,6 +1782,27 @@ Set the timeslice of the credit1 scheduler, in milliseconds. The >>> default is 30ms. Reasonable values may include 10, 5, or even 1 for >>> very latency-sensitive workloads. >>> >>> +### sched-gran (x86) >>> +> `= cpu | core | socket` >>> + >>> +> Default: `sched-gran=cpu` >>> + >>> +Set the scheduling granularity. In case the granularity is larger than 1 (e.g. >>> +`core`on a SMT-enabled system, or `socket`) multiple vcpus are assigned >>> +statically to a "scheduling unit" which will then be subject to scheduling. >>> +This assignment of vcpus to scheduling units is fixed. >>> + >>> +`cpu`: Vcpus will be scheduled individually on single cpus. >>> + >>> +`core`: As many vcpus as there are hyperthreads on a physical core are >>> +scheduled together on a physical core. >>> + >>> +`socket`: As many vcpus as there are hyperthreads on a physical sockets are >>> +scheduled together on a physical socket. >> >> I'd prefer if this didn't end up Intel-centric; ideally it also wouldn't be >> x86-specific. AMD has introduced hyperthreading in Fam17 only; Fam15 used >> "compute units", grouping together "cores". Internally the Intel side >> "core vs hyperthread" is represented in the same variables (cpu_sibling_mask >> in particular) as the AMD side "compute unit vs core". > > Yes, it is a mess. > >> Therefore it may be better to talk here about e.g. "smallest topological >> sub-unit" and only say "e.g. a hyperthread to make a connection to common >> x86 / Intel terminology". Of course the AMD side alternative use of the >> variables also renders the actual command line option "sched-gran=core" >> not overly fortunate. Perhaps we'd want to also use more abstract terms >> here, e.g. topological "levels"? > > I think regarding usage of "hyperthreads" I'll go with: > > +`cpu`: Vcpus will be scheduled individually on single cpus (e.g. a > + hyperthread using x86/Intel terminology) > + > + `core`: As many vcpus as there are cpus on a physical core are > + scheduled together on a physical core. > ... > > I think using "core" is fine. We have it in multiple places in the > hypervisor which are _not_ specific to Intel. Well, what we have in hypervisor sources is one thing - we can settle on any convention we want there. It's the user (admin) interface (i.e. the command line option name and description here) which we may want to be a little more careful with. But yes, I can see how we use "core" already in similar contexts in the command line option doc, first and foremost on "credit2_runqueue". (In retrospect I think this might have been a mistake though.) > And "core-scheduling" is a well-known buzzword already. Let me not get started on buzzwords ;-) Jan
On 30.09.19 13:02, Jan Beulich wrote: > On 30.09.2019 12:51, Jürgen Groß wrote: >> On 30.09.19 12:25, Jan Beulich wrote: >>> On 30.09.2019 12:09, Juergen Gross wrote: >>>> Add documentation for the new "sched-gran" hypervisor boot parameter. >>>> >>>> Signed-off-by: Juergen Gross <jgross@suse.com> >>>> --- >>>> docs/misc/xen-command-line.pandoc | 21 +++++++++++++++++++++ >>>> 1 file changed, 21 insertions(+) >>>> >>>> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc >>>> index fc64429064..c855246050 100644 >>>> --- a/docs/misc/xen-command-line.pandoc >>>> +++ b/docs/misc/xen-command-line.pandoc >>>> @@ -1782,6 +1782,27 @@ Set the timeslice of the credit1 scheduler, in milliseconds. The >>>> default is 30ms. Reasonable values may include 10, 5, or even 1 for >>>> very latency-sensitive workloads. >>>> >>>> +### sched-gran (x86) >>>> +> `= cpu | core | socket` >>>> + >>>> +> Default: `sched-gran=cpu` >>>> + >>>> +Set the scheduling granularity. In case the granularity is larger than 1 (e.g. >>>> +`core`on a SMT-enabled system, or `socket`) multiple vcpus are assigned >>>> +statically to a "scheduling unit" which will then be subject to scheduling. >>>> +This assignment of vcpus to scheduling units is fixed. >>>> + >>>> +`cpu`: Vcpus will be scheduled individually on single cpus. >>>> + >>>> +`core`: As many vcpus as there are hyperthreads on a physical core are >>>> +scheduled together on a physical core. >>>> + >>>> +`socket`: As many vcpus as there are hyperthreads on a physical sockets are >>>> +scheduled together on a physical socket. >>> >>> I'd prefer if this didn't end up Intel-centric; ideally it also wouldn't be >>> x86-specific. AMD has introduced hyperthreading in Fam17 only; Fam15 used >>> "compute units", grouping together "cores". Internally the Intel side >>> "core vs hyperthread" is represented in the same variables (cpu_sibling_mask >>> in particular) as the AMD side "compute unit vs core". >> >> Yes, it is a mess. >> >>> Therefore it may be better to talk here about e.g. "smallest topological >>> sub-unit" and only say "e.g. a hyperthread to make a connection to common >>> x86 / Intel terminology". Of course the AMD side alternative use of the >>> variables also renders the actual command line option "sched-gran=core" >>> not overly fortunate. Perhaps we'd want to also use more abstract terms >>> here, e.g. topological "levels"? >> >> I think regarding usage of "hyperthreads" I'll go with: >> >> +`cpu`: Vcpus will be scheduled individually on single cpus (e.g. a >> + hyperthread using x86/Intel terminology) >> + >> + `core`: As many vcpus as there are cpus on a physical core are >> + scheduled together on a physical core. >> ... >> >> I think using "core" is fine. We have it in multiple places in the >> hypervisor which are _not_ specific to Intel. > > Well, what we have in hypervisor sources is one thing - we can > settle on any convention we want there. It's the user (admin) > interface (i.e. the command line option name and description > here) which we may want to be a little more careful with. But > yes, I can see how we use "core" already in similar contexts > in the command line option doc, first and foremost on > "credit2_runqueue". (In retrospect I think this might have been > a mistake though.) So what do you suggest? <Irony on> "topology-level-just-above-the-smallest-topological-sub-unit"? <Irony-off> I can't think of any sensible terminology not resulting in something which is much harder to understand than "core". And we are using "core" or "cores" in hypervisor messages, too. >> And "core-scheduling" is a well-known buzzword already. > > Let me not get started on buzzwords ;-) :-) Juergen
On 30.09.2019 13:13, Jürgen Groß wrote: > On 30.09.19 13:02, Jan Beulich wrote: >> On 30.09.2019 12:51, Jürgen Groß wrote: >>> On 30.09.19 12:25, Jan Beulich wrote: >>>> On 30.09.2019 12:09, Juergen Gross wrote: >>>>> Add documentation for the new "sched-gran" hypervisor boot parameter. >>>>> >>>>> Signed-off-by: Juergen Gross <jgross@suse.com> >>>>> --- >>>>> docs/misc/xen-command-line.pandoc | 21 +++++++++++++++++++++ >>>>> 1 file changed, 21 insertions(+) >>>>> >>>>> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc >>>>> index fc64429064..c855246050 100644 >>>>> --- a/docs/misc/xen-command-line.pandoc >>>>> +++ b/docs/misc/xen-command-line.pandoc >>>>> @@ -1782,6 +1782,27 @@ Set the timeslice of the credit1 scheduler, in milliseconds. The >>>>> default is 30ms. Reasonable values may include 10, 5, or even 1 for >>>>> very latency-sensitive workloads. >>>>> >>>>> +### sched-gran (x86) >>>>> +> `= cpu | core | socket` >>>>> + >>>>> +> Default: `sched-gran=cpu` >>>>> + >>>>> +Set the scheduling granularity. In case the granularity is larger than 1 (e.g. >>>>> +`core`on a SMT-enabled system, or `socket`) multiple vcpus are assigned >>>>> +statically to a "scheduling unit" which will then be subject to scheduling. >>>>> +This assignment of vcpus to scheduling units is fixed. >>>>> + >>>>> +`cpu`: Vcpus will be scheduled individually on single cpus. >>>>> + >>>>> +`core`: As many vcpus as there are hyperthreads on a physical core are >>>>> +scheduled together on a physical core. >>>>> + >>>>> +`socket`: As many vcpus as there are hyperthreads on a physical sockets are >>>>> +scheduled together on a physical socket. >>>> >>>> I'd prefer if this didn't end up Intel-centric; ideally it also wouldn't be >>>> x86-specific. AMD has introduced hyperthreading in Fam17 only; Fam15 used >>>> "compute units", grouping together "cores". Internally the Intel side >>>> "core vs hyperthread" is represented in the same variables (cpu_sibling_mask >>>> in particular) as the AMD side "compute unit vs core". >>> >>> Yes, it is a mess. >>> >>>> Therefore it may be better to talk here about e.g. "smallest topological >>>> sub-unit" and only say "e.g. a hyperthread to make a connection to common >>>> x86 / Intel terminology". Of course the AMD side alternative use of the >>>> variables also renders the actual command line option "sched-gran=core" >>>> not overly fortunate. Perhaps we'd want to also use more abstract terms >>>> here, e.g. topological "levels"? >>> >>> I think regarding usage of "hyperthreads" I'll go with: >>> >>> +`cpu`: Vcpus will be scheduled individually on single cpus (e.g. a >>> + hyperthread using x86/Intel terminology) >>> + >>> + `core`: As many vcpus as there are cpus on a physical core are >>> + scheduled together on a physical core. >>> ... >>> >>> I think using "core" is fine. We have it in multiple places in the >>> hypervisor which are _not_ specific to Intel. >> >> Well, what we have in hypervisor sources is one thing - we can >> settle on any convention we want there. It's the user (admin) >> interface (i.e. the command line option name and description >> here) which we may want to be a little more careful with. But >> yes, I can see how we use "core" already in similar contexts >> in the command line option doc, first and foremost on >> "credit2_runqueue". (In retrospect I think this might have been >> a mistake though.) > > So what do you suggest? > > <Irony on> > "topology-level-just-above-the-smallest-topological-sub-unit"? > <Irony-off> > > I can't think of any sensible terminology not resulting in something > which is much harder to understand than "core". Ideally I'd like us to have an arch-independent way of expressing things - "socket" and "node" look to be common enough, so perhaps wouldn't need further abstraction, but sub-socket granularities could perhaps be expressed as "level1" or "level2"? And then there could be context sensitive meanings of "core", "cu", and perhaps (in the future) "die". My concern is that AMD-focused people may, when using "core", not get what they'd expect (and this concern extends to the existing uses of "core"). IOW "context sensitive" above would assign different meaning to "core" depending on the hardware we run on. Granted I can also see how this might confuse people other than the example AMD-focused ones. > And we are using "core" or "cores" in hypervisor messages, too. That's still slightly different though. Jan
On 30.09.19 13:20, Jan Beulich wrote: > On 30.09.2019 13:13, Jürgen Groß wrote: >> On 30.09.19 13:02, Jan Beulich wrote: >>> On 30.09.2019 12:51, Jürgen Groß wrote: >>>> On 30.09.19 12:25, Jan Beulich wrote: >>>>> On 30.09.2019 12:09, Juergen Gross wrote: >>>>>> Add documentation for the new "sched-gran" hypervisor boot parameter. >>>>>> >>>>>> Signed-off-by: Juergen Gross <jgross@suse.com> >>>>>> --- >>>>>> docs/misc/xen-command-line.pandoc | 21 +++++++++++++++++++++ >>>>>> 1 file changed, 21 insertions(+) >>>>>> >>>>>> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc >>>>>> index fc64429064..c855246050 100644 >>>>>> --- a/docs/misc/xen-command-line.pandoc >>>>>> +++ b/docs/misc/xen-command-line.pandoc >>>>>> @@ -1782,6 +1782,27 @@ Set the timeslice of the credit1 scheduler, in milliseconds. The >>>>>> default is 30ms. Reasonable values may include 10, 5, or even 1 for >>>>>> very latency-sensitive workloads. >>>>>> >>>>>> +### sched-gran (x86) >>>>>> +> `= cpu | core | socket` >>>>>> + >>>>>> +> Default: `sched-gran=cpu` >>>>>> + >>>>>> +Set the scheduling granularity. In case the granularity is larger than 1 (e.g. >>>>>> +`core`on a SMT-enabled system, or `socket`) multiple vcpus are assigned >>>>>> +statically to a "scheduling unit" which will then be subject to scheduling. >>>>>> +This assignment of vcpus to scheduling units is fixed. >>>>>> + >>>>>> +`cpu`: Vcpus will be scheduled individually on single cpus. >>>>>> + >>>>>> +`core`: As many vcpus as there are hyperthreads on a physical core are >>>>>> +scheduled together on a physical core. >>>>>> + >>>>>> +`socket`: As many vcpus as there are hyperthreads on a physical sockets are >>>>>> +scheduled together on a physical socket. >>>>> >>>>> I'd prefer if this didn't end up Intel-centric; ideally it also wouldn't be >>>>> x86-specific. AMD has introduced hyperthreading in Fam17 only; Fam15 used >>>>> "compute units", grouping together "cores". Internally the Intel side >>>>> "core vs hyperthread" is represented in the same variables (cpu_sibling_mask >>>>> in particular) as the AMD side "compute unit vs core". >>>> >>>> Yes, it is a mess. >>>> >>>>> Therefore it may be better to talk here about e.g. "smallest topological >>>>> sub-unit" and only say "e.g. a hyperthread to make a connection to common >>>>> x86 / Intel terminology". Of course the AMD side alternative use of the >>>>> variables also renders the actual command line option "sched-gran=core" >>>>> not overly fortunate. Perhaps we'd want to also use more abstract terms >>>>> here, e.g. topological "levels"? >>>> >>>> I think regarding usage of "hyperthreads" I'll go with: >>>> >>>> +`cpu`: Vcpus will be scheduled individually on single cpus (e.g. a >>>> + hyperthread using x86/Intel terminology) >>>> + >>>> + `core`: As many vcpus as there are cpus on a physical core are >>>> + scheduled together on a physical core. >>>> ... >>>> >>>> I think using "core" is fine. We have it in multiple places in the >>>> hypervisor which are _not_ specific to Intel. >>> >>> Well, what we have in hypervisor sources is one thing - we can >>> settle on any convention we want there. It's the user (admin) >>> interface (i.e. the command line option name and description >>> here) which we may want to be a little more careful with. But >>> yes, I can see how we use "core" already in similar contexts >>> in the command line option doc, first and foremost on >>> "credit2_runqueue". (In retrospect I think this might have been >>> a mistake though.) >> >> So what do you suggest? >> >> <Irony on> >> "topology-level-just-above-the-smallest-topological-sub-unit"? >> <Irony-off> >> >> I can't think of any sensible terminology not resulting in something >> which is much harder to understand than "core". > > Ideally I'd like us to have an arch-independent way of > expressing things - "socket" and "node" look to be common enough, > so perhaps wouldn't need further abstraction, but sub-socket > granularities could perhaps be expressed as "level1" or "level2"? > And then there could be context sensitive meanings of "core", > "cu", and perhaps (in the future) "die". > > My concern is that AMD-focused people may, when using "core", not > get what they'd expect (and this concern extends to the existing > uses of "core"). IOW "context sensitive" above would assign > different meaning to "core" depending on the hardware we run on. > Granted I can also see how this might confuse people other than > the example AMD-focused ones. And it will be fatal for large scale installations with AMD- and INTEL- servers. Boot-parameters having the same semantics should be named the same (regardless of the name or value part) in order to enable such customers to use the same setting on each server. Juergen
On 9/30/19 12:26 PM, Jürgen Groß wrote: > On 30.09.19 13:20, Jan Beulich wrote: >> On 30.09.2019 13:13, Jürgen Groß wrote: >>> On 30.09.19 13:02, Jan Beulich wrote: >>>> On 30.09.2019 12:51, Jürgen Groß wrote: >>>>> On 30.09.19 12:25, Jan Beulich wrote: >>>>>> On 30.09.2019 12:09, Juergen Gross wrote: >>>>>>> Add documentation for the new "sched-gran" hypervisor boot >>>>>>> parameter. >>>>>>> >>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com> >>>>>>> --- >>>>>>> docs/misc/xen-command-line.pandoc | 21 +++++++++++++++++++++ >>>>>>> 1 file changed, 21 insertions(+) >>>>>>> >>>>>>> diff --git a/docs/misc/xen-command-line.pandoc >>>>>>> b/docs/misc/xen-command-line.pandoc >>>>>>> index fc64429064..c855246050 100644 >>>>>>> --- a/docs/misc/xen-command-line.pandoc >>>>>>> +++ b/docs/misc/xen-command-line.pandoc >>>>>>> @@ -1782,6 +1782,27 @@ Set the timeslice of the credit1 >>>>>>> scheduler, in milliseconds. The >>>>>>> default is 30ms. Reasonable values may include 10, 5, or >>>>>>> even 1 for >>>>>>> very latency-sensitive workloads. >>>>>>> +### sched-gran (x86) >>>>>>> +> `= cpu | core | socket` >>>>>>> + >>>>>>> +> Default: `sched-gran=cpu` >>>>>>> + >>>>>>> +Set the scheduling granularity. In case the granularity is >>>>>>> larger than 1 (e.g. >>>>>>> +`core`on a SMT-enabled system, or `socket`) multiple vcpus are >>>>>>> assigned >>>>>>> +statically to a "scheduling unit" which will then be subject to >>>>>>> scheduling. >>>>>>> +This assignment of vcpus to scheduling units is fixed. >>>>>>> + >>>>>>> +`cpu`: Vcpus will be scheduled individually on single cpus. >>>>>>> + >>>>>>> +`core`: As many vcpus as there are hyperthreads on a physical >>>>>>> core are >>>>>>> +scheduled together on a physical core. >>>>>>> + >>>>>>> +`socket`: As many vcpus as there are hyperthreads on a physical >>>>>>> sockets are >>>>>>> +scheduled together on a physical socket. >>>>>> >>>>>> I'd prefer if this didn't end up Intel-centric; ideally it also >>>>>> wouldn't be >>>>>> x86-specific. AMD has introduced hyperthreading in Fam17 only; >>>>>> Fam15 used >>>>>> "compute units", grouping together "cores". Internally the Intel side >>>>>> "core vs hyperthread" is represented in the same variables >>>>>> (cpu_sibling_mask >>>>>> in particular) as the AMD side "compute unit vs core". >>>>> >>>>> Yes, it is a mess. >>>>> >>>>>> Therefore it may be better to talk here about e.g. "smallest >>>>>> topological >>>>>> sub-unit" and only say "e.g. a hyperthread to make a connection to >>>>>> common >>>>>> x86 / Intel terminology". Of course the AMD side alternative use >>>>>> of the >>>>>> variables also renders the actual command line option >>>>>> "sched-gran=core" >>>>>> not overly fortunate. Perhaps we'd want to also use more abstract >>>>>> terms >>>>>> here, e.g. topological "levels"? >>>>> >>>>> I think regarding usage of "hyperthreads" I'll go with: >>>>> >>>>> +`cpu`: Vcpus will be scheduled individually on single cpus (e.g. a >>>>> + hyperthread using x86/Intel terminology) >>>>> + >>>>> + `core`: As many vcpus as there are cpus on a physical core are >>>>> + scheduled together on a physical core. >>>>> ... >>>>> >>>>> I think using "core" is fine. We have it in multiple places in the >>>>> hypervisor which are _not_ specific to Intel. >>>> >>>> Well, what we have in hypervisor sources is one thing - we can >>>> settle on any convention we want there. It's the user (admin) >>>> interface (i.e. the command line option name and description >>>> here) which we may want to be a little more careful with. But >>>> yes, I can see how we use "core" already in similar contexts >>>> in the command line option doc, first and foremost on >>>> "credit2_runqueue". (In retrospect I think this might have been >>>> a mistake though.) >>> >>> So what do you suggest? >>> >>> <Irony on> >>> "topology-level-just-above-the-smallest-topological-sub-unit"? >>> <Irony-off> >>> >>> I can't think of any sensible terminology not resulting in something >>> which is much harder to understand than "core". >> >> Ideally I'd like us to have an arch-independent way of >> expressing things - "socket" and "node" look to be common enough, >> so perhaps wouldn't need further abstraction, but sub-socket >> granularities could perhaps be expressed as "level1" or "level2"? >> And then there could be context sensitive meanings of "core", >> "cu", and perhaps (in the future) "die". Words like "core" should have a consistent meaning. I did a quick search and couldn't really find any useful resources describing the difference. I think we have a couple of options (not necessarily all of these are exclusive): * Use "core / thread" for both, and document the rough mapping of these onto AMD terminologies. * Use "core / thread" for Intel, and AMD-specific terminology for AMD. * Add higher-level terms, like "secure" and "performance" (or "smallest"), so that an administrator can say, "Give me the smallest granularity which is still secure", and "Give me the best performance regardless of security". If HT is ever fixed on future processors, then those processors in the fleet will have thread-based scheduling, and insecure processors will have core-based scheduling. Fundamentally, either the topology levels are similar enough that a single setting is sensible to use across both, or they are not. If they are similar enough, then I think using "core / thread" and mapping them is probably the best option. If they are not similar enough, then things like "level1" and "level2" aren't actually useful anyway, because what they mean on different systems is too divergent; i.e., in all likelihood you'd want "level2" on Intels and "level1" on AMD anyway. -George
On 30.09.2019 13:45, George Dunlap wrote: > Fundamentally, either the topology levels are similar enough that a > single setting is sensible to use across both, or they are not. If they > are similar enough, then I think using "core / thread" and mapping them > is probably the best option. Indeed - hence my comment here and not on the code actually parsing the option. I.e. while I'd ideally prefer to see even the tokens on the command line to match what they mean on underlying hardware, I can accept (the reasons for) a common spelling, as long as the respective doc parts sufficiently clarify the meaning. Jan
On 30.09.2019 13:26, Jürgen Groß wrote: > And it will be fatal for large scale installations with AMD- and INTEL- > servers. Boot-parameters having the same semantics should be named the > same (regardless of the name or value part) in order to enable such > customers to use the same setting on each server. But such a large scale user would quite likely want the meaning of "core" in the respective vendor's sense, i.e. CPU scheduling on AMD (as not being affected by the various HT leaks), and core scheduling on Intel. Due to AMD Fam17 now actually calling the thing HT too, in fact such installations would likely want _different_ options when the primary goal is security, and a secondary one is performance / throughput. Otoh I guess this is going to be our default eventually, i.e. no command line option ought to be needed to achieve this. Jan
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index fc64429064..c855246050 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -1782,6 +1782,27 @@ Set the timeslice of the credit1 scheduler, in milliseconds. The default is 30ms. Reasonable values may include 10, 5, or even 1 for very latency-sensitive workloads. +### sched-gran (x86) +> `= cpu | core | socket` + +> Default: `sched-gran=cpu` + +Set the scheduling granularity. In case the granularity is larger than 1 (e.g. +`core`on a SMT-enabled system, or `socket`) multiple vcpus are assigned +statically to a "scheduling unit" which will then be subject to scheduling. +This assignment of vcpus to scheduling units is fixed. + +`cpu`: Vcpus will be scheduled individually on single cpus. + +`core`: As many vcpus as there are hyperthreads on a physical core are +scheduled together on a physical core. + +`socket`: As many vcpus as there are hyperthreads on a physical sockets are +scheduled together on a physical socket. + +Note: a value other than `cpu` will result in rejecting a runtime modification +of the "smt" setting. + ### sched_ratelimit_us > `= <integer>`
Add documentation for the new "sched-gran" hypervisor boot parameter. Signed-off-by: Juergen Gross <jgross@suse.com> --- docs/misc/xen-command-line.pandoc | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)