diff mbox

[for-4.8] xen: credit2: make runqueues be per-socket by default.

Message ID 148041702351.16498.17129591426095525570.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli Nov. 29, 2016, 10:57 a.m. UTC
Benchmarks have shown that per-socket runqueues arrangement
behaves better (e.g., we achieve better load balancing)
than the current per-core default.

Here's an example (coming from
https://lists.xen.org/archives/html/xen-devel/2016-06/msg02287.html ):

|=======================================|
| XEN BUILD TIME, LOW LOAD, NO NOISE    |
|---------------------------------------|
|       runq=core   runq=socket         |
|         35.200       33.433           |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, HIGH LOAD, NO NOISE   | IPERF, HIGH LOAD, NO NOISE   |
|---------------------------------------|------------------------------|
|       runq=core   runq=socket         |     runq=core runq=socket    |
|         18.013       18.530           |       23.200     23.466      |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, LOW LOAD, WITH NOISE  |
|-------------------------------------  |
|       runq=core   runq=socket         |
|         45.866       39.493           |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, HIGH LOAD, WITH NOISE | IPERF, HIGH LOAD, WITH NOISE |
|---------------------------------------|------------------------------|
|       runq=core   runq=socket         |     runq=core runq=socket    |
|         36.840       29.080           |       19.967     21.000      |
|=======================================|==============================|

The only reason why we went for per-core, initially, was to
introduce some form of hyperthreading support. Now we have
hyperthreading support, independently from how runqueues
are organized (9bb9c7388 "xen: credit2: implement true SMT
support"), and thus we can switch to per-socket.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
---
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
George's Ack obtained 'out of band', as he is away.
---
 docs/misc/xen-command-line.markdown |    2 +-
 xen/common/sched_credit2.c          |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Wei Liu Nov. 29, 2016, 11:14 a.m. UTC | #1
On Tue, Nov 29, 2016 at 11:57:03AM +0100, Dario Faggioli wrote:
> Benchmarks have shown that per-socket runqueues arrangement
> behaves better (e.g., we achieve better load balancing)
> than the current per-core default.
> 
> Here's an example (coming from
> https://lists.xen.org/archives/html/xen-devel/2016-06/msg02287.html ):
> 
> |=======================================|
> | XEN BUILD TIME, LOW LOAD, NO NOISE    |
> |---------------------------------------|
> |       runq=core   runq=socket         |
> |         35.200       33.433           |
> |---------------------------------------|------------------------------|
> | XEN BUILD TIME, HIGH LOAD, NO NOISE   | IPERF, HIGH LOAD, NO NOISE   |
> |---------------------------------------|------------------------------|
> |       runq=core   runq=socket         |     runq=core runq=socket    |
> |         18.013       18.530           |       23.200     23.466      |
> |---------------------------------------|------------------------------|
> | XEN BUILD TIME, LOW LOAD, WITH NOISE  |
> |-------------------------------------  |
> |       runq=core   runq=socket         |
> |         45.866       39.493           |
> |---------------------------------------|------------------------------|
> | XEN BUILD TIME, HIGH LOAD, WITH NOISE | IPERF, HIGH LOAD, WITH NOISE |
> |---------------------------------------|------------------------------|
> |       runq=core   runq=socket         |     runq=core runq=socket    |
> |         36.840       29.080           |       19.967     21.000      |
> |=======================================|==============================|
> 
> The only reason why we went for per-core, initially, was to
> introduce some form of hyperthreading support. Now we have
> hyperthreading support, independently from how runqueues
> are organized (9bb9c7388 "xen: credit2: implement true SMT
> support"), and thus we can switch to per-socket.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
> ---
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
> George's Ack obtained 'out of band', as he is away.
> ---

Strictly speaking, this is not necessary for 4.8 because even though the
default configuration is not optimal, there is a simple way to change
it.

On the other hand, because credit2 is declared supported in this
release, it is better to have a sensible default setting for credit2.

The risk is that changing the default value would cause osstest to
discover bugs in the scheduler logic, hence blocking the release. But
overall I think the chance is low, but Dario can you please state to
what extend did you test this patch?

Wei.
Dario Faggioli Nov. 29, 2016, 11:33 a.m. UTC | #2
On Tue, 2016-11-29 at 11:14 +0000, Wei Liu wrote:
> On Tue, Nov 29, 2016 at 11:57:03AM +0100, Dario Faggioli wrote:
> Strictly speaking, this is not necessary for 4.8 because even though
> the
> default configuration is not optimal, there is a simple way to change
> it.
> 
Absolutely true.

> On the other hand, because credit2 is declared supported in this
> release, it is better to have a sensible default setting for credit2.
> 
Exactly.

> The risk is that changing the default value would cause osstest to
> discover bugs in the scheduler logic, hence blocking the release. But
> overall I think the chance is low, but Dario can you please state to
> what extend did you test this patch?
> 
I've tried to state that in the cover letter:
https://lists.xen.org/archives/html/xen-devel/2016-11/msg02616.html

In some more details, it is _months_ that I and Anshul run benchmarks
with Credit2 running in this configuration.

It's months that, all the time that I boot my test box, either to do
Credit2 related development and testing, or completely unrelated things
(e.g., recently, when testing the new Fedora 25 as HVM and PV guest), I
use Credit2 in this configuration.

George has done some runs of his schedbench suite with this
configuration (to compare Credit1 vs Credit2 per-core vs Credit2 per-
socket).

So, if you ask me "how much is this tested", my answer would be "really
really really a lot". :-)

Dario
Wei Liu Nov. 29, 2016, 11:42 a.m. UTC | #3
On Tue, Nov 29, 2016 at 12:33:24PM +0100, Dario Faggioli wrote:
> On Tue, 2016-11-29 at 11:14 +0000, Wei Liu wrote:
> > On Tue, Nov 29, 2016 at 11:57:03AM +0100, Dario Faggioli wrote:
> > Strictly speaking, this is not necessary for 4.8 because even though
> > the
> > default configuration is not optimal, there is a simple way to change
> > it.
> > 
> Absolutely true.
> 
> > On the other hand, because credit2 is declared supported in this
> > release, it is better to have a sensible default setting for credit2.
> > 
> Exactly.
> 
> > The risk is that changing the default value would cause osstest to
> > discover bugs in the scheduler logic, hence blocking the release. But
> > overall I think the chance is low, but Dario can you please state to
> > what extend did you test this patch?
> > 
> I've tried to state that in the cover letter:
> https://lists.xen.org/archives/html/xen-devel/2016-11/msg02616.html
> 

Ah, I thought that was a dupe because it had the same subject line as
the patch, so I skipped it.

> In some more details, it is _months_ that I and Anshul run benchmarks
> with Credit2 running in this configuration.
> 
> It's months that, all the time that I boot my test box, either to do
> Credit2 related development and testing, or completely unrelated things
> (e.g., recently, when testing the new Fedora 25 as HVM and PV guest), I
> use Credit2 in this configuration.
> 
> George has done some runs of his schedbench suite with this
> configuration (to compare Credit1 vs Credit2 per-core vs Credit2 per-
> socket).
> 
> So, if you ask me "how much is this tested", my answer would be "really
> really really a lot". :-)
> 

OK. This is rather convincing to me. I will wait a bit for other people
express their opinion.

Wei.

> Dario
> -- 
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Jan Beulich Nov. 29, 2016, 11:51 a.m. UTC | #4
>>> On 29.11.16 at 12:42, <wei.liu2@citrix.com> wrote:
> On Tue, Nov 29, 2016 at 12:33:24PM +0100, Dario Faggioli wrote:
>> On Tue, 2016-11-29 at 11:14 +0000, Wei Liu wrote:
>> > On Tue, Nov 29, 2016 at 11:57:03AM +0100, Dario Faggioli wrote:
>> > Strictly speaking, this is not necessary for 4.8 because even though
>> > the
>> > default configuration is not optimal, there is a simple way to change
>> > it.
>> > 
>> Absolutely true.
>> 
>> > On the other hand, because credit2 is declared supported in this
>> > release, it is better to have a sensible default setting for credit2.
>> > 
>> Exactly.
>> 
>> > The risk is that changing the default value would cause osstest to
>> > discover bugs in the scheduler logic, hence blocking the release. But
>> > overall I think the chance is low, but Dario can you please state to
>> > what extend did you test this patch?
>> > 
>> I've tried to state that in the cover letter:
>> https://lists.xen.org/archives/html/xen-devel/2016-11/msg02616.html 
>> 
> 
> Ah, I thought that was a dupe because it had the same subject line as
> the patch, so I skipped it.

Same here, albeit I had noticed I shouldn't have got 2 copies, so
I did look into the other one.

>> In some more details, it is _months_ that I and Anshul run benchmarks
>> with Credit2 running in this configuration.
>> 
>> It's months that, all the time that I boot my test box, either to do
>> Credit2 related development and testing, or completely unrelated things
>> (e.g., recently, when testing the new Fedora 25 as HVM and PV guest), I
>> use Credit2 in this configuration.
>> 
>> George has done some runs of his schedbench suite with this
>> configuration (to compare Credit1 vs Credit2 per-core vs Credit2 per-
>> socket).
>> 
>> So, if you ask me "how much is this tested", my answer would be "really
>> really really a lot". :-)
>> 
> 
> OK. This is rather convincing to me. I will wait a bit for other people
> express their opinion.

FWIW, I'm in favor of putting it in with the background provided.

Jan
Dario Faggioli Nov. 29, 2016, 11:57 a.m. UTC | #5
On Tue, 2016-11-29 at 04:51 -0700, Jan Beulich wrote:
> > > > On 29.11.16 at 12:42, <wei.liu2@citrix.com> wrote:
> > > I've tried to state that in the cover letter:
> > > https://lists.xen.org/archives/html/xen-devel/2016-11/msg02616.ht
> > > ml 
> > > 
> > Ah, I thought that was a dupe because it had the same subject line
> > as
> > the patch, so I skipped it.
> 
> Same here, albeit I had noticed I shouldn't have got 2 copies, so
> I did look into the other one.
> 
Ah, I see it can indeed look confusing. I hadn't thought about that,
sorry.

> > OK. This is rather convincing to me. I will wait a bit for other
> > people
> > express their opinion.
> 
> FWIW, I'm in favor of putting it in with the background provided.
> 
Whatever the outcome, thanks for having a look (well, to Jan, Wei was
kind of forced to do so! :-P :-P)

Regards,
Dario
Wei Liu Nov. 29, 2016, 12:01 p.m. UTC | #6
On Tue, Nov 29, 2016 at 04:51:04AM -0700, Jan Beulich wrote:
> >>> On 29.11.16 at 12:42, <wei.liu2@citrix.com> wrote:
> > On Tue, Nov 29, 2016 at 12:33:24PM +0100, Dario Faggioli wrote:
> >> On Tue, 2016-11-29 at 11:14 +0000, Wei Liu wrote:
> >> > On Tue, Nov 29, 2016 at 11:57:03AM +0100, Dario Faggioli wrote:
> >> > Strictly speaking, this is not necessary for 4.8 because even though
> >> > the
> >> > default configuration is not optimal, there is a simple way to change
> >> > it.
> >> > 
> >> Absolutely true.
> >> 
> >> > On the other hand, because credit2 is declared supported in this
> >> > release, it is better to have a sensible default setting for credit2.
> >> > 
> >> Exactly.
> >> 
> >> > The risk is that changing the default value would cause osstest to
> >> > discover bugs in the scheduler logic, hence blocking the release. But
> >> > overall I think the chance is low, but Dario can you please state to
> >> > what extend did you test this patch?
> >> > 
> >> I've tried to state that in the cover letter:
> >> https://lists.xen.org/archives/html/xen-devel/2016-11/msg02616.html 
> >> 
> > 
> > Ah, I thought that was a dupe because it had the same subject line as
> > the patch, so I skipped it.
> 
> Same here, albeit I had noticed I shouldn't have got 2 copies, so
> I did look into the other one.
> 
> >> In some more details, it is _months_ that I and Anshul run benchmarks
> >> with Credit2 running in this configuration.
> >> 
> >> It's months that, all the time that I boot my test box, either to do
> >> Credit2 related development and testing, or completely unrelated things
> >> (e.g., recently, when testing the new Fedora 25 as HVM and PV guest), I
> >> use Credit2 in this configuration.
> >> 
> >> George has done some runs of his schedbench suite with this
> >> configuration (to compare Credit1 vs Credit2 per-core vs Credit2 per-
> >> socket).
> >> 
> >> So, if you ask me "how much is this tested", my answer would be "really
> >> really really a lot". :-)
> >> 
> > 
> > OK. This is rather convincing to me. I will wait a bit for other people
> > express their opinion.
> 
> FWIW, I'm in favor of putting it in with the background provided.
> 

Release-acked-by: Wei Liu <wei.liu2@citrix.com>

Feel free to commit it with your other patches.

> Jan
>
diff mbox

Patch

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 87c3023..0138978 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -527,7 +527,7 @@  The default value of `1 sec` is rather long.
 ### credit2\_runqueue
 > `= core | socket | node | all`
 
-> Default: `core`
+> Default: `socket`
 
 Specify how host CPUs are arranged in runqueues. Runqueues are kept
 balanced with respect to the load generated by the vCPUs running on
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index c2c563d..ef8e0d8 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -330,7 +330,7 @@  static const char *const opt_runqueue_str[] = {
     [OPT_RUNQUEUE_NODE] = "node",
     [OPT_RUNQUEUE_ALL] = "all"
 };
-static int __read_mostly opt_runqueue = OPT_RUNQUEUE_CORE;
+static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET;
 
 static void parse_credit2_runqueue(const char *s)
 {