[4/6] xen: credit2: rearrange members of control structures

Message ID	149821531270.5914.329757532058629047.stgit@Solace (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Dario Faggioli <dario.faggioli@citrix.com> To: xen-devel@lists.xenproject.org Date: Fri, 23 Jun 2017 12:55:12 +0200 Message-ID: <149821531270.5914.329757532058629047.stgit@Solace> In-Reply-To: <149821475587.5914.12193327340105859241.stgit@Solace> References: <149821475587.5914.12193327340105859241.stgit@Solace> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Cc: George Dunlap <george.dunlap@citrix.com>, Anshul Makkar <anshulmakkar@gmail.com> Subject: [Xen-devel] [PATCH 4/6] xen: credit2: rearrange members of control structures Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Message ID

149821531270.5914.329757532058629047.stgit@Solace (mailing list archive)

State

New, archived

Headers

From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xenproject.org
Date: Fri, 23 Jun 2017 12:55:12 +0200
Message-ID: <149821531270.5914.329757532058629047.stgit@Solace>
In-Reply-To: <149821475587.5914.12193327340105859241.stgit@Solace>
References: <149821475587.5914.12193327340105859241.stgit@Solace>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Cc: George Dunlap <george.dunlap@citrix.com>,
	Anshul Makkar <anshulmakkar@gmail.com>
Subject: [Xen-devel] [PATCH 4/6] xen: credit2: rearrange members of control
	structures
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Commit Message

Dario Faggioli June 23, 2017, 10:55 a.m. UTC

With the aim of improving memory size and layout, and
at the same time trying to put related fields reside
in the same cacheline.

Here's a summary of the output of `pahole`, with and
without this patch, for the affected data structures.

csched2_runqueue_data:
 * Before:
    size: 216, cachelines: 4, members: 14
    sum members: 208, holes: 2, sum holes: 8
    last cacheline: 24 bytes
 * After:
    size: 208, cachelines: 4, members: 14
    last cacheline: 16 bytes

csched2_private:
 * Before:
    size: 120, cachelines: 2, members: 8
    sum members: 112, holes: 1, sum holes: 4
    padding: 4
    last cacheline: 56 bytes
 * After:
    size: 112, cachelines: 2, members: 8
    last cacheline: 48 bytes

csched2_vcpu:
 * Before:
    size: 112, cachelines: 2, members: 14
    sum members: 108, holes: 1, sum holes: 4
    last cacheline: 48 bytes
 * After:
    size: 112, cachelines: 2, members: 14
    padding: 4
    last cacheline: 48 bytes

While there, improve the wording, style and alignment
of comments too.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshulmakkar@gmail.com>
---
 xen/common/sched_credit2.c |  102 ++++++++++++++++++++++----------------------
 1 file changed, 51 insertions(+), 51 deletions(-)

Comments

George Dunlap July 21, 2017, 5:05 p.m. UTC | #1

On 06/23/2017 11:55 AM, Dario Faggioli wrote:
> With the aim of improving memory size and layout, and
> at the same time trying to put related fields reside
> in the same cacheline.
> 
> Here's a summary of the output of `pahole`, with and
> without this patch, for the affected data structures.
> 
> csched2_runqueue_data:
>  * Before:
>     size: 216, cachelines: 4, members: 14
>     sum members: 208, holes: 2, sum holes: 8
>     last cacheline: 24 bytes
>  * After:
>     size: 208, cachelines: 4, members: 14
>     last cacheline: 16 bytes
> 
> csched2_private:
>  * Before:
>     size: 120, cachelines: 2, members: 8
>     sum members: 112, holes: 1, sum holes: 4
>     padding: 4
>     last cacheline: 56 bytes
>  * After:
>     size: 112, cachelines: 2, members: 8
>     last cacheline: 48 bytes
> 
> csched2_vcpu:
>  * Before:
>     size: 112, cachelines: 2, members: 14
>     sum members: 108, holes: 1, sum holes: 4
>     last cacheline: 48 bytes
>  * After:
>     size: 112, cachelines: 2, members: 14
>     padding: 4
>     last cacheline: 48 bytes
> 
> While there, improve the wording, style and alignment
> of comments too.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

I haven't taken a careful look at these; the idea sounds good and I'll
trust that you've taken a careful look at them:

Acked-by: George Dunlap <george.dunlap@citrix.com>

Dario Faggioli July 21, 2017, 7:53 p.m. UTC | #2

On Fri, 2017-07-21 at 18:05 +0100, George Dunlap wrote:
> On 06/23/2017 11:55 AM, Dario Faggioli wrote:
> > 
> > While there, improve the wording, style and alignment
> > of comments too.
> > 
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> I haven't taken a careful look at these; the idea sounds good and
> I'll
> trust that you've taken a careful look at them:
> 
Hehe... thanks! :-)

I've even done the whole thing twice. In fact, I was about to submit
the series, when I discovered that I did optimize the cache layout of a
debug build, and hence had to redo everything from the beginning! :-P

Regards,
Dario

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 15862f2..9814072 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -355,40 +355,41 @@  custom_param("credit2_runqueue", parse_credit2_runqueue);
  * Per-runqueue data
  */
 struct csched2_runqueue_data {
-    int id;
-
-    spinlock_t lock;      /* Lock for this runqueue. */
-    cpumask_t active;      /* CPUs enabled for this runqueue */
-
-    struct list_head runq; /* Ordered list of runnable vms */
-    struct list_head svc;  /* List of all vcpus assigned to this runqueue */
-    unsigned int max_weight;
-    unsigned int pick_bias;/* Last CPU we picked. Start from it next time */
-
-    cpumask_t idle,        /* Currently idle pcpus */
-        smt_idle,          /* Fully idle-and-untickled cores (see below) */
-        tickled;           /* Have been asked to go through schedule */
-    int load;              /* Instantaneous load: Length of queue  + num non-idle threads */
-    s_time_t load_last_update;  /* Last time average was updated */
-    s_time_t avgload;           /* Decaying queue load */
-    s_time_t b_avgload;         /* Decaying queue load modified by balancing */
+    spinlock_t lock;           /* Lock for this runqueue                     */
+
+    struct list_head runq;     /* Ordered list of runnable vms               */
+    int id;                    /* ID of this runqueue (-1 if invalid)        */
+
+    int load;                  /* Instantaneous load (num of non-idle vcpus) */
+    s_time_t load_last_update; /* Last time average was updated              */
+    s_time_t avgload;          /* Decaying queue load                        */
+    s_time_t b_avgload;        /* Decaying queue load modified by balancing  */
+
+    cpumask_t active,          /* CPUs enabled for this runqueue             */
+        smt_idle,              /* Fully idle-and-untickled cores (see below) */
+        tickled,               /* Have been asked to go through schedule     */
+        idle;                  /* Currently idle pcpus                       */
+
+    struct list_head svc;      /* List of all vcpus assigned to the runqueue */
+    unsigned int max_weight;   /* Max weight of the vcpus in this runqueue   */
+    unsigned int pick_bias;    /* Last picked pcpu. Start from it next time  */
 };
 
 /*
  * System-wide private data
  */
 struct csched2_private {
-    rwlock_t lock;
-    cpumask_t initialized; /* CPU is initialized for this pool */
-    
-    struct list_head sdom; /* Used mostly for dump keyhandler. */
+    rwlock_t lock;                     /* Private scheduler lock             */
 
-    cpumask_t active_queues; /* Queues which may have active cpus */
-    struct csched2_runqueue_data *rqd;
+    unsigned int load_precision_shift; /* Precision of load calculations     */
+    unsigned int load_window_shift;    /* Lenght of load decaying window     */
+    unsigned int ratelimit_us;         /* Rate limiting for this scheduler   */
+
+    cpumask_t active_queues;           /* Runqueues with (maybe) active cpus */
+    struct csched2_runqueue_data *rqd; /* Data of the various runqueues      */
 
-    unsigned int load_precision_shift;
-    unsigned int load_window_shift;
-    unsigned ratelimit_us; /* each cpupool can have its own ratelimit */
+    cpumask_t initialized;             /* CPUs part of this scheduler        */
+    struct list_head sdom;             /* List of domains (for debug key)    */
 };
 
 /*
@@ -403,37 +404,36 @@  static DEFINE_PER_CPU(int, runq_map);
  * Virtual CPU
  */
 struct csched2_vcpu {
-    struct list_head rqd_elem;         /* On the runqueue data list  */
-    struct list_head runq_elem;        /* On the runqueue            */
-    struct csched2_runqueue_data *rqd; /* Up-pointer to the runqueue */
-
-    /* Up-pointers */
-    struct csched2_dom *sdom;
-    struct vcpu *vcpu;
-
-    unsigned int weight;
-    unsigned int residual;
-
-    int credit;
-    s_time_t start_time; /* When we were scheduled (used for credit) */
-    unsigned flags;      /* 16 bits doesn't seem to play well with clear_bit() */
-    int tickled_cpu;     /* cpu tickled for picking us up (-1 if none) */
-
-    /* Individual contribution to load */
-    s_time_t load_last_update;  /* Last time average was updated */
-    s_time_t avgload;           /* Decaying queue load */
-
-    struct csched2_runqueue_data *migrate_rqd; /* Pre-determined rqd to which to migrate */
+    struct list_head rqd_elem;         /* On csched2_runqueue_data's svc list */
+    struct csched2_runqueue_data *rqd; /* Up-pointer to the runqueue          */
+
+    int credit;                        /* Current amount of credit            */
+    unsigned int weight;               /* Weight of this vcpu                 */
+    unsigned int residual;             /* Reminder of div(max_weight/weight)  */
+    unsigned flags;                    /* Status flags (16 bits would be ok,  */
+                                       /* but clear_bit() does not like that) */
+    s_time_t start_time;               /* Time we were scheduled (for credit) */
+
+    /* Individual contribution to load                                        */
+    s_time_t load_last_update;         /* Last time average was updated       */
+    s_time_t avgload;                  /* Decaying queue load                 */
+
+    struct list_head runq_elem;        /* On the runqueue (rqd->runq)         */
+    struct csched2_dom *sdom;          /* Up-pointer to domain                */
+    struct vcpu *vcpu;                 /* Up-pointer, to vcpu                 */
+
+    struct csched2_runqueue_data *migrate_rqd; /* Pre-determined migr. target */
+    int tickled_cpu;                   /* Cpu that will pick us (-1 if none)  */
 };
 
 /*
  * Domain
  */
 struct csched2_dom {
-    struct list_head sdom_elem;
-    struct domain *dom;
-    uint16_t weight;
-    uint16_t nr_vcpus;
+    struct list_head sdom_elem; /* On csched2_runqueue_data's sdom list       */
+    struct domain *dom;         /* Up-pointer to domain                       */
+    uint16_t weight;            /* User specified weight                      */
+    uint16_t nr_vcpus;          /* Number of vcpus of this domain             */
 };
 
 /*

[4/6] xen: credit2: rearrange members of control structures

Commit Message

Comments

Patch