diff mbox

[v2,02/11] xen: credit2: prevent load balancing to go mad if time goes backwards

Message ID 146859415833.10217.1295698882153856098.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli July 15, 2016, 2:49 p.m. UTC
This really should not happen, but:
 1. it does happen! Some more info here:
    http://lists.xen.org/archives/html/xen-devel/2016-06/msg00922.html
 2. independently from 1, it makes sense and is easy enough
    to have a 'safety catch'.

The reason why this is particularly bad for Credit2 is that
negative values of delta mean out of scale high load (because
of the conversion to unsigned). This, for instance in the
case of runqueue load, results in a runqueue having its load
updated to values of the order of 10000% or so, which in turns
means that the load balancer will migrate everything off from
the pCPUs in the runqueue, and leave them idle until the load
gets back to something sane... which may indeed take a while!

This is not a fix for the problem of time going backwards. In
fact, if that happens a lot, load tracking accuracy is still
compromized, but at least the effect is a lot less bad than
before.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
Leaving the patch as it was in v1, for reasons explained in these messages, by
both George and me:
 * <CAFLBxZZdS21UGW4TSDGvaDtn3LzD3X6L3f-rpDP4ySrHN0r7RQ@mail.gmail.com>
 * <a5b9e1c0-1d25-4cda-b1b6-a546189b43b0@citrix.com>
 * <1467888805.23985.62.camel@citrix.com>
---
 xen/common/sched_credit2.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)
diff mbox

Patch

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 7e572bf..6cb06e8 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -404,6 +404,12 @@  __update_runq_load(const struct scheduler *ops,
     else
     {
         delta = now - rqd->load_last_update;
+        if ( unlikely(delta < 0) )
+        {
+            d2printk("%s: Time went backwards? now %"PRI_stime" llu %"PRI_stime"\n",
+                     __func__, now, rqd->load_last_update);
+            delta = 0;
+        }
 
         rqd->avgload =
             ( ( delta * ( (unsigned long long)rqd->load << prv->load_window_shift ) )
@@ -455,6 +461,12 @@  __update_svc_load(const struct scheduler *ops,
     else
     {
         delta = now - svc->load_last_update;
+        if ( unlikely(delta < 0) )
+        {
+            d2printk("%s: Time went backwards? now %"PRI_stime" llu %"PRI_stime"\n",
+                     __func__, now, svc->load_last_update);
+            delta = 0;
+        }
 
         svc->avgload =
             ( ( delta * ( (unsigned long long)vcpu_load << prv->load_window_shift ) )