Message ID | 146620512780.29766.17654986428277031886.stgit@Solace.fritz.box (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 18.06.16 at 01:12, <dario.faggioli@citrix.com> wrote: > This really should not happen, but: > 1. it does happen! Investigation is ongoing here: > http://lists.xen.org/archives/html/xen-devel/2016-06/msg00922.html > 2. even when 1 will be fixed it makes sense and is easy enough > to have a 'safety catch' for it. > > The reason why this is particularly bad for Credit2 is that > negative values of delta mean out of scale high load (because > of the conversion to unsigned). This, for instance in the > case of runqueue load, results in a runqueue having its load > updated to values of the order of 10000% or so, which in turns > means that the load balancer will migrate everything off from > the pCPUs in the runqueue, and leave them idle until the load > gets back to something sane... which may indeed take a while! > > This is not a fix for the problem of time going backwards. In > fact, if that happens a lot, load tracking accuracy is still > compromized, but at least the effect is a lot less bad than > before. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > Cc: George Dunlap <george.dunlap@citrix.com> > Cc: Anshul Makkar <anshul.makkar@citrix.com> > Cc: David Vrabel <david.vrabel@citrix.com> > --- > xen/common/sched_credit2.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 50f8dfd..b73d034 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -404,6 +404,12 @@ __update_runq_load(const struct scheduler *ops, > else > { > delta = now - rqd->load_last_update; > + if ( unlikely(delta < 0) ) > + { > + d2printk("%s: Time went backwards? now %"PRI_stime" llu %"PRI_stime"\n", > + __func__, now, rqd->load_last_update); > + delta = 0; > + } > > rqd->avgload = > ( ( delta * ( (unsigned long long)rqd->load << prv->load_window_shift ) ) > @@ -455,6 +461,12 @@ __update_svc_load(const struct scheduler *ops, > else > { > delta = now - svc->load_last_update; > + if ( unlikely(delta < 0) ) > + { > + d2printk("%s: Time went backwards? now %"PRI_stime" llu %"PRI_stime"\n", > + __func__, now, svc->load_last_update); > + delta = 0; > + } > > svc->avgload = > ( ( delta * ( (unsigned long long)vcpu_load << prv->load_window_shift ) ) Do the absolute times really matter here? I.e. wouldn't it be more useful to simply log the value of delta? Also, may I ask you to use the L modifier in favor of the ll one, for being one byte shorter (and hence, even if just very slightly, reducing both image size and cache pressure)? And finally, instead of logging function names, could the two messages be made distinguishable by other means resulting in less data issued to the log (and potentially needing transmission over a slow serial line)? Jan
On Mon, Jun 20, 2016 at 9:02 AM, Jan Beulich <JBeulich@suse.com> wrote: >>>> On 18.06.16 at 01:12, <dario.faggioli@citrix.com> wrote: >> This really should not happen, but: >> 1. it does happen! Investigation is ongoing here: >> http://lists.xen.org/archives/html/xen-devel/2016-06/msg00922.html >> 2. even when 1 will be fixed it makes sense and is easy enough >> to have a 'safety catch' for it. >> >> The reason why this is particularly bad for Credit2 is that >> negative values of delta mean out of scale high load (because >> of the conversion to unsigned). This, for instance in the >> case of runqueue load, results in a runqueue having its load >> updated to values of the order of 10000% or so, which in turns >> means that the load balancer will migrate everything off from >> the pCPUs in the runqueue, and leave them idle until the load >> gets back to something sane... which may indeed take a while! >> >> This is not a fix for the problem of time going backwards. In >> fact, if that happens a lot, load tracking accuracy is still >> compromized, but at least the effect is a lot less bad than >> before. >> >> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> >> --- >> Cc: George Dunlap <george.dunlap@citrix.com> >> Cc: Anshul Makkar <anshul.makkar@citrix.com> >> Cc: David Vrabel <david.vrabel@citrix.com> >> --- >> xen/common/sched_credit2.c | 12 ++++++++++++ >> 1 file changed, 12 insertions(+) >> >> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c >> index 50f8dfd..b73d034 100644 >> --- a/xen/common/sched_credit2.c >> +++ b/xen/common/sched_credit2.c >> @@ -404,6 +404,12 @@ __update_runq_load(const struct scheduler *ops, >> else >> { >> delta = now - rqd->load_last_update; >> + if ( unlikely(delta < 0) ) >> + { >> + d2printk("%s: Time went backwards? now %"PRI_stime" llu %"PRI_stime"\n", >> + __func__, now, rqd->load_last_update); >> + delta = 0; >> + } >> >> rqd->avgload = >> ( ( delta * ( (unsigned long long)rqd->load << prv->load_window_shift ) ) >> @@ -455,6 +461,12 @@ __update_svc_load(const struct scheduler *ops, >> else >> { >> delta = now - svc->load_last_update; >> + if ( unlikely(delta < 0) ) >> + { >> + d2printk("%s: Time went backwards? now %"PRI_stime" llu %"PRI_stime"\n", >> + __func__, now, svc->load_last_update); >> + delta = 0; >> + } >> >> svc->avgload = >> ( ( delta * ( (unsigned long long)vcpu_load << prv->load_window_shift ) ) > > Do the absolute times really matter here? I.e. wouldn't it be more > useful to simply log the value of delta? > > Also, may I ask you to use the L modifier in favor of the ll one, for > being one byte shorter (and hence, even if just very slightly, > reducing both image size and cache pressure)? > > And finally, instead of logging function names, could the two > messages be made distinguishable by other means resulting in less > data issued to the log (and potentially needing transmission over > a slow serial line)? The reason this is under a "d2printk" is because it's really only to help developers in debugging. In-tree this warning isn't even on with debug=y; you have to go to the top of the file and change the #define to make it even exist. Given that, I don't think the quibbles over the code size or the length of what's logged really matter. I think we should just take it as it is. Reviewed-by: George Dunlap <george.dunlap@citrix.com> -George
>>> On 06.07.16 at 18:21, <george.dunlap@citrix.com> wrote: > On Mon, Jun 20, 2016 at 9:02 AM, Jan Beulich <JBeulich@suse.com> wrote: >>>>> On 18.06.16 at 01:12, <dario.faggioli@citrix.com> wrote: >>> This really should not happen, but: >>> 1. it does happen! Investigation is ongoing here: >>> http://lists.xen.org/archives/html/xen-devel/2016-06/msg00922.html >>> 2. even when 1 will be fixed it makes sense and is easy enough >>> to have a 'safety catch' for it. >>> >>> The reason why this is particularly bad for Credit2 is that >>> negative values of delta mean out of scale high load (because >>> of the conversion to unsigned). This, for instance in the >>> case of runqueue load, results in a runqueue having its load >>> updated to values of the order of 10000% or so, which in turns >>> means that the load balancer will migrate everything off from >>> the pCPUs in the runqueue, and leave them idle until the load >>> gets back to something sane... which may indeed take a while! >>> >>> This is not a fix for the problem of time going backwards. In >>> fact, if that happens a lot, load tracking accuracy is still >>> compromized, but at least the effect is a lot less bad than >>> before. >>> >>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> >>> --- >>> Cc: George Dunlap <george.dunlap@citrix.com> >>> Cc: Anshul Makkar <anshul.makkar@citrix.com> >>> Cc: David Vrabel <david.vrabel@citrix.com> >>> --- >>> xen/common/sched_credit2.c | 12 ++++++++++++ >>> 1 file changed, 12 insertions(+) >>> >>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c >>> index 50f8dfd..b73d034 100644 >>> --- a/xen/common/sched_credit2.c >>> +++ b/xen/common/sched_credit2.c >>> @@ -404,6 +404,12 @@ __update_runq_load(const struct scheduler *ops, >>> else >>> { >>> delta = now - rqd->load_last_update; >>> + if ( unlikely(delta < 0) ) >>> + { >>> + d2printk("%s: Time went backwards? now %"PRI_stime" llu > %"PRI_stime"\n", >>> + __func__, now, rqd->load_last_update); >>> + delta = 0; >>> + } >>> >>> rqd->avgload = >>> ( ( delta * ( (unsigned long long)rqd->load << > prv->load_window_shift ) ) >>> @@ -455,6 +461,12 @@ __update_svc_load(const struct scheduler *ops, >>> else >>> { >>> delta = now - svc->load_last_update; >>> + if ( unlikely(delta < 0) ) >>> + { >>> + d2printk("%s: Time went backwards? now %"PRI_stime" llu > %"PRI_stime"\n", >>> + __func__, now, svc->load_last_update); >>> + delta = 0; >>> + } >>> >>> svc->avgload = >>> ( ( delta * ( (unsigned long long)vcpu_load << > prv->load_window_shift ) ) >> >> Do the absolute times really matter here? I.e. wouldn't it be more >> useful to simply log the value of delta? >> >> Also, may I ask you to use the L modifier in favor of the ll one, for >> being one byte shorter (and hence, even if just very slightly, >> reducing both image size and cache pressure)? >> >> And finally, instead of logging function names, could the two >> messages be made distinguishable by other means resulting in less >> data issued to the log (and potentially needing transmission over >> a slow serial line)? > > The reason this is under a "d2printk" is because it's really only to > help developers in debugging. In-tree this warning isn't even on with > debug=y; you have to go to the top of the file and change the #define > to make it even exist. > > Given that, I don't think the quibbles over the code size or the > length of what's logged really matter. I think we should just take it > as it is. > > Reviewed-by: George Dunlap <george.dunlap@citrix.com> Oh, okay - I agree on those two parts then. But the question on the usefulness of absolute vs relative times remains. Jan
On 07/07/16 08:29, Jan Beulich wrote: >>>> On 06.07.16 at 18:21, <george.dunlap@citrix.com> wrote: >> On Mon, Jun 20, 2016 at 9:02 AM, Jan Beulich <JBeulich@suse.com> wrote: >>>>>> On 18.06.16 at 01:12, <dario.faggioli@citrix.com> wrote: >>>> This really should not happen, but: >>>> 1. it does happen! Investigation is ongoing here: >>>> http://lists.xen.org/archives/html/xen-devel/2016-06/msg00922.html >>>> 2. even when 1 will be fixed it makes sense and is easy enough >>>> to have a 'safety catch' for it. >>>> >>>> The reason why this is particularly bad for Credit2 is that >>>> negative values of delta mean out of scale high load (because >>>> of the conversion to unsigned). This, for instance in the >>>> case of runqueue load, results in a runqueue having its load >>>> updated to values of the order of 10000% or so, which in turns >>>> means that the load balancer will migrate everything off from >>>> the pCPUs in the runqueue, and leave them idle until the load >>>> gets back to something sane... which may indeed take a while! >>>> >>>> This is not a fix for the problem of time going backwards. In >>>> fact, if that happens a lot, load tracking accuracy is still >>>> compromized, but at least the effect is a lot less bad than >>>> before. >>>> >>>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> >>>> --- >>>> Cc: George Dunlap <george.dunlap@citrix.com> >>>> Cc: Anshul Makkar <anshul.makkar@citrix.com> >>>> Cc: David Vrabel <david.vrabel@citrix.com> >>>> --- >>>> xen/common/sched_credit2.c | 12 ++++++++++++ >>>> 1 file changed, 12 insertions(+) >>>> >>>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c >>>> index 50f8dfd..b73d034 100644 >>>> --- a/xen/common/sched_credit2.c >>>> +++ b/xen/common/sched_credit2.c >>>> @@ -404,6 +404,12 @@ __update_runq_load(const struct scheduler *ops, >>>> else >>>> { >>>> delta = now - rqd->load_last_update; >>>> + if ( unlikely(delta < 0) ) >>>> + { >>>> + d2printk("%s: Time went backwards? now %"PRI_stime" llu >> %"PRI_stime"\n", >>>> + __func__, now, rqd->load_last_update); >>>> + delta = 0; >>>> + } >>>> >>>> rqd->avgload = >>>> ( ( delta * ( (unsigned long long)rqd->load << >> prv->load_window_shift ) ) >>>> @@ -455,6 +461,12 @@ __update_svc_load(const struct scheduler *ops, >>>> else >>>> { >>>> delta = now - svc->load_last_update; >>>> + if ( unlikely(delta < 0) ) >>>> + { >>>> + d2printk("%s: Time went backwards? now %"PRI_stime" llu >> %"PRI_stime"\n", >>>> + __func__, now, svc->load_last_update); >>>> + delta = 0; >>>> + } >>>> >>>> svc->avgload = >>>> ( ( delta * ( (unsigned long long)vcpu_load << >> prv->load_window_shift ) ) >>> >>> Do the absolute times really matter here? I.e. wouldn't it be more >>> useful to simply log the value of delta? >>> >>> Also, may I ask you to use the L modifier in favor of the ll one, for >>> being one byte shorter (and hence, even if just very slightly, >>> reducing both image size and cache pressure)? >>> >>> And finally, instead of logging function names, could the two >>> messages be made distinguishable by other means resulting in less >>> data issued to the log (and potentially needing transmission over >>> a slow serial line)? >> >> The reason this is under a "d2printk" is because it's really only to >> help developers in debugging. In-tree this warning isn't even on with >> debug=y; you have to go to the top of the file and change the #define >> to make it even exist. >> >> Given that, I don't think the quibbles over the code size or the >> length of what's logged really matter. I think we should just take it >> as it is. >> >> Reviewed-by: George Dunlap <george.dunlap@citrix.com> > > Oh, okay - I agree on those two parts then. But the question on the > usefulness of absolute vs relative times remains. What is the usefulness of printing the relative time? If you have the absolute time, you have some chance of catching mistakes like one of the times being 0 or something like that; or of being able to correlate it with another time printed somewhere else (for instance, a timestamp from a trace record). In any case, I think it's really a bike shed. Dario is the one who has used this error message to find an actual bug recently, so I'll let him decide what he thinks the most useful thing to print here is. -George
>>> On 07.07.16 at 11:09, <george.dunlap@citrix.com> wrote: > On 07/07/16 08:29, Jan Beulich wrote: >>>>> On 06.07.16 at 18:21, <george.dunlap@citrix.com> wrote: >>> On Mon, Jun 20, 2016 at 9:02 AM, Jan Beulich <JBeulich@suse.com> wrote: >>>>>>> On 18.06.16 at 01:12, <dario.faggioli@citrix.com> wrote: >>>>> This really should not happen, but: >>>>> 1. it does happen! Investigation is ongoing here: >>>>> http://lists.xen.org/archives/html/xen-devel/2016-06/msg00922.html >>>>> 2. even when 1 will be fixed it makes sense and is easy enough >>>>> to have a 'safety catch' for it. >>>>> >>>>> The reason why this is particularly bad for Credit2 is that >>>>> negative values of delta mean out of scale high load (because >>>>> of the conversion to unsigned). This, for instance in the >>>>> case of runqueue load, results in a runqueue having its load >>>>> updated to values of the order of 10000% or so, which in turns >>>>> means that the load balancer will migrate everything off from >>>>> the pCPUs in the runqueue, and leave them idle until the load >>>>> gets back to something sane... which may indeed take a while! >>>>> >>>>> This is not a fix for the problem of time going backwards. In >>>>> fact, if that happens a lot, load tracking accuracy is still >>>>> compromized, but at least the effect is a lot less bad than >>>>> before. >>>>> >>>>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> >>>>> --- >>>>> Cc: George Dunlap <george.dunlap@citrix.com> >>>>> Cc: Anshul Makkar <anshul.makkar@citrix.com> >>>>> Cc: David Vrabel <david.vrabel@citrix.com> >>>>> --- >>>>> xen/common/sched_credit2.c | 12 ++++++++++++ >>>>> 1 file changed, 12 insertions(+) >>>>> >>>>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c >>>>> index 50f8dfd..b73d034 100644 >>>>> --- a/xen/common/sched_credit2.c >>>>> +++ b/xen/common/sched_credit2.c >>>>> @@ -404,6 +404,12 @@ __update_runq_load(const struct scheduler *ops, >>>>> else >>>>> { >>>>> delta = now - rqd->load_last_update; >>>>> + if ( unlikely(delta < 0) ) >>>>> + { >>>>> + d2printk("%s: Time went backwards? now %"PRI_stime" llu >>> %"PRI_stime"\n", >>>>> + __func__, now, rqd->load_last_update); >>>>> + delta = 0; >>>>> + } >>>>> >>>>> rqd->avgload = >>>>> ( ( delta * ( (unsigned long long)rqd->load << >>> prv->load_window_shift ) ) >>>>> @@ -455,6 +461,12 @@ __update_svc_load(const struct scheduler *ops, >>>>> else >>>>> { >>>>> delta = now - svc->load_last_update; >>>>> + if ( unlikely(delta < 0) ) >>>>> + { >>>>> + d2printk("%s: Time went backwards? now %"PRI_stime" llu >>> %"PRI_stime"\n", >>>>> + __func__, now, svc->load_last_update); >>>>> + delta = 0; >>>>> + } >>>>> >>>>> svc->avgload = >>>>> ( ( delta * ( (unsigned long long)vcpu_load << >>> prv->load_window_shift ) ) >>>> >>>> Do the absolute times really matter here? I.e. wouldn't it be more >>>> useful to simply log the value of delta? >>>> >>>> Also, may I ask you to use the L modifier in favor of the ll one, for >>>> being one byte shorter (and hence, even if just very slightly, >>>> reducing both image size and cache pressure)? >>>> >>>> And finally, instead of logging function names, could the two >>>> messages be made distinguishable by other means resulting in less >>>> data issued to the log (and potentially needing transmission over >>>> a slow serial line)? >>> >>> The reason this is under a "d2printk" is because it's really only to >>> help developers in debugging. In-tree this warning isn't even on with >>> debug=y; you have to go to the top of the file and change the #define >>> to make it even exist. >>> >>> Given that, I don't think the quibbles over the code size or the >>> length of what's logged really matter. I think we should just take it >>> as it is. >>> >>> Reviewed-by: George Dunlap <george.dunlap@citrix.com> >> >> Oh, okay - I agree on those two parts then. But the question on the >> usefulness of absolute vs relative times remains. > > What is the usefulness of printing the relative time? If you have the > absolute time, you have some chance of catching mistakes like one of the > times being 0 or something like that; or of being able to correlate it > with another time printed somewhere else (for instance, a timestamp from > a trace record). Well, having had to deal with time going backwards elsewhere (both in the past and recently) I have always found it cumbersome to work out the delta from huge (far into the billions) absolute numbers, and therefore consider logging the delta more useful - apart from seeing at the first glance whether a particular delta is positive or negative, this also allows at almost the first glance to at least recognize the magnitude of the difference. But anyway ... > In any case, I think it's really a bike shed. Dario is the one who has > used this error message to find an actual bug recently, so I'll let him > decide what he thinks the most useful thing to print here is. ... fine with me; it was just a question after all. Jan
On Thu, 2016-07-07 at 03:18 -0600, Jan Beulich wrote: > > > > On 07.07.16 at 11:09, <george.dunlap@citrix.com> wrote: > > > Oh, okay - I agree on those two parts then. But the question on > > > the > > > usefulness of absolute vs relative times remains. > > What is the usefulness of printing the relative time? If you have > > the > > absolute time, you have some chance of catching mistakes like one > > of the > > times being 0 or something like that; or of being able to correlate > > it > > with another time printed somewhere else (for instance, a timestamp > > from > > a trace record). > Well, having had to deal with time going backwards elsewhere > (both in the past and recently) I have always found it cumbersome > to work out the delta from huge (far into the billions) absolute > numbers, and therefore consider logging the delta more useful - > And, in general, I agree with you. In this case, however... > apart from seeing at the first glance whether a particular delta is > positive or negative, this also allows at almost the first glance to > at least recognize the magnitude of the difference. But anyway ... > true, for the magnitude part, but in this case we are only logging anything if delta is negative. So the fact itself that there is something being logged, tells about the sign of the delta. And... > > In any case, I think it's really a bike shed. Dario is the one who > > has > > used this error message to find an actual bug recently, so I'll let > > him > > decide what he thinks the most useful thing to print here is. > ... fine with me; it was just a question after all. > ...I caught two bugs thanks to this. The (more recent) one about lack of monotonicity, having two absolute values or a delta would have been the same. For the other one (the fact that we were using stale time samples because we were calling NOW() and then [potentially] waiting for acquiring a lock) seeing that there were repeating occurrences of the same absolute value of now, did indeed helped both identifying and debugging the issue, while having just a delta wouldn't have been as effective. And I've also done what George refers to, i.e., look for the exact value printed in the trace record, to have an idea of what was happening, and again having absolutes revealed useful for this. So, yes, I think it's actually fine to leave the message as it is. Thanks and Regards, Dario
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 50f8dfd..b73d034 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -404,6 +404,12 @@ __update_runq_load(const struct scheduler *ops, else { delta = now - rqd->load_last_update; + if ( unlikely(delta < 0) ) + { + d2printk("%s: Time went backwards? now %"PRI_stime" llu %"PRI_stime"\n", + __func__, now, rqd->load_last_update); + delta = 0; + } rqd->avgload = ( ( delta * ( (unsigned long long)rqd->load << prv->load_window_shift ) ) @@ -455,6 +461,12 @@ __update_svc_load(const struct scheduler *ops, else { delta = now - svc->load_last_update; + if ( unlikely(delta < 0) ) + { + d2printk("%s: Time went backwards? now %"PRI_stime" llu %"PRI_stime"\n", + __func__, now, svc->load_last_update); + delta = 0; + } svc->avgload = ( ( delta * ( (unsigned long long)vcpu_load << prv->load_window_shift ) )
This really should not happen, but: 1. it does happen! Investigation is ongoing here: http://lists.xen.org/archives/html/xen-devel/2016-06/msg00922.html 2. even when 1 will be fixed it makes sense and is easy enough to have a 'safety catch' for it. The reason why this is particularly bad for Credit2 is that negative values of delta mean out of scale high load (because of the conversion to unsigned). This, for instance in the case of runqueue load, results in a runqueue having its load updated to values of the order of 10000% or so, which in turns means that the load balancer will migrate everything off from the pCPUs in the runqueue, and leave them idle until the load gets back to something sane... which may indeed take a while! This is not a fix for the problem of time going backwards. In fact, if that happens a lot, load tracking accuracy is still compromized, but at least the effect is a lot less bad than before. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Cc: George Dunlap <george.dunlap@citrix.com> Cc: Anshul Makkar <anshul.makkar@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> --- xen/common/sched_credit2.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)