Message ID | 1463573758-11441-1-git-send-email-jgross@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/18/2016 08:15 AM, Juergen Gross wrote: > } > > +void __init xen_time_setup_guest(void) > +{ > + pv_time_ops.steal_clock = xen_steal_clock; > + > + static_key_slow_inc(¶virt_steal_enabled); > + /* > + * We can't set paravirt_steal_rq_enabled as this would require the > + * capability to read another cpu's runstate info. > + */ > +} Won't we be accounting for stolen cycles twice now --- once from steal_account_process_tick()->steal_clock() and second time from do_stolen_accounting()? -boris
On 18/05/16 16:46, Boris Ostrovsky wrote: > On 05/18/2016 08:15 AM, Juergen Gross wrote: >> } >> >> +void __init xen_time_setup_guest(void) >> +{ >> + pv_time_ops.steal_clock = xen_steal_clock; >> + >> + static_key_slow_inc(¶virt_steal_enabled); >> + /* >> + * We can't set paravirt_steal_rq_enabled as this would require the >> + * capability to read another cpu's runstate info. >> + */ >> +} > > Won't we be accounting for stolen cycles twice now --- once from > steal_account_process_tick()->steal_clock() and second time from > do_stolen_accounting()? Uuh, yes. I guess I should rip do_stolen_accounting() out, too? It is a Xen-specific hack, so I guess nobody will cry. Maybe it would be a good idea to select CONFIG_PARAVIRT_TIME_ACCOUNTING for XEN then? Juergen
On 05/18/2016 10:53 AM, Juergen Gross wrote: > On 18/05/16 16:46, Boris Ostrovsky wrote: >> On 05/18/2016 08:15 AM, Juergen Gross wrote: >>> } >>> >>> +void __init xen_time_setup_guest(void) >>> +{ >>> + pv_time_ops.steal_clock = xen_steal_clock; >>> + >>> + static_key_slow_inc(¶virt_steal_enabled); >>> + /* >>> + * We can't set paravirt_steal_rq_enabled as this would require the >>> + * capability to read another cpu's runstate info. >>> + */ >>> +} >> Won't we be accounting for stolen cycles twice now --- once from >> steal_account_process_tick()->steal_clock() and second time from >> do_stolen_accounting()? > Uuh, yes. > > I guess I should rip do_stolen_accounting() out, too? I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If that's indeed the case then we should ifndef do_stolen_accounting(). Or maybe check for paravirt_steal_enabled. -boris > It is a > Xen-specific hack, so I guess nobody will cry. Maybe it would be a > good idea to select CONFIG_PARAVIRT_TIME_ACCOUNTING for XEN then? > > > Juergen >
On 18/05/16 17:25, Boris Ostrovsky wrote: > On 05/18/2016 10:53 AM, Juergen Gross wrote: >> On 18/05/16 16:46, Boris Ostrovsky wrote: >>> On 05/18/2016 08:15 AM, Juergen Gross wrote: >>>> } >>>> >>>> +void __init xen_time_setup_guest(void) >>>> +{ >>>> + pv_time_ops.steal_clock = xen_steal_clock; >>>> + >>>> + static_key_slow_inc(¶virt_steal_enabled); >>>> + /* >>>> + * We can't set paravirt_steal_rq_enabled as this would require the >>>> + * capability to read another cpu's runstate info. >>>> + */ >>>> +} >>> Won't we be accounting for stolen cycles twice now --- once from >>> steal_account_process_tick()->steal_clock() and second time from >>> do_stolen_accounting()? >> Uuh, yes. >> >> I guess I should rip do_stolen_accounting() out, too? > > I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If This is easy to accomplish. :-) > that's indeed the case then we should ifndef do_stolen_accounting(). Or > maybe check for paravirt_steal_enabled. Is this really a sensible thing to do? There is a mechanism used by KVM to do the stolen accounting. I think we should use it instead of having a second implementation doing the same thing in case the generic one isn't enabled. Juergen
On 18/05/16 16:42, Juergen Gross wrote: > On 18/05/16 17:25, Boris Ostrovsky wrote: >> On 05/18/2016 10:53 AM, Juergen Gross wrote: >>> On 18/05/16 16:46, Boris Ostrovsky wrote: >>>> On 05/18/2016 08:15 AM, Juergen Gross wrote: >>>>> } >>>>> >>>>> +void __init xen_time_setup_guest(void) >>>>> +{ >>>>> + pv_time_ops.steal_clock = xen_steal_clock; >>>>> + >>>>> + static_key_slow_inc(¶virt_steal_enabled); >>>>> + /* >>>>> + * We can't set paravirt_steal_rq_enabled as this would require the >>>>> + * capability to read another cpu's runstate info. >>>>> + */ >>>>> +} >>>> Won't we be accounting for stolen cycles twice now --- once from >>>> steal_account_process_tick()->steal_clock() and second time from >>>> do_stolen_accounting()? >>> Uuh, yes. >>> >>> I guess I should rip do_stolen_accounting() out, too? >> >> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If > > This is easy to accomplish. :-) > >> that's indeed the case then we should ifndef do_stolen_accounting(). Or >> maybe check for paravirt_steal_enabled. > > Is this really a sensible thing to do? There is a mechanism used by KVM > to do the stolen accounting. I think we should use it instead of having > a second implementation doing the same thing in case the generic one > isn't enabled. I agree. Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I don't think it's essential (or is it?). David
On 18/05/16 17:45, David Vrabel wrote: > On 18/05/16 16:42, Juergen Gross wrote: >> On 18/05/16 17:25, Boris Ostrovsky wrote: >>> On 05/18/2016 10:53 AM, Juergen Gross wrote: >>>> On 18/05/16 16:46, Boris Ostrovsky wrote: >>>>> On 05/18/2016 08:15 AM, Juergen Gross wrote: >>>>>> } >>>>>> >>>>>> +void __init xen_time_setup_guest(void) >>>>>> +{ >>>>>> + pv_time_ops.steal_clock = xen_steal_clock; >>>>>> + >>>>>> + static_key_slow_inc(¶virt_steal_enabled); >>>>>> + /* >>>>>> + * We can't set paravirt_steal_rq_enabled as this would require the >>>>>> + * capability to read another cpu's runstate info. >>>>>> + */ >>>>>> +} >>>>> Won't we be accounting for stolen cycles twice now --- once from >>>>> steal_account_process_tick()->steal_clock() and second time from >>>>> do_stolen_accounting()? >>>> Uuh, yes. >>>> >>>> I guess I should rip do_stolen_accounting() out, too? >>> >>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If >> >> This is easy to accomplish. :-) >> >>> that's indeed the case then we should ifndef do_stolen_accounting(). Or >>> maybe check for paravirt_steal_enabled. >> >> Is this really a sensible thing to do? There is a mechanism used by KVM >> to do the stolen accounting. I think we should use it instead of having >> a second implementation doing the same thing in case the generic one >> isn't enabled. > > I agree. > > Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I > don't think it's essential (or is it?). Not doing so will change behavior in case I rip out do_stolen_accounting(). What about "default y if XEN" ? Juergen
On 18/05/16 16:51, Juergen Gross wrote: > On 18/05/16 17:45, David Vrabel wrote: >> On 18/05/16 16:42, Juergen Gross wrote: >>> On 18/05/16 17:25, Boris Ostrovsky wrote: >>>> On 05/18/2016 10:53 AM, Juergen Gross wrote: >>>>> On 18/05/16 16:46, Boris Ostrovsky wrote: >>>>>> On 05/18/2016 08:15 AM, Juergen Gross wrote: >>>>>>> } >>>>>>> >>>>>>> +void __init xen_time_setup_guest(void) >>>>>>> +{ >>>>>>> + pv_time_ops.steal_clock = xen_steal_clock; >>>>>>> + >>>>>>> + static_key_slow_inc(¶virt_steal_enabled); >>>>>>> + /* >>>>>>> + * We can't set paravirt_steal_rq_enabled as this would require the >>>>>>> + * capability to read another cpu's runstate info. >>>>>>> + */ >>>>>>> +} >>>>>> Won't we be accounting for stolen cycles twice now --- once from >>>>>> steal_account_process_tick()->steal_clock() and second time from >>>>>> do_stolen_accounting()? >>>>> Uuh, yes. >>>>> >>>>> I guess I should rip do_stolen_accounting() out, too? >>>> >>>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If >>> >>> This is easy to accomplish. :-) >>> >>>> that's indeed the case then we should ifndef do_stolen_accounting(). Or >>>> maybe check for paravirt_steal_enabled. >>> >>> Is this really a sensible thing to do? There is a mechanism used by KVM >>> to do the stolen accounting. I think we should use it instead of having >>> a second implementation doing the same thing in case the generic one >>> isn't enabled. >> >> I agree. >> >> Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I >> don't think it's essential (or is it?). > > Not doing so will change behavior in case I rip out > do_stolen_accounting(). What about "default y if XEN" ? Ok. David
On 05/18/2016 11:45 AM, David Vrabel wrote: > On 18/05/16 16:42, Juergen Gross wrote: >> On 18/05/16 17:25, Boris Ostrovsky wrote: >>> On 05/18/2016 10:53 AM, Juergen Gross wrote: >>>> On 18/05/16 16:46, Boris Ostrovsky wrote: >>>>> On 05/18/2016 08:15 AM, Juergen Gross wrote: >>>>>> } >>>>>> >>>>>> +void __init xen_time_setup_guest(void) >>>>>> +{ >>>>>> + pv_time_ops.steal_clock = xen_steal_clock; >>>>>> + >>>>>> + static_key_slow_inc(¶virt_steal_enabled); >>>>>> + /* >>>>>> + * We can't set paravirt_steal_rq_enabled as this would require the >>>>>> + * capability to read another cpu's runstate info. >>>>>> + */ >>>>>> +} >>>>> Won't we be accounting for stolen cycles twice now --- once from >>>>> steal_account_process_tick()->steal_clock() and second time from >>>>> do_stolen_accounting()? >>>> Uuh, yes. >>>> >>>> I guess I should rip do_stolen_accounting() out, too? >>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If >> This is easy to accomplish. :-) I looked at KVM code (PARAVIRT_TIME_ACCOUNTING is not selected there neither) and in their case that's presumably because stealing accounting is a CPUID bit, i.e. it might not be supported. In Xen case we always have this interface. >> >>> that's indeed the case then we should ifndef do_stolen_accounting(). Or >>> maybe check for paravirt_steal_enabled. >> Is this really a sensible thing to do? There is a mechanism used by KVM >> to do the stolen accounting. I think we should use it instead of having >> a second implementation doing the same thing in case the generic one >> isn't enabled. > I agree. > > Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I > don't think it's essential (or is it?). Looks like it's useful only if paravirt_steal_rq_enabled, which we don't support yet. -boris
On 18/05/16 17:53, Boris Ostrovsky wrote: > On 05/18/2016 11:45 AM, David Vrabel wrote: >> On 18/05/16 16:42, Juergen Gross wrote: >>> On 18/05/16 17:25, Boris Ostrovsky wrote: >>>> On 05/18/2016 10:53 AM, Juergen Gross wrote: >>>>> On 18/05/16 16:46, Boris Ostrovsky wrote: >>>>>> On 05/18/2016 08:15 AM, Juergen Gross wrote: >>>>>>> } >>>>>>> >>>>>>> +void __init xen_time_setup_guest(void) >>>>>>> +{ >>>>>>> + pv_time_ops.steal_clock = xen_steal_clock; >>>>>>> + >>>>>>> + static_key_slow_inc(¶virt_steal_enabled); >>>>>>> + /* >>>>>>> + * We can't set paravirt_steal_rq_enabled as this would require the >>>>>>> + * capability to read another cpu's runstate info. >>>>>>> + */ >>>>>>> +} >>>>>> Won't we be accounting for stolen cycles twice now --- once from >>>>>> steal_account_process_tick()->steal_clock() and second time from >>>>>> do_stolen_accounting()? >>>>> Uuh, yes. >>>>> >>>>> I guess I should rip do_stolen_accounting() out, too? >>>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If >>> This is easy to accomplish. :-) > > > I looked at KVM code (PARAVIRT_TIME_ACCOUNTING is not selected there > neither) and in their case that's presumably because stealing accounting > is a CPUID bit, i.e. it might not be supported. In Xen case we always > have this interface. So they added it later and the default is to keep the old behavior. >>>> that's indeed the case then we should ifndef do_stolen_accounting(). Or >>>> maybe check for paravirt_steal_enabled. >>> Is this really a sensible thing to do? There is a mechanism used by KVM >>> to do the stolen accounting. I think we should use it instead of having >>> a second implementation doing the same thing in case the generic one >>> isn't enabled. >> I agree. >> >> Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I >> don't think it's essential (or is it?). > > Looks like it's useful only if paravirt_steal_rq_enabled, which we don't > support yet. I think the patch is still useful. It is reducing code size and it is removing arch-specific Xen-hack(s). With the patch Xen's solution for arm and x86 is common and the same as for KVM. Adding paravirt_steal_rq_enabled later will be much easier as only one function needs substantial modification. Juergen
On Wed, 2016-05-18 at 16:53 +0200, Juergen Gross wrote: > On 18/05/16 16:46, Boris Ostrovsky wrote: > > > > Won't we be accounting for stolen cycles twice now --- once from > > steal_account_process_tick()->steal_clock() and second time from > > do_stolen_accounting()? > Uuh, yes. > > I guess I should rip do_stolen_accounting() out, too? It is a > Xen-specific hack, so I guess nobody will cry. Maybe it would be a > good idea to select CONFIG_PARAVIRT_TIME_ACCOUNTING for XEN then? > So, config options aside, if I understand this correctly, it looks like we were actually already doing steal time accounting, although in a non-standard way. And yet, people seem to have issues relating to lack of (proper?) steal time accounting (Cc-ing Tony). I guess this means that, either: - the issue being reported is actually not caused by the lack of steal time accounting, - our current (Xen specific) steal time accounting solution is flawed, - the issue is caused by the lack of the bit of steal time accounting that we do not support yet, - other ideas? Tony? Dario
On 05/18/2016 12:00 PM, Juergen Gross wrote: > On 18/05/16 17:53, Boris Ostrovsky wrote: >> On 05/18/2016 11:45 AM, David Vrabel wrote: >>> On 18/05/16 16:42, Juergen Gross wrote: >>>> On 18/05/16 17:25, Boris Ostrovsky wrote: >>>>> On 05/18/2016 10:53 AM, Juergen Gross wrote: >>>>>> On 18/05/16 16:46, Boris Ostrovsky wrote: >>>>>>> On 05/18/2016 08:15 AM, Juergen Gross wrote: >>>>>>>> } >>>>>>>> >>>>>>>> +void __init xen_time_setup_guest(void) >>>>>>>> +{ >>>>>>>> + pv_time_ops.steal_clock = xen_steal_clock; >>>>>>>> + >>>>>>>> + static_key_slow_inc(¶virt_steal_enabled); >>>>>>>> + /* >>>>>>>> + * We can't set paravirt_steal_rq_enabled as this would require the >>>>>>>> + * capability to read another cpu's runstate info. >>>>>>>> + */ >>>>>>>> +} >>>>>>> Won't we be accounting for stolen cycles twice now --- once from >>>>>>> steal_account_process_tick()->steal_clock() and second time from >>>>>>> do_stolen_accounting()? >>>>>> Uuh, yes. >>>>>> >>>>>> I guess I should rip do_stolen_accounting() out, too? >>>>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If >>>> This is easy to accomplish. :-) >> >> I looked at KVM code (PARAVIRT_TIME_ACCOUNTING is not selected there >> neither) and in their case that's presumably because stealing accounting >> is a CPUID bit, i.e. it might not be supported. In Xen case we always >> have this interface. > So they added it later and the default is to keep the old behavior. > >>>>> that's indeed the case then we should ifndef do_stolen_accounting(). Or >>>>> maybe check for paravirt_steal_enabled. >>>> Is this really a sensible thing to do? There is a mechanism used by KVM >>>> to do the stolen accounting. I think we should use it instead of having >>>> a second implementation doing the same thing in case the generic one >>>> isn't enabled. >>> I agree. >>> >>> Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I >>> don't think it's essential (or is it?). >> Looks like it's useful only if paravirt_steal_rq_enabled, which we don't >> support yet. > I think the patch is still useful. It is reducing code size and > it is removing arch-specific Xen-hack(s). With the patch Xen's > solution for arm and x86 is common and the same as for KVM. Adding > paravirt_steal_rq_enabled later will be much easier as only one > function needs substantial modification. I am not arguing against having a patch that will remove do_stolen_accounting(). I was responding to David's statement about whether we need to select CONFIG_PARAVIRT_TIME_ACCOUNTING, and I am not sure this is necessary since steal_account_process_tick() (that will take case of things that do_stolen_accounting() currently does) doesn't need it. (And if it is indeed needed --- can we have Xen's Kconfig select it instead of "default y if XEN" ?) -boris
On 05/18/2016 12:10 PM, Dario Faggioli wrote: > On Wed, 2016-05-18 at 16:53 +0200, Juergen Gross wrote: >> On 18/05/16 16:46, Boris Ostrovsky wrote: >>> >>> Won't we be accounting for stolen cycles twice now --- once from >>> steal_account_process_tick()->steal_clock() and second time from >>> do_stolen_accounting()? >> Uuh, yes. >> >> I guess I should rip do_stolen_accounting() out, too? It is a >> Xen-specific hack, so I guess nobody will cry. Maybe it would be a >> good idea to select CONFIG_PARAVIRT_TIME_ACCOUNTING for XEN then? >> > So, config options aside, if I understand this correctly, it looks like > we were actually already doing steal time accounting, although in a > non-standard way. > > And yet, people seem to have issues relating to lack of (proper?) steal > time accounting (Cc-ing Tony). > > I guess this means that, either: > - the issue being reported is actually not caused by the lack of > steal time accounting, > - our current (Xen specific) steal time accounting solution is flawed, > - the issue is caused by the lack of the bit of steal time accounting > that we do not support yet, I believe it's this one. Tony narrowed the problem down to update_curr() where vruntime is calculated, based on runqueue's clock_task value. That value is computed in update_rq_clock_task(), which needs paravirt_steal_rq_enabled. -boris > - other ideas? Tony? > > Dario
On Wed, May 18, 2016 at 11:20 AM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > On 05/18/2016 12:10 PM, Dario Faggioli wrote: >> On Wed, 2016-05-18 at 16:53 +0200, Juergen Gross wrote: >>> On 18/05/16 16:46, Boris Ostrovsky wrote: >>>> >>>> Won't we be accounting for stolen cycles twice now --- once from >>>> steal_account_process_tick()->steal_clock() and second time from >>>> do_stolen_accounting()? >>> Uuh, yes. >>> >>> I guess I should rip do_stolen_accounting() out, too? It is a >>> Xen-specific hack, so I guess nobody will cry. Maybe it would be a >>> good idea to select CONFIG_PARAVIRT_TIME_ACCOUNTING for XEN then? >>> >> So, config options aside, if I understand this correctly, it looks like >> we were actually already doing steal time accounting, although in a >> non-standard way. >> >> And yet, people seem to have issues relating to lack of (proper?) steal >> time accounting (Cc-ing Tony). >> >> I guess this means that, either: >> - the issue being reported is actually not caused by the lack of >> steal time accounting, >> - our current (Xen specific) steal time accounting solution is flawed, >> - the issue is caused by the lack of the bit of steal time accounting >> that we do not support yet, > > I believe it's this one. > > Tony narrowed the problem down to update_curr() where vruntime is > calculated, based on runqueue's clock_task value. That value is computed > in update_rq_clock_task(), which needs paravirt_steal_rq_enabled. > Hi Boris, You are right. The real problem is steal_clock in pv_time_ops is implemented in KVM but not in Xen. arch/x86/include/asm/paravirt_types.h struct pv_time_ops { unsigned long long (*sched_clock)(void); unsigned long long (*steal_clock)(int cpu); unsigned long (*get_tsc_khz)(void); }; (1) KVM implemented both sched_clock and steal_clock. arch/x86/kernel/kvmclock.c pv_time_ops.sched_clock = kvm_clock_read; arch/x86/kernel/kvm.c pv_time_ops.steal_clock = kvm_steal_clock; (2) However, Xen just implemented sched_clock while the steal_clock is still native_steal_clock(). The function native_steal_clock() just simply return 0. arch/x86/xen/time.c .sched_clock = xen_clocksource_read; arch/x86/kernel/paravirt.c static u64 native_steal_clock(int cpu) { return 0; } Therefore, even though update_rq_clock_task() calculates the value and paravirt_steal_rq_enabled option is enabled, the steal value just returns 0. This will cause the problem which I mentioned. update_rq_clock_task --> paravirt_steal_clock --> pv_time_ops.steal_clock --> native_steal_clock (if in Xen) --> 0 The fundamental solution is to implement a steal_clock in Xen(learn from KVM implementation) instead of using the native one. Tony > -boris > >> - other ideas? Tony? >> >> Dario > >
On Wed, May 18, 2016 at 11:59 AM, Tony S <suokunstar@gmail.com> wrote: > On Wed, May 18, 2016 at 11:20 AM, Boris Ostrovsky > <boris.ostrovsky@oracle.com> wrote: >> On 05/18/2016 12:10 PM, Dario Faggioli wrote: >>> On Wed, 2016-05-18 at 16:53 +0200, Juergen Gross wrote: >>>> On 18/05/16 16:46, Boris Ostrovsky wrote: >>>>> >>>>> Won't we be accounting for stolen cycles twice now --- once from >>>>> steal_account_process_tick()->steal_clock() and second time from >>>>> do_stolen_accounting()? >>>> Uuh, yes. >>>> >>>> I guess I should rip do_stolen_accounting() out, too? It is a >>>> Xen-specific hack, so I guess nobody will cry. Maybe it would be a >>>> good idea to select CONFIG_PARAVIRT_TIME_ACCOUNTING for XEN then? >>>> >>> So, config options aside, if I understand this correctly, it looks like >>> we were actually already doing steal time accounting, although in a >>> non-standard way. >>> >>> And yet, people seem to have issues relating to lack of (proper?) steal >>> time accounting (Cc-ing Tony). >>> >>> I guess this means that, either: >>> - the issue being reported is actually not caused by the lack of >>> steal time accounting, >>> - our current (Xen specific) steal time accounting solution is flawed, >>> - the issue is caused by the lack of the bit of steal time accounting >>> that we do not support yet, >> >> I believe it's this one. >> >> Tony narrowed the problem down to update_curr() where vruntime is >> calculated, based on runqueue's clock_task value. That value is computed >> in update_rq_clock_task(), which needs paravirt_steal_rq_enabled. >> > > Hi Boris, > > You are right. > > The real problem is steal_clock in pv_time_ops is implemented in KVM > but not in Xen. > > arch/x86/include/asm/paravirt_types.h > struct pv_time_ops { > unsigned long long (*sched_clock)(void); > unsigned long long (*steal_clock)(int cpu); > unsigned long (*get_tsc_khz)(void); > }; > > > (1) KVM implemented both sched_clock and steal_clock. > > arch/x86/kernel/kvmclock.c > pv_time_ops.sched_clock = kvm_clock_read; > > arch/x86/kernel/kvm.c > pv_time_ops.steal_clock = kvm_steal_clock; > > > (2) However, Xen just implemented sched_clock while the steal_clock is > still native_steal_clock(). The function native_steal_clock() just > simply return 0. > > arch/x86/xen/time.c > .sched_clock = xen_clocksource_read; > > arch/x86/kernel/paravirt.c > static u64 native_steal_clock(int cpu) > { > return 0; > } > > > Therefore, even though update_rq_clock_task() calculates the value and > paravirt_steal_rq_enabled option is enabled, the steal value just > returns 0. This will cause the problem which I mentioned. > > update_rq_clock_task > --> paravirt_steal_clock > --> pv_time_ops.steal_clock > --> native_steal_clock (if in Xen) > --> 0 > > The fundamental solution is to implement a steal_clock in Xen(learn > from KVM implementation) instead of using the native one. > > Tony > Also, I tried the latest long term version of Linux 4.4, this issue still exists there. Hoping the next version can add this patch. Tony >> -boris >> >>> - other ideas? Tony? >>> >>> Dario >> >>
On Wed, 2016-05-18 at 12:03 -0600, Tony S wrote: > On Wed, May 18, 2016 at 11:59 AM, Tony S <suokunstar@gmail.com> > wrote: > > On Wed, May 18, 2016 at 11:20 AM, Boris Ostrovsky > > <boris.ostrovsky@oracle.com> wrote: > > > Tony narrowed the problem down to update_curr() where vruntime is > > > calculated, based on runqueue's clock_task value. That value is > > > computed > > > in update_rq_clock_task(), which needs paravirt_steal_rq_enabled. > > > > > Hi Boris, > > > > You are right. > > > > The real problem is steal_clock in pv_time_ops is implemented in > > KVM > > but not in Xen. > > > > [...] > > > > Therefore, even though update_rq_clock_task() calculates the value > > and > > paravirt_steal_rq_enabled option is enabled, the steal value just > > returns 0. This will cause the problem which I mentioned. > > > > update_rq_clock_task > > --> paravirt_steal_clock > > --> pv_time_ops.steal_clock > > --> native_steal_clock (if in Xen) > > --> 0 > > Ok, thanks again for confirming this. > Also, I tried the latest long term version of Linux 4.4, this issue > still exists there. Hoping the next version can add this patch. > Yes, as Juergen said here: http://lists.xen.org/archives/html/xen-devel/2016-05/msg01712.html he has in his plans to implement the full mechanism. It will just take a bit longer, due to the amount of work necessary for this second part. Regards, Dario
On 18/05/16 18:13, Boris Ostrovsky wrote: > On 05/18/2016 12:00 PM, Juergen Gross wrote: >> On 18/05/16 17:53, Boris Ostrovsky wrote: >>> On 05/18/2016 11:45 AM, David Vrabel wrote: >>>> On 18/05/16 16:42, Juergen Gross wrote: >>>>> On 18/05/16 17:25, Boris Ostrovsky wrote: >>>>>> On 05/18/2016 10:53 AM, Juergen Gross wrote: >>>>>>> On 18/05/16 16:46, Boris Ostrovsky wrote: >>>>>>>> On 05/18/2016 08:15 AM, Juergen Gross wrote: >>>>>>>>> } >>>>>>>>> >>>>>>>>> +void __init xen_time_setup_guest(void) >>>>>>>>> +{ >>>>>>>>> + pv_time_ops.steal_clock = xen_steal_clock; >>>>>>>>> + >>>>>>>>> + static_key_slow_inc(¶virt_steal_enabled); >>>>>>>>> + /* >>>>>>>>> + * We can't set paravirt_steal_rq_enabled as this would require the >>>>>>>>> + * capability to read another cpu's runstate info. >>>>>>>>> + */ >>>>>>>>> +} >>>>>>>> Won't we be accounting for stolen cycles twice now --- once from >>>>>>>> steal_account_process_tick()->steal_clock() and second time from >>>>>>>> do_stolen_accounting()? >>>>>>> Uuh, yes. >>>>>>> >>>>>>> I guess I should rip do_stolen_accounting() out, too? >>>>>> I don't think PARAVIRT_TIME_ACCOUNTING is always selected for Xen. If >>>>> This is easy to accomplish. :-) >>> >>> I looked at KVM code (PARAVIRT_TIME_ACCOUNTING is not selected there >>> neither) and in their case that's presumably because stealing accounting >>> is a CPUID bit, i.e. it might not be supported. In Xen case we always >>> have this interface. >> So they added it later and the default is to keep the old behavior. >> >>>>>> that's indeed the case then we should ifndef do_stolen_accounting(). Or >>>>>> maybe check for paravirt_steal_enabled. >>>>> Is this really a sensible thing to do? There is a mechanism used by KVM >>>>> to do the stolen accounting. I think we should use it instead of having >>>>> a second implementation doing the same thing in case the generic one >>>>> isn't enabled. >>>> I agree. >>>> >>>> Although I don't think selecting PARAVIRT_TIME_ACC' is necessary -- I >>>> don't think it's essential (or is it?). >>> Looks like it's useful only if paravirt_steal_rq_enabled, which we don't >>> support yet. >> I think the patch is still useful. It is reducing code size and >> it is removing arch-specific Xen-hack(s). With the patch Xen's >> solution for arm and x86 is common and the same as for KVM. Adding >> paravirt_steal_rq_enabled later will be much easier as only one >> function needs substantial modification. > > I am not arguing against having a patch that will remove > do_stolen_accounting(). I was responding to David's statement about > whether we need to select CONFIG_PARAVIRT_TIME_ACCOUNTING, and I am not > sure this is necessary since steal_account_process_tick() (that will > take case of things that do_stolen_accounting() currently does) doesn't > need it. Aah, okay. That's a good reason to not add the Kconfig stuff. > (And if it is indeed needed --- can we have Xen's Kconfig select it > instead of "default y if XEN" ?) I've verified that CONFIG_PARAVIRT_TIME_ACCOUNTING is _not_ needed. I've removed it from .config and used my patch with do_stolen_accounting() removed. In an overcommitted guest (4 vcpus on 2 physical cpus) running a parallel make top showed near 50% stolen time. Juergen
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 75cd734..9163b94 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -84,19 +84,6 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma, } EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range); -static unsigned long long xen_stolen_accounting(int cpu) -{ - struct vcpu_runstate_info state; - - BUG_ON(cpu != smp_processor_id()); - - xen_get_runstate_snapshot(&state); - - WARN_ON(state.state != RUNSTATE_running); - - return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline]; -} - static void xen_read_wallclock(struct timespec64 *ts) { u32 version; @@ -355,8 +342,8 @@ static int __init xen_guest_init(void) register_cpu_notifier(&xen_cpu_notifier); - pv_time_ops.steal_clock = xen_stolen_accounting; - static_key_slow_inc(¶virt_steal_enabled); + xen_time_setup_guest(); + if (xen_initial_domain()) pvclock_gtod_register_notifier(&xen_pvclock_gtod_notifier); diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index a0a4e55..f478169 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -431,6 +431,8 @@ static void __init xen_time_init(void) xen_setup_timer(cpu); xen_setup_cpu_clockevents(); + xen_time_setup_guest(); + if (xen_initial_domain()) pvclock_gtod_register_notifier(&xen_pvclock_gtod_notifier); } diff --git a/drivers/xen/time.c b/drivers/xen/time.c index 7107842..6648a78 100644 --- a/drivers/xen/time.c +++ b/drivers/xen/time.c @@ -75,6 +75,15 @@ bool xen_vcpu_stolen(int vcpu) return per_cpu(xen_runstate, vcpu).state == RUNSTATE_runnable; } +static u64 xen_steal_clock(int cpu) +{ + struct vcpu_runstate_info state; + + BUG_ON(cpu != smp_processor_id()); + xen_get_runstate_snapshot(&state); + return state.time[RUNSTATE_runnable] + state.time[RUNSTATE_offline]; +} + void xen_setup_runstate_info(int cpu) { struct vcpu_register_runstate_memory_area area; @@ -86,3 +95,13 @@ void xen_setup_runstate_info(int cpu) BUG(); } +void __init xen_time_setup_guest(void) +{ + pv_time_ops.steal_clock = xen_steal_clock; + + static_key_slow_inc(¶virt_steal_enabled); + /* + * We can't set paravirt_steal_rq_enabled as this would require the + * capability to read another cpu's runstate info. + */ +} diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h index 86abe07..5ce51c2 100644 --- a/include/xen/xen-ops.h +++ b/include/xen/xen-ops.h @@ -21,6 +21,7 @@ void xen_resume_notifier_unregister(struct notifier_block *nb); bool xen_vcpu_stolen(int vcpu); void xen_setup_runstate_info(int cpu); +void __init xen_time_setup_guest(void); void xen_get_runstate_snapshot(struct vcpu_runstate_info *res); int xen_setup_shutdown_event(void);
With CONFIG_PARAVIRT_TIME_ACCOUNTING the kernel is capable to account for time a thread wasn't able to run due to hypervisor scheduling. Add support in Xen arch independent time handling for this feature by moving it out of the arm arch into drivers/xen. Signed-off-by: Juergen Gross <jgross@suse.com> --- arch/arm/xen/enlighten.c | 17 ++--------------- arch/x86/xen/time.c | 2 ++ drivers/xen/time.c | 19 +++++++++++++++++++ include/xen/xen-ops.h | 1 + 4 files changed, 24 insertions(+), 15 deletions(-)