Message ID | 20210706131828.22309-1-lukasz.luba@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce Active Stats framework with CPU performance statistics | expand |
On Tue, Jul 6, 2021 at 3:18 PM Lukasz Luba <lukasz.luba@arm.com> wrote: > > Hi all, > > This patch set introduces a new mechanism: Active Stats framework (ASF), which > gathers and maintains statistics of CPU performance - time residency at each > performance state. > > The ASF tracks all the frequency transitions as well as CPU > idle entry/exit events for all CPUs. Based on that it accounts the active > (non-idle) residency time for each CPU at each frequency. This information can > be used by some other subsystems (like thermal governor) to enhance their > estimations about CPU usage at a given period. This seems to mean that what is needed is something like the cpufreq stats but only collected during the time when CPUs are not in idle states. > Does it fix something in mainline? > Yes, there is thermal governor Intelligent Power Allocation (IPA), which > estimates the CPUs power used in the past. IPA is sampling the CPU utilization > and frequency and relies on the info available at the time of sampling > and this imposes the estimation errors. > The use of ASF solve the issue and enables IPA to make better estimates. Obviously the IPA is not used on all platforms where cpufreq and cpuidle are used. What platforms are going to benefit from this change? > Why it couldn't be done using existing frameworks? > The CPUFreq and CPUIdle statistics are not combined, so it is not possible > to derive the information on how long exactly the CPU was running with a given > frequency. But it doesn't mean that the statistics could not be combined. For instance, the frequency of the CPU cannot change via cpufreq when active_stats_cpu_idle_enter() is running, so instead of using an entirely new framework for collecting statistics it might update the existing cpufreq stats to register that event. And analogously for the wakeup. > This new framework combines that information and provides > it in a handy way. I'm not convinced about the last piece. > IMHO it has to be implemented as a new framework, next to > CPUFreq and CPUIdle, due to a clean design and not just hooks from thermal > governor into the frequency change and idle code paths. As far as the design is concerned, I'm not sure if I agree with it. From my perspective it's all a 1000-line patch that I have to read and understand to figure out what the design is. > Tha patch 4/6 introduces a new API for cooling devices, which allows to > stop tracking the freq and idle statistics. > > The patch set contains also a patches 5/6 6/6 which adds the new power model > based on ASF into the cpufreq cooling (used by thermal governor IPA). > It is added as ifdef option, since Active Stats might be not compiled in. > The ASF is a compile time option, but that might be changed and IPA could > select it, which would allow to remove some redundant code from > cpufreq_cooling.c. > > Comments and suggestions are very welcome. I'm totally not convinced that it is necessary to put the extra 1000 lines of code into the kernel to address the problem at hand.
On 7/6/21 4:28 PM, Rafael J. Wysocki wrote: > On Tue, Jul 6, 2021 at 3:18 PM Lukasz Luba <lukasz.luba@arm.com> wrote: >> >> Hi all, >> >> This patch set introduces a new mechanism: Active Stats framework (ASF), which >> gathers and maintains statistics of CPU performance - time residency at each >> performance state. >> >> The ASF tracks all the frequency transitions as well as CPU >> idle entry/exit events for all CPUs. Based on that it accounts the active >> (non-idle) residency time for each CPU at each frequency. This information can >> be used by some other subsystems (like thermal governor) to enhance their >> estimations about CPU usage at a given period. > > This seems to mean that what is needed is something like the cpufreq > stats but only collected during the time when CPUs are not in idle > states. Yes > >> Does it fix something in mainline? >> Yes, there is thermal governor Intelligent Power Allocation (IPA), which >> estimates the CPUs power used in the past. IPA is sampling the CPU utilization >> and frequency and relies on the info available at the time of sampling >> and this imposes the estimation errors. >> The use of ASF solve the issue and enables IPA to make better estimates. > > Obviously the IPA is not used on all platforms where cpufreq and > cpuidle are used. What platforms are going to benefit from this > change? Arm platforms which still use kernel thermal to control temperature, such as Chromebooks or mid-, low-end phones. > >> Why it couldn't be done using existing frameworks? >> The CPUFreq and CPUIdle statistics are not combined, so it is not possible >> to derive the information on how long exactly the CPU was running with a given >> frequency. > > But it doesn't mean that the statistics could not be combined. > > For instance, the frequency of the CPU cannot change via cpufreq when > active_stats_cpu_idle_enter() is running, so instead of using an > entirely new framework for collecting statistics it might update the > existing cpufreq stats to register that event. True, but keep in mind that the cpufreq most likely works for a few CPUs (policy::related_cpus), while cpuidle in a per-cpu fashion. I would say that cpuidle should check during enter/exit what is the currently set frequency for cluster and account its active period. > > And analogously for the wakeup. > >> This new framework combines that information and provides >> it in a handy way. > > I'm not convinced about the last piece. The handy structure is called Active Stats Monitor. It samples the stats gathered after processing idle. That private structure maintains statistics which are for a given period (current snapshot - previous snapshot). > >> IMHO it has to be implemented as a new framework, next to >> CPUFreq and CPUIdle, due to a clean design and not just hooks from thermal >> governor into the frequency change and idle code paths. > > As far as the design is concerned, I'm not sure if I agree with it. > > From my perspective it's all a 1000-line patch that I have to read and > understand to figure out what the design is. I can help you with understanding it with some design docs if you want. > >> Tha patch 4/6 introduces a new API for cooling devices, which allows to >> stop tracking the freq and idle statistics. >> >> The patch set contains also a patches 5/6 6/6 which adds the new power model >> based on ASF into the cpufreq cooling (used by thermal governor IPA). >> It is added as ifdef option, since Active Stats might be not compiled in. >> The ASF is a compile time option, but that might be changed and IPA could >> select it, which would allow to remove some redundant code from >> cpufreq_cooling.c. >> >> Comments and suggestions are very welcome. > > I'm totally not convinced that it is necessary to put the extra 1000 > lines of code into the kernel to address the problem at hand. > I understand your concerns. If you have another idea than this framework I'm happy to hear it. Maybe better stats in cpuidle, which would be are of the cpufreq?
On Tue, Jul 6, 2021 at 5:56 PM Lukasz Luba <lukasz.luba@arm.com> wrote: > > On 7/6/21 4:28 PM, Rafael J. Wysocki wrote: > > On Tue, Jul 6, 2021 at 3:18 PM Lukasz Luba <lukasz.luba@arm.com> wrote: > >> > >> Hi all, > >> > >> This patch set introduces a new mechanism: Active Stats framework (ASF), which > >> gathers and maintains statistics of CPU performance - time residency at each > >> performance state. > >> > >> The ASF tracks all the frequency transitions as well as CPU > >> idle entry/exit events for all CPUs. Based on that it accounts the active > >> (non-idle) residency time for each CPU at each frequency. This information can > >> be used by some other subsystems (like thermal governor) to enhance their > >> estimations about CPU usage at a given period. > > > > This seems to mean that what is needed is something like the cpufreq > > stats but only collected during the time when CPUs are not in idle > > states. > > Yes So this is a clear problem statement: the cpufreq statistics cover the time when CPUs are in idle states, so they are not suitable for certain purposes, like thermal control. The most straightforward approach to address it seems to be to modify the collection of cpufreq statistics so that they don't include the time spent by CPUs in idle states, or to make it possible to distinguish the time spent in idle states from the "active" time. > >> Does it fix something in mainline? > >> Yes, there is thermal governor Intelligent Power Allocation (IPA), which > >> estimates the CPUs power used in the past. IPA is sampling the CPU utilization > >> and frequency and relies on the info available at the time of sampling > >> and this imposes the estimation errors. > >> The use of ASF solve the issue and enables IPA to make better estimates. > > > > Obviously the IPA is not used on all platforms where cpufreq and > > cpuidle are used. What platforms are going to benefit from this > > change? > > Arm platforms which still use kernel thermal to control temperature, > such as Chromebooks or mid-, low-end phones. Which means that this feature is not going to be universally useful. However, if the time spent by CPUs in idle states were accounted for in the cpufreq statistics, that would be universally useful. > > > >> Why it couldn't be done using existing frameworks? > >> The CPUFreq and CPUIdle statistics are not combined, so it is not possible > >> to derive the information on how long exactly the CPU was running with a given > >> frequency. > > > > But it doesn't mean that the statistics could not be combined. > > > > For instance, the frequency of the CPU cannot change via cpufreq when > > active_stats_cpu_idle_enter() is running, so instead of using an > > entirely new framework for collecting statistics it might update the > > existing cpufreq stats to register that event. > > True, but keep in mind that the cpufreq most likely works for a few > CPUs (policy::related_cpus), while cpuidle in a per-cpu fashion. > I would say that cpuidle should check during enter/exit what is > the currently set frequency for cluster and account its active > period. Yes, that's not particularly difficult to achieve in principle: on idle entry and exit, update the cpufreq statistics of the policy including the current CPU. > > > > And analogously for the wakeup. > > > >> This new framework combines that information and provides > >> it in a handy way. > > > > I'm not convinced about the last piece. > > The handy structure is called Active Stats Monitor. It samples > the stats gathered after processing idle. That private > structure maintains statistics which are for a given period > (current snapshot - previous snapshot). So collecting the statistics should be fast and simple and processing them need not be. Ideally, they should be processed only when somebody asks the data. I'm not sure if that is the case in the current patchset. > > > >> IMHO it has to be implemented as a new framework, next to > >> CPUFreq and CPUIdle, due to a clean design and not just hooks from thermal > >> governor into the frequency change and idle code paths. > > > > As far as the design is concerned, I'm not sure if I agree with it. > > > > From my perspective it's all a 1000-line patch that I have to read and > > understand to figure out what the design is. > > I can help you with understanding it with some design docs if you want. That may help, but let's avoid doing extra work just yet. > > > >> Tha patch 4/6 introduces a new API for cooling devices, which allows to > >> stop tracking the freq and idle statistics. > >> > >> The patch set contains also a patches 5/6 6/6 which adds the new power model > >> based on ASF into the cpufreq cooling (used by thermal governor IPA). > >> It is added as ifdef option, since Active Stats might be not compiled in. > >> The ASF is a compile time option, but that might be changed and IPA could > >> select it, which would allow to remove some redundant code from > >> cpufreq_cooling.c. > >> > >> Comments and suggestions are very welcome. > > > > I'm totally not convinced that it is necessary to put the extra 1000 > > lines of code into the kernel to address the problem at hand. > > > > I understand your concerns. If you have another idea than this framework > I'm happy to hear it. Maybe better stats in cpuidle, which would be > are of the cpufreq? One idea that I have is outlined above and I'm not seeing a reason to put cpufreq statistics into cpuidle.
On 7/6/21 5:34 PM, Rafael J. Wysocki wrote: > On Tue, Jul 6, 2021 at 5:56 PM Lukasz Luba <lukasz.luba@arm.com> wrote: >> >> On 7/6/21 4:28 PM, Rafael J. Wysocki wrote: >>> On Tue, Jul 6, 2021 at 3:18 PM Lukasz Luba <lukasz.luba@arm.com> wrote: >>>> >>>> Hi all, >>>> >>>> This patch set introduces a new mechanism: Active Stats framework (ASF), which >>>> gathers and maintains statistics of CPU performance - time residency at each >>>> performance state. >>>> >>>> The ASF tracks all the frequency transitions as well as CPU >>>> idle entry/exit events for all CPUs. Based on that it accounts the active >>>> (non-idle) residency time for each CPU at each frequency. This information can >>>> be used by some other subsystems (like thermal governor) to enhance their >>>> estimations about CPU usage at a given period. >>> >>> This seems to mean that what is needed is something like the cpufreq >>> stats but only collected during the time when CPUs are not in idle >>> states. >> >> Yes > > So this is a clear problem statement: the cpufreq statistics cover the > time when CPUs are in idle states, so they are not suitable for > certain purposes, like thermal control. Agree, it's better described problem statement. > > The most straightforward approach to address it seems to be to modify > the collection of cpufreq statistics so that they don't include the > time spent by CPUs in idle states, or to make it possible to > distinguish the time spent in idle states from the "active" time. > >>>> Does it fix something in mainline? >>>> Yes, there is thermal governor Intelligent Power Allocation (IPA), which >>>> estimates the CPUs power used in the past. IPA is sampling the CPU utilization >>>> and frequency and relies on the info available at the time of sampling >>>> and this imposes the estimation errors. >>>> The use of ASF solve the issue and enables IPA to make better estimates. >>> >>> Obviously the IPA is not used on all platforms where cpufreq and >>> cpuidle are used. What platforms are going to benefit from this >>> change? >> >> Arm platforms which still use kernel thermal to control temperature, >> such as Chromebooks or mid-, low-end phones. > > Which means that this feature is not going to be universally useful. > > However, if the time spent by CPUs in idle states were accounted for > in the cpufreq statistics, that would be universally useful. True > >>> >>>> Why it couldn't be done using existing frameworks? >>>> The CPUFreq and CPUIdle statistics are not combined, so it is not possible >>>> to derive the information on how long exactly the CPU was running with a given >>>> frequency. >>> >>> But it doesn't mean that the statistics could not be combined. >>> >>> For instance, the frequency of the CPU cannot change via cpufreq when >>> active_stats_cpu_idle_enter() is running, so instead of using an >>> entirely new framework for collecting statistics it might update the >>> existing cpufreq stats to register that event. >> >> True, but keep in mind that the cpufreq most likely works for a few >> CPUs (policy::related_cpus), while cpuidle in a per-cpu fashion. >> I would say that cpuidle should check during enter/exit what is >> the currently set frequency for cluster and account its active >> period. > > Yes, that's not particularly difficult to achieve in principle: on > idle entry and exit, update the cpufreq statistics of the policy > including the current CPU. Sounds good. > >>> >>> And analogously for the wakeup. >>> >>>> This new framework combines that information and provides >>>> it in a handy way. >>> >>> I'm not convinced about the last piece. >> >> The handy structure is called Active Stats Monitor. It samples >> the stats gathered after processing idle. That private >> structure maintains statistics which are for a given period >> (current snapshot - previous snapshot). > > So collecting the statistics should be fast and simple and processing > them need not be. > > Ideally, they should be processed only when somebody asks the data. Correct. > > I'm not sure if that is the case in the current patchset. Which is the case in current implementation, where the Active Stats Monitor (ASM) is run in the context of thermal workqueue. It is scheduled every 100ms to run IPA throttling, which does the statistics gathering and calculation in the ASM. Time accounted for this major calculation is moved to the 'client' (like IPA) context. > >>> >>>> IMHO it has to be implemented as a new framework, next to >>>> CPUFreq and CPUIdle, due to a clean design and not just hooks from thermal >>>> governor into the frequency change and idle code paths. >>> >>> As far as the design is concerned, I'm not sure if I agree with it. >>> >>> From my perspective it's all a 1000-line patch that I have to read and >>> understand to figure out what the design is. >> >> I can help you with understanding it with some design docs if you want. > > That may help, but let's avoid doing extra work just yet. Understood > >>> >>>> Tha patch 4/6 introduces a new API for cooling devices, which allows to >>>> stop tracking the freq and idle statistics. >>>> >>>> The patch set contains also a patches 5/6 6/6 which adds the new power model >>>> based on ASF into the cpufreq cooling (used by thermal governor IPA). >>>> It is added as ifdef option, since Active Stats might be not compiled in. >>>> The ASF is a compile time option, but that might be changed and IPA could >>>> select it, which would allow to remove some redundant code from >>>> cpufreq_cooling.c. >>>> >>>> Comments and suggestions are very welcome. >>> >>> I'm totally not convinced that it is necessary to put the extra 1000 >>> lines of code into the kernel to address the problem at hand. >>> >> >> I understand your concerns. If you have another idea than this framework >> I'm happy to hear it. Maybe better stats in cpuidle, which would be >> are of the cpufreq? > > One idea that I have is outlined above and I'm not seeing a reason to > put cpufreq statistics into cpuidle. > I'm happy to prepare such RFC if you like. I would just need a bit more information. It sounds your proposed solution might be smaller in code size, since the client's statistics accounting might be moved to the cpufreq_cooling.c (which now live in the ASM). Also, there won't be a new framework to maintain (which is a big plus).
Sorry for the delay. On Tue, Jul 6, 2021 at 6:51 PM Lukasz Luba <lukasz.luba@arm.com> wrote: > > > > On 7/6/21 5:34 PM, Rafael J. Wysocki wrote: > > On Tue, Jul 6, 2021 at 5:56 PM Lukasz Luba <lukasz.luba@arm.com> wrote: > >> > >> On 7/6/21 4:28 PM, Rafael J. Wysocki wrote: > >>> On Tue, Jul 6, 2021 at 3:18 PM Lukasz Luba <lukasz.luba@arm.com> wrote: > >>>> > >>>> Hi all, > >>>> > >>>> This patch set introduces a new mechanism: Active Stats framework (ASF), which > >>>> gathers and maintains statistics of CPU performance - time residency at each > >>>> performance state. > >>>> > >>>> The ASF tracks all the frequency transitions as well as CPU > >>>> idle entry/exit events for all CPUs. Based on that it accounts the active > >>>> (non-idle) residency time for each CPU at each frequency. This information can > >>>> be used by some other subsystems (like thermal governor) to enhance their > >>>> estimations about CPU usage at a given period. > >>> > >>> This seems to mean that what is needed is something like the cpufreq > >>> stats but only collected during the time when CPUs are not in idle > >>> states. > >> > >> Yes > > > > So this is a clear problem statement: the cpufreq statistics cover the > > time when CPUs are in idle states, so they are not suitable for > > certain purposes, like thermal control. > > Agree, it's better described problem statement. > > > > > The most straightforward approach to address it seems to be to modify > > the collection of cpufreq statistics so that they don't include the > > time spent by CPUs in idle states, or to make it possible to > > distinguish the time spent in idle states from the "active" time. > > > >>>> Does it fix something in mainline? > >>>> Yes, there is thermal governor Intelligent Power Allocation (IPA), which > >>>> estimates the CPUs power used in the past. IPA is sampling the CPU utilization > >>>> and frequency and relies on the info available at the time of sampling > >>>> and this imposes the estimation errors. > >>>> The use of ASF solve the issue and enables IPA to make better estimates. > >>> > >>> Obviously the IPA is not used on all platforms where cpufreq and > >>> cpuidle are used. What platforms are going to benefit from this > >>> change? > >> > >> Arm platforms which still use kernel thermal to control temperature, > >> such as Chromebooks or mid-, low-end phones. > > > > Which means that this feature is not going to be universally useful. > > > > However, if the time spent by CPUs in idle states were accounted for > > in the cpufreq statistics, that would be universally useful. > > True > > > > >>> > >>>> Why it couldn't be done using existing frameworks? > >>>> The CPUFreq and CPUIdle statistics are not combined, so it is not possible > >>>> to derive the information on how long exactly the CPU was running with a given > >>>> frequency. > >>> > >>> But it doesn't mean that the statistics could not be combined. > >>> > >>> For instance, the frequency of the CPU cannot change via cpufreq when > >>> active_stats_cpu_idle_enter() is running, so instead of using an > >>> entirely new framework for collecting statistics it might update the > >>> existing cpufreq stats to register that event. > >> > >> True, but keep in mind that the cpufreq most likely works for a few > >> CPUs (policy::related_cpus), while cpuidle in a per-cpu fashion. > >> I would say that cpuidle should check during enter/exit what is > >> the currently set frequency for cluster and account its active > >> period. > > > > Yes, that's not particularly difficult to achieve in principle: on > > idle entry and exit, update the cpufreq statistics of the policy > > including the current CPU. > > Sounds good. > > > > >>> > >>> And analogously for the wakeup. > >>> > >>>> This new framework combines that information and provides > >>>> it in a handy way. > >>> > >>> I'm not convinced about the last piece. > >> > >> The handy structure is called Active Stats Monitor. It samples > >> the stats gathered after processing idle. That private > >> structure maintains statistics which are for a given period > >> (current snapshot - previous snapshot). > > > > So collecting the statistics should be fast and simple and processing > > them need not be. > > > > Ideally, they should be processed only when somebody asks the data. > > Correct. > > > > > I'm not sure if that is the case in the current patchset. > > Which is the case in current implementation, where the Active Stats > Monitor (ASM) is run in the context of thermal workqueue. It is > scheduled every 100ms to run IPA throttling, which does the > statistics gathering and calculation in the ASM. Time accounted for this > major calculation is moved to the 'client' (like IPA) context. > > > > >>> > >>>> IMHO it has to be implemented as a new framework, next to > >>>> CPUFreq and CPUIdle, due to a clean design and not just hooks from thermal > >>>> governor into the frequency change and idle code paths. > >>> > >>> As far as the design is concerned, I'm not sure if I agree with it. > >>> > >>> From my perspective it's all a 1000-line patch that I have to read and > >>> understand to figure out what the design is. > >> > >> I can help you with understanding it with some design docs if you want. > > > > That may help, but let's avoid doing extra work just yet. > > Understood > > > > >>> > >>>> Tha patch 4/6 introduces a new API for cooling devices, which allows to > >>>> stop tracking the freq and idle statistics. > >>>> > >>>> The patch set contains also a patches 5/6 6/6 which adds the new power model > >>>> based on ASF into the cpufreq cooling (used by thermal governor IPA). > >>>> It is added as ifdef option, since Active Stats might be not compiled in. > >>>> The ASF is a compile time option, but that might be changed and IPA could > >>>> select it, which would allow to remove some redundant code from > >>>> cpufreq_cooling.c. > >>>> > >>>> Comments and suggestions are very welcome. > >>> > >>> I'm totally not convinced that it is necessary to put the extra 1000 > >>> lines of code into the kernel to address the problem at hand. > >>> > >> > >> I understand your concerns. If you have another idea than this framework > >> I'm happy to hear it. Maybe better stats in cpuidle, which would be > >> are of the cpufreq? > > > > One idea that I have is outlined above and I'm not seeing a reason to > > put cpufreq statistics into cpuidle. > > > > I'm happy to prepare such RFC if you like. Well, it should be quite clear that I would prefer this to the original approach, if viable at all. :-) > I would just need a bit more information. OK > It sounds your proposed solution might be smaller in code > size, since the client's statistics accounting might be moved to the > cpufreq_cooling.c (which now live in the ASM). Also, there won't be a > new framework to maintain (which is a big plus). Right.
On 7/14/21 7:30 PM, Rafael J. Wysocki wrote: > Sorry for the delay. Thank you for coming back with comments. No worries, I've been on holidays last week :) [snip] >>>> >>>> I understand your concerns. If you have another idea than this framework >>>> I'm happy to hear it. Maybe better stats in cpuidle, which would be >>>> are of the cpufreq? >>> >>> One idea that I have is outlined above and I'm not seeing a reason to >>> put cpufreq statistics into cpuidle. >>> >> >> I'm happy to prepare such RFC if you like. > > Well, it should be quite clear that I would prefer this to the > original approach, if viable at all. :-) Sure, let me check if this approach is viable. I'll come back to you in next days... > >> I would just need a bit more information. > > OK For now I have only high level questions: 1. The stats data structures and related manageable code should live in the cpuidle and cpufreq just calls some API to notify about freq transition? (no split of data, code between these two frameworks) 2. The registration/allocation of these structures inside the cpuidle could be done using cpufreq_register_notifier() with notification mechanism? 3. CPU hotplug notification (which is needed for these stats) can be used inside the cpuidle (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,...))?