Message ID | 20201028140847.1018-1-lukasz.luba@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | Add sustainable OPP concept | expand |
On 28-10-20, 14:08, Lukasz Luba wrote: > Hi all, > > This patch set introduces a concept of sustainable OPP, which then can be used > by kernel frameworks or governors for estimating system sustainable system > state. This kind of estimation is done e.g. in thermal governor Intelligent > Power Allocation (IPA), which calculates sustainable power of the whole system > and then derives some coefficients for internal algorithm. > > The patch set introduces a new DT bindings 'opp-sustainable', with parsing > code. It also adds a function (in patch 3/4) which allows device drivers to set > directly the sustainable OPP. This is helpful when the device drivers populate > the OPP table by themself (example in patch 4/4). > Can we please have some more information about this ? What does the sustainable OPP mean ? How will platform guys know or learn about this ? How we are going to use it finally ? What does it have to do with temperature of the SoC or the thermal affects, etc.
On 29-10-20, 13:10, Viresh Kumar wrote: > On 28-10-20, 14:08, Lukasz Luba wrote: > > Hi all, > > > > This patch set introduces a concept of sustainable OPP, which then can be used > > by kernel frameworks or governors for estimating system sustainable system > > state. This kind of estimation is done e.g. in thermal governor Intelligent > > Power Allocation (IPA), which calculates sustainable power of the whole system > > and then derives some coefficients for internal algorithm. > > > > The patch set introduces a new DT bindings 'opp-sustainable', with parsing > > code. It also adds a function (in patch 3/4) which allows device drivers to set > > directly the sustainable OPP. This is helpful when the device drivers populate > > the OPP table by themself (example in patch 4/4). > > > > Can we please have some more information about this ? What does the > sustainable OPP mean ? How will platform guys know or learn about this > ? How we are going to use it finally ? What does it have to do with > temperature of the SoC or the thermal affects, etc. And that we need a real user of this first if it is ever going to be merged.
On 10/29/20 7:53 AM, Viresh Kumar wrote: > On 29-10-20, 13:10, Viresh Kumar wrote: >> On 28-10-20, 14:08, Lukasz Luba wrote: >>> Hi all, >>> >>> This patch set introduces a concept of sustainable OPP, which then can be used >>> by kernel frameworks or governors for estimating system sustainable system >>> state. This kind of estimation is done e.g. in thermal governor Intelligent >>> Power Allocation (IPA), which calculates sustainable power of the whole system >>> and then derives some coefficients for internal algorithm. >>> >>> The patch set introduces a new DT bindings 'opp-sustainable', with parsing >>> code. It also adds a function (in patch 3/4) which allows device drivers to set >>> directly the sustainable OPP. This is helpful when the device drivers populate >>> the OPP table by themself (example in patch 4/4). >>> >> >> Can we please have some more information about this ? What does the >> sustainable OPP mean ? How will platform guys know or learn about this >> ? How we are going to use it finally ? What does it have to do with >> temperature of the SoC or the thermal affects, etc. There were discussions about Energy Model (EM), scale of values (mW or abstract scale) and relation to EAS and IPA. You can find quite long discussion below v2 [1] (there is also v3 send after agreement [2]). We have in thermal DT binding: 'sustainable-power' expressed in mW, which is used by IPA, but it would not support bogoWatts. The sustainable power is used for estimation of internal coefficients (also for power budget), which I am trying to change to work with 'abstract scale' [3][4]. This would allow to estimate sustainable power of the system based on CPUs, GPU opp-sustainable points, where we don't have 'sustainable-power' or devices using bogoWatts. > > And that we need a real user of this first if it is ever going to be > merged. > IPA would be the first user of this in combination with scmi-cpufreq.c, which can feed 'abstract scale' in to EM. Currently IPA takes lowest allowed OPPs into account for this estimation which is not optimal. This marked OPPs would make estimation a lot better. Regards, Lukasz [1] https://lore.kernel.org/lkml/20201002114426.31277-1-lukasz.luba@arm.com/ [2] https://lore.kernel.org/lkml/20201019140601.3047-1-lukasz.luba@arm.com/ [3] https://lore.kernel.org/linux-pm/5f682bbb-b250-49e6-dbb7-aea522a58595@arm.com/ [4] https://lore.kernel.org/lkml/20201009135850.14727-1-lukasz.luba@arm.com/
On 29-10-20, 09:56, Lukasz Luba wrote: > There were discussions about Energy Model (EM), scale of values (mW or > abstract scale) and relation to EAS and IPA. You can find quite long > discussion below v2 [1] (there is also v3 send after agreement [2]). > We have in thermal DT binding: 'sustainable-power' expressed in mW, > which is used by IPA, but it would not support bogoWatts. Why so ? (I am sorry, can't dig into such long threads without knowing which message I am looking for :( ). Lets assume if that same property can be used for bogoWatts, will that be sufficient for you ? Or you will still need this patch set ? > The sustainable power is used for estimation of internal coefficients > (also for power budget), which I am trying to change to work with > 'abstract scale' [3][4]. > > This would allow to estimate sustainable power of the system based on > CPUs, GPU opp-sustainable points, where we don't have > 'sustainable-power' or devices using bogoWatts. Then maybe we should ahve sustainable-power in those cases too instead of adding a meaningless (IMHO) binding. Honestly speaking, as Nishanth said, there is nothing like a sustainable OPP in reality. Moreover, the DT needs to describe the hardware as it is (and in some cases the behavior of the firmware). And what you are trying to add here is none of them and so it should not go in DT as such. There are too many factors which play a part here, ambient temperature is one of the biggest ones, and the software needs to find the sustainable OPP by itself based on the current situation. So I don't really see a good reason why such a property should be added here. Coming to properties like suspend-opp, it made sense for some of the platforms as the last configured frequency of the CPU plays a part in deciding the power consumed by the SoC even when the system is suspended. And finding an optimal OPP (normally the lowest) there would make sense and so was that property added.
On 10/30/20 8:29 AM, Viresh Kumar wrote: > On 29-10-20, 09:56, Lukasz Luba wrote: >> There were discussions about Energy Model (EM), scale of values (mW or >> abstract scale) and relation to EAS and IPA. You can find quite long >> discussion below v2 [1] (there is also v3 send after agreement [2]). >> We have in thermal DT binding: 'sustainable-power' expressed in mW, >> which is used by IPA, but it would not support bogoWatts. > > Why so ? (I am sorry, can't dig into such long threads without knowing > which message I am looking for :( ). Lets assume if that same property > can be used for bogoWatts, will that be sufficient for you ? Or you > will still need this patch set ? I had a patch for that, but I know Rob's opinion on this one [1] (which is below in that thread). > >> The sustainable power is used for estimation of internal coefficients >> (also for power budget), which I am trying to change to work with >> 'abstract scale' [3][4]. >> >> This would allow to estimate sustainable power of the system based on >> CPUs, GPU opp-sustainable points, where we don't have >> 'sustainable-power' or devices using bogoWatts. > > Then maybe we should ahve sustainable-power in those cases too instead > of adding a meaningless (IMHO) binding. How about dropping the DT binding, but just adding this new field into dev_pm_opp? There will be no DT parsing code, just the get/set functions, which will be used in SCMI patch 4/4 and in IPA? That would not require to change any DT bindings. > > Honestly speaking, as Nishanth said, there is nothing like a > sustainable OPP in reality. Moreover, the DT needs to describe the > hardware as it is (and in some cases the behavior of the firmware). > And what you are trying to add here is none of them and so it should > not go in DT as such. There are too many factors which play a part > here, ambient temperature is one of the biggest ones, and the software > needs to find the sustainable OPP by itself based on the current > situation. > > So I don't really see a good reason why such a property should be > added here. I see. Just for your information SCMI supports 'Sustained Performance' expressed in kHz. > > Coming to properties like suspend-opp, it made sense for some of the > platforms as the last configured frequency of the CPU plays a part in > deciding the power consumed by the SoC even when the system is > suspended. And finding an optimal OPP (normally the lowest) there > would make sense and so was that property added. > I also found that suspend-opp (83f8ca45afbf041e312909). I hope you wouldn't mind if I add this new field into dev_pm_opp (no DT support, just FW). [1] https://lore.kernel.org/lkml/20201002114426.31277-4-lukasz.luba@arm.com/
On 30-10-20, 09:19, Lukasz Luba wrote: > How about dropping the DT binding, but just adding this new field into > dev_pm_opp? There will be no DT parsing code, just the get/set > functions, which will be used in SCMI patch 4/4 and in IPA? > That would not require to change any DT bindings. > I see. Just for your information SCMI supports 'Sustained Performance' > expressed in kHz. Even that doesn't sound great (but then I don't have any background of why that was added there). The problem is not about how do we get this data into the kernel (from DT or firmware), but why is it even required. I really feel that software can find the sustainable OPP by itself (which can keep changing). About moving it into the OPP core, I am open to getting something added there if it is really useful and if the OPP core is the best suited place to keep such data. Though I am not sure of that for this field right now. Is it ever going to be used by anyone else apart from IPA ? If not, what about adding a helper in IPA to set sustainable-freq for a device ? So only SCMI based platforms will be able to use this stuff ? That's very limited, isn't it ? I think we should still try to make it better for everyone by making the software smarter. It has so much data, the OPPs, the power it will consume (based on microvolt property?), the heat we produce from that (from thermal framework), etc. Perhaps building this information continuously at runtime based on when and how we hit the trip points ? So we know which is the right frequency where we can refrain from hitting the trip points. But may be I am asking too much :(
On 10/30/20 9:52 AM, Viresh Kumar wrote: > On 30-10-20, 09:19, Lukasz Luba wrote: >> How about dropping the DT binding, but just adding this new field into >> dev_pm_opp? There will be no DT parsing code, just the get/set >> functions, which will be used in SCMI patch 4/4 and in IPA? >> That would not require to change any DT bindings. > >> I see. Just for your information SCMI supports 'Sustained Performance' >> expressed in kHz. > > Even that doesn't sound great (but then I don't have any background of > why that was added there). The problem is not about how do we get this > data into the kernel (from DT or firmware), but why is it even > required. I really feel that software can find the sustainable OPP by > itself (which can keep changing). IPA tries to do that, even dynamically when e.g. GPU is supper busy in 3D games (~2000W) or almost idle showing 2D home screen. It tries to find highest 'sustainable' frequencies for the devices, at that various workloads and temp. But it needs some coefficients to start, which have big impact on the algorithm. It could slow down IPA a lot, when those coefficients are calculated based on lowest OPPs. > > About moving it into the OPP core, I am open to getting something > added there if it is really useful and if the OPP core is the best > suited place to keep such data. Though I am not sure of that for this > field right now. > > Is it ever going to be used by anyone else apart from IPA ? If not, > what about adding a helper in IPA to set sustainable-freq for a device > ? My backup plan was to add a flag into EM em_perf_state, extend SCMI perf exposing the 'sustained_freq_khz' to scmi-cpufreq, which would set that field after registering EM. IPA depends on EM, so should be OK. > > So only SCMI based platforms will be able to use this stuff ? That's I don't know who would also use it in future. I just presented you current user of this, as you asked. > very limited, isn't it ? I think we should still try to make it better > for everyone by making the software smarter. It has so much data, the > OPPs, the power it will consume (based on microvolt property?), the > heat we produce from that (from thermal framework), etc. Perhaps > building this information continuously at runtime based on when and > how we hit the trip points ? So we know which is the right frequency > where we can refrain from hitting the trip points. IPA works in this way. > > But may be I am asking too much :( > When you asked for user of this, I gave you instantly. This is one is more difficult. I am still not there with IPA tests in LISA. I have some out-of-tree kernel driver for testing, which also need polishing before can be used with LISA. Then proper workloads with results processing. EM for devfreq cooling devices. Then decent 'hot' board running preferably mainline kernel. What you requested is on my list, but it needs more work, which won't be ready over night. Regards, Lukasz
On 30-10-20, 10:56, Lukasz Luba wrote: > IPA tries to do that, even dynamically when e.g. GPU is supper busy > in 3D games (~2000W) or almost idle showing 2D home screen. > It tries to find highest 'sustainable' frequencies for the devices, > at that various workloads and temp. But it needs some coefficients to > start, which have big impact on the algorithm. It could slow down IPA a > lot, when those coefficients are calculated based on lowest OPPs. I see. So when you say it slows down IPA, what does that really mean ? IPA isn't performing that accurately during the initial period of booting (any time estimate here) ? Does it work fine after a time duration? Or will it suffer for ever ? And maybe you shouldn't start with the lowest OPPs while you calculate these coefficients dynamically ? Maybe start from the middle ? As the sustainable OPP would be something there only or maybe a bit higher only. But yeah, I don't have any idea about how those coefficients are calculated so this idea can be simply ignored as well :) > My backup plan was to add a flag into EM em_perf_state, extend SCMI perf > exposing the 'sustained_freq_khz' to scmi-cpufreq, which would set that > field after registering EM. IPA depends on EM, so should be OK. I think at this point (considering the limited number of users (only IPA) and providers (only SCMI)), it would be better that way only instead of updating the OPP framework. Of course we can revisit that if we ever feel that we need a better placeholder for it. > > So only SCMI based platforms will be able to use this stuff ? That's > > very limited, isn't it ? I think we should still try to make it better > > for everyone by making the software smarter. It has so much data, the > > OPPs, the power it will consume (based on microvolt property?), the > > heat we produce from that (from thermal framework), etc. Perhaps > > building this information continuously at runtime based on when and > > how we hit the trip points ? So we know which is the right frequency > > where we can refrain from hitting the trip points. > > IPA works in this way. Nice, that's what I thought as well but then got a bit confused with your patchset. > > But may be I am asking too much :( > > > > When you asked for user of this, I gave you instantly. This is one is > more difficult. I am still not there with IPA tests in LISA. I have some > out-of-tree kernel driver for testing, which also need polishing before > can be used with LISA. Then proper workloads with results processing. > EM for devfreq cooling devices. Then decent 'hot' board running > preferably mainline kernel. > What you requested is on my list, but it needs more work, which > won't be ready over night. I can understand what you are trying to do here. And this surely requires a lot of effort.
On 10/30/20 11:17 AM, Viresh Kumar wrote: > On 30-10-20, 10:56, Lukasz Luba wrote: >> IPA tries to do that, even dynamically when e.g. GPU is supper busy >> in 3D games (~2000W) or almost idle showing 2D home screen. >> It tries to find highest 'sustainable' frequencies for the devices, >> at that various workloads and temp. But it needs some coefficients to >> start, which have big impact on the algorithm. It could slow down IPA a >> lot, when those coefficients are calculated based on lowest OPPs. > > I see. So when you say it slows down IPA, what does that really mean ? > IPA isn't performing that accurately during the initial period of > booting (any time estimate here) ? Does it work fine after a time > duration? Or will it suffer for ever ? The coefficients would stay 'forever', which determine the temp rising slope, until someone change them via sysfs (the: k_po, k_pu, k_i, sustainable_power). > > And maybe you shouldn't start with the lowest OPPs while you calculate > these coefficients dynamically ? Maybe start from the middle ? As the > sustainable OPP would be something there only or maybe a bit higher > only. But yeah, I don't have any idea about how those coefficients are > calculated so this idea can be simply ignored as well :) > >> My backup plan was to add a flag into EM em_perf_state, extend SCMI perf >> exposing the 'sustained_freq_khz' to scmi-cpufreq, which would set that >> field after registering EM. IPA depends on EM, so should be OK. > > I think at this point (considering the limited number of users (only > IPA) and providers (only SCMI)), it would be better that way only > instead of updating the OPP framework. Of course we can revisit that > if we ever feel that we need a better placeholder for it. OK, sounds good. > >>> So only SCMI based platforms will be able to use this stuff ? That's >>> very limited, isn't it ? I think we should still try to make it better >>> for everyone by making the software smarter. It has so much data, the >>> OPPs, the power it will consume (based on microvolt property?), the >>> heat we produce from that (from thermal framework), etc. Perhaps >>> building this information continuously at runtime based on when and >>> how we hit the trip points ? So we know which is the right frequency >>> where we can refrain from hitting the trip points. >> >> IPA works in this way. > > Nice, that's what I thought as well but then got a bit confused with > your patchset. > >>> But may be I am asking too much :( >>> >> >> When you asked for user of this, I gave you instantly. This is one is >> more difficult. I am still not there with IPA tests in LISA. I have some >> out-of-tree kernel driver for testing, which also need polishing before >> can be used with LISA. Then proper workloads with results processing. >> EM for devfreq cooling devices. Then decent 'hot' board running >> preferably mainline kernel. >> What you requested is on my list, but it needs more work, which >> won't be ready over night. > > I can understand what you are trying to do here. And this surely > requires a lot of effort. > Thank you Viresh for your opinion. I will take the EM approach, please ignore this patch set. Regards, Lukasz