Message ID | 20200526151619.8779-1-benjamin.gaignard@st.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce cpufreq minimum load QoS | expand |
Hi Benjamin, On 26/05/20 16:16, Benjamin Gaignard wrote: > A first round [1] of discussions and suggestions have already be done on > this series but without found a solution to the problem. I resend it to > progress on this topic. > Apologies for sleeping on that previous thread. So what had been suggested over there was to use uclamp to boost the frequency of the handling thread; however if you use threaded IRQs you get RT threads, which already get the max frequency by default (at least with schedutil). Does that not work for you, and if so, why? > When start streaming from the sensor the CPU load could remain very low > because almost all the capture pipeline is done in hardware (i.e. without > using the CPU) and let believe to cpufreq governor that it could use lower > frequencies. If the governor decides to use a too low frequency that > becomes a problem when we need to acknowledge the interrupt during the > blanking time. > The delay to ack the interrupt and perform all the other actions before > the next frame is very short and doesn't allow to the cpufreq governor to > provide the required burst of power. That led to drop the half of the frames. > > To avoid this problem, DCMI driver informs the cpufreq governors by adding > a cpufreq minimum load QoS resquest. > > Benjamin > > [1] https://lkml.org/lkml/2020/4/24/360 > > Benjamin Gaignard (3): > PM: QoS: Introduce cpufreq minimum load QoS > cpufreq: governor: Use minimum load QoS > media: stm32-dcmi: Inform cpufreq governors about cpu load needs > > drivers/cpufreq/cpufreq_governor.c | 5 + > drivers/media/platform/stm32/stm32-dcmi.c | 8 ++ > include/linux/pm_qos.h | 12 ++ > kernel/power/qos.c | 213 ++++++++++++++++++++++++++++++ > 4 files changed, 238 insertions(+)
On 5/27/20 12:09 PM, Valentin Schneider wrote: > Hi Benjamin, > > On 26/05/20 16:16, Benjamin Gaignard wrote: >> A first round [1] of discussions and suggestions have already be done on >> this series but without found a solution to the problem. I resend it to >> progress on this topic. >> > Apologies for sleeping on that previous thread. > > So what had been suggested over there was to use uclamp to boost the > frequency of the handling thread; however if you use threaded IRQs you > get RT threads, which already get the max frequency by default (at least > with schedutil). > > Does that not work for you, and if so, why? That doesn't work because almost everything is done by the hardware blocks without charge the CPU so the thread isn't running. I have done the tests with schedutil and ondemand scheduler (which is the one I'm targeting). I have no issues when using performance scheduler because it always keep the highest frequencies. > >> When start streaming from the sensor the CPU load could remain very low >> because almost all the capture pipeline is done in hardware (i.e. without >> using the CPU) and let believe to cpufreq governor that it could use lower >> frequencies. If the governor decides to use a too low frequency that >> becomes a problem when we need to acknowledge the interrupt during the >> blanking time. >> The delay to ack the interrupt and perform all the other actions before >> the next frame is very short and doesn't allow to the cpufreq governor to >> provide the required burst of power. That led to drop the half of the frames. >> >> To avoid this problem, DCMI driver informs the cpufreq governors by adding >> a cpufreq minimum load QoS resquest. >> >> Benjamin >> >> [1] https://lkml.org/lkml/2020/4/24/360 >> >> Benjamin Gaignard (3): >> PM: QoS: Introduce cpufreq minimum load QoS >> cpufreq: governor: Use minimum load QoS >> media: stm32-dcmi: Inform cpufreq governors about cpu load needs >> >> drivers/cpufreq/cpufreq_governor.c | 5 + >> drivers/media/platform/stm32/stm32-dcmi.c | 8 ++ >> include/linux/pm_qos.h | 12 ++ >> kernel/power/qos.c | 213 ++++++++++++++++++++++++++++++ >> 4 files changed, 238 insertions(+)
On 27/05/20 12:17, Benjamin GAIGNARD wrote: > On 5/27/20 12:09 PM, Valentin Schneider wrote: >> Hi Benjamin, >> >> On 26/05/20 16:16, Benjamin Gaignard wrote: >>> A first round [1] of discussions and suggestions have already be done on >>> this series but without found a solution to the problem. I resend it to >>> progress on this topic. >>> >> Apologies for sleeping on that previous thread. >> >> So what had been suggested over there was to use uclamp to boost the >> frequency of the handling thread; however if you use threaded IRQs you >> get RT threads, which already get the max frequency by default (at least >> with schedutil). >> >> Does that not work for you, and if so, why? > > That doesn't work because almost everything is done by the hardware blocks > without charge the CPU so the thread isn't running. I'm not sure I follow; the frequency of the CPU doesn't matter while your hardware blocks are spinning, right? AIUI what matters is running your interrupt handler / action at max freq, which you get if you use threaded IRQs and schedutil. I think it would help if you could clarify which tasks / parts of your pipeline you need running at high frequencies. The point is that setting a QoS request affects all tasks, whereas we could be smarter and only boost the required tasks. > I have done the > tests with schedutil > and ondemand scheduler (which is the one I'm targeting). I have no > issues when using > performance scheduler because it always keep the highest frequencies. > > >> >>> When start streaming from the sensor the CPU load could remain very low >>> because almost all the capture pipeline is done in hardware (i.e. without >>> using the CPU) and let believe to cpufreq governor that it could use lower >>> frequencies. If the governor decides to use a too low frequency that >>> becomes a problem when we need to acknowledge the interrupt during the >>> blanking time. >>> The delay to ack the interrupt and perform all the other actions before >>> the next frame is very short and doesn't allow to the cpufreq governor to >>> provide the required burst of power. That led to drop the half of the frames. >>> >>> To avoid this problem, DCMI driver informs the cpufreq governors by adding >>> a cpufreq minimum load QoS resquest. >>> >>> Benjamin >>> >>> [1] https://lkml.org/lkml/2020/4/24/360 >>> >>> Benjamin Gaignard (3): >>> PM: QoS: Introduce cpufreq minimum load QoS >>> cpufreq: governor: Use minimum load QoS >>> media: stm32-dcmi: Inform cpufreq governors about cpu load needs >>> >>> drivers/cpufreq/cpufreq_governor.c | 5 + >>> drivers/media/platform/stm32/stm32-dcmi.c | 8 ++ >>> include/linux/pm_qos.h | 12 ++ >>> kernel/power/qos.c | 213 ++++++++++++++++++++++++++++++ >>> 4 files changed, 238 insertions(+)
On Wed, 27 May 2020 at 13:17, Benjamin GAIGNARD <benjamin.gaignard@st.com> wrote: > > > > On 5/27/20 12:09 PM, Valentin Schneider wrote: > > Hi Benjamin, > > > > On 26/05/20 16:16, Benjamin Gaignard wrote: > >> A first round [1] of discussions and suggestions have already be done on > >> this series but without found a solution to the problem. I resend it to > >> progress on this topic. > >> > > Apologies for sleeping on that previous thread. > > > > So what had been suggested over there was to use uclamp to boost the > > frequency of the handling thread; however if you use threaded IRQs you > > get RT threads, which already get the max frequency by default (at least > > with schedutil). > > > > Does that not work for you, and if so, why? > That doesn't work because almost everything is done by the hardware blocks > without charge the CPU so the thread isn't running. I have done the > tests with schedutil > and ondemand scheduler (which is the one I'm targeting). I have no > issues when using > performance scheduler because it always keep the highest frequencies. IMHO, the only way to ensure a min frequency for anything else than a thread is to use freq_qos_add_request() just like cpufreq cooling device but for the opposite QoS. This can be applied only on the frequency domain of the CPU which handles the interrupt. Have you also checked the wakeup latency of your idle state ? > > > > > >> When start streaming from the sensor the CPU load could remain very low > >> because almost all the capture pipeline is done in hardware (i.e. without > >> using the CPU) and let believe to cpufreq governor that it could use lower > >> frequencies. If the governor decides to use a too low frequency that > >> becomes a problem when we need to acknowledge the interrupt during the > >> blanking time. > >> The delay to ack the interrupt and perform all the other actions before > >> the next frame is very short and doesn't allow to the cpufreq governor to > >> provide the required burst of power. That led to drop the half of the frames. > >> > >> To avoid this problem, DCMI driver informs the cpufreq governors by adding > >> a cpufreq minimum load QoS resquest. > >> > >> Benjamin > >> > >> [1] https://lkml.org/lkml/2020/4/24/360 > >> > >> Benjamin Gaignard (3): > >> PM: QoS: Introduce cpufreq minimum load QoS > >> cpufreq: governor: Use minimum load QoS > >> media: stm32-dcmi: Inform cpufreq governors about cpu load needs > >> > >> drivers/cpufreq/cpufreq_governor.c | 5 + > >> drivers/media/platform/stm32/stm32-dcmi.c | 8 ++ > >> include/linux/pm_qos.h | 12 ++ > >> kernel/power/qos.c | 213 ++++++++++++++++++++++++++++++ > >> 4 files changed, 238 insertions(+)
On 5/27/20 2:22 PM, Vincent Guittot wrote: > On Wed, 27 May 2020 at 13:17, Benjamin GAIGNARD > <benjamin.gaignard@st.com> wrote: >> >> >> On 5/27/20 12:09 PM, Valentin Schneider wrote: >>> Hi Benjamin, >>> >>> On 26/05/20 16:16, Benjamin Gaignard wrote: >>>> A first round [1] of discussions and suggestions have already be done on >>>> this series but without found a solution to the problem. I resend it to >>>> progress on this topic. >>>> >>> Apologies for sleeping on that previous thread. >>> >>> So what had been suggested over there was to use uclamp to boost the >>> frequency of the handling thread; however if you use threaded IRQs you >>> get RT threads, which already get the max frequency by default (at least >>> with schedutil). >>> >>> Does that not work for you, and if so, why? >> That doesn't work because almost everything is done by the hardware blocks >> without charge the CPU so the thread isn't running. I have done the >> tests with schedutil >> and ondemand scheduler (which is the one I'm targeting). I have no >> issues when using >> performance scheduler because it always keep the highest frequencies. > IMHO, the only way to ensure a min frequency for anything else than a > thread is to use freq_qos_add_request() just like cpufreq cooling > device but for the opposite QoS. This can be applied only on the > frequency domain of the CPU which handles the interrupt. I will give a try with this idea. Thanks. > Have you also checked the wakeup latency of your idle state ? It just could go in WFI so latency should be minimal. > >> >>>> When start streaming from the sensor the CPU load could remain very low >>>> because almost all the capture pipeline is done in hardware (i.e. without >>>> using the CPU) and let believe to cpufreq governor that it could use lower >>>> frequencies. If the governor decides to use a too low frequency that >>>> becomes a problem when we need to acknowledge the interrupt during the >>>> blanking time. >>>> The delay to ack the interrupt and perform all the other actions before >>>> the next frame is very short and doesn't allow to the cpufreq governor to >>>> provide the required burst of power. That led to drop the half of the frames. >>>> >>>> To avoid this problem, DCMI driver informs the cpufreq governors by adding >>>> a cpufreq minimum load QoS resquest. >>>> >>>> Benjamin >>>> >>>> [1] https://lkml.org/lkml/2020/4/24/360 >>>> >>>> Benjamin Gaignard (3): >>>> PM: QoS: Introduce cpufreq minimum load QoS >>>> cpufreq: governor: Use minimum load QoS >>>> media: stm32-dcmi: Inform cpufreq governors about cpu load needs >>>> >>>> drivers/cpufreq/cpufreq_governor.c | 5 + >>>> drivers/media/platform/stm32/stm32-dcmi.c | 8 ++ >>>> include/linux/pm_qos.h | 12 ++ >>>> kernel/power/qos.c | 213 ++++++++++++++++++++++++++++++ >>>> 4 files changed, 238 insertions(+)
On 5/27/20 2:14 PM, Valentin Schneider wrote: > On 27/05/20 12:17, Benjamin GAIGNARD wrote: >> On 5/27/20 12:09 PM, Valentin Schneider wrote: >>> Hi Benjamin, >>> >>> On 26/05/20 16:16, Benjamin Gaignard wrote: >>>> A first round [1] of discussions and suggestions have already be done on >>>> this series but without found a solution to the problem. I resend it to >>>> progress on this topic. >>>> >>> Apologies for sleeping on that previous thread. >>> >>> So what had been suggested over there was to use uclamp to boost the >>> frequency of the handling thread; however if you use threaded IRQs you >>> get RT threads, which already get the max frequency by default (at least >>> with schedutil). >>> >>> Does that not work for you, and if so, why? >> That doesn't work because almost everything is done by the hardware blocks >> without charge the CPU so the thread isn't running. > I'm not sure I follow; the frequency of the CPU doesn't matter while > your hardware blocks are spinning, right? AIUI what matters is running > your interrupt handler / action at max freq, which you get if you use > threaded IRQs and schedutil. Yes but not limited to schedutil. Given the latency needed to change of frequencies I think it could already too late to change the CPU frequency when handling the threaded interrupt. > > I think it would help if you could clarify which tasks / parts of your > pipeline you need running at high frequencies. The point is that setting > a QoS request affects all tasks, whereas we could be smarter and only > boost the required tasks. What make us drop frames is that the threaded IRQ is scheduled too late. The not thread part of the interrupt handler where we clear the interrupt flags is going fine but the thread part not. > >> I have done the >> tests with schedutil >> and ondemand scheduler (which is the one I'm targeting). I have no >> issues when using >> performance scheduler because it always keep the highest frequencies. >> >> >>>> When start streaming from the sensor the CPU load could remain very low >>>> because almost all the capture pipeline is done in hardware (i.e. without >>>> using the CPU) and let believe to cpufreq governor that it could use lower >>>> frequencies. If the governor decides to use a too low frequency that >>>> becomes a problem when we need to acknowledge the interrupt during the >>>> blanking time. >>>> The delay to ack the interrupt and perform all the other actions before >>>> the next frame is very short and doesn't allow to the cpufreq governor to >>>> provide the required burst of power. That led to drop the half of the frames. >>>> >>>> To avoid this problem, DCMI driver informs the cpufreq governors by adding >>>> a cpufreq minimum load QoS resquest. >>>> >>>> Benjamin >>>> >>>> [1] https://lkml.org/lkml/2020/4/24/360 >>>> >>>> Benjamin Gaignard (3): >>>> PM: QoS: Introduce cpufreq minimum load QoS >>>> cpufreq: governor: Use minimum load QoS >>>> media: stm32-dcmi: Inform cpufreq governors about cpu load needs >>>> >>>> drivers/cpufreq/cpufreq_governor.c | 5 + >>>> drivers/media/platform/stm32/stm32-dcmi.c | 8 ++ >>>> include/linux/pm_qos.h | 12 ++ >>>> kernel/power/qos.c | 213 ++++++++++++++++++++++++++++++ >>>> 4 files changed, 238 insertions(+)
On 5/27/20 2:48 PM, Benjamin GAIGNARD wrote: > > > On 5/27/20 2:22 PM, Vincent Guittot wrote: >> On Wed, 27 May 2020 at 13:17, Benjamin GAIGNARD >> <benjamin.gaignard@st.com> wrote: >>> >>> >>> On 5/27/20 12:09 PM, Valentin Schneider wrote: >>>> Hi Benjamin, >>>> >>>> On 26/05/20 16:16, Benjamin Gaignard wrote: >>>>> A first round [1] of discussions and suggestions have already be >>>>> done on >>>>> this series but without found a solution to the problem. I resend >>>>> it to >>>>> progress on this topic. >>>>> >>>> Apologies for sleeping on that previous thread. >>>> >>>> So what had been suggested over there was to use uclamp to boost the >>>> frequency of the handling thread; however if you use threaded IRQs you >>>> get RT threads, which already get the max frequency by default (at >>>> least >>>> with schedutil). >>>> >>>> Does that not work for you, and if so, why? >>> That doesn't work because almost everything is done by the hardware >>> blocks >>> without charge the CPU so the thread isn't running. I have done the >>> tests with schedutil >>> and ondemand scheduler (which is the one I'm targeting). I have no >>> issues when using >>> performance scheduler because it always keep the highest frequencies. >> IMHO, the only way to ensure a min frequency for anything else than a >> thread is to use freq_qos_add_request() just like cpufreq cooling >> device but for the opposite QoS. This can be applied only on the >> frequency domain of the CPU which handles the interrupt. > I will give a try with this idea. > Thanks. Adding freq_qos_add_request(FREQ_QOS_MIN) when starting streaming frames solve my problem. I remove the request at the end of the streaming to restore the default value. Benjamin >> Have you also checked the wakeup latency of your idle state ? > It just could go in WFI so latency should be minimal. >> >>> >>>>> When start streaming from the sensor the CPU load could remain >>>>> very low >>>>> because almost all the capture pipeline is done in hardware (i.e. >>>>> without >>>>> using the CPU) and let believe to cpufreq governor that it could >>>>> use lower >>>>> frequencies. If the governor decides to use a too low frequency that >>>>> becomes a problem when we need to acknowledge the interrupt during >>>>> the >>>>> blanking time. >>>>> The delay to ack the interrupt and perform all the other actions >>>>> before >>>>> the next frame is very short and doesn't allow to the cpufreq >>>>> governor to >>>>> provide the required burst of power. That led to drop the half of >>>>> the frames. >>>>> >>>>> To avoid this problem, DCMI driver informs the cpufreq governors >>>>> by adding >>>>> a cpufreq minimum load QoS resquest. >>>>> >>>>> Benjamin >>>>> >>>>> [1] https://lkml.org/lkml/2020/4/24/360 >>>>> >>>>> Benjamin Gaignard (3): >>>>> PM: QoS: Introduce cpufreq minimum load QoS >>>>> cpufreq: governor: Use minimum load QoS >>>>> media: stm32-dcmi: Inform cpufreq governors about cpu load needs >>>>> >>>>> drivers/cpufreq/cpufreq_governor.c | 5 + >>>>> drivers/media/platform/stm32/stm32-dcmi.c | 8 ++ >>>>> include/linux/pm_qos.h | 12 ++ >>>>> kernel/power/qos.c | 213 >>>>> ++++++++++++++++++++++++++++++ >>>>> 4 files changed, 238 insertions(+) >
On 27/05/20 14:11, Benjamin GAIGNARD wrote: > On 5/27/20 2:14 PM, Valentin Schneider wrote: >> On 27/05/20 12:17, Benjamin GAIGNARD wrote: >>> On 5/27/20 12:09 PM, Valentin Schneider wrote: >>>> Hi Benjamin, >>>> >>>> On 26/05/20 16:16, Benjamin Gaignard wrote: >>>>> A first round [1] of discussions and suggestions have already be done on >>>>> this series but without found a solution to the problem. I resend it to >>>>> progress on this topic. >>>>> >>>> Apologies for sleeping on that previous thread. >>>> >>>> So what had been suggested over there was to use uclamp to boost the >>>> frequency of the handling thread; however if you use threaded IRQs you >>>> get RT threads, which already get the max frequency by default (at least >>>> with schedutil). >>>> >>>> Does that not work for you, and if so, why? >>> That doesn't work because almost everything is done by the hardware blocks >>> without charge the CPU so the thread isn't running. >> I'm not sure I follow; the frequency of the CPU doesn't matter while >> your hardware blocks are spinning, right? AIUI what matters is running >> your interrupt handler / action at max freq, which you get if you use >> threaded IRQs and schedutil. > Yes but not limited to schedutil. > Given the latency needed to change of frequencies I think it could > already too late > to change the CPU frequency when handling the threaded interrupt. Right, on my Juno the transition latency (i.e. worse case) is about 1.2ms; I can see that eating into your time budget, depending on the framerate you're going for. Vincent's got a point, if you can limit that max-freq-hold to a single frequency domain, that would probably be a tad better. Thanks for persisting through my questioning :-) >> >> I think it would help if you could clarify which tasks / parts of your >> pipeline you need running at high frequencies. The point is that setting >> a QoS request affects all tasks, whereas we could be smarter and only >> boost the required tasks. > What make us drop frames is that the threaded IRQ is scheduled too late. > The not thread part of the interrupt handler where we clear the > interrupt flags > is going fine but the thread part not. >> >>> I have done the >>> tests with schedutil >>> and ondemand scheduler (which is the one I'm targeting). I have no >>> issues when using >>> performance scheduler because it always keep the highest frequencies. >>> >>> >>>>> When start streaming from the sensor the CPU load could remain very low >>>>> because almost all the capture pipeline is done in hardware (i.e. without >>>>> using the CPU) and let believe to cpufreq governor that it could use lower >>>>> frequencies. If the governor decides to use a too low frequency that >>>>> becomes a problem when we need to acknowledge the interrupt during the >>>>> blanking time. >>>>> The delay to ack the interrupt and perform all the other actions before >>>>> the next frame is very short and doesn't allow to the cpufreq governor to >>>>> provide the required burst of power. That led to drop the half of the frames. >>>>> >>>>> To avoid this problem, DCMI driver informs the cpufreq governors by adding >>>>> a cpufreq minimum load QoS resquest. >>>>> >>>>> Benjamin >>>>> >>>>> [1] https://lkml.org/lkml/2020/4/24/360 >>>>> >>>>> Benjamin Gaignard (3): >>>>> PM: QoS: Introduce cpufreq minimum load QoS >>>>> cpufreq: governor: Use minimum load QoS >>>>> media: stm32-dcmi: Inform cpufreq governors about cpu load needs >>>>> >>>>> drivers/cpufreq/cpufreq_governor.c | 5 + >>>>> drivers/media/platform/stm32/stm32-dcmi.c | 8 ++ >>>>> include/linux/pm_qos.h | 12 ++ >>>>> kernel/power/qos.c | 213 ++++++++++++++++++++++++++++++ >>>>> 4 files changed, 238 insertions(+)
On Wed, May 27, 2020 at 4:54 PM Benjamin GAIGNARD <benjamin.gaignard@st.com> wrote: > > > > On 5/27/20 2:48 PM, Benjamin GAIGNARD wrote: > > > > > > On 5/27/20 2:22 PM, Vincent Guittot wrote: > >> On Wed, 27 May 2020 at 13:17, Benjamin GAIGNARD > >> <benjamin.gaignard@st.com> wrote: > >>> > >>> > >>> On 5/27/20 12:09 PM, Valentin Schneider wrote: > >>>> Hi Benjamin, > >>>> > >>>> On 26/05/20 16:16, Benjamin Gaignard wrote: > >>>>> A first round [1] of discussions and suggestions have already be > >>>>> done on > >>>>> this series but without found a solution to the problem. I resend > >>>>> it to > >>>>> progress on this topic. > >>>>> > >>>> Apologies for sleeping on that previous thread. > >>>> > >>>> So what had been suggested over there was to use uclamp to boost the > >>>> frequency of the handling thread; however if you use threaded IRQs you > >>>> get RT threads, which already get the max frequency by default (at > >>>> least > >>>> with schedutil). > >>>> > >>>> Does that not work for you, and if so, why? > >>> That doesn't work because almost everything is done by the hardware > >>> blocks > >>> without charge the CPU so the thread isn't running. I have done the > >>> tests with schedutil > >>> and ondemand scheduler (which is the one I'm targeting). I have no > >>> issues when using > >>> performance scheduler because it always keep the highest frequencies. > >> IMHO, the only way to ensure a min frequency for anything else than a > >> thread is to use freq_qos_add_request() just like cpufreq cooling > >> device but for the opposite QoS. This can be applied only on the > >> frequency domain of the CPU which handles the interrupt. > > I will give a try with this idea. > > Thanks. > > Adding freq_qos_add_request(FREQ_QOS_MIN) when starting streaming frames > solve my problem. I remove the request at the end of the streaming to > restore > the default value. You may as well add the request once at the init time with the request value set to PM_QOS_MIN_FREQUENCY_DEFAULT_VALUE initially and update it as needed going forward.