Message ID | 20230818020616.4748-1-chun-jen.tseng@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cpufreq: mediatek: change transition delay for MT8186 | expand |
Hi Mark, I am not entirely clear by few things in the commit log. On 18-08-23, 10:06, Mark Tseng wrote: > For MT8186, it has policy0 and policy6 by different governor thread,so > it may be call cpufreq->set_target_index() by different core. Why does this matter ? > In general > case, it must check BCPU, LCPU and CCI together then take about 10ms. BCPU is Big CPU ? LCPU is Little CPU ? So are you saying that changing the frequency takes roughly 10 ms for MT8186 ? > Atfer 44295af5019f this patch, it may call cpufreq_out_of_sync() by > cpufreq_verify_current_freq() because current frequency is bigger > than clk_get_rate() ouver 1Mh. By the same time, it may call s/ouver/over/ s/1Mh/1 MHz/ > cpufreq->set_target_index() again. Where was it called for the first time ? > So, the CCI freq may be too lower for > BCPU cause BCPU kernel panic. I am not sure how a low frequency causes kernel panic here. > So, it should change the default transition delay 1ms to 10ms. It can > promise the next freq setting then governor trigger new freq change. There are few typos as well here, please fix them. > Fixes: 44295af5019f ("cpufreq: use correct unit when verify cur freq") I think you should drop this. The issue at hand may be visible now after 44295af5019f is applied, but it certainly didn't cause it.
On Mon, 2023-08-28 at 12:09 +0530, Viresh Kumar wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > Hi Mark, > > I am not entirely clear by few things in the commit log. > > On 18-08-23, 10:06, Mark Tseng wrote: > > For MT8186, it has policy0 and policy6 by different governor > thread,so > > it may be call cpufreq->set_target_index() by different core. > > Why does this matter ? For MT8186 set freq must Big CPU -> Little CPU -> CCI like this order and it takes 10 ms. But in cpufreq & devfreq flow , when Big CPU or Little CPU change freq , it will call CCI setting by different policy. It will become Big CPU -> CCI setting or Little CPU -> CCI setting. Howevery, It will cause CCI setting to wrong value when per 1ms call governor and change freq. > > In general > > case, it must check BCPU, LCPU and CCI together then take about > 10ms. > > BCPU is Big CPU ? LCPU is Little CPU ? Yes, it is. > So are you saying that changing the frequency takes roughly 10 ms for > MT8186 ? > > > Atfer 44295af5019f this patch, it may call cpufreq_out_of_sync() by > > cpufreq_verify_current_freq() because current frequency is bigger > > than clk_get_rate() over 1 Mhz. By the same time, it may call > > s/ouver/over/ > s/1Mh/1 MHz/ > > > > cpufreq->set_target_index() again. > > Where was it called for the first time ? this patch (44295af5019f) , fixed cpufreq_out_of_sync() condition from 1000 Mhz to 1 Mhz. So, when cpufreq_verify_current_freq() call clk_get_rate() over 1 Mhz, it must to do freq sync by cpufreq_out_of_sync(). In MT8186, it offen over 1 Mhz when call clk_get_rate(), so I modify transition delay from 1 ms to 10 ms for make sure freq setting done. > > So, the CCI freq may be too lower for > > BCPU cause BCPU kernel panic. > > I am not sure how a low frequency causes kernel panic here. In MT8186, if CCI freq is lower than Bit CPU freq 70%, it will causes kernel panic on Big CPU. > > So, it should change the default transition delay 1ms to 10ms. It > can > > promise the next freq setting then governor trigger new freq > change. > > There are few typos as well here, please fix them. > > > Fixes: 44295af5019f ("cpufreq: use correct unit when verify cur > freq") > > I think you should drop this. The issue at hand may be visible now > after 44295af5019f is applied, but it certainly didn't cause it. > > -- > viresh >
On 29-08-23, 06:57, Chun-Jen Tseng (曾俊仁) wrote: > For MT8186 set freq must Big CPU -> Little CPU -> CCI like this order > and it > takes 10 ms. > > But in cpufreq & devfreq flow , when Big CPU or Little CPU change freq > , it will call > CCI setting by different policy. It will become Big CPU -> CCI setting > or Little CPU -> > CCI setting. Howevery, It will cause CCI setting to wrong value when > per 1ms call governor > and change freq. > this patch (44295af5019f) , fixed cpufreq_out_of_sync() condition from > 1000 Mhz to 1 Mhz. > So, when cpufreq_verify_current_freq() call clk_get_rate() over 1 Mhz, > it must to do freq sync > by cpufreq_out_of_sync(). In MT8186, it offen over 1 Mhz when call > clk_get_rate(), so I modify > transition delay from 1 ms to 10 ms for make sure freq setting done. > In MT8186, if CCI freq is lower than Bit CPU freq 70%, it will causes > kernel panic > on Big CPU. Okay, I get it now. First of all, the kernel shouldn't crash because of a simple delay anywhere, like the latency delay here. If that is happening it means something else is wrong somewhere else. From what I understand now, your CCI have a constraint compared to the frequency of the BIG CPU (and little CPU too). You need something else that guarantees that the CCI is always configured in the right range. Perhaps a devfreq governor or something else that takes care of this. It shall evaluate the next state based on both big and little CPU frequencies and not only the caller via which we reached CCI. Please see how others have solved this. I am very sure this is working by chance here and will break with some other delay somewhere else. Fix the real cause of it.
On Tue, 2023-08-29 at 12:40 +0530, Viresh Kumar wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > On 29-08-23, 06:57, Chun-Jen Tseng (曾俊仁) wrote: > > For MT8186 set freq must Big CPU -> Little CPU -> CCI like this > order > > and it > > takes 10 ms. > > > > But in cpufreq & devfreq flow , when Big CPU or Little CPU change > freq > > , it will call > > CCI setting by different policy. It will become Big CPU -> CCI > setting > > or Little CPU -> > > CCI setting. Howevery, It will cause CCI setting to wrong value > when > > per 1ms call governor > > and change freq. > > > this patch (44295af5019f) , fixed cpufreq_out_of_sync() condition > from > > 1000 Mhz to 1 Mhz. > > So, when cpufreq_verify_current_freq() call clk_get_rate() over 1 > Mhz, > > it must to do freq sync > > by cpufreq_out_of_sync(). In MT8186, it offen over 1 Mhz when call > > clk_get_rate(), so I modify > > transition delay from 1 ms to 10 ms for make sure freq setting > done. > > > In MT8186, if CCI freq is lower than Big CPU freq 70%, it will > causes > > kernel panic > > on Big CPU. > > Okay, I get it now. First of all, the kernel shouldn't crash because > of a simple delay anywhere, like the latency delay here. If that is > happening it means something else is wrong somewhere else. > > From what I understand now, your CCI have a constraint compared to > the > frequency of the BIG CPU (and little CPU too). You need something > else > that guarantees that the CCI is always configured in the right range. > Perhaps a devfreq governor or something else that takes care of this. > It shall evaluate the next state based on both big and little CPU > frequencies and not only the caller via which we reached CCI. Please > see how others have solved this. > > I am very sure this is working by chance here and will break with > some > other delay somewhere else. Fix the real cause of it. > Actually, the root cause is the CPU freq setting finish time. If MT8186 needs 10 ms for two clusters findish setting CPU clock done, I should set transition delay 10 ms which avoid call clk_get_rate() get previous clock value. If I get previous CPU clock and it over 1 Mhz, the cpufreq_out_of_sync() will set CPU freq again but it wrong CPU freq. Howervery, transition delay seting is by individual SoC , it should not force 1 ms for all SoC. So, I wish I can do this patch here. > -- > viresh
On 29-08-23, 08:25, Chun-Jen Tseng (曾俊仁) wrote: > Actually, the root cause is the CPU freq setting finish time. If MT8186 > needs 10 ms for two clusters findish setting CPU clock done, I should > set transition delay 10 ms which avoid call clk_get_rate() get previous > clock value. If I get previous CPU clock and it over 1 Mhz, the > cpufreq_out_of_sync() will set CPU freq again but it wrong CPU freq. Even if another attempt is made to update the frequency, it shouldn't result in crashing the kernel. If it crashes, then there is something wrong here. > Howervery, transition delay seting is by individual SoC , it should not > force 1 ms for all SoC. So, I wish I can do this patch here. Its fine if you want to make it 1 second too :), the only thing is that you should do it for the right reason and I don't think we know it yet. Why exactly does the kernel crash here ? Any idea ?
On 29-08-23, 14:01, Viresh Kumar wrote:
> Why exactly does the kernel crash here ? Any idea ?
Also note that cpufreq core has enough locking in place to make sure
two ->target_index() function calls don't run in parallel for the same
policy.
What may be happening in your case is that you are configuring a
common entity (CCI) from both the policies and there is no locking in
place to take care of the races.
diff --git a/drivers/cpufreq/mediatek-cpufreq.c b/drivers/cpufreq/mediatek-cpufreq.c index a0a61919bc4c..5633a5357e8f 100644 --- a/drivers/cpufreq/mediatek-cpufreq.c +++ b/drivers/cpufreq/mediatek-cpufreq.c @@ -23,6 +23,7 @@ struct mtk_cpufreq_platform_data { int sram_min_volt; int sram_max_volt; bool ccifreq_supported; + unsigned int transition_delay_us; }; /* @@ -595,6 +596,7 @@ static int mtk_cpufreq_init(struct cpufreq_policy *policy) policy->freq_table = freq_table; policy->driver_data = info; policy->clk = info->cpu_clk; + policy->transition_delay_us = info->soc_data->transition_delay_us; return 0; } @@ -689,6 +691,7 @@ static const struct mtk_cpufreq_platform_data mt2701_platform_data = { .sram_min_volt = 0, .sram_max_volt = 1150000, .ccifreq_supported = false, + .transition_delay_us = 1000, }; static const struct mtk_cpufreq_platform_data mt7622_platform_data = { @@ -698,6 +701,7 @@ static const struct mtk_cpufreq_platform_data mt7622_platform_data = { .sram_min_volt = 0, .sram_max_volt = 1350000, .ccifreq_supported = false, + .transition_delay_us = 1000, }; static const struct mtk_cpufreq_platform_data mt7623_platform_data = { @@ -705,6 +709,7 @@ static const struct mtk_cpufreq_platform_data mt7623_platform_data = { .max_volt_shift = 200000, .proc_max_volt = 1300000, .ccifreq_supported = false, + .transition_delay_us = 1000, }; static const struct mtk_cpufreq_platform_data mt8183_platform_data = { @@ -714,6 +719,7 @@ static const struct mtk_cpufreq_platform_data mt8183_platform_data = { .sram_min_volt = 0, .sram_max_volt = 1150000, .ccifreq_supported = true, + .transition_delay_us = 1000, }; static const struct mtk_cpufreq_platform_data mt8186_platform_data = { @@ -723,6 +729,7 @@ static const struct mtk_cpufreq_platform_data mt8186_platform_data = { .sram_min_volt = 850000, .sram_max_volt = 1118750, .ccifreq_supported = true, + .transition_delay_us = 10000, }; static const struct mtk_cpufreq_platform_data mt8516_platform_data = { @@ -732,6 +739,7 @@ static const struct mtk_cpufreq_platform_data mt8516_platform_data = { .sram_min_volt = 0, .sram_max_volt = 1310000, .ccifreq_supported = false, + .transition_delay_us = 1000, }; /* List of machines supported by this driver */
For MT8186, it has policy0 and policy6 by different governor thread,so it may be call cpufreq->set_target_index() by different core. In general case, it must check BCPU, LCPU and CCI together then take about 10ms. Atfer 44295af5019f this patch, it may call cpufreq_out_of_sync() by cpufreq_verify_current_freq() because current frequency is bigger than clk_get_rate() ouver 1Mh. By the same time, it may call cpufreq->set_target_index() again. So, the CCI freq may be too lower for BCPU cause BCPU kernel panic. So, it should change the default transition delay 1ms to 10ms. It can promise the next freq setting then governor trigger new freq change. Fixes: 44295af5019f ("cpufreq: use correct unit when verify cur freq") Signed-off-by: Mark Tseng <chun-jen.tseng@mediatek.com> --- drivers/cpufreq/mediatek-cpufreq.c | 8 ++++++++ 1 file changed, 8 insertions(+)