mbox series

[V3,0/4] firmware: arm_scmi: Misc Fixes

Message ID 20241007060642.1978049-1-quic_sibis@quicinc.com (mailing list archive)
Headers show
Series firmware: arm_scmi: Misc Fixes | expand

Message

Sibi Sankar Oct. 7, 2024, 6:06 a.m. UTC
The series addresses the kernel warnings reported by Johan at [1] and are
are required to X1E cpufreq device tree changes [2] to land.

[1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
[2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/

The following warnings remain unadressed:
arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16

Duplicate levels:
arm-scmi arm-scmi.0.auto: Level 2976000 Power 218062 Latency 30us Ifreq 2976000 Index 10
arm-scmi arm-scmi.0.auto: Level 3206400 Power 264356 Latency 30us Ifreq 3206400 Index 11
arm-scmi arm-scmi.0.auto: Level 3417600 Power 314966 Latency 30us Ifreq 3417600 Index 12
arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
arm-scmi arm-scmi.0.auto: Level 4012800 Power 528848 Latency 30us Ifreq 4012800 Index 15

^^ exist because SCP reports duplicate values for the highest sustainable
freq for perf domains 1 and 2. These are the only freqs that appear as
duplicates and will be fixed with a firmware update. FWIW the warnings
that we are addressing in this series will also get fixed by a firmware
update but they still have to land for devices already out in the wild.

V2:
* Include the fix for do_xfer timeout
* Include the fix debugfs node creation failure
* Include Cristian's fix for skipping opp duplication

V1:
* add missing MSG_SUPPORTS_FASTCHANNEL definition.

Base branch: next-20241004

Cristian Marussi (1):
  firmware: arm_scmi: Skip adding bad duplicates

Sibi Sankar (3):
  firmware: arm_scmi: Ensure that the message-id supports fastchannel
  pmdomain: core: Fix debugfs node creation failure
  mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag

 drivers/firmware/arm_scmi/driver.c    |  9 ++++++
 drivers/firmware/arm_scmi/perf.c      | 37 +++++++++++++++++++------
 drivers/firmware/arm_scmi/protocols.h |  2 ++
 drivers/mailbox/qcom-cpucp-mbox.c     |  2 +-
 drivers/pmdomain/core.c               | 40 +++++++++++++++++----------
 include/linux/pm_domain.h             |  1 +
 6 files changed, 66 insertions(+), 25 deletions(-)

Comments

Ulf Hansson Oct. 9, 2024, 11:14 a.m. UTC | #1
On Mon, 7 Oct 2024 at 08:07, Sibi Sankar <quic_sibis@quicinc.com> wrote:
>
> The series addresses the kernel warnings reported by Johan at [1] and are
> are required to X1E cpufreq device tree changes [2] to land.
>
> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
>
> The following warnings remain unadressed:
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>
> Duplicate levels:
> arm-scmi arm-scmi.0.auto: Level 2976000 Power 218062 Latency 30us Ifreq 2976000 Index 10
> arm-scmi arm-scmi.0.auto: Level 3206400 Power 264356 Latency 30us Ifreq 3206400 Index 11
> arm-scmi arm-scmi.0.auto: Level 3417600 Power 314966 Latency 30us Ifreq 3417600 Index 12
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> arm-scmi arm-scmi.0.auto: Level 4012800 Power 528848 Latency 30us Ifreq 4012800 Index 15
>
> ^^ exist because SCP reports duplicate values for the highest sustainable
> freq for perf domains 1 and 2. These are the only freqs that appear as
> duplicates and will be fixed with a firmware update. FWIW the warnings
> that we are addressing in this series will also get fixed by a firmware
> update but they still have to land for devices already out in the wild.
>
> V2:
> * Include the fix for do_xfer timeout
> * Include the fix debugfs node creation failure
> * Include Cristian's fix for skipping opp duplication
>
> V1:
> * add missing MSG_SUPPORTS_FASTCHANNEL definition.
>
> Base branch: next-20241004
>
> Cristian Marussi (1):
>   firmware: arm_scmi: Skip adding bad duplicates
>
> Sibi Sankar (3):
>   firmware: arm_scmi: Ensure that the message-id supports fastchannel
>   pmdomain: core: Fix debugfs node creation failure
>   mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag
>

Is there a preferred way to merge this?

I can certainly pick the pmdomain patch, as it looks independent for
the other. Or let me know if you prefer that I take them all?

Kind regards
Uffe
Johan Hovold Oct. 10, 2024, 3:02 p.m. UTC | #2
On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> The series addresses the kernel warnings reported by Johan at [1] and are
> are required to X1E cpufreq device tree changes [2] to land.
> 
> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> 
> The following warnings remain unadressed:
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16

Are there any plans for how to address these?

Johan
Sibi Sankar Oct. 23, 2024, 7:46 a.m. UTC | #3
On 10/10/24 20:32, Johan Hovold wrote:
> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
>> The series addresses the kernel warnings reported by Johan at [1] and are
>> are required to X1E cpufreq device tree changes [2] to land.
>>
>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
>>
>> The following warnings remain unadressed:
>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> 
> Are there any plans for how to address these?

Hey Johan,
Sorry missed replying to this. The error implies that duplicate
opps are reported by the SCP firmware and appear once during probe.
This particular error can be fixed only by a firmware update and you
should be able to test it out soon on the CRD first.

"FWIW the warnings that we are addressing in this series will also get
fixed by a firmware update but they still have to land for devices
already out in the wild."


> 
> Johan
Johan Hovold Oct. 23, 2024, 4:26 p.m. UTC | #4
On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> On 10/10/24 20:32, Johan Hovold wrote:
> > On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> >> The series addresses the kernel warnings reported by Johan at [1] and are
> >> are required to X1E cpufreq device tree changes [2] to land.
> >>
> >> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> >> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> >>
> >> The following warnings remain unadressed:
> >> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > 
> > Are there any plans for how to address these?

> Sorry missed replying to this. The error implies that duplicate
> opps are reported by the SCP firmware and appear once during probe.

I only see it at boot, but it shows up four times here with the CRD:

[    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
[    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
[    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
[    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16

> This particular error can be fixed only by a firmware update and you
> should be able to test it out soon on the CRD first.

Can you explain why this can only be fixed by a firmware update? Why
can't we suppress these warnings as well, like we did for the other
warnings related to the duplicate entries?

IIUC the firmware is not really broken, but rather describes a feature
that Linux does not (yet) support, right?

Johan
Sibi Sankar Oct. 25, 2024, 6:08 a.m. UTC | #5
On 10/23/24 21:56, Johan Hovold wrote:
> On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
>> On 10/10/24 20:32, Johan Hovold wrote:
>>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
>>>> The series addresses the kernel warnings reported by Johan at [1] and are
>>>> are required to X1E cpufreq device tree changes [2] to land.
>>>>
>>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
>>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
>>>>
>>>> The following warnings remain unadressed:
>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>>
>>> Are there any plans for how to address these?
> 
>> Sorry missed replying to this. The error implies that duplicate
>> opps are reported by the SCP firmware and appear once during probe.
> 
> I only see it at boot, but it shows up four times here with the CRD:

https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/

As explained ^^, we see duplicates for max sustainable performance twice
for each domain.

> 
> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> 
>> This particular error can be fixed only by a firmware update and you
>> should be able to test it out soon on the CRD first.
> 
> Can you explain why this can only be fixed by a firmware update? Why
> can't we suppress these warnings as well, like we did for the other
> warnings related to the duplicate entries?
> 
> IIUC the firmware is not really broken, but rather describes a feature
> that Linux does not (yet) support, right?

We keep saying it's a buggy firmware because the SCP firmware reports
identical perf and power levels for the additional two opps and the
kernel has no way of treating it otherwise and we shouldn't suppress
them. Out of the two duplicate opps reported one is a artifact from how
Qualcomm usually show a transition to boost frequencies. The second opp
which you say is a feature should be treated as a boost opp i.e. one
core can run at max at a lower power when other cores are at idle but
we can start marking them as such once they start advertising their
correct power requirements. So I maintain that this is the best we
can do and need a firmware update for us to address anything more.

-Sibi


> 
> Johan
Dmitry Baryshkov Oct. 25, 2024, 6:14 a.m. UTC | #6
On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> 
> 
> On 10/23/24 21:56, Johan Hovold wrote:
> > On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> > > On 10/10/24 20:32, Johan Hovold wrote:
> > > > On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> > > > > The series addresses the kernel warnings reported by Johan at [1] and are
> > > > > are required to X1E cpufreq device tree changes [2] to land.
> > > > > 
> > > > > [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > > > > [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> > > > > 
> > > > > The following warnings remain unadressed:
> > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > 
> > > > Are there any plans for how to address these?
> > 
> > > Sorry missed replying to this. The error implies that duplicate
> > > opps are reported by the SCP firmware and appear once during probe.
> > 
> > I only see it at boot, but it shows up four times here with the CRD:
> 
> https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
> 
> As explained ^^, we see duplicates for max sustainable performance twice
> for each domain.

If existing products were shipped with the firmware that lists single
freq twice, please filter the frequencies like qcom-cpufreq-hw does.

> 
> > 
> > [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > 
> > > This particular error can be fixed only by a firmware update and you
> > > should be able to test it out soon on the CRD first.
> > 
> > Can you explain why this can only be fixed by a firmware update? Why
> > can't we suppress these warnings as well, like we did for the other
> > warnings related to the duplicate entries?
> > 
> > IIUC the firmware is not really broken, but rather describes a feature
> > that Linux does not (yet) support, right?
> 
> We keep saying it's a buggy firmware because the SCP firmware reports
> identical perf and power levels for the additional two opps and the
> kernel has no way of treating it otherwise and we shouldn't suppress
> them. Out of the two duplicate opps reported one is a artifact from how
> Qualcomm usually show a transition to boost frequencies. The second opp
> which you say is a feature should be treated as a boost opp i.e. one
> core can run at max at a lower power when other cores are at idle but
> we can start marking them as such once they start advertising their
> correct power requirements. So I maintain that this is the best we
> can do and need a firmware update for us to address anything more.

Will existing shipping products get these firmware updates?
Sibi Sankar Oct. 25, 2024, 6:45 a.m. UTC | #7
On 10/25/24 11:44, Dmitry Baryshkov wrote:
> On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
>>
>>
>> On 10/23/24 21:56, Johan Hovold wrote:
>>> On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
>>>> On 10/10/24 20:32, Johan Hovold wrote:
>>>>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
>>>>>> The series addresses the kernel warnings reported by Johan at [1] and are
>>>>>> are required to X1E cpufreq device tree changes [2] to land.
>>>>>>
>>>>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
>>>>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
>>>>>>
>>>>>> The following warnings remain unadressed:
>>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>>>>
>>>>> Are there any plans for how to address these?
>>>
>>>> Sorry missed replying to this. The error implies that duplicate
>>>> opps are reported by the SCP firmware and appear once during probe.
>>>
>>> I only see it at boot, but it shows up four times here with the CRD:
>>
>> https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
>>
>> As explained ^^, we see duplicates for max sustainable performance twice
>> for each domain.
> 
> If existing products were shipped with the firmware that lists single
> freq twice, please filter the frequencies like qcom-cpufreq-hw does.

That was a qualcomm specific driver and hence we could do such
kind of filtering. This however is the generic scmi perf protocol
and it's not something we should ever consider introducing :/

> 
>>
>>>
>>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>>
>>>> This particular error can be fixed only by a firmware update and you
>>>> should be able to test it out soon on the CRD first.
>>>
>>> Can you explain why this can only be fixed by a firmware update? Why
>>> can't we suppress these warnings as well, like we did for the other
>>> warnings related to the duplicate entries?
>>>
>>> IIUC the firmware is not really broken, but rather describes a feature
>>> that Linux does not (yet) support, right?
>>
>> We keep saying it's a buggy firmware because the SCP firmware reports
>> identical perf and power levels for the additional two opps and the
>> kernel has no way of treating it otherwise and we shouldn't suppress
>> them. Out of the two duplicate opps reported one is a artifact from how
>> Qualcomm usually show a transition to boost frequencies. The second opp
>> which you say is a feature should be treated as a boost opp i.e. one
>> core can run at max at a lower power when other cores are at idle but
>> we can start marking them as such once they start advertising their
>> correct power requirements. So I maintain that this is the best we
>> can do and need a firmware update for us to address anything more.
> 
> Will existing shipping products get these firmware updates?

They are sure to trickle out but I guess it's upto the oem
to decide if they do want to pick these up like some of the
other firmware updates being tested only on CRD. Not sure why
warnings duplicates should block cpufreq from landing for x1e
but if that's what the community wants I can drop reposting
this series!

-Sibi

>
Cristian Marussi Oct. 25, 2024, 8:28 a.m. UTC | #8
On Fri, Oct 25, 2024 at 12:15:59PM +0530, Sibi Sankar wrote:
> 
> 
> On 10/25/24 11:44, Dmitry Baryshkov wrote:
> > On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> > > 

Hi,

> > > 
> > > On 10/23/24 21:56, Johan Hovold wrote:
> > > > On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> > > > > On 10/10/24 20:32, Johan Hovold wrote:
> > > > > > On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> > > > > > > The series addresses the kernel warnings reported by Johan at [1] and are
> > > > > > > are required to X1E cpufreq device tree changes [2] to land.
> > > > > > > 
> > > > > > > [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > > > > > > [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> > > > > > > 
> > > > > > > The following warnings remain unadressed:
> > > > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > > 
> > > > > > Are there any plans for how to address these?
> > > > 
> > > > > Sorry missed replying to this. The error implies that duplicate
> > > > > opps are reported by the SCP firmware and appear once during probe.
> > > > 
> > > > I only see it at boot, but it shows up four times here with the CRD:
> > > 
> > > https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
> > > 
> > > As explained ^^, we see duplicates for max sustainable performance twice
> > > for each domain.
> > 
> > If existing products were shipped with the firmware that lists single
> > freq twice, please filter the frequencies like qcom-cpufreq-hw does.
> 
> That was a qualcomm specific driver and hence we could do such
> kind of filtering. This however is the generic scmi perf protocol
> and it's not something we should ever consider introducing :/
> 

+1

In the case of the other warnings, those were similarly related to
duplicates, but the warns themselves were genereated by the OPP
subsystem when trying to register a duplicate...it was indeed a bug
at the SCMI layer to try registering a well-known duplicate with
the OPP subsytem, so it was fixed in the SCMI stack...avoiding to
propagate it to the OPP layer...but the duplicate error condition
indeed still exist (since dependent on what the fw spits out) and they
are trapped at the SCMI level and those noisy warning are just meant
to trap this kind of firmware anomalies...

...IOW who would have ever spotted this thing and considered to fix the
firmware in future releases without the warnings :P ?

> > 
> > > 
> > > > 
> > > > [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > 
> > > > > This particular error can be fixed only by a firmware update and you
> > > > > should be able to test it out soon on the CRD first.
> > > > 
> > > > Can you explain why this can only be fixed by a firmware update? Why
> > > > can't we suppress these warnings as well, like we did for the other
> > > > warnings related to the duplicate entries?
> > > > 
> > > > IIUC the firmware is not really broken, but rather describes a feature
> > > > that Linux does not (yet) support, right?
> > > 
> > > We keep saying it's a buggy firmware because the SCP firmware reports
> > > identical perf and power levels for the additional two opps and the
> > > kernel has no way of treating it otherwise and we shouldn't suppress
> > > them. Out of the two duplicate opps reported one is a artifact from how
> > > Qualcomm usually show a transition to boost frequencies. The second opp
> > > which you say is a feature should be treated as a boost opp i.e. one
> > > core can run at max at a lower power when other cores are at idle but
> > > we can start marking them as such once they start advertising their
> > > correct power requirements. So I maintain that this is the best we
> > > can do and need a firmware update for us to address anything more.
> > 
> > Will existing shipping products get these firmware updates?
> 
> They are sure to trickle out but I guess it's upto the oem
> to decide if they do want to pick these up like some of the
> other firmware updates being tested only on CRD. Not sure why
> warnings duplicates should block cpufreq from landing for x1e
> but if that's what the community wants I can drop reposting
> this series!

Not sure indeed which is the problem with such warnings since they are
just doing their job...in general we tend not to disrupt operation even
when the firmware is buggy (if possible) BUT we definitely try to be
noisy about that to have firmware fixed (...not that fw guys seems so
scared usually about warnings though :P)

Anyway, I'm totally with Sibi here unless there is an impact at the
functional level...Sudeep may think otherwise of course ...

Thanks
Cristian
Dmitry Baryshkov Oct. 25, 2024, 10:11 a.m. UTC | #9
On Fri, 25 Oct 2024 at 09:46, Sibi Sankar <quic_sibis@quicinc.com> wrote:
>
>
>
> On 10/25/24 11:44, Dmitry Baryshkov wrote:
> > On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> >>
> >>
> >> On 10/23/24 21:56, Johan Hovold wrote:
> >>> On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> >>>> On 10/10/24 20:32, Johan Hovold wrote:
> >>>>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> >>>>>> The series addresses the kernel warnings reported by Johan at [1] and are
> >>>>>> are required to X1E cpufreq device tree changes [2] to land.
> >>>>>>
> >>>>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> >>>>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> >>>>>>
> >>>>>> The following warnings remain unadressed:
> >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>>>>
> >>>>> Are there any plans for how to address these?
> >>>
> >>>> Sorry missed replying to this. The error implies that duplicate
> >>>> opps are reported by the SCP firmware and appear once during probe.
> >>>
> >>> I only see it at boot, but it shows up four times here with the CRD:
> >>
> >> https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
> >>
> >> As explained ^^, we see duplicates for max sustainable performance twice
> >> for each domain.
> >
> > If existing products were shipped with the firmware that lists single
> > freq twice, please filter the frequencies like qcom-cpufreq-hw does.
>
> That was a qualcomm specific driver and hence we could do such
> kind of filtering. This however is the generic scmi perf protocol
> and it's not something we should ever consider introducing :/

This depends on the maintainer's discretion.

>
> >
> >>
> >>>
> >>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>>
> >>>> This particular error can be fixed only by a firmware update and you
> >>>> should be able to test it out soon on the CRD first.
> >>>
> >>> Can you explain why this can only be fixed by a firmware update? Why
> >>> can't we suppress these warnings as well, like we did for the other
> >>> warnings related to the duplicate entries?
> >>>
> >>> IIUC the firmware is not really broken, but rather describes a feature
> >>> that Linux does not (yet) support, right?
> >>
> >> We keep saying it's a buggy firmware because the SCP firmware reports
> >> identical perf and power levels for the additional two opps and the
> >> kernel has no way of treating it otherwise and we shouldn't suppress
> >> them. Out of the two duplicate opps reported one is a artifact from how
> >> Qualcomm usually show a transition to boost frequencies. The second opp
> >> which you say is a feature should be treated as a boost opp i.e. one
> >> core can run at max at a lower power when other cores are at idle but
> >> we can start marking them as such once they start advertising their
> >> correct power requirements. So I maintain that this is the best we
> >> can do and need a firmware update for us to address anything more.
> >
> > Will existing shipping products get these firmware updates?
>
> They are sure to trickle out but I guess it's upto the oem
> to decide if they do want to pick these up like some of the
> other firmware updates being tested only on CRD. Not sure why
> warnings duplicates should block cpufreq from landing for x1e
> but if that's what the community wants I can drop reposting
> this series!

No, the community definitely wants to have cpufreq for X1E.
But at the same time some reviewers prefer to have a warning-free boot
if those warnings can't be really fixed. I don't have such a strict
position, but I'd prefer to see those messages at dev_info or dev_dbg
level.

Also, can we please have some plan or idea if somebody is actually
working with laptop vendors to get corresponding firmware updates onto
their hardware?
Cristian Marussi Oct. 25, 2024, 10:29 a.m. UTC | #10
On Fri, Oct 25, 2024 at 01:11:37PM +0300, Dmitry Baryshkov wrote:
> On Fri, 25 Oct 2024 at 09:46, Sibi Sankar <quic_sibis@quicinc.com> wrote:
> >
> >
> >
> > On 10/25/24 11:44, Dmitry Baryshkov wrote:
> > > On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> > >>
> > >>
> > >> On 10/23/24 21:56, Johan Hovold wrote:
> > >>> On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> > >>>> On 10/10/24 20:32, Johan Hovold wrote:
> > >>>>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> > >>>>>> The series addresses the kernel warnings reported by Johan at [1] and are
> > >>>>>> are required to X1E cpufreq device tree changes [2] to land.
> > >>>>>>
> > >>>>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > >>>>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> > >>>>>>
> > >>>>>> The following warnings remain unadressed:
> > >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>>>>
> > >>>>> Are there any plans for how to address these?
> > >>>
> > >>>> Sorry missed replying to this. The error implies that duplicate
> > >>>> opps are reported by the SCP firmware and appear once during probe.
> > >>>
> > >>> I only see it at boot, but it shows up four times here with the CRD:
> > >>
> > >> https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
> > >>
> > >> As explained ^^, we see duplicates for max sustainable performance twice
> > >> for each domain.
> > >
> > > If existing products were shipped with the firmware that lists single
> > > freq twice, please filter the frequencies like qcom-cpufreq-hw does.
> >
> > That was a qualcomm specific driver and hence we could do such
> > kind of filtering. This however is the generic scmi perf protocol
> > and it's not something we should ever consider introducing :/
> 
> This depends on the maintainer's discretion.
> 
> >
> > >
> > >>
> > >>>
> > >>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>>
> > >>>> This particular error can be fixed only by a firmware update and you
> > >>>> should be able to test it out soon on the CRD first.
> > >>>
> > >>> Can you explain why this can only be fixed by a firmware update? Why
> > >>> can't we suppress these warnings as well, like we did for the other
> > >>> warnings related to the duplicate entries?
> > >>>
> > >>> IIUC the firmware is not really broken, but rather describes a feature
> > >>> that Linux does not (yet) support, right?
> > >>
> > >> We keep saying it's a buggy firmware because the SCP firmware reports
> > >> identical perf and power levels for the additional two opps and the
> > >> kernel has no way of treating it otherwise and we shouldn't suppress
> > >> them. Out of the two duplicate opps reported one is a artifact from how
> > >> Qualcomm usually show a transition to boost frequencies. The second opp
> > >> which you say is a feature should be treated as a boost opp i.e. one
> > >> core can run at max at a lower power when other cores are at idle but
> > >> we can start marking them as such once they start advertising their
> > >> correct power requirements. So I maintain that this is the best we
> > >> can do and need a firmware update for us to address anything more.
> > >
> > > Will existing shipping products get these firmware updates?
> >
> > They are sure to trickle out but I guess it's upto the oem
> > to decide if they do want to pick these up like some of the
> > other firmware updates being tested only on CRD. Not sure why
> > warnings duplicates should block cpufreq from landing for x1e
> > but if that's what the community wants I can drop reposting
> > this series!
> 
> No, the community definitely wants to have cpufreq for X1E.
> But at the same time some reviewers prefer to have a warning-free boot
> if those warnings can't be really fixed. I don't have such a strict
> position, but I'd prefer to see those messages at dev_info or dev_dbg
> level.

I think dev_info could be an option from the SCMI perspective (as per my
other mail), the important thing in these regards is to NOT go
completely silent against fw anomalies...to avoid the the risk of being
silently ignored .... I'll see what Sudeep thinks about...

Thanks,
Cristian
Dmitry Baryshkov Oct. 25, 2024, 11:37 a.m. UTC | #11
On Fri, 25 Oct 2024 at 13:29, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> On Fri, Oct 25, 2024 at 01:11:37PM +0300, Dmitry Baryshkov wrote:
> > On Fri, 25 Oct 2024 at 09:46, Sibi Sankar <quic_sibis@quicinc.com> wrote:
> > >
> > >
> > >
> > > On 10/25/24 11:44, Dmitry Baryshkov wrote:
> > > > On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> > > >>
> > > >>
> > > >> On 10/23/24 21:56, Johan Hovold wrote:
> > > >>> On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> > > >>>> On 10/10/24 20:32, Johan Hovold wrote:
> > > >>>>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> > > >>>>>> The series addresses the kernel warnings reported by Johan at [1] and are
> > > >>>>>> are required to X1E cpufreq device tree changes [2] to land.
> > > >>>>>>
> > > >>>>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > > >>>>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> > > >>>>>>
> > > >>>>>> The following warnings remain unadressed:
> > > >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>>>>
> > > >>>>> Are there any plans for how to address these?
> > > >>>
> > > >>>> Sorry missed replying to this. The error implies that duplicate
> > > >>>> opps are reported by the SCP firmware and appear once during probe.
> > > >>>
> > > >>> I only see it at boot, but it shows up four times here with the CRD:
> > > >>
> > > >> https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
> > > >>
> > > >> As explained ^^, we see duplicates for max sustainable performance twice
> > > >> for each domain.
> > > >
> > > > If existing products were shipped with the firmware that lists single
> > > > freq twice, please filter the frequencies like qcom-cpufreq-hw does.
> > >
> > > That was a qualcomm specific driver and hence we could do such
> > > kind of filtering. This however is the generic scmi perf protocol
> > > and it's not something we should ever consider introducing :/
> >
> > This depends on the maintainer's discretion.
> >
> > >
> > > >
> > > >>
> > > >>>
> > > >>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>>
> > > >>>> This particular error can be fixed only by a firmware update and you
> > > >>>> should be able to test it out soon on the CRD first.
> > > >>>
> > > >>> Can you explain why this can only be fixed by a firmware update? Why
> > > >>> can't we suppress these warnings as well, like we did for the other
> > > >>> warnings related to the duplicate entries?
> > > >>>
> > > >>> IIUC the firmware is not really broken, but rather describes a feature
> > > >>> that Linux does not (yet) support, right?
> > > >>
> > > >> We keep saying it's a buggy firmware because the SCP firmware reports
> > > >> identical perf and power levels for the additional two opps and the
> > > >> kernel has no way of treating it otherwise and we shouldn't suppress
> > > >> them. Out of the two duplicate opps reported one is a artifact from how
> > > >> Qualcomm usually show a transition to boost frequencies. The second opp
> > > >> which you say is a feature should be treated as a boost opp i.e. one
> > > >> core can run at max at a lower power when other cores are at idle but
> > > >> we can start marking them as such once they start advertising their
> > > >> correct power requirements. So I maintain that this is the best we
> > > >> can do and need a firmware update for us to address anything more.
> > > >
> > > > Will existing shipping products get these firmware updates?
> > >
> > > They are sure to trickle out but I guess it's upto the oem
> > > to decide if they do want to pick these up like some of the
> > > other firmware updates being tested only on CRD. Not sure why
> > > warnings duplicates should block cpufreq from landing for x1e
> > > but if that's what the community wants I can drop reposting
> > > this series!
> >
> > No, the community definitely wants to have cpufreq for X1E.
> > But at the same time some reviewers prefer to have a warning-free boot
> > if those warnings can't be really fixed. I don't have such a strict
> > position, but I'd prefer to see those messages at dev_info or dev_dbg
> > level.
>
> I think dev_info could be an option from the SCMI perspective (as per my
> other mail), the important thing in these regards is to NOT go
> completely silent against fw anomalies...to avoid the the risk of being
> silently ignored .... I'll see what Sudeep thinks about...

Absolutely. SCMI layer knows that such a problem might exist and knows
how to handle it, so it should bug the user in a polite way.
Johan Hovold Oct. 25, 2024, 1:23 p.m. UTC | #12
On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> On 10/23/24 21:56, Johan Hovold wrote:
> > On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> >> On 10/10/24 20:32, Johan Hovold wrote:
> >>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> >>>> The series addresses the kernel warnings reported by Johan at [1] and are
> >>>> are required to X1E cpufreq device tree changes [2] to land.
> >>>>
> >>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> >>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> >>>>
> >>>> The following warnings remain unadressed:
> >>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>>
> >>> Are there any plans for how to address these?

> >> This particular error can be fixed only by a firmware update and you
> >> should be able to test it out soon on the CRD first.
> > 
> > Can you explain why this can only be fixed by a firmware update? Why
> > can't we suppress these warnings as well, like we did for the other
> > warnings related to the duplicate entries?
> > 
> > IIUC the firmware is not really broken, but rather describes a feature
> > that Linux does not (yet) support, right?
> 
> We keep saying it's a buggy firmware because the SCP firmware reports
> identical perf and power levels for the additional two opps and the
> kernel has no way of treating it otherwise and we shouldn't suppress
> them. Out of the two duplicate opps reported one is a artifact from how
> Qualcomm usually show a transition to boost frequencies. The second opp
> which you say is a feature should be treated as a boost opp i.e. one
> core can run at max at a lower power when other cores are at idle but
> we can start marking them as such once they start advertising their
> correct power requirements. So I maintain that this is the best we
> can do and need a firmware update for us to address anything more.

Fair enough, but if you end up respinning the series, please say
something about this in the cover letter so that we know why those
warnings are (rightly) left in place.

Johan
Johan Hovold Oct. 25, 2024, 1:32 p.m. UTC | #13
On Fri, Oct 25, 2024 at 11:29:05AM +0100, Cristian Marussi wrote:

> > > >>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16

> I think dev_info could be an option from the SCMI perspective (as per my
> other mail), the important thing in these regards is to NOT go
> completely silent against fw anomalies...to avoid the the risk of being
> silently ignored .... I'll see what Sudeep thinks about...

I agree.

But could the error handling be improved to look less scary for an end
user by saying something about duplicate entries being ignored instead
perhaps?

Printing something at info level and with a FW_BUG ("[Firmware Bug]: ")
prefix as was done here:

	https://lore.kernel.org/all/20230414084619.31524-1-johan+linaro@kernel.org/

should make it clear that this is not something for end users to worry
(too much) about.

Johan
Cristian Marussi Oct. 25, 2024, 1:48 p.m. UTC | #14
On Fri, Oct 25, 2024 at 03:32:53PM +0200, Johan Hovold wrote:
> On Fri, Oct 25, 2024 at 11:29:05AM +0100, Cristian Marussi wrote:
> 
> > > > >>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > >>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > >>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > >>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> 
> > I think dev_info could be an option from the SCMI perspective (as per my
> > other mail), the important thing in these regards is to NOT go
> > completely silent against fw anomalies...to avoid the the risk of being
> > silently ignored .... I'll see what Sudeep thinks about...
> 
> I agree.
> 
> But could the error handling be improved to look less scary for an end
> user by saying something about duplicate entries being ignored instead
> perhaps?
> 
> Printing something at info level and with a FW_BUG ("[Firmware Bug]: ")
> prefix as was done here:
> 
> 	https://lore.kernel.org/all/20230414084619.31524-1-johan+linaro@kernel.org/
> 
> should make it clear that this is not something for end users to worry
> (too much) about.

Sure...and thanks for the suggestion....I will cook something up around
this....

(I am probably too used to try to scary the FW guys that I forgot there
are also innocent bystanders like final users :P)

Thanks,
Cristian