diff mbox series

[V2,1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel

Message ID 20240904031324.2901114-2-quic_sibis@quicinc.com (mailing list archive)
State Superseded
Headers show
Series firmware: arm_scmi: Misc Fixes | expand

Commit Message

Sibi Sankar Sept. 4, 2024, 3:13 a.m. UTC
Currently the perf and powercap protocol relies on the protocol domain
attributes, which just ensures that one fastchannel per domain, before
instantiating fastchannels for all possible message-ids. Fix this by
ensuring that each message-id supports fastchannel before initialization.

Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support")
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
---

v1:
* add missing MSG_SUPPORTS_FASTCHANNEL definition.

 drivers/firmware/arm_scmi/driver.c    | 9 +++++++++
 drivers/firmware/arm_scmi/protocols.h | 2 ++
 2 files changed, 11 insertions(+)

Comments

Johan Hovold Sept. 4, 2024, 7 a.m. UTC | #1
On Wed, Sep 04, 2024 at 08:43:23AM +0530, Sibi Sankar wrote:
> Currently the perf and powercap protocol relies on the protocol domain
> attributes, which just ensures that one fastchannel per domain, before
> instantiating fastchannels for all possible message-ids. Fix this by
> ensuring that each message-id supports fastchannel before initialization.

Please include the warnings that I reported seeing on x1e80100 and that
this patch suppresses to the commit message:

arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:0] - ret:-95. Using regular messaging.
arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:1] - ret:-95. Using regular messaging.
arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:2] - ret:-95. Using regular messaging.
 
> Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support")

And add:

Reported-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/

(or use Closes: if you prefer).

> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
> ---
> 
> v1:
> * add missing MSG_SUPPORTS_FASTCHANNEL definition.

Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:

        [   26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
        [   26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
        [   26.987142] cpufreq: cpufreq_online: ->get() failed

and then the machine hangs (mostly, I saw an nvme timeout message after a
while).

Make sure you test suspend as well as some of the warnings I reported
only show up during suspend.

Johan
Konrad Dybcio Sept. 4, 2024, 11:29 a.m. UTC | #2
On 4.09.2024 9:00 AM, Johan Hovold wrote:
> On Wed, Sep 04, 2024 at 08:43:23AM +0530, Sibi Sankar wrote:
>> Currently the perf and powercap protocol relies on the protocol domain
>> attributes, which just ensures that one fastchannel per domain, before
>> instantiating fastchannels for all possible message-ids. Fix this by
>> ensuring that each message-id supports fastchannel before initialization.
> 
> Please include the warnings that I reported seeing on x1e80100 and that
> this patch suppresses to the commit message:
> 
> arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:0] - ret:-95. Using regular messaging.
> arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:1] - ret:-95. Using regular messaging.
> arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:2] - ret:-95. Using regular messaging.
>  
>> Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support")
> 
> And add:
> 
> Reported-by: Johan Hovold <johan+linaro@kernel.org>
> Link: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> 
> (or use Closes: if you prefer).
> 
>> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
>> ---
>>
>> v1:
>> * add missing MSG_SUPPORTS_FASTCHANNEL definition.
> 
> Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:
> 
>         [   26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
>         [   26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
>         [   26.987142] cpufreq: cpufreq_online: ->get() failed
> 
> and then the machine hangs (mostly, I saw an nvme timeout message after a
> while).
> 
> Make sure you test suspend as well as some of the warnings I reported
> only show up during suspend.

Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but
the firmware fails to inform us about it through BIT(0) in attrs..

Konrad
Sudeep Holla Sept. 4, 2024, 12:38 p.m. UTC | #3
On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote:
> On 4.09.2024 9:00 AM, Johan Hovold wrote:

[...]

> >
> > Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:
> >
> >         [   26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
> >         [   26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
> >         [   26.987142] cpufreq: cpufreq_online: ->get() failed
> >
> > and then the machine hangs (mostly, I saw an nvme timeout message after a
> > while).
> >
> > Make sure you test suspend as well as some of the warnings I reported
> > only show up during suspend.
>
> Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but
> the firmware fails to inform us about it through BIT(0) in attrs..
>

Just trying to understand things better here. So the firmware expects OSPM
to just use FC only for PERF_LEVEL_GET and hence doesn't implement the
default/normal channel for PERF_LEVEL_GET(I assume it returns error ?)
but fails to set the attribute indicating FC is available for the domain.

I am not sure if that is stupid choice or there is some cost benefit in
not implementing PERF_LEVEL_GET via normal channel if that is a fact. I
am very much interested to know the reason either way especially if it
is latter.

--
Regards,
Sudeep
Cristian Marussi Sept. 4, 2024, 2:20 p.m. UTC | #4
On Wed, Sep 04, 2024 at 01:38:55PM +0100, Sudeep Holla wrote:
> On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote:
> > On 4.09.2024 9:00 AM, Johan Hovold wrote:
> 
> [...]
> 
> > >
> > > Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:
> > >
> > >         [   26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
> > >         [   26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
> > >         [   26.987142] cpufreq: cpufreq_online: ->get() failed
> > >
> > > and then the machine hangs (mostly, I saw an nvme timeout message after a
> > > while).
> > >
> > > Make sure you test suspend as well as some of the warnings I reported
> > > only show up during suspend.
> >
> > Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but
> > the firmware fails to inform us about it through BIT(0) in attrs..
> >
> 
> Just trying to understand things better here. So the firmware expects OSPM
> to just use FC only for PERF_LEVEL_GET and hence doesn't implement the
> default/normal channel for PERF_LEVEL_GET(I assume it returns error ?)
> but fails to set the attribute indicating FC is available for the domain.
> 

Is not that FCs are optional BUT PERF_LEVEL_GET standard messages is
support is mandatory by the spec anyway ?

Thanks,
Cristian
Konrad Dybcio Sept. 5, 2024, 12:54 p.m. UTC | #5
On 4.09.2024 4:20 PM, Cristian Marussi wrote:
> On Wed, Sep 04, 2024 at 01:38:55PM +0100, Sudeep Holla wrote:
>> On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote:
>>> On 4.09.2024 9:00 AM, Johan Hovold wrote:
>>
>> [...]
>>
>>>>
>>>> Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:
>>>>
>>>>         [   26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
>>>>         [   26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
>>>>         [   26.987142] cpufreq: cpufreq_online: ->get() failed
>>>>
>>>> and then the machine hangs (mostly, I saw an nvme timeout message after a
>>>> while).
>>>>
>>>> Make sure you test suspend as well as some of the warnings I reported
>>>> only show up during suspend.
>>>
>>> Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but
>>> the firmware fails to inform us about it through BIT(0) in attrs..
>>>
>>
>> Just trying to understand things better here. So the firmware expects OSPM
>> to just use FC only for PERF_LEVEL_GET and hence doesn't implement the
>> default/normal channel for PERF_LEVEL_GET(I assume it returns error ?)
>> but fails to set the attribute indicating FC is available for the domain.
>>
> 
> Is not that FCs are optional BUT PERF_LEVEL_GET standard messages is
> support is mandatory by the spec anyway ?

So doing a bit of poking I think it's that FC is not marked as supported,
but we need to read out the frequency from the .get_addr.. which is only
populated if we go through fastchannel_init

Konrad
Sibi Sankar Oct. 7, 2024, 6:46 a.m. UTC | #6
On 9/5/24 18:24, Konrad Dybcio wrote:
> On 4.09.2024 4:20 PM, Cristian Marussi wrote:
>> On Wed, Sep 04, 2024 at 01:38:55PM +0100, Sudeep Holla wrote:
>>> On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote:
>>>> On 4.09.2024 9:00 AM, Johan Hovold wrote:
>>>
>>> [...]
>>>
>>>>>
>>>>> Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:
>>>>>
>>>>>          [   26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
>>>>>          [   26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
>>>>>          [   26.987142] cpufreq: cpufreq_online: ->get() failed
>>>>>
>>>>> and then the machine hangs (mostly, I saw an nvme timeout message after a
>>>>> while).
>>>>>
>>>>> Make sure you test suspend as well as some of the warnings I reported
>>>>> only show up during suspend.
>>>>
>>>> Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but
>>>> the firmware fails to inform us about it through BIT(0) in attrs..
>>>>
>>>
>>> Just trying to understand things better here. So the firmware expects OSPM
>>> to just use FC only for PERF_LEVEL_GET and hence doesn't implement the
>>> default/normal channel for PERF_LEVEL_GET(I assume it returns error ?)
>>> but fails to set the attribute indicating FC is available for the domain.
>>>
>>
>> Is not that FCs are optional BUT PERF_LEVEL_GET standard messages is
>> support is mandatory by the spec anyway ?
> 
> So doing a bit of poking I think it's that FC is not marked as supported,
> but we need to read out the frequency from the .get_addr.. which is only
> populated if we go through fastchannel_init

On further debug it was found that the SCP was servicing the request
but mailbox had the interrupt disabled during suspend which caused the
timeout. I just re-spun the series wit hte fix. So yeah PERF_LEVEL_GET 
is expectedto work without any problems. There is no dependence on EC as
Konrad speculated. Just a straight forward case of interrupt being
disabled in the resume path.

-Sibi

> 
> Konrad
diff mbox series

Patch

diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index 69c15135371c..95d039152f79 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -56,6 +56,9 @@  static atomic_t transfer_last_id;
 
 static struct dentry *scmi_top_dentry;
 
+static int scmi_protocol_msg_check(const struct scmi_protocol_handle *ph,
+				   u32 message_id, u32 *attributes);
+
 /**
  * struct scmi_xfers_info - Structure to manage transfer information
  *
@@ -1841,6 +1844,7 @@  scmi_common_fastchannel_init(const struct scmi_protocol_handle *ph,
 	int ret;
 	u32 flags;
 	u64 phys_addr;
+	u32 attributes;
 	u8 size;
 	void __iomem *addr;
 	struct scmi_xfer *t;
@@ -1849,6 +1853,11 @@  scmi_common_fastchannel_init(const struct scmi_protocol_handle *ph,
 	struct scmi_msg_resp_desc_fc *resp;
 	const struct scmi_protocol_instance *pi = ph_to_pi(ph);
 
+	/* Check if the MSG_ID supports fastchannel */
+	ret = scmi_protocol_msg_check(ph, message_id, &attributes);
+	if (!ret && !MSG_SUPPORTS_FASTCHANNEL(attributes))
+		return;
+
 	if (!p_addr) {
 		ret = -EINVAL;
 		goto err_out;
diff --git a/drivers/firmware/arm_scmi/protocols.h b/drivers/firmware/arm_scmi/protocols.h
index 8e95f53bd7b7..c1c5260f71c9 100644
--- a/drivers/firmware/arm_scmi/protocols.h
+++ b/drivers/firmware/arm_scmi/protocols.h
@@ -31,6 +31,8 @@ 
 
 #define SCMI_PROTOCOL_VENDOR_BASE	0x80
 
+#define MSG_SUPPORTS_FASTCHANNEL(x)	((x) & BIT(0))
+
 enum scmi_common_cmd {
 	PROTOCOL_VERSION = 0x0,
 	PROTOCOL_ATTRIBUTES = 0x1,