Message ID | 20240904031324.2901114-2-quic_sibis@quicinc.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | firmware: arm_scmi: Misc Fixes | expand |
On Wed, Sep 04, 2024 at 08:43:23AM +0530, Sibi Sankar wrote: > Currently the perf and powercap protocol relies on the protocol domain > attributes, which just ensures that one fastchannel per domain, before > instantiating fastchannels for all possible message-ids. Fix this by > ensuring that each message-id supports fastchannel before initialization. Please include the warnings that I reported seeing on x1e80100 and that this patch suppresses to the commit message: arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:0] - ret:-95. Using regular messaging. arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:1] - ret:-95. Using regular messaging. arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:2] - ret:-95. Using regular messaging. > Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support") And add: Reported-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/ (or use Closes: if you prefer). > Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com> > --- > > v1: > * add missing MSG_SUPPORTS_FASTCHANNEL definition. Unfortunately, this patch breaks resume from suspend on the x1e80100 crd: [ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011] [ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568) [ 26.987142] cpufreq: cpufreq_online: ->get() failed and then the machine hangs (mostly, I saw an nvme timeout message after a while). Make sure you test suspend as well as some of the warnings I reported only show up during suspend. Johan
On 4.09.2024 9:00 AM, Johan Hovold wrote: > On Wed, Sep 04, 2024 at 08:43:23AM +0530, Sibi Sankar wrote: >> Currently the perf and powercap protocol relies on the protocol domain >> attributes, which just ensures that one fastchannel per domain, before >> instantiating fastchannels for all possible message-ids. Fix this by >> ensuring that each message-id supports fastchannel before initialization. > > Please include the warnings that I reported seeing on x1e80100 and that > this patch suppresses to the commit message: > > arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:0] - ret:-95. Using regular messaging. > arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:1] - ret:-95. Using regular messaging. > arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:2] - ret:-95. Using regular messaging. > >> Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support") > > And add: > > Reported-by: Johan Hovold <johan+linaro@kernel.org> > Link: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/ > > (or use Closes: if you prefer). > >> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com> >> --- >> >> v1: >> * add missing MSG_SUPPORTS_FASTCHANNEL definition. > > Unfortunately, this patch breaks resume from suspend on the x1e80100 crd: > > [ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011] > [ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568) > [ 26.987142] cpufreq: cpufreq_online: ->get() failed > > and then the machine hangs (mostly, I saw an nvme timeout message after a > while). > > Make sure you test suspend as well as some of the warnings I reported > only show up during suspend. Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but the firmware fails to inform us about it through BIT(0) in attrs.. Konrad
On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote: > On 4.09.2024 9:00 AM, Johan Hovold wrote: [...] > > > > Unfortunately, this patch breaks resume from suspend on the x1e80100 crd: > > > > [ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011] > > [ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568) > > [ 26.987142] cpufreq: cpufreq_online: ->get() failed > > > > and then the machine hangs (mostly, I saw an nvme timeout message after a > > while). > > > > Make sure you test suspend as well as some of the warnings I reported > > only show up during suspend. > > Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but > the firmware fails to inform us about it through BIT(0) in attrs.. > Just trying to understand things better here. So the firmware expects OSPM to just use FC only for PERF_LEVEL_GET and hence doesn't implement the default/normal channel for PERF_LEVEL_GET(I assume it returns error ?) but fails to set the attribute indicating FC is available for the domain. I am not sure if that is stupid choice or there is some cost benefit in not implementing PERF_LEVEL_GET via normal channel if that is a fact. I am very much interested to know the reason either way especially if it is latter. -- Regards, Sudeep
On Wed, Sep 04, 2024 at 01:38:55PM +0100, Sudeep Holla wrote: > On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote: > > On 4.09.2024 9:00 AM, Johan Hovold wrote: > > [...] > > > > > > > Unfortunately, this patch breaks resume from suspend on the x1e80100 crd: > > > > > > [ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011] > > > [ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568) > > > [ 26.987142] cpufreq: cpufreq_online: ->get() failed > > > > > > and then the machine hangs (mostly, I saw an nvme timeout message after a > > > while). > > > > > > Make sure you test suspend as well as some of the warnings I reported > > > only show up during suspend. > > > > Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but > > the firmware fails to inform us about it through BIT(0) in attrs.. > > > > Just trying to understand things better here. So the firmware expects OSPM > to just use FC only for PERF_LEVEL_GET and hence doesn't implement the > default/normal channel for PERF_LEVEL_GET(I assume it returns error ?) > but fails to set the attribute indicating FC is available for the domain. > Is not that FCs are optional BUT PERF_LEVEL_GET standard messages is support is mandatory by the spec anyway ? Thanks, Cristian
On 4.09.2024 4:20 PM, Cristian Marussi wrote: > On Wed, Sep 04, 2024 at 01:38:55PM +0100, Sudeep Holla wrote: >> On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote: >>> On 4.09.2024 9:00 AM, Johan Hovold wrote: >> >> [...] >> >>>> >>>> Unfortunately, this patch breaks resume from suspend on the x1e80100 crd: >>>> >>>> [ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011] >>>> [ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568) >>>> [ 26.987142] cpufreq: cpufreq_online: ->get() failed >>>> >>>> and then the machine hangs (mostly, I saw an nvme timeout message after a >>>> while). >>>> >>>> Make sure you test suspend as well as some of the warnings I reported >>>> only show up during suspend. >>> >>> Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but >>> the firmware fails to inform us about it through BIT(0) in attrs.. >>> >> >> Just trying to understand things better here. So the firmware expects OSPM >> to just use FC only for PERF_LEVEL_GET and hence doesn't implement the >> default/normal channel for PERF_LEVEL_GET(I assume it returns error ?) >> but fails to set the attribute indicating FC is available for the domain. >> > > Is not that FCs are optional BUT PERF_LEVEL_GET standard messages is > support is mandatory by the spec anyway ? So doing a bit of poking I think it's that FC is not marked as supported, but we need to read out the frequency from the .get_addr.. which is only populated if we go through fastchannel_init Konrad
On 9/5/24 18:24, Konrad Dybcio wrote: > On 4.09.2024 4:20 PM, Cristian Marussi wrote: >> On Wed, Sep 04, 2024 at 01:38:55PM +0100, Sudeep Holla wrote: >>> On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote: >>>> On 4.09.2024 9:00 AM, Johan Hovold wrote: >>> >>> [...] >>> >>>>> >>>>> Unfortunately, this patch breaks resume from suspend on the x1e80100 crd: >>>>> >>>>> [ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011] >>>>> [ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568) >>>>> [ 26.987142] cpufreq: cpufreq_online: ->get() failed >>>>> >>>>> and then the machine hangs (mostly, I saw an nvme timeout message after a >>>>> while). >>>>> >>>>> Make sure you test suspend as well as some of the warnings I reported >>>>> only show up during suspend. >>>> >>>> Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but >>>> the firmware fails to inform us about it through BIT(0) in attrs.. >>>> >>> >>> Just trying to understand things better here. So the firmware expects OSPM >>> to just use FC only for PERF_LEVEL_GET and hence doesn't implement the >>> default/normal channel for PERF_LEVEL_GET(I assume it returns error ?) >>> but fails to set the attribute indicating FC is available for the domain. >>> >> >> Is not that FCs are optional BUT PERF_LEVEL_GET standard messages is >> support is mandatory by the spec anyway ? > > So doing a bit of poking I think it's that FC is not marked as supported, > but we need to read out the frequency from the .get_addr.. which is only > populated if we go through fastchannel_init On further debug it was found that the SCP was servicing the request but mailbox had the interrupt disabled during suspend which caused the timeout. I just re-spun the series wit hte fix. So yeah PERF_LEVEL_GET is expectedto work without any problems. There is no dependence on EC as Konrad speculated. Just a straight forward case of interrupt being disabled in the resume path. -Sibi > > Konrad
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index 69c15135371c..95d039152f79 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -56,6 +56,9 @@ static atomic_t transfer_last_id; static struct dentry *scmi_top_dentry; +static int scmi_protocol_msg_check(const struct scmi_protocol_handle *ph, + u32 message_id, u32 *attributes); + /** * struct scmi_xfers_info - Structure to manage transfer information * @@ -1841,6 +1844,7 @@ scmi_common_fastchannel_init(const struct scmi_protocol_handle *ph, int ret; u32 flags; u64 phys_addr; + u32 attributes; u8 size; void __iomem *addr; struct scmi_xfer *t; @@ -1849,6 +1853,11 @@ scmi_common_fastchannel_init(const struct scmi_protocol_handle *ph, struct scmi_msg_resp_desc_fc *resp; const struct scmi_protocol_instance *pi = ph_to_pi(ph); + /* Check if the MSG_ID supports fastchannel */ + ret = scmi_protocol_msg_check(ph, message_id, &attributes); + if (!ret && !MSG_SUPPORTS_FASTCHANNEL(attributes)) + return; + if (!p_addr) { ret = -EINVAL; goto err_out; diff --git a/drivers/firmware/arm_scmi/protocols.h b/drivers/firmware/arm_scmi/protocols.h index 8e95f53bd7b7..c1c5260f71c9 100644 --- a/drivers/firmware/arm_scmi/protocols.h +++ b/drivers/firmware/arm_scmi/protocols.h @@ -31,6 +31,8 @@ #define SCMI_PROTOCOL_VENDOR_BASE 0x80 +#define MSG_SUPPORTS_FASTCHANNEL(x) ((x) & BIT(0)) + enum scmi_common_cmd { PROTOCOL_VERSION = 0x0, PROTOCOL_ATTRIBUTES = 0x1,
Currently the perf and powercap protocol relies on the protocol domain attributes, which just ensures that one fastchannel per domain, before instantiating fastchannels for all possible message-ids. Fix this by ensuring that each message-id supports fastchannel before initialization. Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support") Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com> --- v1: * add missing MSG_SUPPORTS_FASTCHANNEL definition. drivers/firmware/arm_scmi/driver.c | 9 +++++++++ drivers/firmware/arm_scmi/protocols.h | 2 ++ 2 files changed, 11 insertions(+)