diff mbox series

pmdomain: arm: Fix debugfs node creation failure

Message ID 20240703110741.2668800-1-quic_sibis@quicinc.com (mailing list archive)
State Not Applicable
Headers show
Series pmdomain: arm: Fix debugfs node creation failure | expand

Commit Message

Sibi Sankar July 3, 2024, 11:07 a.m. UTC
The domain attributes returned by the perf protocol can end up
reporting identical names across domains, resulting in debugfs
node creation failure. Fix this duplication by appending the
domain-id to the domain name.

Logs:
debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
debugfs: Directory 'NCC' with parent 'pm_genpd' already present!

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
---
 drivers/pmdomain/arm/scmi_perf_domain.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Sudeep Holla July 4, 2024, 10:32 a.m. UTC | #1
On Wed, Jul 03, 2024 at 04:37:41PM +0530, Sibi Sankar wrote:
> The domain attributes returned by the perf protocol can end up
> reporting identical names across domains, resulting in debugfs
> node creation failure. Fix this duplication by appending the
> domain-id to the domain name.
> 
> Logs:
> debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
> debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
>

If there are 2 perf domains for a device or group of devices, there must
be something unique about each of these domains. Why can't the firmware
specify the uniqueness or the difference via the name?

The example above seems firmware is being just lazy to update it. Also
for the user/developer/debugger, the unique name might be more useful
than just this number.

So please use the name(we must now have extended name if 16bytes are less)
to provide unique names. Please stop working around such silly firmware
bugs like this, it just makes using debugfs for anything useful harder.

--
Regards,
Sudeep
Sibi Sankar July 5, 2024, 3:46 a.m. UTC | #2
On 7/4/24 16:02, Sudeep Holla wrote:
> On Wed, Jul 03, 2024 at 04:37:41PM +0530, Sibi Sankar wrote:
>> The domain attributes returned by the perf protocol can end up
>> reporting identical names across domains, resulting in debugfs
>> node creation failure. Fix this duplication by appending the
>> domain-id to the domain name.

Hey Sudeep,

Thanks for taking time to review the patch :)

>>
>> Logs:
>> debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
>> debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
>>
> 
> If there are 2 perf domains for a device or group of devices, there must
> be something unique about each of these domains. Why can't the firmware
> specify the uniqueness or the difference via the name?
> 
> The example above seems firmware is being just lazy to update it. Also
> for the user/developer/debugger, the unique name might be more useful
> than just this number.
> 
> So please use the name(we must now have extended name if 16bytes are less)
> to provide unique names. Please stop working around such silly firmware
> bugs like this, it just makes using debugfs for anything useful harder.

This is just meant to address firmware that are already out in the wild.
That being said I don't necessarily agree with the patch either since
it's penalizing firmware that actually uses a proper name by appending
something inherently less useful to it. Since, the using of an unique
domain name isn't required by the spec, the need for it goes under the 
radar for vendors. Mandating it might be the right thing to do since
the kernel seems inherently expect that.

-Sibi

> 
> --
> Regards,
> Sudeep
Sudeep Holla July 5, 2024, 1:04 p.m. UTC | #3
On Fri, Jul 05, 2024 at 09:16:29AM +0530, Sibi Sankar wrote:
>
> On 7/4/24 16:02, Sudeep Holla wrote:
> >
> > If there are 2 perf domains for a device or group of devices, there must
> > be something unique about each of these domains. Why can't the firmware
> > specify the uniqueness or the difference via the name?
> >
> > The example above seems firmware is being just lazy to update it. Also
> > for the user/developer/debugger, the unique name might be more useful
> > than just this number.
> >
> > So please use the name(we must now have extended name if 16bytes are less)
> > to provide unique names. Please stop working around such silly firmware
> > bugs like this, it just makes using debugfs for anything useful harder.
>
> This is just meant to address firmware that are already out in the wild.
> That being said I don't necessarily agree with the patch either since
> it's penalizing firmware that actually uses a proper name by appending
> something inherently less useful to it. Since, the using of an unique
> domain name isn't required by the spec, the need for it goes under the radar
> for vendors. Mandating it might be the right thing to do since
> the kernel seems inherently expect that.
>

Well I would love if spec authors can agree and mandate this. But this is
one of those things I can't argue as I don't necessarily agree with the
argument. There are 2 distinct/unique domains but firmware authors ran out
of unique names for them or just can't be bothered to care about it.

They can't run out of characters as well in above examples, firmware can
add some useless domain ID in the name if they can't be bothered or creative.

So I must admit I can't be bothered as well with that honestly.
--
Regards,
Sudeep
Sibi Sankar July 8, 2024, 10:07 a.m. UTC | #4
On 7/5/24 18:34, Sudeep Holla wrote:
> On Fri, Jul 05, 2024 at 09:16:29AM +0530, Sibi Sankar wrote:
>>
>> On 7/4/24 16:02, Sudeep Holla wrote:
>>>
>>> If there are 2 perf domains for a device or group of devices, there must
>>> be something unique about each of these domains. Why can't the firmware
>>> specify the uniqueness or the difference via the name?
>>>
>>> The example above seems firmware is being just lazy to update it. Also
>>> for the user/developer/debugger, the unique name might be more useful
>>> than just this number.
>>>
>>> So please use the name(we must now have extended name if 16bytes are less)
>>> to provide unique names. Please stop working around such silly firmware
>>> bugs like this, it just makes using debugfs for anything useful harder.
>>
>> This is just meant to address firmware that are already out in the wild.
>> That being said I don't necessarily agree with the patch either since
>> it's penalizing firmware that actually uses a proper name by appending
>> something inherently less useful to it. Since, the using of an unique
>> domain name isn't required by the spec, the need for it goes under the radar
>> for vendors. Mandating it might be the right thing to do since
>> the kernel seems inherently expect that.
>>
> 
> Well I would love if spec authors can agree and mandate this. But this is
> one of those things I can't argue as I don't necessarily agree with the
> argument. There are 2 distinct/unique domains but firmware authors ran out
> of unique names for them or just can't be bothered to care about it.
> 
> They can't run out of characters as well in above examples, firmware can
> add some useless domain ID in the name if they can't be bothered or creative.
> 
> So I must admit I can't be bothered as well with that honestly.

Okay, I guess the conclusion is that if the firmware vendors
don't care enough to provide unique names, they get to live
without those debugfs nodes.

Do we really want to register/expose scmi perf power-domains used by
the CPU nodes? Given that scmi-cpufreq doesn't consume these power
domains and can be voted upon by another consumer, wouldn't this cause
a disconnect?

-Sibi

> --
> Regards,
> Sudeep
Peng Fan Aug. 7, 2024, 12:51 a.m. UTC | #5
> Subject: Re: [PATCH] pmdomain: arm: Fix debugfs node creation failure
> 
> 
> 
> On 7/5/24 18:34, Sudeep Holla wrote:
> > On Fri, Jul 05, 2024 at 09:16:29AM +0530, Sibi Sankar wrote:
> >>
> >> On 7/4/24 16:02, Sudeep Holla wrote:
> >>>
> >>> If there are 2 perf domains for a device or group of devices, there
> >>> must be something unique about each of these domains. Why
> can't the
> >>> firmware specify the uniqueness or the difference via the name?
> >>>
> >>> The example above seems firmware is being just lazy to update it.
> >>> Also for the user/developer/debugger, the unique name might be
> more
> >>> useful than just this number.
> >>>
> >>> So please use the name(we must now have extended name if
> 16bytes are
> >>> less) to provide unique names. Please stop working around such
> silly
> >>> firmware bugs like this, it just makes using debugfs for anything
> useful harder.
> >>
> >> This is just meant to address firmware that are already out in the
> wild.
> >> That being said I don't necessarily agree with the patch either since
> >> it's penalizing firmware that actually uses a proper name by
> >> appending something inherently less useful to it. Since, the using of
> >> an unique domain name isn't required by the spec, the need for it
> >> goes under the radar for vendors. Mandating it might be the right
> >> thing to do since the kernel seems inherently expect that.
> >>
> >
> > Well I would love if spec authors can agree and mandate this. But this
> > is one of those things I can't argue as I don't necessarily agree with
> > the argument. There are 2 distinct/unique domains but firmware
> authors
> > ran out of unique names for them or just can't be bothered to care
> about it.
> >
> > They can't run out of characters as well in above examples, firmware
> > can add some useless domain ID in the name if they can't be
> bothered or creative.

As Sibi raised, Spec does not has restriction on name.

Linux chose to use genpd to support perf domain, but now it turns out
that Linux is forcing firmware to use different name for power/perf
domain. This will not convince firmware developers.

For example, firmware might be as below:

struct pd_perf_domain {
	char *name;
	(int *)power_hook(int id);
	(int *)perf_hook(int level);
};

From firmware developer's view, name is shared for pd and perf.
The fix should be in linux side.

> >
> > So I must admit I can't be bothered as well with that honestly.
> 
> Okay, I guess the conclusion is that if the firmware vendors don't care
> enough to provide unique names, they get to live without those
> debugfs nodes.
> 
> Do we really want to register/expose scmi perf power-domains used by
> the CPU nodes? 

How about not register debugfs for perf?

Given that scmi-cpufreq doesn't consume these power
> domains and can be voted upon by another consumer, wouldn't this
> cause a disconnect?

You might be also interested in [1], which is also scmi cpufreq related.
[1]https://lore.kernel.org/all/20240729070325.2065286-1-peng.fan@oss.nxp.com/

Regards,
Peng.
> 
> -Sibi
> 
> > --
> > Regards,
> > Sudeep
Ulf Hansson Aug. 14, 2024, 12:38 p.m. UTC | #6
+ Peng

On Fri, 5 Jul 2024 at 15:04, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Fri, Jul 05, 2024 at 09:16:29AM +0530, Sibi Sankar wrote:
> >
> > On 7/4/24 16:02, Sudeep Holla wrote:
> > >
> > > If there are 2 perf domains for a device or group of devices, there must
> > > be something unique about each of these domains. Why can't the firmware
> > > specify the uniqueness or the difference via the name?
> > >
> > > The example above seems firmware is being just lazy to update it. Also
> > > for the user/developer/debugger, the unique name might be more useful
> > > than just this number.
> > >
> > > So please use the name(we must now have extended name if 16bytes are less)
> > > to provide unique names. Please stop working around such silly firmware
> > > bugs like this, it just makes using debugfs for anything useful harder.
> >
> > This is just meant to address firmware that are already out in the wild.
> > That being said I don't necessarily agree with the patch either since
> > it's penalizing firmware that actually uses a proper name by appending
> > something inherently less useful to it. Since, the using of an unique
> > domain name isn't required by the spec, the need for it goes under the radar
> > for vendors. Mandating it might be the right thing to do since
> > the kernel seems inherently expect that.
> >
>
> Well I would love if spec authors can agree and mandate this. But this is
> one of those things I can't argue as I don't necessarily agree with the
> argument. There are 2 distinct/unique domains but firmware authors ran out
> of unique names for them or just can't be bothered to care about it.
>
> They can't run out of characters as well in above examples, firmware can
> add some useless domain ID in the name if they can't be bothered or creative.
>
> So I must admit I can't be bothered as well with that honestly.

Sudeep, while I understand your point and I agree with it, it's really
a simple fix that $subject patch is proposing. As the unique name
isn't mandated by the SCMI spec, it looks to me that we should make a
fix for it on the Linux side.

I have therefore decided to queue up $subject patch for fixes. Please
let me know if you have any other proposals/objections moving forward.

Kind regards
Uffe
Sudeep Holla Aug. 14, 2024, 1:31 p.m. UTC | #7
On Wed, Aug 14, 2024 at 02:38:24PM +0200, Ulf Hansson wrote:
>
> Sudeep, while I understand your point and I agree with it, it's really
> a simple fix that $subject patch is proposing. As the unique name
> isn't mandated by the SCMI spec, it looks to me that we should make a
> fix for it on the Linux side.
>

Yes, I did come to the conclusion that this is inevitable but hadn't
thought much on the exact solution. This email and you merging the original
patch made me think a bit quickly now 
Ulf Hansson Aug. 15, 2024, 10:46 a.m. UTC | #8
On Wed, 14 Aug 2024 at 15:31, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Wed, Aug 14, 2024 at 02:38:24PM +0200, Ulf Hansson wrote:
> >
> > Sudeep, while I understand your point and I agree with it, it's really
> > a simple fix that $subject patch is proposing. As the unique name
> > isn't mandated by the SCMI spec, it looks to me that we should make a
> > fix for it on the Linux side.
> >
>
> Yes, I did come to the conclusion that this is inevitable but hadn't
> thought much on the exact solution. This email and you merging the original
> patch made me think a bit quickly now 
Sudeep Holla Aug. 15, 2024, 1:46 p.m. UTC | #9
On Thu, Aug 15, 2024 at 12:46:15PM +0200, Ulf Hansson wrote:
> On Wed, 14 Aug 2024 at 15:31, Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > On Wed, Aug 14, 2024 at 02:38:24PM +0200, Ulf Hansson wrote:
> > >
> > > Sudeep, while I understand your point and I agree with it, it's really
> > > a simple fix that $subject patch is proposing. As the unique name
> > > isn't mandated by the SCMI spec, it looks to me that we should make a
> > > fix for it on the Linux side.
> > >
> >
> > Yes, I did come to the conclusion that this is inevitable but hadn't
> > thought much on the exact solution. This email and you merging the original
> > patch made me think a bit quickly now 
Johan Hovold Sept. 4, 2024, 7:21 a.m. UTC | #10
On Wed, Jul 03, 2024 at 04:37:41PM +0530, Sibi Sankar wrote:
> The domain attributes returned by the perf protocol can end up
> reporting identical names across domains, resulting in debugfs
> node creation failure. Fix this duplication by appending the
> domain-id to the domain name.
> 
> Logs:
> debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
> debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
> 
> Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")

Please include:

Reported-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/

> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>

I now that this patch is being reworked, but please note that the
following warnings that I reported are seen also with this patch:

[    9.119117] arm-scmi firmware:scmi: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
[    9.129146] arm-scmi firmware:scmi: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
[    9.160328] arm-scmi firmware:scmi: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
[    9.175229] arm-scmi firmware:scmi: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16

which seems to suggest that the approach taken by this patch is not
necessarily the right one.

Can you please also comment on why this is an issue on x1e80100 in the
commit message?

Johan
diff mbox series

Patch

diff --git a/drivers/pmdomain/arm/scmi_perf_domain.c b/drivers/pmdomain/arm/scmi_perf_domain.c
index d7ef46ccd9b8..0af5dc941349 100644
--- a/drivers/pmdomain/arm/scmi_perf_domain.c
+++ b/drivers/pmdomain/arm/scmi_perf_domain.c
@@ -18,6 +18,7 @@  struct scmi_perf_domain {
 	const struct scmi_perf_proto_ops *perf_ops;
 	const struct scmi_protocol_handle *ph;
 	const struct scmi_perf_domain_info *info;
+	char domain_name[SCMI_MAX_STR_SIZE];
 	u32 domain_id;
 };
 
@@ -123,7 +124,12 @@  static int scmi_perf_domain_probe(struct scmi_device *sdev)
 		scmi_pd->domain_id = i;
 		scmi_pd->perf_ops = perf_ops;
 		scmi_pd->ph = ph;
-		scmi_pd->genpd.name = scmi_pd->info->name;
+
+		/* Domain attributes can report identical names across domains */
+		snprintf(scmi_pd->domain_name, sizeof(scmi_pd->domain_name), "%s-%d",
+			 scmi_pd->info->name, scmi_pd->domain_id);
+
+		scmi_pd->genpd.name = scmi_pd->domain_name;
 		scmi_pd->genpd.flags = GENPD_FLAG_ALWAYS_ON |
 				       GENPD_FLAG_OPP_TABLE_FW;
 		scmi_pd->genpd.set_performance_state = scmi_pd_set_perf_state;